Anachro-mputing blog

retro and not-so-retro computing topics
Related links:
  • homebrew games, demos, and misc software
  • Old links
  • NOWUT webpage
  • quick download links: DeHunk+UnSuper IMGTOOL 0.96 ImaginarySoundChip Mod player routine ModPZ NOWUT 0.26 MBFAST 0.93b

  • Updates

    2021 Sep 23 The video timing problem

    On 1980s hardware you could read some hardware register and determine whether you were in the vertical retrace period or maybe even the exact scanline you were on. Maybe you could even configure an interrupt to happen at a particular time. This made timing relatively easy.

    What do we have on 21st century PCs? We have a bunch of different timers that count at different frequencies, all unrelated to video refresh, and under OpenGL we have wglSwapIntervalEXT. This is a meager toolbox but it's all I've been able to uncover so far. Meanwhile, the demands are also greater. Not only is it useful to be able to synchronize a program with the display, it is also desirable to handle variable framerates. The monitor refresh rate could be anything. The framerate could drop due to performance reasons. Or maybe you want to run a benchmark and let the framerate go as high as possible. Last but not least, after each frame is rendered and the thread is waiting to begin the next one, it is good to have the thread go idle and surrender the unneeded CPU time rather than pegging a CPU core at 100% while it waits.

    I put my rendering code into its own thread so it would be unaffected by window messages or audio related duties. Then I tested four different strategies in my game engine, on five different systems.

    Strategy 1: For this one, vsync is turned OFF with wglSwapIntervalEXT. The rendering thread pauses itself after each frame with a call to WaitForSingleObject. It then becomes unpaused by an auto-reset event which is signalled by the multimedia timer (timeSetEvent) configured to run every 16ms. The result is a framerate capped at roughly 60Hz, but not synchronized with the display. The CPU is allowed to go idle. Overall, not a bad option. This was the only strategy that produced the same results on every test system!

    Strategy 2: Here I turn vsync ON, and do not call WaitForSingleObject. I just let the thread run wild and hope that OpenGL and/or the video driver will do what I want and put the thread to sleep as appropriate. This strategy succeeded on systems 1 and 4, producing perfectly synchronized screen updates and low CPU load. On system 2, it drove a CPU to 100% load. System 5 was unique in that it put both CPU cores at 100% load. On system 3 the video was synchronized and the CPU load was low but a different problem appeared. The time between frames as measured by QueryPerformanceCounter varied wildly, so moving objects in the game suffered from horrible jitter.

    Strategy 3: This is the 'benchmark' mode. Vsync is turned OFF, and the thread is allowed to run wild. The results are mostly expected. Everybody has a CPU core maxed out (both cores on system 5) and stupidly high framerates. However system 3 appeared to have jitter sometimes but not always. I couldn't really nail it down. Also, on system 5 when running in OpenGL 1.x mode (but not 3.x mode) it ignored the vsync setting and continued to cap rendering at 60Hz.

    Strategy 4: This one is the same as #2 except I replaced glFlush with glFinish. That solved the jitter on system 3, but also caused the CPU loads to go to 100% again (both cores on system 5).

    I did not find a silver bullet.
    2021 Sep 8 Something a little different

    Retro machine? Yes. Computer? Ahem... maybe it is a sort of mechanical state machine with three states. In any case, this is a Whirlpool belt-drive washer sold under the Kenmore Heavy Duty label, model number 110.82081110, produced around 1980 AFAICT. It had been misbehaving a bit lately.

    There is an access panel which can be removed at the bottom rear to reveal the machine's innards. It is also open at the bottom, hence the obvious course of action was to tip the washer forward and get a good look at everything. Unfortunately, when I did this oil began running out of the gearbox onto the floor.

    Here is the gearbox from underneath, with the drain pump removed. Loosening a nut above the electric motor allows the motor to pivot and release tension on the drive belt. Then there are two bolts underneath the pump which anchor it to the gearbox.

    This is the top of the washer, with the top cover and control panel removed. A plastic cap at the top of the agitator assembly conceals a bolt holding the agitator to a spindle. After removing this bolt, the top part of the agitator comes out. The bottom part can then also be pulled from the spindle, though it is likely to be quite stuck.

    Then after unplugging the four wires and removing the remaining bolts, the entire gearbox can be withdrawn out the bottom. Notice that when you spin the large pulley, the thing with the two actuators riding on top of two rails will swing back and forth. Evidently, this is called a Wig Wag. When both actuators are disengaged, the washer is in "drain mode" and the drain pump is engaged. You might say this is the default state. The second state is "agitate mode" and occurs when the actuator on the right engages. The right rail slides back, flipping the lever on the pump, and the long shaft coming out of the gearbox starts rotating back and forth.

    The third state is "spin mode" and this occurs when the left actuator engages. The left rail slides forward and allows a post to drop. The spring-loaded plate above it also drops and the clutch mechanism above the center pulley engages so that the washer's drum will spin with it. Note that the red wire on the left actuator leads to the door switch. It is disabled (no spin mode) when the door is open.

    See the black plate with one screw in the center, holding down the two rails? The gearbox oil can be refilled through that bolt hole.

    While reinstalling, I noticed that this bolt securing the gearbox (closest to the front of the washer) goes through a spacer. Removing that one bolt and spacer ought to allow installing a new belt without removing everything else (though it would still need to slip between the clutch release plate and post).

    2021 Aug 28 368-byte Toadroar demo

    Having fixed many bugs in my CPU core, I was able to port some old x86 code to Toadroar assembly and get it running on the QMTECH FPGA dev board. The CPU is running at 75MHz but memory latency is holding it back some, so it is only able to fill the 256x224 RGB screen at about 6fps. If I get around to adding an L1 cache, at least I have something to use as a benchmark :)

    source files

    noisy video recording from CRT

    2021 Aug 21 Cubemaps 2: Matrix Madness

    I had to revisit cubemaps after changing the projection matrix used in my renderer. I changed the matrix because the old one was funky. I had suspected it for a while, but it was hard to be certain because of lingering doubts over which reference material was showing transposed vs. non-transposed matrices.

    Finally, I made test code that called gluPerspective and read back the matrix data with glGetFloatV.

            dd 1.299038   0          0          0
            dd 0          1.732051   0          0
            dd 0          0         -1.002002  -1
            dd 0          0         -200.2002   0

    This confirmed that my old matrix was not good. That, and the fact that it had excessive Z-fighting problems. Of course, fixing the projection matrix broke everything else, which had been designed around the funky one. Getting the cubemap to render again was not too hard though.


    The texture Z coordinate no longer needs to be inverted. As for the projection matrix, it turns out that the only relevant components in it are the ones related to the viewport size/aspect. So in the shader I just use those directly.

            callex ,gltexcoord3f,    1.0, 1.0,-1.0
            callex ,glvertex3f,      1.0, 1.0,-1.0
            callex ,gltexcoord3f,    1.0,-1.0,-1.0
            callex ,glvertex3f,      1.0,-1.0,-1.0
            callex ,gltexcoord3f,    1.0,-1.0, 1.0
            callex ,glvertex3f,      1.0,-1.0, 1.0
            callex ,gltexcoord3f,    1.0, 1.0, 1.0
            callex ,glvertex3f,      1.0, 1.0, 1.0

    OpenGL 1.x is the same thing. Flip the texture Z coordinate sign, and instead of using the projection matrix as-is, build a separate one like this:

            dd 1.299038   0          0          0
            dd 0          1.732051   0          0
            dd 0          0          1.0        0
            dd 0          0          0          1.0
    2021 Jul 25 Cubemaps

    So I'm working on a 3D game engine in NOWUT. Currently it can render using the OpenGL 3/4 API with shaders, or the old 1.x API without shaders, because why not? There is a lot of common code between the two, and it's not clear at this point whether my plans for the game will preclude supporting older hardware.

    After getting some rudimentary models to render with a basic lighting configuration, I wanted to add a skybox. I didn't know how to do this in a game where the camera can look up and down, so I did a search and came up with the answer: cubemaps.

    Looking at diagrams like this one and reading various descriptions that made it sound like one is rendering a giant cube around the outside of an environment made it hard to understand how this could work. Turns out, that's not what it is. The term 'cubemap' is really a misnomer and this doesn't have much to do with cubes at all.

    A cubemap has six textures, but they don't correspond to any flat surface. Instead, each texture corresponds to a direction. In the game, when you look straight up you'll see the "positive Y axis" texture. Depending on your field of view you might see only part of it, or you might see parts of the other textures where they meet the edges of the PY texture.

    The first step to rendering a skybox in OpenGL is to load the cubemap texture. This is the same in both old and new APIs:

            callex ,glgentextures,cubetex.a,1
            callex ,glbindtexture,cubetex,$8513              ; gl_texture_cube_map
            callex ,gltexparameteri,$812F,$2802,$8513        ; TEXTURE_WRAP_S = CLAMP_TO_EDGE
            callex ,gltexparameteri,$812F,$2803,$8513        ; TEXTURE_WRAP_T = CLAMP_TO_EDGE
            callex ,gltexparameteri,$812F,$8072,$8513        ; TEXTURE_WRAP_R = CLAMP_TO_EDGE
            callex ,gltexparameteri,$2601,$2800,$8513         ; gl_linear, mag_filter
            callex ,gltexparameteri,$2601,$2801,$8513         ; gl_linear, min_filter
            callex ,glteximage2d,testcubepx,$1401,$80E0,0,256,256,$1908,0,$8515        ; $1401 = bytes
            callex ,glteximage2d,testcubenx,$1401,$80E0,0,256,256,$1908,0,$8516        ; $80E0 = BGR
            callex ,glteximage2d,testcubepy,$1401,$80E0,0,256,256,$1908,0,$8517        ; $1908 = RGBA
            callex ,glteximage2d,testcubeny,$1401,$80E0,0,256,256,$1908,0,$8518
            callex ,glteximage2d,testcubepz,$1401,$80E0,0,256,256,$1908,0,$8519
            callex ,glteximage2d,testcubenz,$1401,$80E0,0,256,256,$1908,0,$851A

    Rendering it correctly proved to be tricky. I have the third edition OpenGL book which covers version 1.2, but cubemaps were introduced in 1.3. So I had to do more searching online and shuffle matrices around and flip coordinate signs back and forth until something worked.

            callex ,glenable,$8513                    ; gl_texture_cube_map
            callex ,glbindtexture,cubetex,$8513              ; gl_texture_cube_map
            callex ,glmatrixmode,$1701                ; gl_projection
            callex ,glloadmatrixf,projmatrix.a
            callex ,glmatrixmode,$1700                ; gl_modelview
            callex ,glloadidentity
            callex ,glmatrixmode,$1702                ; gl_texture
            callex ,glloadmatrixf,purerotate.a
            callex ,glbegin,7                         ; gl_quads
            callex ,gltexcoord3f,   -1.0, 1.0,-1.0
            callex ,glvertex3f,      1.0, 1.0,-1.0
            callex ,gltexcoord3f,   -1.0,-1.0,-1.0
            callex ,glvertex3f,      1.0,-1.0,-1.0
            callex ,gltexcoord3f,   -1.0,-1.0, 1.0
            callex ,glvertex3f,      1.0,-1.0, 1.0
            callex ,gltexcoord3f,   -1.0, 1.0, 1.0
            callex ,glvertex3f,      1.0, 1.0, 1.0
            callex ,glend

    The projection matrix is loaded as normal. The modelview stack gets an identity matrix. And then the matrix corresponding to the camera viewpoint goes on the TEXTURE matrix stack. Except not exactly, because you only want the angle, not the position (skybox doesn't move when you move), so the 'purerotate' matrix is like the view matrix with the 'translation' part stripped off. (This matrix is also useful for rotating normals in a shader.) Then we just draw one quad that covers the whole viewport, and the camera angle modifies the texture coordinates which determines what is drawn.

    Example code that I saw online used texcoord4f, and set the fourth component to 0. However this didn't work on my 'low-spec' test machine (Radeon 7500) which only displayed a solid color with that method.

    That's it for OpenGL 1.3, now how to render it using shaders... Well, none of the preceeding code is useful except for the 'bindtexture' part. For OpenGL 3+ it is necessary to prepare some vertex/texture coordinates in a VBO, a fragment shader that uses "samplerCube", and a vertex shader that calculates the needed position and texture coordinates.

    Looking here or elsewhere on the web one can find a set of shaders and vertex data to do the job. Except it appears to work completely different from OpenGL 1.3. There is vertex data for a whole cube instead of one rectangle, and the vertex shader manipulates the position while passing texture coordinates through unchanged. Why? I don't really know since I couldn't quite get this code to work (only part of the skybox would show up) and understanding what all this matrix stuff does at a high level is fairly confusing. So I tried to use shaders to do the same thing that the OpenGL 1.3 code was doing and came up with this:

    #version 150        // fragment shader for skybox
    out vec4 outcolor;
    in vec3 texcoord3;
    uniform samplerCube thetex;
    void main()
    #version 150        // vertex shader for skybox
    in vec3 position;
    out vec3 texcoord3;
    uniform mat4 projmatrix;
    uniform mat4 purerotate;
    void main()
    ; vertex data for skybox quad
            dd -1.0, -1.0,  1.0
            dd -1.0,  1.0,  1.0
            dd  1.0,  1.0,  1.0
            dd  1.0,  1.0,  1.0
            dd  1.0, -1.0,  1.0
            dd -1.0, -1.0,  1.0


    2021 Jul 2 NOWUT version 0.26 release

    ELF386 dynamic linking finally works, or at least enough to open a window with libX11 and implement the JPG loader example. There's a JPG example for EmuTOS too. OpenGL example has been updated.

    LINKBIN needs to know the name of required library files so it can put this information in the ELF file for ld-linux. I wanted to be able to put this in the program source, which meant hiding it in the OBJ somewhere. Rather than inventing my own scheme for this, I consulted the Go Tools documentation and found out about their #dynamiclinkfile directive. This makes GoAsm pass the library names to GoLink by putting them in a special section in the COFF file. I adopted this scheme for MULTINO and LINKBIN, with my own LINKLIBFILE statement, so it can be used for both Win32 programs and ELF386.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 May 12 P-ST8 utility for Win32 on AMD CPUs 10h, 15h

    This is the second release of my overclocking / power management utility. It's similar to PhenomMSRTweaker but designed to also support socket FM2 CPUs. It has only been tested on Windows 2000 with Extended Kernel and two different CPUs (Regor-based Athlon II X2 and Richland-based Athlon X4). It allows both manual selection of a p-state, or automatic throttling based on load. Does not work on 64-bit Windows. NO WARRANTY

    2021 May 6 Atari ST, anyone?

    There was a post recently on hackernews about EmuTOS which got me thinking. I had never used an ST before, but given that it is another 68K system, how hard would it be to add it as another target for LINKBIN?

    I downloaded Hatari, and seeing the SDL2.DLL in there I expected it to have broken keyboard input on Windows 2000. Turns out, it works fine!

    Then I just needed some info on how to build a .PRG and do some basic GEMDOS stuff. In fact, these are nearly the same as .X and Human68K. Some old GEMDOS docs warned that function $3F for reading data from a file would return junk data in D0 if you tried to read past the end of the file. "Noooooooo! Every other platform returns zero!" But it appears that this behavior may have been corrected in EmuTOS... (hopefully)

    2021 Apr 16 DeHunk 0.97 and UnSuper

    Very minor update to the DeHunk 68000 disassembler. Now also includes UnSuper, which is a quick and dirty adaptation of the disassembler to handle SH2 code instead.

    DeHunk+UnSuper download
    2021 Apr 14 Remote kernel debugging with Windbg

    I never had much use for official SDKs, since I don't use any flavor of C programming language. But recently I saw mention in a few different places ( this article for instance ) of using a second system connected via serial cable to diagnose crashes or boot failures. It sounded like something I should try. Maybe I'll even get to the bottom of the video driver crashes on this A88X motherboard?

    The debugging tools are part of the Microsoft Platform SDK or Windows SDK. If you're lucky you might be able to find the (much smaller) dbg_x86.msi floating around on its own

    2021 Apr 10 Higher-quality Mod player

    Here is an updated example of a Win32-based Mod player. It contains some minor changes over the 2019 version, and two big changes:

    First, it eliminates 'pops' in the audio caused by abrupt volume or sample changes, including by looking ahead one row to see which notes will be ending during that time period and can then be quickly faded out before a new note begins.

    The other big change has to do with aliasing. These are 'phantom' high frequencies that can result from converting from a low sampling rate to a higher one. For instance, a sample in a Mod file playing at C-5 (8287Hz) being mixed into the 44100Hz audio output stream. The simple way to do this conversion is to use an index into the instrument sample that has fraction bits (12 bits in my case) and which increases after each sample by a rate proportional to the note's pitch. (I believe the term for this is Phase Accumulator.) 44100 / 8287 = 5.32 which means each instrument sample is going to repeat in the output 5 or 6 times. The value added to the phase accumulator each time would be the reciprocal of that, shifted left 12 bits for the sake of the fixed-point math: 769. I'll call this the phase increment.

    Duplicating the same sample in the output those 5-6 times creates an ugly stair-step waveform which is a direct cause of phantom frequencies.

    In practice, this is what I saw coming out of the 32X:

    Seeing how bad it was in visual terms bothered me enough to reconsider doing something about it :)

    A simple low-pass filter will smooth things out and block some of the aliasing. The more agressive the filtering, the less aliasing that will be heard, however desirable high frequencies are also lost and notes that are low enough are still distorted. Not good.

    I didn't want to go with linear interpolation because it requires fetching 2 (at least) samples from the instrument data. In the context of a Mod player where you have to be mindful of loop begin- and end-points it seemed like too much of a hassle for something that I would like to run on older CPUs (like a 23MHz SH2). Instead, I came up with something workable that uses adjustable low-pass filters.

    My earlier attempt at a (fixed) low-pass filter looked like this: Output = ((New - Previous) * X) + Previous

    If X is one half, then this is the same as averaging each newly calculated value with the last output value. Using a different ratio for X alters the frequency response. What I did is replace X with the phase increment. So for each channel, as long as the phase increment is less than 1.0 (or 4096 after being shifted) then I have Output = (((New - Previous) * PI) shr 12) + Previous

    2021 Mar 13 Toadroar revisions and NOWUT version 0.25 release

    Running my FPGA CPU design through some more elaborate tests has revealed problems. For instance, instruction fetch waitstates caused instructions to be skipped, a few instructions operated on the wrong data, and no consideration was given to where bytes would be presented on the bus when accessing odd addresses. So I've been busy redesigning it while also tweaking the instruction set to suit the implementation details. I have plans to add interrupts and an instruction cache later.

    NOWUT 0.25 is here, with bug fixes in the x86 assembler and elsewhere, and two new IF statements.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 Feb 20 QMTECH again

    Got all three colors hooked up. Added a Toadroar CPU into the mix which can execute code from internal FPGA memory and write to SDRAM. I also have an assembler adapted from an old MultiNO and a utility to convert to text-based (argh!) MIF (memory initialization file) format used by Quartus. Using the MIF means I can reload the FPGA with updated code without having to go through the entire compilation process in Quartus (which takes the better part of a minute otherwise).

    Then I used a PLL to pump up the clock speed to 80MHz. When I did that, the inverted clock being sent to the SDRAM (as seen in my last project upload) became inadequate, and the first out of every eight pixels was showing garbage data on the screen. Setting up a separate PLL output to create another 80MHz clock at 90 degrees out-of-phase fixed this.

    2021 Feb 13 more experiments on QMTECH Cyclone IV Core Board

    I revised my CRT controller, made a basic SDRAM controller with the help of the W9825G6KH datasheet, and then connected them together in an outer module. Now I can view uninitialized garbage data from the SDRAM on the screen, press a button to draw some black stripes in it, or press the other button to scroll down.

    The garbage data at power-on is itself a bit curious. There are tightly repeating patterns, and after scrolling down for a while (256K words I think it is...) the pattern changes completely.

    future plans:

  • add in a CPU core to push gfx data around
  • try boosting the memory clock / resolution / color depth
  • add a GPU that draws triangles?
  • add texture mapping?
  • This is the complete Quartus project.

    2021 Feb 4 another FPGA toy

    This is a nice bang-for-the-buck Cyclone IV project board featuring 15K LEs. Sadly, it is devoid of connectors other than the rows of unpopulated solder pads, and includes only one LED and two buttons for general purpose use. However, it does boast 32MB of SDRAM !

    Having already experimented with the CPU, serial, and audio cores on the Storm_I board, video was the next thing on my agenda. I decided to start out with some 15KHz RGB, using my NEC CM-1991a monitor. My reasoning was that any VGA monitor since the mid '90s is likely to show a blank screen or an error message if the incoming signal is in any way defective, whereas I know the old NEC CRT will show something on the screen even if it is all garbage. Plus, it is already there sitting just a few feet away, with a dodgy custom RGB cable hanging out of it that I used to test an Amiga Firecracker 24 board.

    I read on another page the idea of using a 270 ohm resistor between the 3.3V FPGA output and the video input, to get something approximating the right voltage (assuming 75 ohm load in the monitor). I didn't have a 270 so I used a 330. Viewing the output signal (red, that is) with a scope showed a ~.5V DC offset (with ~1V peak) and I have to say I don't know why that was there but turning down the brightness on the monitor effectively removed it. I used 100 ohm resistors on the h-sync and v-sync.

    I divided 50MHz by ten, yielding roughly 240 pixels horizontally, and created a couple of extra intensity levels by switching the output off early during one pixel.

    Oddly, the camera saw the red bars as orange. I even mucked with the white balance and tint in DPP before saving the JPEG in an attempt to make it more red, which is certainly how it looked to the naked eye. But the brightest red bar still looks orange. *shrugs*

    This is the verilog code.

    Old updates

    entries from 2020 and prior