Anachro-mputing blog

retro and not-so-retro computing topics
Related links:
  • homebrew games, demos, and misc software
  • Old links
  • NOWUT webpage
  • quick download links: DeHunk+UnSuper IMGTOOL 0.96 ImaginarySoundChip Mod player routine ModPZ NOWUT 0.27 MBFAST 0.93b P-ST8 (2021 Nov)

  • Updates

    2021 Nov 30 revisiting Radeon drivers in Windows 2000

    I previously mentioned trying Catalyst 11.7 from BWC's site to run a Radeon 5670. While compatibility with DX9 games seems to be OK, OpenGL support is not too hot. This is unfortunate, as with DX10/11 not being available on Windows 2000, OpenGL is the only newer API available. Catalyst 11.7 also only supports cards up to the 6000 series.

    Catalyst 12.4 is somewhat better in terms of OpenGL (it runs Unigine Heaven with some graphical glitches) but it can freeze the system when attempting video playback (same deal for Catalyst 13). When running in debug mode I saw that there are some messages emitted when this problem occurs:

    Error[PP_PowerPlay]: rv770_hwmgr.c[3653]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister high thermal interrupt!
    Error[PP_PowerPlay]: rv770_hwmgr.c[3657]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister low thermal interrupt!
    Error[PP_PowerPlay]: eventinit.c[724]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister interrupt for CTF event!
    Error[PP_PowerPlay]: eventinit.c[730]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister interrupt for vbios events!
    Error[PP_PowerPlay]: rv770_clockgating.c[409]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister graphics busy interrupt.
    Error[PP_PowerPlay]: rv770_clockgating.c[412]  -- Assertion '(PP_Result_OK == result)' has failed: Failed to unregister graphics idle interrupt.

    Since there is a Catalyst 14.4 for Win XP, I wondered why this couldn't be used in Win 2000. The .INF file needs modification for the driver to install, and upon booting ATI2MTAG.SYS will load but ATI2DVAG.DLL does not load and the screen is stuck in standard VGA mode.

    After more hunting with the debugger, I found that ATI2MTAG allocates memory for some data structure with a size of $54 bytes, the size then being recorded at the beginning of this block. Later, some unknown Windows code checks the size, expecting it to be $50, and returns an error code $C0000059 STATUS_REVISION_MISMATCH. That seems to be the reason that ATI2DVAG was not loading.

    I tried changing the $54 back to a $50 to make Windows happy, but then I got a BSOD instead. Code 1E KMODE_EXCEPTION_NOT_HANDLED due to a null pointer reference in one of the power-related OS routines.

    No progress on this front :(
    2021 Nov 27 ReactOS...

    ReactOS can run inside a VM, but running it in a VM is boring and slow, like everything else in a VM. I've tried to run it on real hardware before but it usually fails. The only time I got an OS that would actually boot was once when I installed on a Pentium III with i8xx chipset.

    My last attempt was a year or two ago, on a Pentium M laptop. So I thought I'd give it another go, but on a desktop socket FM2 system this time.

    Installing ReactOS has always been badly complicated by the lack of a DOS setup program. Windows 9x has a SETUP.EXE, Windows NT/2K/XP have WINNT.EXE, ReactOS has nothing. It can't be installed from HDD. Even if you trick the setup program into booting from HDD, it doesn't work because they put the installation files in a directory called 'ReactOS' which is the same directory name that they want to install to.

    So my plan this time was to boot the ISO in Virtual PC and do the 'text mode setup' (first part) and then copy the resulting newborn ReactOS installation from the Virtual PC disk image to the C:\ FAT32 partition in a real PC. Will it work???

    I want ReactOS to be an option on my existing NTLDR boot menu. I don't want to let ReactOS rewrite the boot sector because I don't want to get stuck in a situation where I can't boot NTLDR anymore. Then I'd have to take the harddisk out and connect it to another system to straighten it out.

    Inside the ReactOS ISO is a FAT32.BIN which can be used to make a boot sector manually. The BPB data needs to be populated before it becomes usable. Namely, the sectors/cluster value at offset $0D and some other stuff from $10 to $4F. I copied it right from my existing FAT32 boot sector (sector 63 of the HDD). Then FAT32.BIN can be placed in C:\ and referenced in BOOT.INI. However, it is 1024 bytes which means it is actually 2 sectors. The second one needs to go somewhere too, because NTLDR (or the MBR code) will only load one sector. I disassembled the code and found that it loads the second sector from the location 14 sectors after the first one, ie. the main boot sector is in #63 so the second one needs to be in #77. This location doesn't conflict with the sectors used by Win98 or 2000 which is good. So I used a sector editor to put the second 512 bytes of FAT32.BIN on my HDD, then rebooted and selected the new ReactOS option in my NTLDR boot menu.

    I received the message "error opening freeldr.ini"

    Of course, I already had copied FREELDR.INI to C:\ along with FREELDR.SYS itself. So why the error? It so happens that a more helpful error message was also transmitted via the COM1 port. It read:

    (C:\buildbot_config\worker\Build_MSVC_x86\build\boot\freeldr\freeldr\disk\partit
    ion.c:186) err: Too many bootable (active) partitions found.
    LOL. Seriously? I disassembled FREELDR and found that yes, it counts bootable partitions and fails if there is more than one.

    To work around this bug, I simply tweaked my partition table, temporarily making the other partition non-bootable. Then I tried to boot ReactOS again. This time it started to boot. Showed the ReactOS logo. Switched to the VESA screen mode. Then... BSOD.

    (base\services\umpnpmgr\install.c:112) Installing: PCIIDE\IDEChannel\4&d0ea710f&0
    (ntoskrnl\io\pnpmgr\pnpres.c:154) IopFindPortResource failed!
    (ntoskrnl\io\pnpmgr\pnpres.c:417) Failed to find an available port resource (0xf090 to 0xf097 length: 0x8)
    (ntoskrnl\io\pnpmgr\pnpres.c:468) Unable to satisfy required resource in list 0
    (ntoskrnl\ps\thread.c:119) PS: Unhandled Kernel Mode Exception Pointers = 0xF78C1628
    (ntoskrnl\ps\thread.c:126) Code c0000005 Addr 8047BD66 Info0 00000000 Info1 00000000 Info2 00000000 Info3 F78C1AC8
    
    *** Fatal System Error: 0x0000007e
                           (0xC0000005,0x8047BD66,0xF78C1A9C,0xF78C1728)
    
    Better luck next time.

    2021 Nov 17 PAL fail

    I have a Needham's EMP-11 programmer which I have used to read/program various types of ROM chips and some ATF750C. I was seeking a simpler device for a project where an ATF750 would be overkill, and so I obtained a handful of ATF16V8B. These are listed in the EMP-11 device list and programming software, but don't seem to actually work.

    I can put in an ATF16V8B and read it (and get all '1's). Attempting to erase or program a chip fails, and every operation thereafter will fail with "incorrect device ID" for the chip in question. Selecting the option under preferences to ignore the device ID has no effect. I don't know if the chips are being trashed during this process but they clearly will not work with the EMP-11. Lame.

    2021 Nov 10 NOWUT version 0.27 release

    Most of the work for this release went into floating-point support. It is still not complete, but variables can be marked as FP with a new '.fd' tag and assignments like this are now possible:

    area.fd=radius*radius*3.14159

    Having more than four decimal places no longer generates an error, though there will be a warning if digits are dropped due to lack of precision. You can go to nine decimal places when compiling from 386 platform (which is the only platform that implements FP at this point anyway). There is a short FPTEST program to demonstrate the parser's current capability.

    Another feature addition is the 'ea' special symbol (let's say it stands for Effective Address). The lack of being able to do fancy address calculations with an indirect address as the base has been a noticable hole in the language. Now I've settled on a way to do it without any radical new syntax.

    [baseaddr+offset+$14].w=0                ; ever wish that this was allowed?
    ea.w(baseaddr+offset+$14)=0              ; now it can be done this way!

    This avoids having to calculate the address and store it in another variable first.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 Sep 23 The video timing problem

    On 1980s hardware you could read some hardware register and determine whether you were in the vertical retrace period or maybe even the exact scanline you were on. Maybe you could even configure an interrupt to happen at a particular time. This made timing relatively easy.

    What do we have on 21st century PCs? We have a bunch of different timers that count at different frequencies, all unrelated to video refresh, and under OpenGL we have wglSwapIntervalEXT. This is a meager toolbox but it's all I've been able to uncover so far. Meanwhile, the demands are also greater. Not only is it useful to be able to synchronize a program with the display, it is also desirable to handle variable framerates. The monitor refresh rate could be anything. The framerate could drop due to performance reasons. Or maybe you want to run a benchmark and let the framerate go as high as possible. Last but not least, after each frame is rendered and the thread is waiting to begin the next one, it is good to have the thread go idle and surrender the unneeded CPU time rather than pegging a CPU core at 100% while it waits.

    I put my rendering code into its own thread so it would be unaffected by window messages or audio related duties. Then I tested four different strategies in my game engine, on five different systems.

    Strategy 1: For this one, vsync is turned OFF with wglSwapIntervalEXT. The rendering thread pauses itself after each frame with a call to WaitForSingleObject. It then becomes unpaused by an auto-reset event which is signalled by the multimedia timer (timeSetEvent) configured to run every 16ms. The result is a framerate capped at roughly 60Hz, but not synchronized with the display. The CPU is allowed to go idle. Overall, not a bad option. This was the only strategy that produced the same results on every test system!

    Strategy 2: Here I turn vsync ON, and do not call WaitForSingleObject. I just let the thread run wild and hope that OpenGL and/or the video driver will do what I want and put the thread to sleep as appropriate. This strategy succeeded on systems 1 and 4, producing perfectly synchronized screen updates and low CPU load. On system 2, it drove a CPU to 100% load. System 5 was unique in that it put both CPU cores at 100% load. On system 3 the video was synchronized and the CPU load was low but a different problem appeared. The time between frames as measured by QueryPerformanceCounter varied wildly, so moving objects in the game suffered from horrible jitter.

    Strategy 3: This is the 'benchmark' mode. Vsync is turned OFF, and the thread is allowed to run wild. The results are mostly expected. Everybody has a CPU core maxed out (both cores on system 5) and stupidly high framerates. However system 3 appeared to have jitter sometimes but not always. I couldn't really nail it down. Also, on system 5 when running in OpenGL 1.x mode (but not 3.x mode) it ignored the vsync setting and continued to cap rendering at 60Hz.

    Strategy 4: This one is the same as #2 except I replaced glFlush with glFinish. That solved the jitter on system 3, but also caused the CPU loads to go to 100% again (both cores on system 5).

    I did not find a silver bullet.
    2021 Sep 8 Something a little different

    Retro machine? Yes. Computer? Ahem... maybe it is a sort of mechanical state machine with three states. In any case, this is a Whirlpool belt-drive washer sold under the Kenmore Heavy Duty label, model number 110.82081110, produced around 1980 AFAICT. It had been misbehaving a bit lately.

    There is an access panel which can be removed at the bottom rear to reveal the machine's innards. It is also open at the bottom, hence the obvious course of action was to tip the washer forward and get a good look at everything. Unfortunately, when I did this oil began running out of the gearbox onto the floor.

    Here is the gearbox from underneath, with the drain pump removed. Loosening a nut above the electric motor allows the motor to pivot and release tension on the drive belt. Then there are two bolts underneath the pump which anchor it to the gearbox.

    This is the top of the washer, with the top cover and control panel removed. A plastic cap at the top of the agitator assembly conceals a bolt holding the agitator to a spindle. After removing this bolt, the top part of the agitator comes out. The bottom part can then also be pulled from the spindle, though it is likely to be quite stuck.

    Then after unplugging the four wires and removing the remaining bolts, the entire gearbox can be withdrawn out the bottom. Notice that when you spin the large pulley, the thing with the two actuators riding on top of two rails will swing back and forth. Evidently, this is called a Wig Wag. When both actuators are disengaged, the washer is in "drain mode" and the drain pump is engaged. You might say this is the default state. The second state is "agitate mode" and occurs when the actuator on the right engages. The right rail slides back, flipping the lever on the pump, and the long shaft coming out of the gearbox starts rotating back and forth.

    The third state is "spin mode" and this occurs when the left actuator engages. The left rail slides forward and allows a post to drop. The spring-loaded plate above it also drops and the clutch mechanism above the center pulley engages so that the washer's drum will spin with it. Note that the red wire on the left actuator leads to the door switch. It is disabled (no spin mode) when the door is open.

    See the black plate with one screw in the center, holding down the two rails? The gearbox oil can be refilled through that bolt hole.

    While reinstalling, I noticed that this bolt securing the gearbox (closest to the front of the washer) goes through a spacer. Removing that one bolt and spacer ought to allow installing a new belt without removing everything else (though it would still need to slip between the clutch release plate and post).

    2021 Aug 28 368-byte Toadroar demo

    Having fixed many bugs in my CPU core, I was able to port some old x86 code to Toadroar assembly and get it running on the QMTECH FPGA dev board. The CPU is running at 75MHz but memory latency is holding it back some, so it is only able to fill the 256x224 RGB screen at about 6fps. If I get around to adding an L1 cache, at least I have something to use as a benchmark :)

    source files

    noisy video recording from CRT

    2021 Aug 21 Cubemaps 2: Matrix Madness

    I had to revisit cubemaps after changing the projection matrix used in my renderer. I changed the matrix because the old one was funky. I had suspected it for a while, but it was hard to be certain because of lingering doubts over which reference material was showing transposed vs. non-transposed matrices.

    Finally, I made test code that called gluPerspective and read back the matrix data with glGetFloatV.

            dd 1.299038   0          0          0
            dd 0          1.732051   0          0
            dd 0          0         -1.002002  -1
            dd 0          0         -200.2002   0

    This confirmed that my old matrix was not good. That, and the fact that it had excessive Z-fighting problems. Of course, fixing the projection matrix broke everything else, which had been designed around the funky one. Getting the cubemap to render again was not too hard though.

    texcoord3=mat3(purerotate)*position;
    gl_Position=vec4(projmatrix[0][0]*position.x,projmatrix[1][1]*position.y,1.0,1.0);

    The texture Z coordinate no longer needs to be inverted. As for the projection matrix, it turns out that the only relevant components in it are the ones related to the viewport size/aspect. So in the shader I just use those directly.

            callex ,gltexcoord3f,    1.0, 1.0,-1.0
            callex ,glvertex3f,      1.0, 1.0,-1.0
            callex ,gltexcoord3f,    1.0,-1.0,-1.0
            callex ,glvertex3f,      1.0,-1.0,-1.0
            callex ,gltexcoord3f,    1.0,-1.0, 1.0
            callex ,glvertex3f,      1.0,-1.0, 1.0
            callex ,gltexcoord3f,    1.0, 1.0, 1.0
            callex ,glvertex3f,      1.0, 1.0, 1.0

    OpenGL 1.x is the same thing. Flip the texture Z coordinate sign, and instead of using the projection matrix as-is, build a separate one like this:

            dd 1.299038   0          0          0
            dd 0          1.732051   0          0
            dd 0          0          1.0        0
            dd 0          0          0          1.0
    2021 Jul 25 Cubemaps

    So I'm working on a 3D game engine in NOWUT. Currently it can render using the OpenGL 3/4 API with shaders, or the old 1.x API without shaders, because why not? There is a lot of common code between the two, and it's not clear at this point whether my plans for the game will preclude supporting older hardware.

    After getting some rudimentary models to render with a basic lighting configuration, I wanted to add a skybox. I didn't know how to do this in a game where the camera can look up and down, so I did a search and came up with the answer: cubemaps.

    Looking at diagrams like this one and reading various descriptions that made it sound like one is rendering a giant cube around the outside of an environment made it hard to understand how this could work. Turns out, that's not what it is. The term 'cubemap' is really a misnomer and this doesn't have much to do with cubes at all.

    A cubemap has six textures, but they don't correspond to any flat surface. Instead, each texture corresponds to a direction. In the game, when you look straight up you'll see the "positive Y axis" texture. Depending on your field of view you might see only part of it, or you might see parts of the other textures where they meet the edges of the PY texture.

    The first step to rendering a skybox in OpenGL is to load the cubemap texture. This is the same in both old and new APIs:

            callex ,glgentextures,cubetex.a,1
            callex ,glbindtexture,cubetex,$8513              ; gl_texture_cube_map
            callex ,gltexparameteri,$812F,$2802,$8513        ; TEXTURE_WRAP_S = CLAMP_TO_EDGE
            callex ,gltexparameteri,$812F,$2803,$8513        ; TEXTURE_WRAP_T = CLAMP_TO_EDGE
            callex ,gltexparameteri,$812F,$8072,$8513        ; TEXTURE_WRAP_R = CLAMP_TO_EDGE
            callex ,gltexparameteri,$2601,$2800,$8513         ; gl_linear, mag_filter
            callex ,gltexparameteri,$2601,$2801,$8513         ; gl_linear, min_filter
    
            callex ,glteximage2d,testcubepx,$1401,$80E0,0,256,256,$1908,0,$8515        ; $1401 = bytes
            callex ,glteximage2d,testcubenx,$1401,$80E0,0,256,256,$1908,0,$8516        ; $80E0 = BGR
            callex ,glteximage2d,testcubepy,$1401,$80E0,0,256,256,$1908,0,$8517        ; $1908 = RGBA
            callex ,glteximage2d,testcubeny,$1401,$80E0,0,256,256,$1908,0,$8518
            callex ,glteximage2d,testcubepz,$1401,$80E0,0,256,256,$1908,0,$8519
            callex ,glteximage2d,testcubenz,$1401,$80E0,0,256,256,$1908,0,$851A

    Rendering it correctly proved to be tricky. I have the third edition OpenGL book which covers version 1.2, but cubemaps were introduced in 1.3. So I had to do more searching online and shuffle matrices around and flip coordinate signs back and forth until something worked.

            callex ,glenable,$8513                    ; gl_texture_cube_map
            callex ,glbindtexture,cubetex,$8513              ; gl_texture_cube_map
    
            callex ,glmatrixmode,$1701                ; gl_projection
            callex ,glloadmatrixf,projmatrix.a
            callex ,glmatrixmode,$1700                ; gl_modelview
            callex ,glloadidentity
            callex ,glmatrixmode,$1702                ; gl_texture
            callex ,glloadmatrixf,purerotate.a
    
            callex ,glbegin,7                         ; gl_quads
            callex ,gltexcoord3f,   -1.0, 1.0,-1.0
            callex ,glvertex3f,      1.0, 1.0,-1.0
            callex ,gltexcoord3f,   -1.0,-1.0,-1.0
            callex ,glvertex3f,      1.0,-1.0,-1.0
            callex ,gltexcoord3f,   -1.0,-1.0, 1.0
            callex ,glvertex3f,      1.0,-1.0, 1.0
            callex ,gltexcoord3f,   -1.0, 1.0, 1.0
            callex ,glvertex3f,      1.0, 1.0, 1.0
            callex ,glend

    The projection matrix is loaded as normal. The modelview stack gets an identity matrix. And then the matrix corresponding to the camera viewpoint goes on the TEXTURE matrix stack. Except not exactly, because you only want the angle, not the position (skybox doesn't move when you move), so the 'purerotate' matrix is like the view matrix with the 'translation' part stripped off. (This matrix is also useful for rotating normals in a shader.) Then we just draw one quad that covers the whole viewport, and the camera angle modifies the texture coordinates which determines what is drawn.

    Example code that I saw online used texcoord4f, and set the fourth component to 0. However this didn't work on my 'low-spec' test machine (Radeon 7500) which only displayed a solid color with that method.

    That's it for OpenGL 1.3, now how to render it using shaders... Well, none of the preceeding code is useful except for the 'bindtexture' part. For OpenGL 3+ it is necessary to prepare some vertex/texture coordinates in a VBO, a fragment shader that uses "samplerCube", and a vertex shader that calculates the needed position and texture coordinates.

    Looking here or elsewhere on the web one can find a set of shaders and vertex data to do the job. Except it appears to work completely different from OpenGL 1.3. There is vertex data for a whole cube instead of one rectangle, and the vertex shader manipulates the position while passing texture coordinates through unchanged. Why? I don't really know since I couldn't quite get this code to work (only part of the skybox would show up) and understanding what all this matrix stuff does at a high level is fairly confusing. So I tried to use shaders to do the same thing that the OpenGL 1.3 code was doing and came up with this:

    #version 150        // fragment shader for skybox
    
    out vec4 outcolor;
    in vec3 texcoord3;
    uniform samplerCube thetex;
    
    void main()
    {
    outcolor=texture(thetex,texcoord3);
    }
    
    #version 150        // vertex shader for skybox
    
    in vec3 position;
    out vec3 texcoord3;
    uniform mat4 projmatrix;
    uniform mat4 purerotate;
    
    void main()
    {
    texcoord3=vec3(purerotate*vec4(position*vec3(1.0,1.0,-1.0),1.0));
    gl_Position=projmatrix*vec4(position,1.0);
    }
    
    ; vertex data for skybox quad
            dd -1.0, -1.0,  1.0
            dd -1.0,  1.0,  1.0
            dd  1.0,  1.0,  1.0
            dd  1.0,  1.0,  1.0
            dd  1.0, -1.0,  1.0
            dd -1.0, -1.0,  1.0

    Phew!

    2021 Jul 2 NOWUT version 0.26 release

    ELF386 dynamic linking finally works, or at least enough to open a window with libX11 and implement the JPG loader example. There's a JPG example for EmuTOS too. OpenGL example has been updated.

    LINKBIN needs to know the name of required library files so it can put this information in the ELF file for ld-linux. I wanted to be able to put this in the program source, which meant hiding it in the OBJ somewhere. Rather than inventing my own scheme for this, I consulted the Go Tools documentation and found out about their #dynamiclinkfile directive. This makes GoAsm pass the library names to GoLink by putting them in a special section in the COFF file. I adopted this scheme for MULTINO and LINKBIN, with my own LINKLIBFILE statement, so it can be used for both Win32 programs and ELF386.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 May 12 P-ST8 utility for Win32 on AMD CPUs 10h, 15h

    This is the second release of my overclocking / power management utility. It's similar to PhenomMSRTweaker but designed to also support socket FM2 CPUs. It has only been tested on Windows 2000 with Extended Kernel and two different CPUs (Regor-based Athlon II X2 and Richland-based Athlon X4). It allows both manual selection of a p-state, or automatic throttling based on load. Does not work on 64-bit Windows. NO WARRANTY

    2021 May 6 Atari ST, anyone?

    There was a post recently on hackernews about EmuTOS which got me thinking. I had never used an ST before, but given that it is another 68K system, how hard would it be to add it as another target for LINKBIN?

    I downloaded Hatari, and seeing the SDL2.DLL in there I expected it to have broken keyboard input on Windows 2000. Turns out, it works fine!

    Then I just needed some info on how to build a .PRG and do some basic GEMDOS stuff. In fact, these are nearly the same as .X and Human68K. Some old GEMDOS docs warned that function $3F for reading data from a file would return junk data in D0 if you tried to read past the end of the file. "Noooooooo! Every other platform returns zero!" But it appears that this behavior may have been corrected in EmuTOS... (hopefully)

    2021 Apr 16 DeHunk 0.97 and UnSuper

    Very minor update to the DeHunk 68000 disassembler. Now also includes UnSuper, which is a quick and dirty adaptation of the disassembler to handle SH2 code instead.

    DeHunk+UnSuper download
    2021 Apr 14 Remote kernel debugging with Windbg

    I never had much use for official SDKs, since I don't use any flavor of C programming language. But recently I saw mention in a few different places ( this article for instance ) of using a second system connected via serial cable to diagnose crashes or boot failures. It sounded like something I should try. Maybe I'll even get to the bottom of the video driver crashes on this A88X motherboard?

    The debugging tools are part of the Microsoft Platform SDK or Windows SDK. If you're lucky you might be able to find the (much smaller) dbg_x86.msi floating around on its own

    2021 Apr 10 Higher-quality Mod player

    Here is an updated example of a Win32-based Mod player. It contains some minor changes over the 2019 version, and two big changes:

    First, it eliminates 'pops' in the audio caused by abrupt volume or sample changes, including by looking ahead one row to see which notes will be ending during that time period and can then be quickly faded out before a new note begins.

    The other big change has to do with aliasing. These are 'phantom' high frequencies that can result from converting from a low sampling rate to a higher one. For instance, a sample in a Mod file playing at C-5 (8287Hz) being mixed into the 44100Hz audio output stream. The simple way to do this conversion is to use an index into the instrument sample that has fraction bits (12 bits in my case) and which increases after each sample by a rate proportional to the note's pitch. (I believe the term for this is Phase Accumulator.) 44100 / 8287 = 5.32 which means each instrument sample is going to repeat in the output 5 or 6 times. The value added to the phase accumulator each time would be the reciprocal of that, shifted left 12 bits for the sake of the fixed-point math: 769. I'll call this the phase increment.

    Duplicating the same sample in the output those 5-6 times creates an ugly stair-step waveform which is a direct cause of phantom frequencies.

    In practice, this is what I saw coming out of the 32X:

    Seeing how bad it was in visual terms bothered me enough to reconsider doing something about it :)

    A simple low-pass filter will smooth things out and block some of the aliasing. The more agressive the filtering, the less aliasing that will be heard, however desirable high frequencies are also lost and notes that are low enough are still distorted. Not good.

    I didn't want to go with linear interpolation because it requires fetching 2 (at least) samples from the instrument data. In the context of a Mod player where you have to be mindful of loop begin- and end-points it seemed like too much of a hassle for something that I would like to run on older CPUs (like a 23MHz SH2). Instead, I came up with something workable that uses adjustable low-pass filters.

    My earlier attempt at a (fixed) low-pass filter looked like this: Output = ((New - Previous) * X) + Previous

    If X is one half, then this is the same as averaging each newly calculated value with the last output value. Using a different ratio for X alters the frequency response. What I did is replace X with the phase increment. So for each channel, as long as the phase increment is less than 1.0 (or 4096 after being shifted) then I have Output = (((New - Previous) * PI) shr 12) + Previous

    2021 Mar 13 Toadroar revisions and NOWUT version 0.25 release

    Running my FPGA CPU design through some more elaborate tests has revealed problems. For instance, instruction fetch waitstates caused instructions to be skipped, a few instructions operated on the wrong data, and no consideration was given to where bytes would be presented on the bus when accessing odd addresses. So I've been busy redesigning it while also tweaking the instruction set to suit the implementation details. I have plans to add interrupts and an instruction cache later.

    NOWUT 0.25 is here, with bug fixes in the x86 assembler and elsewhere, and two new IF statements.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 Feb 20 QMTECH again

    Got all three colors hooked up. Added a Toadroar CPU into the mix which can execute code from internal FPGA memory and write to SDRAM. I also have an assembler adapted from an old MultiNO and a utility to convert to text-based (argh!) MIF (memory initialization file) format used by Quartus. Using the MIF means I can reload the FPGA with updated code without having to go through the entire compilation process in Quartus (which takes the better part of a minute otherwise).

    Then I used a PLL to pump up the clock speed to 80MHz. When I did that, the inverted clock being sent to the SDRAM (as seen in my last project upload) became inadequate, and the first out of every eight pixels was showing garbage data on the screen. Setting up a separate PLL output to create another 80MHz clock at 90 degrees out-of-phase fixed this.

    2021 Feb 13 more experiments on QMTECH Cyclone IV Core Board

    I revised my CRT controller, made a basic SDRAM controller with the help of the W9825G6KH datasheet, and then connected them together in an outer module. Now I can view uninitialized garbage data from the SDRAM on the screen, press a button to draw some black stripes in it, or press the other button to scroll down.

    The garbage data at power-on is itself a bit curious. There are tightly repeating patterns, and after scrolling down for a while (256K words I think it is...) the pattern changes completely.

    future plans:

  • add in a CPU core to push gfx data around
  • try boosting the memory clock / resolution / color depth
  • add a GPU that draws triangles?
  • add texture mapping?
  • This is the complete Quartus project.

    2021 Feb 4 another FPGA toy

    This is a nice bang-for-the-buck Cyclone IV project board featuring 15K LEs. Sadly, it is devoid of connectors other than the rows of unpopulated solder pads, and includes only one LED and two buttons for general purpose use. However, it does boast 32MB of SDRAM !

    Having already experimented with the CPU, serial, and audio cores on the Storm_I board, video was the next thing on my agenda. I decided to start out with some 15KHz RGB, using my NEC CM-1991a monitor. My reasoning was that any VGA monitor since the mid '90s is likely to show a blank screen or an error message if the incoming signal is in any way defective, whereas I know the old NEC CRT will show something on the screen even if it is all garbage. Plus, it is already there sitting just a few feet away, with a dodgy custom RGB cable hanging out of it that I used to test an Amiga Firecracker 24 board.

    I read on another page the idea of using a 270 ohm resistor between the 3.3V FPGA output and the video input, to get something approximating the right voltage (assuming 75 ohm load in the monitor). I didn't have a 270 so I used a 330. Viewing the output signal (red, that is) with a scope showed a ~.5V DC offset (with ~1V peak) and I have to say I don't know why that was there but turning down the brightness on the monitor effectively removed it. I used 100 ohm resistors on the h-sync and v-sync.

    I divided 50MHz by ten, yielding roughly 240 pixels horizontally, and created a couple of extra intensity levels by switching the output off early during one pixel.

    Oddly, the camera saw the red bars as orange. I even mucked with the white balance and tint in DPP before saving the JPEG in an attempt to make it more red, which is certainly how it looked to the naked eye. But the brightest red bar still looks orange. *shrugs*

    This is the verilog code.


    Old updates

    entries from 2020 and prior