Anachro-mputing blog

retro and not-so-retro computing topics
Related links:
  • homebrew games, demos, and misc software
  • Old links
  • NOWUT webpage
  • buy my rad t-shirts on Zazzle shirt 1 shirt 2 and miscellaneous art
  • quick download links: DeHunk+UnSuper IMGTOOL 0.96 ImaginarySoundChip Mod player routine ModPZ NOWUT 0.25 MBFAST 0.93b

  • Updates

    2021 May 6 Atari ST, anyone?

    There was a post recently on hackernews about EmuTOS which got me thinking. I had never used an ST before, but given that it is another 68K system, how hard would it be to add it as another target for LINKBIN?

    I downloaded Hatari, and seeing the SDL2.DLL in there I expected it to have broken keyboard input on Windows 2000. Turns out, it works fine!

    Then I just needed some info on how to build a .PRG and do some basic GEMDOS stuff. In fact, these are nearly the same as .X and Human68K. Some old GEMDOS docs warned that function $3F for reading data from a file would return junk data in D0 if you tried to read past the end of the file. "Noooooooo! Every other platform returns zero!" But it appears that this behavior may have been corrected in EmuTOS... (hopefully)

    2021 Apr 16 DeHunk 0.97 and UnSuper

    Very minor update to the DeHunk 68000 disassembler. Now also includes UnSuper, which is a quick and dirty adaptation of the disassembler to handle SH2 code instead.

    DeHunk+UnSuper download
    2021 Apr 14 Remote kernel debugging with Windbg

    I never had much use for official SDKs, since I don't use any flavor of C programming language. But recently I saw mention in a few different places ( this article for instance ) of using a second system connected via serial cable to diagnose crashes or boot failures. It sounded like something I should try. Maybe I'll even get to the bottom of the video driver crashes on this A88X motherboard?

    The debugging tools are part of the Microsoft Platform SDK or Windows SDK. If you're lucky you might be able to find the (much smaller) dbg_x86.msi floating around on its own

    2021 Apr 10 Higher-quality Mod player

    Here is an updated example of a Win32-based Mod player. It contains some minor changes over the 2019 version, and two big changes:

    First, it eliminates 'pops' in the audio caused by abrupt volume or sample changes, including by looking ahead one row to see which notes will be ending during that time period and can then be quickly faded out before a new note begins.

    The other big change has to do with aliasing. These are 'phantom' high frequencies that can result from converting from a low sampling rate to a higher one. For instance, a sample in a Mod file playing at C-5 (8287Hz) being mixed into the 44100Hz audio output stream. The simple way to do this conversion is to use an index into the instrument sample that has fraction bits (12 bits in my case) and which increases after each sample by a rate proportional to the note's pitch. (I believe the term for this is Phase Accumulator.) 44100 / 8287 = 5.32 which means each instrument sample is going to repeat in the output 5 or 6 times. The value added to the phase accumulator each time would be the reciprocal of that, shifted left 12 bits for the sake of the fixed-point math: 769. I'll call this the phase increment.

    Duplicating the same sample in the output those 5-6 times creates an ugly stair-step waveform which is a direct cause of phantom frequencies.

    In practice, this is what I saw coming out of the 32X:

    Seeing how bad it was in visual terms bothered me enough to reconsider doing something about it :)

    A simple low-pass filter will smooth things out and block some of the aliasing. The more agressive the filtering, the less aliasing that will be heard, however desirable high frequencies are also lost and notes that are low enough are still distorted. Not good.

    I didn't want to go with linear interpolation because it requires fetching 2 (at least) samples from the instrument data. In the context of a Mod player where you have to be mindful of loop begin- and end-points it seemed like too much of a hassle for something that I would like to run on older CPUs (like a 23MHz SH2). Instead, I came up with something workable that uses adjustable low-pass filters.

    My earlier attempt at a (fixed) low-pass filter looked like this: Output = ((New - Previous) * X) + Previous

    If X is one half, then this is the same as averaging each newly calculated value with the last output value. Using a different ratio for X alters the frequency response. What I did is replace X with the phase increment. So for each channel, as long as the phase increment is less than 1.0 (or 4096 after being shifted) then I have Output = (((New - Previous) * PI) shr 12) + Previous

    2021 Mar 13 Toadroar revisions and NOWUT version 0.25 release

    Running my FPGA CPU design through some more elaborate tests has revealed problems. For instance, instruction fetch waitstates caused instructions to be skipped, a few instructions operated on the wrong data, and no consideration was given to where bytes would be presented on the bus when accessing odd addresses. So I've been busy redesigning it while also tweaking the instruction set to suit the implementation details. I have plans to add interrupts and an instruction cache later.

    NOWUT 0.25 is here, with bug fixes in the x86 assembler and elsewhere, and two new IF statements.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2021 Feb 20 QMTECH again

    Got all three colors hooked up. Added a Toadroar CPU into the mix which can execute code from internal FPGA memory and write to SDRAM. I also have an assembler adapted from an old MultiNO and a utility to convert to text-based (argh!) MIF (memory initialization file) format used by Quartus. Using the MIF means I can reload the FPGA with updated code without having to go through the entire compilation process in Quartus (which takes the better part of a minute otherwise).

    Then I used a PLL to pump up the clock speed to 80MHz. When I did that, the inverted clock being sent to the SDRAM (as seen in my last project upload) became inadequate, and the first out of every eight pixels was showing garbage data on the screen. Setting up a separate PLL output to create another 80MHz clock at 90 degrees out-of-phase fixed this.

    2021 Feb 13 more experiments on QMTECH Cyclone IV Core Board

    I revised my CRT controller, made a basic SDRAM controller with the help of the W9825G6KH datasheet, and then connected them together in an outer module. Now I can view uninitialized garbage data from the SDRAM on the screen, press a button to draw some black stripes in it, or press the other button to scroll down.

    The garbage data at power-on is itself a bit curious. There are tightly repeating patterns, and after scrolling down for a while (256K words I think it is...) the pattern changes completely.

    future plans:

  • add in a CPU core to push gfx data around
  • try boosting the memory clock / resolution / color depth
  • add a GPU that draws triangles?
  • add texture mapping?
  • This is the complete Quartus project.

    2021 Feb 4 another FPGA toy

    This is a nice bang-for-the-buck Cyclone IV project board featuring 15K LEs. Sadly, it is devoid of connectors other than the rows of unpopulated solder pads, and includes only one LED and two buttons for general purpose use. However, it does boast 32MB of SDRAM !

    Having already experimented with the CPU, serial, and audio cores on the Storm_I board, video was the next thing on my agenda. I decided to start out with some 15KHz RGB, using my NEC CM-1991a monitor. My reasoning was that any VGA monitor since the mid '90s is likely to show a blank screen or an error message if the incoming signal is in any way defective, whereas I know the old NEC CRT will show something on the screen even if it is all garbage. Plus, it is already there sitting just a few feet away, with a dodgy custom RGB cable hanging out of it that I used to test an Amiga Firecracker 24 board.

    I read on another page the idea of using a 270 ohm resistor between the 3.3V FPGA output and the video input, to get something approximating the right voltage (assuming 75 ohm load in the monitor). I didn't have a 270 so I used a 330. Viewing the output signal (red, that is) with a scope showed a ~.5V DC offset (with ~1V peak) and I have to say I don't know why that was there but turning down the brightness on the monitor effectively removed it. I used 100 ohm resistors on the h-sync and v-sync.

    I divided 50MHz by ten, yielding roughly 240 pixels horizontally, and created a couple of extra intensity levels by switching the output off early during one pixel.

    Oddly, the camera saw the red bars as orange. I even mucked with the white balance and tint in DPP before saving the JPEG in an attempt to make it more red, which is certainly how it looked to the naked eye. But the brightest red bar still looks orange. *shrugs*

    This is the verilog code.

    2020 Dec 27 SetDIBitsToDevice

    Using GDI to transfer an image from memory to the screen is sure to be much slower than a hardware-accelerated OpenGL call, but sometimes it's good enough. Or, in the case of retro machines, it may be the only option available.

    The data in memory which is passed to SetDIBitsToDevice (or StretchDIBits) is nearly the same as a Windows BMP file, including the usual 1/4/8/24-bit color depth options, and it turns out, the lesser-known 16 and 32-bit formats.

    But what effect does the color depth have on performance? I setup a simple benchmark to compare the speed of a 24-bit buffer with a 32-bit buffer. It drew some rectangles and text into the memory buffer and then used GDI to repaint the 768x576 window. Just to be clear, the CPU time used to update the memory buffer is almost insignificant compared to the GDI call, so this really is a benchmark of SetDIBitsToDevice. Result: the 32-bit buffer was faster on every test setup.

  • (frames per second at 24-bit -- and at 32-bit)
  • 83MHz Pentium Overdrive in a VLB board, Win95: 3fps -- 4fps
  • 600MHz Pentium M with VESA video driver, WinNT 3.51: 12fps -- 19fps
  • 600MHz Pentium M with Mobility Radeon 7500, WinXP: 22fps -- 57fps
  • 2.5GHz Athlon 64 X2 with Radeon HD 4550, Win2K: 120fps -- 274fps
  • 3.6GHz Athlon II X2 with GTX 650Ti, Win2K: 154fps -- 406fps
  • 2020 Oct 21 NOWUT version 0.24 release

    This update adds an i386 Linux build (tested on Ubuntu 10 running inside Virtual PC 2007 on Windows 2000), elf386 option for LINKBIN, plus minor tweaks and fixes.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2020 Oct 21 Music and Computing

    The music player I use most often is Winamp, because it supports many formats via plugins, and generally just does the job reliably without drama.

    The Pentium 1 I was using at the time that I first downloaded the shareware version of Winamp would struggle a bit to decode MP3s and do other things at the same time. That is, if I even bothered to boot beyond a DOS prompt. By the time I had upgraded to a Celeron, the program had become freeware. But I guess I never saw the need to replace the version I already had installed with a different one.

    This week I happened to click on this 'shareware' tab which I don't believe I have ever clicked on before. It displayed the following usage statistics:

    186001 tracks played
    executed 2688 times
    522759 minutes total in 8037 days

    The number of days had me scratching my head at first because it seemed like too many. This Winamp is dated Oct 1999, which isn't quite that long ago. But I also noticed a playlist file hanging around with a date of Nov 1998, so I've hypothesized that I had a previous version at that time which was then overwritten by this version (2.24) and the count continued uninterrupted.

    2020 Aug 15 MBFAST update and 3D progs on Win2000

    I just tried to load an STL file into MBFAST and was greeted by a crash. :( That bug has been fixed and a replacement MBFAST 0.93b is now here to download.

    The particular STL file was generated from MakeHuman v1.1.1 running on Win2000.

    I had been thinking about making a program like this, but I checked first to see if a similar program already existed. I found this. It's pretty comprehensive, even if some of the parameters seem redundant or counterintuitive. What I found strange about it is the awkward shoulders-back, chest-out, butt-out posture that most of the included poses exhibited. Maybe this would be natural for soldiers standing at attention but looks odd otherwise.

    A flat-shaded rendering of an exported STL file betrays the limited triangle count of the models (34,000 for this model), which is otherwise obscured by more elaborate shading.

    Many people use this program along with Blender. I decided to try it out

    Blender v2.76b runs in Windows 2000, but keyboard input doesn't work at all. I guess this is because it uses SDL which in turn tries to use the (non-existant) Raw-input API. I have seen at least one other program with this problem.

    2020 Aug 5 IMGTOOL version 0.96 release

    Made some bug fixes and added options for palette/colorspace and for tile/pixel editing. Download the new archive or check the documentation.

    2020 Jul 18 FPGA core for chiptunes

    I now have a working design for a "sound chip" running in the Cyclone 1. It currently stands at 1243 4-LUTs and 6K bits of memory for the main part, plus 218 4-LUTs for the I/O. I would describe it as a cross between a Hu6280 and an SCSP.

  • Hu6280 has 6 channels with user-defined waveforms, 5-bit volume, and stereo panning. One channel can be repurposed as an LFO
  • SCSP has 32 channels with PCM samples, 32 LFOs, FM-synth-style envelopes and modulation, and a DSP.
  • My design, which shall be known as Tsynth unless I come up with a better name, has 16 channels with 4 to 8 user-defined waveforms, panning, the envelopes/modulation, and two channels that can be LFOs.
  • There's a noise generator too! I tried to keep register bloat down, and stuffed everything into 4x 16-bit registers per channel. Then there are 4x global registers, and a large area to contain some 64-byte waveform patterns.

    $0x - panning and volume
            bits 15-11 = Right volume             (0=off, 31=max)
            bits 10-6 = Left volume               (0=off, 31=max)
            bits 5-0 = operator attenuation (0 = 0dB, 63 = -47.25dB)
    $1x - octave and frequency
            bits 15-12 = octave
            bits 11-0 = frequency
    $2x - waveform and modulation
            bit 15 = LFO select
            bit 14-12 = LFO sensitivity (0=none)
            bit 11 = invert waveform
            bits 10-8 = wave pattern number
            bit 7 = noise enable
            bits 6-4 = self-modulation magnitude (0=none)
            bits 3-2 = second modulator  \  0=self                   2=operator before that
            bits 1-0 = first modulator   /  1=previous operator      3=operator before THAT
    $3x - envelope
            bits 15-12 = attack (0 disables envelope)
            bits 11-8 = sustain level (15 causes EG to skip right to decay2, use to do key-off)
            bits 7-4 = decay1
            bits 3-0 = decay2

    Using LFO select/sensitivity, channels 0 or 1 can be used for a vibrato effect. For FM synthesis (phase modulation to be exact) each channel can be a modulator, a carrier, or both simultaneously. The choice of modulators is limited to self-feedback or any of the three previously numbered channels. This is sufficient to handle any 4-op algorithm at the least.

    Envelope parameters are packed into one register by keeping everything to 4-bits, and omitting a separate release rate. When you do 'key off' you can just write the register again with your release rate replacing the old decay2. Envelopes are ignored when the attack rate is set to zero, for PSG-style operation.

    misc register $00 - noise control
                            bits 6-4 = feedback bit
                            bits 2-0 = output bit
    misc register $01 - noise period
    misc register $02 - phase reset / 'key on'
    misc register $03 - envelope time constant (0-31)

    The noise generator is a 20-bit linear feedback shift register (LFSR) with selectable output and tap bits. Writing to the noise control register also initializes the LFSR to all ones. When the noise bit is set for any sound channel, the channel's output will invert when the LFSR output bit is one.

    The FPGA core uses 10 cycles to calculate everything for each channel, while five cycles overlap between adjacent channels. So there are 16 channels times 5 cycles = 80 cycles to output one sample. (The key-on/phase reset register can't be written more than once within 80 cycles.) Since the Storm FPGA board has a 50MHz clock, that's what I have run it at, although it doesn't need to be this high. 20MHz would be cool. Below that, envelopes might be too slow without further modification.

    To test out this project, I used two sigma-delta modulators and an RS-232 interface (contained in STSYNTH.V) to set the sound registers from my PC and output audio through the FPGA board's PS/2 port. The serial line is a bit slow for this purpose, hence my attempt to play some YM-2203 music doesn't keep time. Sounds like FM though, right?

    audio clip download

    2020 Jul 10 re: Making a (bootable) Saturn ROM cart

    I finally succeeded in getting the Saturn to boot from cartridge ROM with an ATF750 in place of the original PAL. In the end there really isn't much to it. A25 is high during access to the $2000000 address window (duh?) so enabling the ROM when A25 is high and the /AWR0 line is high was all it needed. A19-A24 could be ignored.

    The cartridge slot also has a ReaD signal on it, which is probably just as good for this purpose as using the WRite signal (with inverted logic) but it's not actually connected on this PCB so I haven't tested it.

    2020 Jun 23 DeHunk 0.96 release

    I encountered an Amiga executable with multiple data hunks. The old DeHunk only showed one of them. It also failed to disassamble an image at base address $000000. Hopefully these issues are now fixed, as well as the MOVEP instruction.

    DeHunk 0.96
    2020 Jun 9 Audio test on the Cyclone 1

    I started making an assembler for Toadroar, but at the moment I have no plan to try making any complex programs due to the lack of RAM available (only ~6KB internal to the FPGA).

    What next then? I considered trying to generate a VGA signal. But with no VGA connector, this would have required making my own cable with several signals and a bunch of resistors to convert voltage levels. I'd rather do a project with less soldering. I also considered NTSC composite video, which would have fewer wires, but what's up with the sync signal? I never realized it was that complicated. Maybe some other time. I settled on sending audio out the PS/2 port instead.

    My first attempt at Sigma-Delta Modulation:

    This is supposed to be a 32-step sawtooth wave (carried by 'audleft' which is 16-bit, signed PCM data). However the signal is clipped, as the FPGA I/O pin seems incapable of driving the (filtered) signal much below 1.0V or above 2.3V, even at the default maximum drive strength of 24mA. The low-pass filter I made uses a 1K resistor and a .01uF ceramic capacitor, since I have a pile of these components. The values are probably not ideal. But lowering the amplitude of the waveform yields an output free of clipping:

    2020 May 24 Toadroar CPU test on the Cyclone 1

    Upon receiving the Storm_I board, I tried out a few obligatory 'hello world' type programs like counters that make the LEDs blink on and off. No problems there, but one thing that I discovered is the 50MHz clock is connected to a pin which doesn't seem to allow using the PLL. So for now I am stuck with 50MHz, though it may not be a problem. This is a -C8 speed grade (slower) Cyclone, and when doing a full compile of Toadroar in Quartus 9.1 (old version is necessary for Cyclone 1 support) the fMAX estimate is right around 50MHz. But since I added capability to my source file for redefining the size of a 'DWORD' to anything from 17-32 bits, I can easily shrink the design down to use fewer LUTs and acheive higher speeds. At the moment I have an 18-bit CPU, with 9x9 multiplier and 16x 18-bit registers all implemented using 1325 4-LUTs. (Cyclone 1 doesn't have DSP blocks, and getting the register file to occupy the memory blocks under Quartus 9.1 is proving to be a challenge.)

    Click here to download the Verilog source. It's amazing what you can do with 500 lines of Verilog. Of course, this is still a barebones design and may still have bugs. Here is the test program I was able to run on the Storm_I board:

    case (iaddrbus)
    8'h00: idatabus = 16'h0009;         // ldi 9
    8'h01: idatabus = 16'hC902;         // mov r0,r2
    8'h02: idatabus = 16'hD200;         // shr 1,r0
    8'h03: idatabus = 16'hC925;         // mov r2,r5
    8'h04: idatabus = 16'h000F;         // ldi 15
    8'h05: idatabus = 16'h8410;         // movz [r0].b,r1      ; read keys
    8'h06: idatabus = 16'hAD12;         // xor r0,r1,r2        ; invert
    8'h07: idatabus = 16'hA832;         // and r3,r2
    8'h08: idatabus = 16'hC913;         // mov r1,r3           ; save old reading
    8'h09: idatabus = 16'hAC25;         // xor r2,r5           ; toggle LEDs on keypress
    8'h0A: idatabus = 16'h9450;         // mov r5,[r0].b
    8'h0B: idatabus = 16'hDBF7;         // bra -9
    
    default: idatabus = 16'b1100_1001_0000_0000;        // nop
    endcase

    It's just a case statement that exists in an outer shell module. The outer module (in stoad.v) feeds the status of keys 1-4 into the input data bus, and sets LEDs 1-4 according to the state of the output data bus buffers.

    The first four instructions of the test program don't do a whole lot, mainly they just cause some bits to have a known state so that when I run this in Icarus Verilog I don't get Xs everywhere. But the initial value of 9 can also be seen on the LEDs when the program is reset. What this program does is continuously read the state of the keys and update the state of the LEDs. When a key is pressed, the corresponding LED turns on or off.

    Before now, I was using Quartus 12.1 to compile this (targetting a Cyclone 4). I had problems initially getting the registers to occupy a memory block. It insisted that the address be present at the clock edge preceeding the reading of data. So I took the address right out of the fetched instruction opcode. This causes the wrong register to be read for instructions that use R0 for one of the source operands. So I broke off R0 into a separate quantity and added logic to choose between reading from R0 or from another register during the decode stage. This succeeded in getting Quartus 12.1 to infer the memory, however it still comes up with a warning about 'adding pass-through logic' even though I have my own pass-through logic and went so far as to specify the "no_rw_check" property. Quartus 9.1 complains about "asynchronous read logic" and does not infer the memory. Without the memory or DSP blocks the number of 4-LUTs for a 32-bit CPU balloons from 1500 to 2300.

    2020 May 21 Storm_I_FPGA

    Since receiving shipments from China is proving to be problematic at the moment, I wasn't able to get the FPGA board that I originally wanted.

    While I have found the selection of FPGA dev boards generally to be underwhelming, with many that are too expensive and others that are quite boring, having at best a pin header and an RS232 port and no other I/O to play around with, the selection becomes all the worse when trying to find something in the western hemisphere.

    The best thing I was able to get my hands on turned out to be the Storm_I_FPGA which is nevertheless Chinese in origin and also unknown to web search engines. It has an EP1C3 and no memory except for a small 24C16. However it does have various switches and LEDs, a buzzer, and a few ports.

    I now have a "USB Blaster" JTAG interface which I needed for some other projects too.

    2020 Apr 28 ED7 hacking (SPOILER WARNING for anyone who cares)

    The Cutting Room Floor has pages up for Trails in the Sky (ED6) and Trails of Cold Steel (ED8) but no love for ED7 :( So I had the urge to do a little save file hacking of Zero no Kiseki PC version.

    The save files are 155,624 bytes long. The last two DWORDs are checksums. To recalculate, first add up all DWORDs (little endian) except those last two. This result will be the first of the checksums. Then subtract that number from 0, and subtract 38904 (decimal) to obtain the second and final checksum.

    Some offsets:

  • $1B2 - Yin's defense stat?
  • $33C - Yin's equipped armor?
  • $DB8 - beginning of inventory list
  • $19C10 - total play time?
  • $1AC04 - party members (4x 16-bit words followed by additional words for support members)
  • values that correspond to party members:

  • 0 = ロイド
  • 1 = エリィ
  • 2 = ティオ
  • 3 = ランディ
  • 4 = ワジ
  • 5 = イン
  • 6 = エステル
  • 7 = ヨシュア
  • 8 = ノエル
  • 9 = ダドリー
  • FF = empty slot
  • 2020 Apr 25 Making a Saturn ROM cart from the ST-Key

    Previously, I posted photos of the PCB from this Saturn cartridge. After that I added sockets and reinstalled the PAL. Then, armed with 10 blank 27C1024 EPROMs, I ran some test programs. Step 1 was to examine the old ST-Key code and compare it to the IP_GNU.BIN used on CDs. The first 3840 bytes looked the same or nearly the same in both, and it contained all of the Sega-related strings and had SH2 code immediately following it. So I took this 3840 bytes and used it as the header for my own ROM.

    Step 2 was to use fully position-independent code to setup a VDP2 screen and display the program counter address so I knew where I was. The ROM code does NOT run from the $2000000 cartridge space. Instead, part of it gets copied to $6002000 and run from there. After accounting for the header, my code was therefore starting at $6002F00.

    Step 3 was to probe the PAL and EPROM pins while reading from the $2000000 cartridge area. I saw that the PAL was driving /OE and /CS low all the time. The address bus also appears to hold the last value indefinitely. Access time is pretty slow at nearly 700ns, but this is configurable by software using the SCU A-bus control registers. Do I even need the PAL? I tried removing it and jumpering /OE and /CS low, but the Saturn failed to boot with that configuration.

    Step 4 was to make a test program that read each of 128x 256KB address ranges from the cartridge and observe /OE and /CS again. This is because the PAL has A19-A25 connected to it, therefore I suspected it was doing some address decoding. I actually didn't test A25 because at the time I was thinking that there was only 32MB total, but in fact it does have enough address lines for 64MB even though there isn't that much room in the address space of the Saturn (various other things are mapped into the $5000000-$6000000 range). In any case, the test showed that /OE and /CS were only low when all the high address bits were 0.

    The PAL also has 4x data bits connected to it and a WRite signal. It's probably possible for a write with a certain address/value to disable the ROM via the PAL. However I'm not really interested in testing this as my goal is just to find out what minimum of logic is needed to have a bootable cart. Next up I will have to try replacing the PAL with an ATF750 and see if I can get it to work with that doing the address decoding. Does it need to go all the way down to 256KB? Or would it be enough to just check A25?

    2020 Apr 21 NOWUT version 0.23 release

    Minor update with a couple of improvements and a couple of fixes.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2020 Apr 19 preliminary spec for a load-store CPU: Toadroar

    While I haven't been able to do so much as a blinkin' LED hello-world program yet due to waiting on hardware, my CPU design has proceeded anyway. Toadroar (rhymes with load-store) is a pipelined design that uses operand-forwarding to execute any instruction in a single cycle except for branch, multiply, or memory read. It uses 16-bit fixed-length instructions but it can load a signed 15-bit integer into R0 with one instruction. Two loads in a row yield a 30-bit value. (Half the opcode space is devoted to this ability.) Hence, R0 is available as an optional third operand for address calculations and ALU instructions. General purpose registers range from R0 to R15 and there is a separate 31-bit PC.

    Several instructions modify a carry flag and a zero flag, including separate signed and unsigned CoMPares. The flags are used for conditional MOVes and branches. MOVes from memory have the option for zero-extension or sign-extension, and a two bit field for operand size allowing expansion beyond 32 bits. MOVes that write memory can pre-decrement or post-increment. Shifts can do up to 16 positions with one instruction.

    Currently I have code that compiles and can run a 10-line test program in a simulator. But there are still a few things missing like interrupts, saving the status register, division, or any memory controller. Using two small FPGA memory blocks for the register file and DSP units for multiplication, the rest of the design is hovering around 1500 4-LUTs.

    current opcode list:
    
    ldi imm               ; 0iii iiii iiii iiii        r0 is implicit
    
    ; operand1 , [operand2] , destination
    
    movz [rx+r0],ry       ; 1000 00ss yyyy xxxx        ss - 0=byte, 1=word, 2=dword
    movz [rx],ry          ; 1000 01ss yyyy xxxx
    movs [rx+r0],ry       ; 1000 10ss yyyy xxxx
    movs [rx],ry          ; 1000 11ss yyyy xxxx
    mov ry,[rx+r0]        ; 1001 00ss yyyy xxxx
    mov ry,[rx]           ; 1001 01ss yyyy xxxx
    mov ry,[rx+]          ; 1001 10ss yyyy xxxx        post-increment
    mov ry,[rx-]          ; 1001 11ss yyyy xxxx        pre-decrement
    
    add ry,rx             ; 1010 0000 yyyy xxxx
    add r0,ry,rx          ; 1010 0001 yyyy xxxx
    adc ry,rx             ; 1010 0010 yyyy xxxx
    adc r0,ry,rx          ; 1010 0011 yyyy xxxx
    rsb ry,rx             ; 1010 0100 yyyy xxxx        rx=ry-rx (reverse subtraction)
    rsb r0,ry,rx          ; 1010 0101 yyyy xxxx        rx=ry-r0
    and ry,rx             ; 1010 1000 yyyy xxxx
    and r0,ry,rx          ; 1010 1001 yyyy xxxx
    or ry,rx              ; 1010 1010 yyyy xxxx
    or r0,ry,rx           ; 1010 1011 yyyy xxxx
    xor ry,rx             ; 1010 1100 yyyy xxxx
    xor r0,ry,rx          ; 1010 1101 yyyy xxxx
    
    mul rx,ry             ; 1011 0000 yyyy xxxx        16*16->32 or 32*32->32 or something in between?
    mul r0,rx,ry          ; 1011 0001 yyyy xxxx
    sub rx,ry             ; 1011 0100 yyyy xxxx
    sub r0,rx,ry          ; 1011 0101 yyyy xxxx
    sbb rx,ry             ; 1011 0110 yyyy xxxx        ry=ry-rx
    sbb r0,rx,ry          ; 1011 0111 yyyy xxxx        ry=r0-rx
    
    cmpu ry,rx            ; 1100 0000 yyyy xxxx
    cmps ry,rx            ; 1100 0001 yyyy xxxx
    swapb ry,rx           ; 1100 0010 yyyy xxxx
    swapw ry,rx           ; 1100 0110 yyyy xxxx
    mov ry,rx             ; 1100 1001 yyyy xxxx
    nop                   ; 1100 1001 ???? ????        move a register to itself = NOP
    tst ry,rx             ; 1100 1000 yyyy xxxx
    movcc ry,rx           ; 1100 0011 yyyy xxxx        \
    movcs ry,rx           ; 1100 0111 yyyy xxxx         \ conditional MOVes
    movne ry,rx           ; 1100 1011 yyyy xxxx         /
    moveq ry,rx           ; 1100 1111 yyyy xxxx        /
    
    shl n,rx              ; 1101 0000 nnnn xxxx        shift n+1 times
    shr n,rx              ; 1101 0010 nnnn xxxx
    sar n,rx              ; 1101 0011 nnnn xxxx
    rol rx                ; 1101 0001 0000 xxxx
    ror rx                ; 1101 0001 0001 xxxx
    extub rx              ; 1101 0100 0000 xxxx        zero-extend byte
    extsb rx              ; 1101 0100 0001 xxxx        sign-extend byte
    add n,rx              ; 1101 0110 nnnn xxxx        add 1-16
    sub n,rx              ; 1101 0111 nnnn xxxx        sub 1-16
    bra disp10            ; 1101 10dd dddd dddd
    jmp rx                ; 1101 1100 0000 xxxx
    bra rx                ; 1101 1100 0001 xxxx
    jsr rx                ; 1101 1100 0010 xxxx
    bsr rx                ; 1101 1100 0011 xxxx
    
    bcc disp10            ; 1111 00dd dddd dddd
    bcs disp10            ; 1111 01dd dddd dddd
    bne disp10            ; 1111 10dd dddd dddd
    beq disp10            ; 1111 11dd dddd dddd
    
    2020 Apr 6 Fun with Verilog

    A couple of months ago I wanted to figure out how to use CPLDs to simplify some potential retro-computing hardware projects. This led to experimentation with WinCUPL and some ATF750/ATF1504 parts. What can you do with 20 or 64 flip-flops and 22 or 32 pins? Plenty.

    But this line of thought inevitably leads to wondering what can be done with even more chip resources. How about a Tiny CPU for instance?

    What if you had thousands or a million flip-flops like modern FPGAs do?

    I've been aware of various retro-computing related FPGA projects like One Chip MSX and Minimig. The possibilities are nearly endless. So I had to see if I could do this too.

    I knew that old versions of Quartus II Web-edition would work in Windows 2000. I downloaded one back in 2003 or so, but never did much with it at the time. With the Win2000 Extended Kernel, version 12.1sp1 will run. I did a few test compiles of projects that I found online.

    After a brief comparison of VHDL and Verilog I determined that the former looks more like the C programming language, which is the bane of my existance. Therefore I jumped into Verilog. (Icarus Verilog 0.9.7 works in Win2000, although GTKWave appears to get stuck in an infinite loop.)

    While waiting for another JTAG interface that I ordered, after the first was defective, and a cheap FPGA dev board, I set out to create my first project. I decided to design a CPU. Go big or go home!

    I mocked up an instruction set first, then started writing code, revisiting the opcodes here and there to make changes. After I got enough written that I thought most of the functionality was there, and working through various compiler errors, I achieved a design that used 0 LUTs. Much clever, so efficiency ;)

    There were some logic errors that were causing it remain in an eternal pipeline stall / reset state. So the compiler had helpfully optimized away that part of the circuit (all of it). After fixing this and getting a real compile, I was at 2000-2200 LUTs. Yay! Kind of bloated though isn't it?

    My original plan had been to use a register file that allowed two reads and one write per cycle, but towards the end I realized there was an instruction that required three reads. So I said what the heck, and put it in. But after I succeeded at compiling, I went back and took out this naughty instruction. This resulted in Quartus moving registers to memory blocks and cut the number of LUTs in half. Whoa.

    This design doesn't actually work yet, there are still big problems that I know about and more that I don't. And I have no hardware to run it on. So it remains a work in progress, and I'm not entirely sure what I even need a custom CPU in an FPGA for in the first place.

    2020 Apr 6 Windows 2000 registry

    The Windows registry is kind of a disaster and probably not one of my favorite things about Windows. A lot of it falls under the "ignorance is bliss" category but there are also some important settings which are otherwise inaccessible and benefit from being tweaked. Meanwhile, far too many programs pollute the registry with useless entries that don't need to be there. Or if they are particularly bad (like the Yandex installer) they might go through and change/create an association for every file extension known to man without even asking :/

    A bloated registry wastes RAM and CPU cycles. (Windows 2000 also can fail to boot if the SYSTEM hive becomes too big!) Whereas a corrupt registry or one with incorrect settings can also cause no-boot or other problems. Given these hazards, it can be useful to backup the registry, edit it from a separate OS, or move data between registries.

    Backing up or restoring the registry: The files are mostly located in SYSTEM32\CONFIG. The "user" portion is contained in NTUSER.DAT under one of the DOCUME~1 subdirectories. Of course, while the OS is running it doesn't let you copy the files, requiring a third-party program to work around this or being able to boot another OS that can access the relevant partition. (They are all short filenames so use FAT32 and then you can boot Win98 DOS or FreeDOS on the same partition. Just a suggestion.)

    Windows 2000 has two registry editors. REGEDIT and REGEDT32. The latter has the capability to open arbitrary registry files that you may have backed up or copied from another computer. Use the "load hive" option to do this. You have to provide a name. Don't use one of the normal names like "HARDWARE" or some conflict might occur (if the program even allows this, I'm not inclined to try it). I also recommend unloading the hive as soon as you're done.

    What if you need to copy registry keys between your running system and another registry file? REGEDIT has the ability to import data from a text file or export it. REGEDT32 doesn't. However, if you run REGEDT32 first and use "load hive" then you can run REGEDIT and it will be able to access the additional tree.

    I used this capability to compare installed fonts between two computers and export "before" and "after" copies of the font keys as I added fonts from one to the other.

    2020 Mar 7 MBFAST 0.93 release and re: SB600 SATA driver

    Here is a minor update to my 3D model editor written in FreeBASIC. It's still primitive as far as 3D applications go, but might be the only program with a DOS build that can open .STL files, so there's that... MBFAST 0.93 (Win32 build is also included of course)

    I think the SB600 SATA driver that I mentioned as potentially useful for Windows 2000 causes a blue screen on shutdown sometimes. Maybe not worth the trouble.

    2020 Feb 25 NOWUT version 0.22 release, and DeHunk 0.95

    My compiler was taking a long time to compile itself on old stuff. Three minutes per pass on a 68000, and double that on a 286 that has to fart around with segment registers all the time. I wasn't really impressed by this so I put some more time into optimizing the generated code. This shrunk the code size a bit but produced no speed gains. Then I split the symbol list into two parts to reduce the search space. This cut the time by 25% or so, which is something, but I'll continue to look for improvements here.

    The main motivation for this release though is improvements to the 68000 assembler. Missing addressing modes have finally been added so that MULTINO can be used to re-assemble the output of my new 68000 disassembler- DeHunk. DeHunk automatically labels branch targets and data references. It can disassemble itself and the output, when fed directly into MULTINO, produces a working duplicate of DeHunk again. It supports 68000 and 68010 instructions and is able to distinguish data areas that are mixed in with code (but this can be disabled if it fails for some reason).

    In addition to Amiga Hunks, it now supports Human68K executables (even though I have never felt the need to reverse engineer one, since I have way better documentation for X68 than I do for Amiga), and ROM images or miscellaneous 68000 binaries (but these can't be reassembled yet).

    NOWUT 0.22 DeHunk 0.95
    2020 Feb 4 Sega Saturn cartridge "ST-Key"

    The small IC was a PAL, the big one an Atmel 27C1024

    2020 Feb 3 ATI/AMD SB600 SATA driver in Windows 2000

    I rarely use optical media, but recently I burned a DVD+R and noticed that drive performance was at a lowly 1.6MB/s. Not only during writing, but during verify as well. It also caused high CPU usage on one core. I can't remember if it has always been like this (?)

    Sounds like it must be stuck in PIO mode right? That was a common problem on old PCI IDE controllers. Except this drive is connected via SATA so there are no PIO/DMA settings in the BIOS or Windows device manager to mess with. The main HDD was also connected to the motherboard SATA and running fine so the cause of DVD slowness was not clear.

    The IDE interface was listed in device manager as "ATI IDE Controller" with a date of 2006/01/22 although it used the stock Microsoft driver files. Meanwhile the SATA interface was listed as "Standard Dual Channel PCI IDE Controller" with 2003 Microsoft driver.

    I went looking around for a different driver and the best thing I could find was the 13-4_xp32-64_raid.exe from AMD's website. It doesn't specifically mention SB600 on the site (although if you pick 690 chipset it will lead to this file, and 690 was always paired with SB600 according to wikipedia) but it does have an SB600 or SB6xx directory buried in its contents. The .INF file contained various device IDs that matched my vendor and device but not the subsystem. Mine is VEN_1002&DEV_4380&SUBSYS_43801002. I added this ID to the .INF and gave it a try.

    After rebooting, "Standard Dual Channel PCI IDE Controller" is gone and "AMD AHCI Compatible RAID Controller" has appeared. And the DVD drive can hit 10+MB/s like it should.

    2020 Jan 23 Miscellaneous

    Here is an update to ISC. It can load and save config now, play up to four notes simultaneously, and has two window size options.

    re: 2019/11/19 BIOS hacking - I tested an Athlon II X2 280 and it is fine for single power plane motherboards.

    re: 2019/6/1 Windows 2000 - It appears that my recommended sequence for installing Win2k updates/extensions can fail with the latest versions of Extended Kernel. An older one has to be installed first (eg. V28?) for the newer one to succeed?

    2020 Jan 12 ISC - ImaginarySoundChip

    After reading more about FM synthesizer chips, I realized that the most famous ones like Yamaha OPM, OPNA, and OPN2 were actually lacking some features that could have produced interesting results in a more advanced chip. Of course, PCM sample-based synthesis largely took over after these FM synths ran their course and there was little attention paid to pushing the boundaries of parameter-based synthesis after that.

    Comparison of some sound generators:
  • SN76489 - has square waves and a noise generator
  • YM2149 - has square waves, noise generator, and an envelope generator
  • Famicom sound - has square waves with multiple duty cycles, triangle wave, and noise generator
  • Konami SCC - has user-defined waveforms
  • Hu6280 - has user-defined waveforms, noise generator, stereo output with variable panning, and one channel can act as an LFO
  • YM2151 (OPM) - 4x sine wave operators per channel, ADSR envelope, 8 algorithms, integer frequency ratios (or one half) and detune, 1x LFO with per-channel sensitivity and per-operator AM enable, output to left/right or both, noise generator on channel 8
  • YM2203/YM2612 (OPN/OPN2) - similar capabilities to YM2151 except one channel can use arbitrary frequency ratios, and there is an additional "SSG-type" envelope
  • OPL3 - has 8 waveforms instead of just sine waves, but lacks several algorithms from OPM/OPN, and has fewer bits to configure envelope/LFO
  • The Yamaha DX100 synthesizer (keyboard) has similar capabilities to OPM/OPN but with additional frequency ratios. The DX11 also has 8 waveforms.
  • And let us not forget the SID chip with its tunable analog filters, etc. (I would explain it better except I did forget)
  • The Sega Custom Sound Processor has 32 independently-controlled operators with PCM playback, 32 LFOs, envelope generators, panning, a DSP, and its own memory bus.

    I decided to make a program to experiment with adding different features to a 4-op FM synth. So far I have tried to emulate the base features of an OPM/OPN (no doubt there are some inaccuracies) and also put in 10 waveforms and a lame noise generator. It uses the mouse to adjust settings and keyboard to play notes (only one at a time for now).

    download it here

  • 2019 Nov 23 NOWUT version 0.21 release

    Fixed a few bugs, cleaned up some code, and added logic to optimize the generated code a little more. With the optimizations canceling out the added code, the compiler size remains approximately the same. 8086 version still weighs in just below the 64KB limit :)

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2019 Nov 19 pro-tip

    I saw a few posts around where people had problems with audio not working on Windows 2000 after installing Extended Kernel and/or Unofficial Service Pack 5.1. My advice is to check the date/version of SYSTEM32\WDMAUD.DRV because it may get replaced by an older version (the Realtek/etc. audio drivers should have the correct version) and then the sysaudio service can't start.

    2019 Nov 19 BIOS hacking

    So in 2008 I got this motherboard, a BIOSTAR A770-A2+. At the time I also got an Athlon X2 4850e for $50 which was plenty fast for anything I could throw at it, and a nice improvement over my previous Athlon XP 2500+. But the board was advertised as socket AM2+ which meant it could be upgraded further in future. Except the official CPU support list never grew beyond 65nm Phenoms and BIOS updates never came.

    Eventually I became curious. If a Phenom works, then why not an Athlon X2 7xxx? These promised to be somewhat faster than the 4850e, but like the Phenom they had much higher power consumption which I wasn't crazy about. But as 2015 rolled around, the prices on these old chips had dropped enough that I didn't mind buying a few just to experiment with.

    First I tried an Athlon 7850 BE (2.8GHz, with 2MB L3). This worked with no problems. I see no reason why it would not have been added to the motherboard manual except for laziness. But as previously mentioned, the power consumption of this CPU is not the greatest, whereas the 45nm CPUs could potentially offer much more efficiency. So next I tried an Athlon II X2 B26 (3.2GHz, 1MB L2 per core, no L3). It came up as "AMD Processor Model Unknown" but it worked! I did notice that HyperTransport speed would only go up to 1.6GHz which was lower than what I thought it should be (and also slightly lower than with the Athlon 7850), but it performed well enough anyway. My curiosity was satisfied for a time.

    Later someone gave me a Phenom II X2 560 BE (3.3GHz, 6MB L3). This one has L3 and an unlocked multiplier so it might be faster, I thought. I tried it out and the system booted at only 800MHz. By this time I had already run across the AMD BIOS and Kernel Developer's Guide, and had begun writing a DOS program to tweak CPU speeds and voltages. (As of now I never finished the utility for K10 / family 10h CPUs, but utilities for K7/K8 and some Intel CPUs are elsewhere on my site.) So I looked at the P-State register contents and saw that everything was flagged disabled except P0 which was 800MHz. I was able to raise the speed by using MSR $C0010070 COFVID Control. Then I did the same thing under Windows using the DirectNT driver and ran some benchmarks. Results showed that the Phenom X2 was up to 16% faster than the Athlon II at the same clock speed. Not a huge difference, and maybe not worth trying to deal with weirdness from the BIOS.

    But why did one CPU have problems that the other one didn't? Because of the L3? I tried an Athlon X4 at 3.2GHz and that had no issues. Then I tried a Phenom II 945 (3.0GHz), no probs there either. I figured the BIOS couldn't deal with clock speeds above 3.2GHz.

    More time passed and I decided to revisit this question. Searching online, I learned about a utility called CBROM which can extract or insert modules into a Phoenix/Award BIOS. One of the modules contained in the BIOS is AGESACPU.ROM which apparently comes from AMD and is related to CPU support. Also contained in the BIOS are microcode updates for some of the processors. An older version of CBROM, 1.55, can insert modules and update the BIOS image's checksum (which seems to be located at $EFFFE). This can change the order of modules though which can result in a non-working BIOS, because a few modules (ie. the uncompressed MEMINIT, HT.DLL, HT32GATE) apparently have entry points that are hard-coded somewhere else. Either that or they run from L2 cache and need to be aligned so as not to be ejected as L2 victims (there is a note about this in the manual). There is a newer CBROM 1.98 but when this one is used to insert a module it will attempt to update the entry points for the aforementioned modules, however this always seems to result in a non-booting BIOS. Therefore, it's important to use CBROM 1.55 to add modules, as well as employing other measures to avoid changing the location of those modules (to be explained below).

    IMPORTANT: because of the high likelihood of bricking your motherboard with a modified BIOS, it's wise to have some external means of flashing the chip to recover it. I bought a CH341a-based USB device which can flash SPI/I2C chips using the AsProgrammer software. I had to reflash my BIOS several times (a Winbond 25X80AV).

    What is a module anyway? My BIOS image is 1MB. The first 128KB contains fixed, uncompressed code/data. The last 128KB does also. In between is where modules are located, one after another, and they are essentially LHA a.k.a. LZH compressed archives. You can see the string -lh5- in the header. Modules which are uncompressed show -lh0- instead. The LZH file header spec can be found on the web.

    The first question becomes, how can we replace an existing module with one of a different size without causing MEMINIT to move? A couple of forum posts on the web described a strategy of using a "dummy" module that has the same size as the one you want to replace. My initial attempts at doing this failed (possibly for a different reason) and so I came up with something else. Both methods are dependant on having enough free space in the BIOS image to contain extra modules but mine is nearly half empty... So what I found is that the filename of the module is not important but rather the "Item-Name" is what determines which modules fulfill which purpose. And that item-name is associated with a word value hidden in the date field of the LZH header.

    known values for item names
    02 40 = splash screen
    03 40 = ACPI
    0E 40 = ygroup
    0F 40 = MIB
    13 40 = OEM1
    14 40 = OEM2
    
    5D 40 = BIOSF0
    67 40 = GV3
    69 40 = MINIT
    77 40 = OEM4
    7A 40 = HTINIT
    7C 40 = 2 PE32 in MB (?)
    7F 40 = xgroup
    86 40 = PCI ROM A
    87 40 = PCI ROM B
    B5 40 = SETUP0
    B7 40 = SMI32
    00 50 = BIOS

    My strategy is to find the old AGESACPU.ROM which I wanted to replace, and change its module type to an unused one such as OEM2. Changing that byte necessitates changing the header checksum byte as well (the one before the first hyphen). Then I used CBROM 1.55 to add a new version of AGESACPU.ROM.

    There are many versions of AGESA. I first one I tried to add was 3.7.x and it did not work. But 3.3.x and 3.5.x do work, and correctly identify the later CPUs. However the issue of booting at 800MHz is not solved.

    I also replaced a chunk of microcode for an old 90nm CPU that existed in my BIOS with a Phenom II microcode patch obtained from a newer BIOS (from a different board). The way to find the microcode is by looking for an NCPUCODE string toward the end of the ROM image, and then look at the data BEFORE that. It's a series of 2KB chunks. They have CPU ID numbers like $1043 (Phenom II rev C) which correspond to bytes 0 and 2 of the model/family info retrieved by the CPUID instruction. CBROM can display the microcode info, although it seems to have problems with BIOSs that are too new or have too many microcodes.

    I disassembled some of the AGESA code and part of the BIOS code in the last 128KB to try and figure out what is going on. (I'll note that it looks like hand-coded assembly although not the tidiest. It has likely been created and revised by many people.)

    There are three different sets of registers used to probe and configure the Phenoms. One set is readable using the CPUID instruction. One is MSRs. And finally there are some which are accessed through port I/O at $3F8 and $3FC. This is all explained in AMD's manual.

    I spent a lot of time searching for code that sets up P-State values but didn't find it. I'm not sure that these values are even determined by the BIOS, but rather they may be defaults that exist in the CPU and are loaded automatically at power-on. (I did find an AGESA routine that calculates CPU power requirements. And there was a bug in the part translated VIDs into numeric voltages. For the low voltage range which is never used anyway, it was subtracting .125V per step from a base of .7625V, whereas it should have been subtracting .0125V)

    Although I can set the Phenom X2 560 to any clock speed I want after booting, the time stamp counter runs at 800MHz and this seems to cause timing issues in some games and wrong values being reported in diagnostic software. At power-on the TSC runs at the same frequency as the CPU-northbridge, but toggling a bit in MSR $C0010015 locks it to whatever speed the CPU is running at that time, hence 800MHz. I found a routine in AGESA that toggles that bit and I inserted code to fix the P-States first. This did result in the system booting at 3.3GHz, but didn't fix the TSC. I searched the BIOS code also, and found another routine that hit the TSC control bit. But my attempt to insert code into the BIOS failed, as it broke something else and resulted in crashing during POST. I had also tried patching the BIOS (the last 128KB section that is) to set the CPU-northbridge to 1800 or 2000MHz instead of 1600 but this failed too.

    It seems that basically there were two elements distinguishing socket AM2+ from AM2. One was faster HyperTransport, which this board has, and the other is dual power planes, which this board doesn't. I'm not sure what the importance of the dual power planes is, but there is a CPU register which tells the BIOS what CPU-northbridge speed and voltage to use in single plane systems, and this is where the 1600MHz setting comes from. But there is also a bit which flags CPUs that aren't meant to run with single plane at all, and the Phenom II 560 has this bit set, which I was not expecting. As such, my theory is that the CPU can detect it is in a single plane board and it then loads P-State 0 with an 800MHz setting automatically. I searched for a routine in the BIOS that checked the dual-plane-only bit and found none. Which makes sense, considering that the bit only appeared on 45nm Phenoms and my BIOS came out months before they did.

    It still seems possible that a BIOS mod could fix the pstates and allow an unsupported CPU to run with correct TSC frequency. However there is something else which may complicate matters that I hadn't considered. I believe every core of a multicore processor begins executing the BIOS code at power-on, and then the BIOS determines which core is the "Boot Strap Processor" by checking APIC IDs. Most likely the P-State fixing code would have to execute on all cores because they are capable of running at different speeds (I don't know if TSCs are separate as well).

    On the plus side, I saw this code in the BIOS:

    shr dx,1
    rcr ax,1
    shr dx,1
    rcr ax,1

    And I said "wait... does that do what they think it does???"

    I had mistakenly believed that x86 shr/sar did not affect the carry flag. But after seeing this and checking some other documentation, I now know that they do. (And I fixed some stuff in NOWUT)

    2019 Sep 07 updated .MOD player(s)

    three mod players: for Win32 command line, DOS (486+Sound Blaster), and 32X (a ROM image). Also included is NOWUT source, a guide to the .MOD file format, and one music track.

    2019 Aug 13 NOWUT version 0.20 release

    Many changes. MULTINO and LINKBIN can be built and run on Amiga, X68000, and 16-bit DOS, in addition to Win32. (Note that PE files still can't be linked without GoLink.) The compiler can run two passes for the purpose of optimizing some jumps/branches to short versions.

    A new example program, DISTG, can also be compiled for several platforms by linking to the same platform I/O modules. This is a simple Hu6280 disassembler.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2019 Jul 20 updated .MOD player

    a mod player for Win32 command line, now with bug fixes and improved audio (contains source and executable)

    2019 Jun 24 False Alarm...

    Boo :(

    Never mind! ^_^

    2019 Jun 1 SH2 benchmarking on 32X and Saturn

    from my post on sega16 forums:

    2019 Jun 1 Fooling with KiCad 4.0.7 on Win2K

    So far I've made a schematic and begun laying out a board. The UI is a bit alien and takes some getting used to. And then there's the mind-boggling "libraries" system... if you need to add a component or edit a foot print, be sure to RTFM and follow the instructions step-by-step, otherwise it will not make any sense. But after overcoming those hurdles I am actually getting stuff done (whether my circuit works is a different question).

    2019 Jun 1 a few more things about Win2K

    Some recent game releases on GOG are including a GOG Galaxy DLL file (with the non-Galaxy version of the game) which calls the kernel32 function QueryFullProcessImageName. This function doesn't exist in Win2K and will cause the game to fail with an error. Older versions of BWC's Extended Kernel did not add this function or had it disabled by default, but as of version 3.0b of Extended Kernel this function is now available. So even more games will run!

    Installing the Extended Kernel has some prerequisites, and along with all the other updates available, the process of bringing a fresh OS install up to speed can be tricky. After some trial and error, I've determined that this method should work:

  • begin with Win2K SP4
  • install unofficial service pack 5.1 (USP51.ZIP) (this contains some IE updates that are needed later, even if you have no intention of using IE)
  • install Windows2000-UpdateRollup2-x86-ENU.exe (or your language version if you can find it)
  • install IE6.0SP1-KB2722913-WINDOWS2000-X86-ENU.EXE
  • now Extended Kernel can be installed

    directx:

  • install the February 2010 release of DirectX 9
  • get the June 2010 (final) release of DirectX 9. This won't install but you can extract the files and manually add ones that are missing.
  • add D3DX9_43.DLL, D3DX10_43.DLL, D3DX11_43.DLL, D3DCompiler_43.dll, d3dcsx_43.dll, XAPOFX1_5.dll
  • xinput DLLs (can't remember where these are found, but they can be added to a game's own directory)
  • add the following files and execute the commands to register them (whatever that means)
    regsvr32 xactengine2_0.dll
    regsvr32 xactengine2_1.dll
    regsvr32 xactengine2_10.dll
    regsvr32 xactengine2_2.dll
    regsvr32 xactengine2_3.dll
    regsvr32 xactengine2_4.dll
    regsvr32 xactengine2_5.dll
    regsvr32 xactengine2_6.dll
    regsvr32 xactengine2_7.dll
    regsvr32 xactengine2_8.dll
    regsvr32 xactengine2_9.dll
    regsvr32 xactengine3_0.dll
    regsvr32 xactengine3_1.dll
    regsvr32 xactengine3_2.dll
    regsvr32 xactengine3_7.dll
    regsvr32 XAudio2_0.dll
    regsvr32 XAudio2_1.dll
    regsvr32 XAudio2_2.dll
    regsvr32 XAudio2_3.dll
    regsvr32 XAudio2_4.dll
    regsvr32 XAudio2_5.dll
    regsvr32 XAudio2_6.dll
    regsvr32 XAudio2_7.dll

    nVidia drivers:

  • the last official release appears to be 182.46_quadro_winxp2k_english_whql.exe which supports some rebadged Quadro versions of early DX10-level GeForce cards, eg. 8400GS, 9500GT, and even GTX 280
  • once kernelex is installed it becomes possible to use WinXP drivers up to 258.96 which supports GTX 480, GTX 460, and previous generation cards.
  • BWC's website has a modified version of 310.xx which supports GeForce 6xx series cards. I couldn't get the nVidia control panel to work with this release (makes it very difficult to use multiple monitors), however the stability and compatibilty with newer games appears to be superior.

    Radeon drivers: BWC's site has several versions of Radeon video drivers that work with Win2K. I use Catalyst 11.7 because the later versions are buggy on my system. This supports most of the Radeon HD 6000 cards and earlier. It includes OpenGL 4.1 support. Catalyst control center works as long as dotnetfx is installed. Unlike with the nVidia drivers, hardware-accelerated video playback does not work (but 1080p should be no problem for VLC 2.08 with a dual-core anyway)

    other stuff to install:

  • dotnetfx - there is a netfx4w2krc3.exe on BWC's site although when I ran this installer it got stuck in an infinite loop so I'm not sure if it was entirely successful. IIRC, before that I had NETFx20SP2_x86.exe installed. (update: there is also a NetFX35W2KEX.exe)
  • video/audio codecs...
  • MyPal, Opera 12.18, and/or otter-browser...
  • PPSSPP version 1.3, Avidemux 2.6, Foxit Reader 4.x or 5.4...
  • MSVC redistributables... the latest ones may not install on Win2K but you can manually add the files.
    07/12/2009  12:02 AM           569,664 msvcp90.dll
    07/12/2009  12:02 AM           653,120 msvcr90.dll
    07/12/2009  12:05 AM           225,280 msvcm90.dll
    03/18/2010  01:16 PM           771,424 msvcr100_clr0400.dll
    12/13/2011  06:39 PM           768,848 msvcr100.dll
    12/13/2011  06:39 PM           421,200 msvcp100.dll
    02/07/2012  07:12 PM            56,832 msvcirt.dll
    04/19/2013  02:06 PM           875,472 msvcr110.dll
    04/19/2013  02:06 PM           535,008 msvcp110.dll
    04/22/2015  05:18 PM           353,360 msvcrt.dll
    03/23/2018  01:09 PM           457,512 msvcp140.dll
    07/12/2018  02:16 AM           970,912 msvcr120.dll
    07/12/2018  02:16 AM           455,328 msvcp120.dll
  • 2019 May 15 IMGTOOL version 0.95 release

    Several changes to the auto-palettizer, and some other random stuff. Download the new archive.

    2019 May 11 NOWUT version 0.14b release

    Multino has had various bugs from 0.14 fixed. Download the new archive.

    Also, Here is a .MOD player (win32 command line).

    2019 May 1 NOWUT version 0.14 release

    This release combines NOWUT, NO68, and NOSH2 into a single compiler that supports all three CPUs (four? three-and-a-half?). 8086 support is complete, and there are small improvements to 68K and SH2. There is a JPEG decoder example program for the Amiga. Some old stuff has been removed from the archive (old archive links are still good if you need those).

    The compiler compiles itself in 6 seconds on a 100MHz 486. I tested a modified version compiled as 8086 code (not yet included in the release) which took about 11 seconds. That's not too bad for a 16-bit CPU pretending to be 32-bit?

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2019 Apr 30 Amiga 2000

    After struggling with this system off and on over the course of many months, it finally seems to have reached a fully functional state. (Minus the RTC which hasn't had a battery in decades, but who cares about that.)

    The current configuration:

  • GVP A3001 with 68030/882 at 50MHz and 12MB of RAM
  • GVP HC+8 SCSI controller with 2MB of zorro-2 RAM
  • 270MB Quantum HDD
  • 2MB chipmem upgrade
  • deinterlacer/VGA upgrade

    Years ago it became increasingly unstable and I put it away. After digging it out again, and a bit of soldering, cleaning, and reseating of ICs it was able to reliably boot to the purple checkmark screen at least, but not beyond that. With the SCSI card installed it would either show a blinking power LED, complain of a faulty expansion board, or freeze while attempting to load workbench. An IDE HDD connected to the A3001 worked for a time and then died. I spent some time creating my own bootable disk image and writing it to another IDE HDD but this one was only able to boot occasionally (checksum errors). Then one of the two working SCSI HDDs that I had died.

    Eventually I got it to boot by disabling the RAM on the SCSI card and relocating the fast RAM on the A3001 RAM32 board to the sub-16MB space, a thing that I did unintentionally as a result of changing jumper settings. The function of some jumpers on this revision of the board turned out to be different than specified in the documentation I found on the web,

  • When J5 on the RAM32 board is closed it locates the memory in the sub-16MB space, while also limiting memory size to 4MB. With the jumper open, all the memory becomes available at $1000000. This board has always had three 4MB SIMMs installed since I got it, in the top three slots. No 1MB SIMMs are present or necessary.
  • J15 on the CPU board can be used to disable the boot ROM and speed up the booting process when no IDE HDD is being used. However, disabling the ROM means that the A3001 RAM is not recognized by the OS anymore. It's better to use J18 to disable the IDE instead.
  • J7 on the CPU board cuts the CPU clock in half. The interesting part about this is that the FPU keeps running at 50MHz, and further more the DRAM continues to run at the same speed. In other words the memory latency is greatly improved in relative terms. AIBB reports a latency of 7.1 at 50MHz, but only 3.3 at 25MHz. Benchmark results are somewhat lower but certainly better than half and also better than a stock A3000. Too bad we couldn't run at the full 50MHz with that kind of memory latency eh?

    For unknown reasons, OS 3.9 would not boot without "fast" RAM in the zorro-2 space. So my next step, wanting to use all 12MB on the A3001 again, was to determine why the HC+8 wouldn't behave when its memory was enabled. This came down to trying different SIMMs until it worked. It liked the 3-chip modules better than the 9-chip ones apparently.

    Now it's time to test some Amiga programs compiled with NOWUT (they work, yay!) and run some old scene demos.

  • 2019 Mar 30 Displaying images on the 16-bit Sega

    Here is a photo resized to 320x200 with some fancy color remapping and F-S dithering.

    Here it is on a CRT connected to the Genesis w/ composite video. It uses two layers to get 31 colors available anywhere. (click image for larger version)

    2019 Mar 23 NOWUT version 0.13 release

    This release adds experimental 8086 support. The dreaded segmented memory model has not yet been dealt with, so addressing is limited to 64KB. (Of course, inline assembly code can always modify segment registers itself.) However, the lack of 32-bit ALU is masked by the compiler which does 32-bit operations anyway. For instance, an addition compiles to an ADD AX,lowword > ADC DX,highword and then AX and DX are stored sequentially in a dword variable. This is a trade-off which sacrifices code efficiency in favor of convenience/compatibility.

    Two more example programs were added. Here is the complete list:

  • Amiga 68K (plays a sound sample)
  • DOS 16-bit (JPEG decoder)
  • DOS 32-bit (JPEG decoder)
  • Sega 32X (JPEG decoder)
  • Sega Genesis (JPEG decoder)
  • Sega Saturn (JPEG decoder)
  • Sharp X68000 (JPEG decoder)
  • Win32 (JPEG decoder)
  • Win32 (OpenGL renderer)

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

  • 2019 Feb 24 NOWUT version 0.12 release

    This release contains bug fixes for NOSH2 and NOWUT, x87 FPU assembly support, and an OpenGL 3.2 demo.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.


    2019 Feb 2 Amiga harddisk image utility

    Here is the utility I constructed to make a bootable disk image for my GVP A3001 IDE interface. It's a Win32 executable (with NOWUT source) and includes a dump of the startup "LSEG" code from the original harddisk. I don't really know what this code does or where it comes from, but it's needed for booting. When creating a disk image, it's automatically copied over.

    example:
    MAKEHDF seagate.hdf 1001 15 34

    After the disk image is created it can be mounted in WinUAE and an icon will appear in Workbench. It can then be formatted and have WB files copied over. Finally, the image needs to be byte-swapped (use the included HDSWAP) and written to a disk.

    Finding drives that work with the A3001 may be the biggest challenge. I tried the WD, a 200MB Toshiba, and a 2GB Transcend card which all failed to boot.

    2019 Jan 26 The web browser from 2004 that's better than anything from this decade.

    I will attempt to explain one of the reasons I've become disillusioned with the web by showing how awesome a browser could be, and was, before developers went mad.

    Here is a screenshot from Opera 7.54 (registered version).

    As you can see, it has real menus, with words and everything. There is an email/USENET reader built-in, and an IRC client, if you need those things.

    Look what happens if you press F12. You can disable JavaScript without having to download an extra program for that sole purpose.


    Now check this out. See how the link text is highlighted? That's because I pressed shift+down and selected that link. Keyboard controls can be used for nearly everything, a mouse is not needed. Then I pressed the context menu key. Look at the options that became available. New page, new window, background page, background window, download, copy link, and more.

    Of course, having multiple pages open in separate tabs was already old hat. In addition to opening the new page however you please, you can duplicate an existing tab. Let's say you see something interesting, but it's a little too long to read right now, and you need to look at some other items first. Just duplicate that tab and leave the dupe there for later. No need to copy and paste a URL or wait for the page to load again. You can also save your current roster of tabs as a "session" and reopen all of them again later, or next time you start the browser. There is a menu option and keyboard shortcut to reload all the open pages.

    I can browse photos in a local directory. I can zoom out on those big images, and hit F11 for a full screen view with plain, black border (if any). The image can be resized using a proper algorithm instead of nearest-neighbor ugliness (this is OS dependent). Then I can use the back/forward buttons, or their keyboard or mouse-gesture equivalents, to cycle through all of the photos in the directory.

    Now maybe the menus and toolbars were missing something important, or they took up too much space and had too many unneeded buttons. You can change it, easily.

    You can literally click-and-drag the buttons you want to the place you want them.


    Each one of the categories has additional options to pick from.

    I rearranged by buttons, now do I want toolbars at the top or bottom?

    I can change fonts and a million other things too. It's all accessible through the GUI, no need to use the registry-like about:config page.

    I'm sure I've left out many things, but this is a good start at least.

    2019 Jan 19 Amiga hacking

    So I wanted to test some of my NOWUT demo code on my dusty old Amiga 2000. Years ago the machine had become very tempermental and I had lost interest in it. Then last year I reflowed some solder joints and reseated the 68000 and got it to boot once again.

    However it would not boot from the 2GB SCSI HDD that I used to use. I don't know why, and since I have no compatible SCSI interface to connect the drive to a PC, I have no way to investigate.

    I was left with only the 80MB Quantum HDD that I received along with the GVP A3001 CPU board. It connects to its built-in IDE interface. And now that drive is on its last legs.

    Having a computer that can't do anything for lack of bootable media is always an annoying situation to be in. When you have another computer nearby, usually its possible to "jump start" something, but it depends. When I got an Apple IIGS, all I had to do was load DOS 3.3 over a null modem cable and go from there. If a PC won't boot, I can take the harddisk out and connect it to a USB-IDE adaptor. When I got a 68k Mac with a dead HDD, I could not find any instructions on how to get it going again, so it had to go on the junk pile.

    In any case, if I were designing a computer I would definitely put a disk partitioning utility, hex/sector editor, and terminal program into ROM to make things easier.

    As for my A2000, it at least has an IDE interface, and I still have my old Amiga files backed up on my PC. The GVP A3001 IDE is said to have poor compatibility, but I have several old drives to try on it. Another important detail to know about it is that it swaps the upper and lower bytes when it reads/writes the disk.

    The first thing I tried was to create a bootable hardfile (.HDF) in WinUAE and write it to a Western Digital Caviar 1170. So how do you do that? Well, WinUAE is a bit clunky in this regard, but it is possible. First I created a blank HDF. It doesn't let you specify the exact size, it only asks for "MB" and I wasn't sure whether it used real megabytes or base 10 ones, so I put in 171. That ended up being a bit oversize, but it doesn't matter.

    Hit full RDB and manual geometry and put in the CHS values. You might need to exit this dialog and start again, it's a bit confusing.

    The next step was to put some files in it. I booted WinUAE and ran HDtoolbox. Now, I'm not sure if this is something I used to know and had forgotten, but running HDtoolbox only gives you a blank window with grayed-out buttons unless you start it with a secret command line argument. That argument needs to be the name of a device, in this case uaehf.device

    hdtoolbox uaehf.device

    I partitioned. I formatted. I copied over some backed up workbench files from a PC directory. I tested whether I could boot this HDF with WinUAE. It worked.

    Then I attached the WD1170 to the secondary IDE port of my old Athlon XP and wrote 333,300 sectors from the HDF file using Roadkil's Sector Editor.

    I connected that to the Amiga. But it went to the insert disk screen. FAIL.

    I'm not sure if that disk image WOULD have worked... But my later experiment proved that the WD1170 is simply not useable with the GVP A3001.

    So at this point I took a different approach. After disconnecting the 1170 from the Athlon system, I connected the old 80MB Quantum drive again. (Bet you didn't know you could hot-plug IDE drives under Windows 2000!) Previously the drive had been making a lot of clunking noises and refusing to read certain sectors. But I tried again to dump it, using Roadkil's Sector Editor. I was able to get a complete dump with minimal clunking. It turns out that a large portion is corrupt, but I at least got the first track, and a substantial amount of the second partition (it was divided into two 40MB partitions).

    I kept thinking "what if I just write this dump to another disk and plug it in?" The only problem was that the Quantum drive had 965 cylinders, 10 heads, and 17 sectors, while the 1170 had 1010 cylinders, 6 heads, and 55 sectors. The geometry didn't match up, and it would never work if the system expected 10 heads and had only 6. But I looked around and found a 260MB Seagate with 1001 cylinders, 15 heads, and 34 sectors. Aha!

    I connected the Seagate to the Athlon XP and set the parameters in the BIOS to match the Quantum drive. Then I booted to DOS. I couldn't use Windows 2000 because it bypasses the BIOS for disk access, and would surely discover the real geometry. I needed the computer to think that the CHS values were the Quantum ones, so that the disk image gets written out the way I want it to. I used Norton Diskedit for DOS to write the image. It was INCREDIBLY SLOW, for unknown reasons. Took about 3 hours to write 80MB.

    But when I connected this drive to the Amiga, it did something! Because of the aforementioned corruption in the Quantum dump, it didn't boot Workbench, but after some error messages it did land on a "1" prompt. Now, I'm not an expert on Amiga DOS, but miraculously I was able to deduce that the reason I couldn't run "dir" was because it's an external command, located in the "c" directory. I remembered the name of the second partition, so I went there. Did a "cd c" and then I could use "dir" AMAZING! It WORKS! (sort of). The next thing I did was run hdtoolbox and try to see what was up with the first partition. It said it needed to update something. But after it did that, then the Amiga would no longer boot. Hmmm...

    Having proved that the A3001 could read this Seagate drive, I wanted to get a better disk image put together that wasn't all corrupt and stuff. I found some info about the Amiga RDB, Partition block, and filesys block on the www.amigadev.elowar.com site. I found some more details on the Amiga OS Wiki. And I started writing a program to generate disk images with custom geometry, while keeping everything as close as possible to how it was layed out on in the Quantum dump, in case there are any peculiarities there that the GVP card depends on.

    After several hours of monkeying around, I was able to generate images which would appear in WinUAE, where I could then format and copy files. Then I could write the image to a harddisk and try it on the A2000. (And let's not forget to swap the byte order in the image first!)

    It was at this point that I concluded that the WD1170 simply would not work with the GVP A3001. But I got the Seagate to boot to WB 3.1. However it seems that this setup isn't 100% reliable. While it does boot and run stuff, checksum errors are a regular occurence (hitting retry does the trick) and I suspect that if I were to try to write anything to the disk it would go horribly wrong.

    My utility to create HDFs from nothing still needs work. But after everything that I went through to resurrect the A2000 I was able to run a terminal program, download the NOWUT demo code into RAM: and execute it. And it worked :)

    2019 Jan 19 NOWUT version 0.11b release

    This release contains bug fixes for NO68, NOSH2, and LINKBIN.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2019 Jan 2 NOWUT version 0.11 release

    Here's an update to the self-hosted NOWUT compiler. Bug fixes, new linker, new DOS example program...

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2018 Dec 15 cache policy

    A write-through cache can only speed up reads, as writes still result in a bus cycle (and avoids potential issues with "old" data in DRAM being accessed by another CPU/device). 486DX, 68020, etc. used write-through.

    With a write-back policy, writes can be stored in the cache instead of going to the bus. The cache line containing the modified data eventually gets written to DRAM when it is retired. However, when a write addresses a memory area which is not currently represented in the cache, two things can happen. Either it can be treated as a write-through (486DX4 uses this scheme if I'm not mistaken), or the relevant memory can be first read into the cache. The latter is called allocate-on-write and is used on 68040 and newer x86 CPUs (it's optional on Cyrix 6x86).

    Allocate-on-write can result in a degradation of performance when large data blocks are written. Data from the destination area of DRAM crowds out other data in the cache, and even if it is read later (data written to a video card frame buffer may not ever be read by the CPU) it is likely to still result in a miss.

    The improved efficiency of transferring an entire cache line at once in a burst, instead of accessing DRAM one word at a time, can mitigate slow-downs resulting from allocate-on-write while maintaining the advantages. Some socket 7 chipsets (eg. Via VP, MVP) don't do write bursts.

    If large data blocks are written one byte at time, rather than using 32- or 64-bit words, the reduced number of bus cycles using allocate-on-write may save more time than is lost by doing pointless reads before hand.

    A scheme called write-gathering or write-combining is used in specific circumstances (eg. video card frame buffer memory) to hold sequential writes in a buffer, without reading any data first, and then execute one large write.

    With all that being said, my idea is this. Maybe the decision on whether to allocate-on-write or not could take into account the size of the write. Eg. write-through (or combine) for 32-bit words, allocate for bytes. Large block copies would tend to use the larger size for speed. (Perhaps this has already been implemented.)

    2018 Dec 10 IMGTOOL version 0.94

    New version of my bitmap editor IMGTOOL. Has a lot of loose ends at the moment, but also many new features like tile/pixel editor modes. Time for a release! Includes FreeBASIC source and DOS/Win32 executables. Download the complete archive.

    2018 Dec 4 NOWUT compiler released

    This is the first release of the NOWUT cross compiler for the low-level NOWUT language. It runs on Win32 and targets x86, Amiga 68K, X68000, Sega 32x and Saturn. Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.


    2018 Dec 1 delayed immediates

    RISC CPUs sometimes employ a delayed branch, where the instruction immediately after the branch instruction is always executed, hence avoiding some disruption of the pipeline.

    RISC CPUs also tend not to allow immediate data (data following the opcode) to be used as operands. This is regrettable from a programming standpoint, and seems like it would be a mixed-bag from a performance standpoint, since a memory access has to occur somewhere else to load that data. So... what if, like a delayed branch, we had delayed immediate data? You could load a DWORD into a register with one instruction, but the data could go further down the stream.

    To be more specific, imagine that your CPU used a general purpose register as the program counter (eg. R15), and that you had a post-increment register-indirect addressing mode available to commonly used instructions. Loading from (R15+) would be the same thing as loading immediate data conceptually. You just have to assume that R15 was pointing at the location following the opcode, and that the post-increment would cause it to skip over the data before the next instruction fetch. Those assumptions wouldn't hold on a pipelined CPU, but perhaps the "delayed immediate" would make the situation workable, by lining up the data load and the "skip" with the proper pipeline stage of the opcode that needed it.

    2018 Apr 14 video card framebuffer memory bandwidth

    From my post on VOGONS:

    486DX4-100, Trident 8900 ISA - 5.4MB/s
    486DX4-100, Trident 9440 VLB - 31MB/s
    Pentium II-350, i440BX, Trident 9680 PCI - 38MB/s
    Pentium II-350, i440BX, Trident 9680 PCI - 62MB/s (write combining enabled)
    Pentium III-600e, i440BX, GeForce FX5200 AGP - 47MB/s
    Pentium III-600e, i440BX, GeForce FX5200 AGP - 240MB/s (write combining enabled)
    Pentium M-1200, i855PM, Radeon 7500 AGP - 50MB/s
    Pentium M-1200, i855PM, Radeon 7500 AGP - 169MB/s (write combining enabled)
    Athlon XP, ViaKT333, GeForce FX5700 AGP - 83MB/s
    Athlon XP, ViaKT333, GeForce FX5700 AGP - 192MB/s (write combining enabled)
    Phenom II, AMD770, Radeon 5670 PCIe - 189MB/s
    Phenom II, AMD770, Radeon 5670 PCIe - 2500MB/s (write combining enabled)
    

    These tests use the REP STOSD instruction to fill video memory with a constant. My guess is that the higher AGP 4X/8X speeds aren't enabled under DOS, since I haven't seen any AGP system beat this 440 board. (The Pentium M and Athlon are AGP 4X)

    2018 Mar 18 SuperPi

    From my post on VOGONS:

    Athlon II X2 260 (3.2GHz, DDR2-800 CL5) - 25s
    Athlon X2 7850 (2.8GHz, DDR2-800 CL5) - 28s
    Core2 SU9600 (1.6GHz, 1.8GHz turbo, DDR2-667) - 34s
    Athlon X2 4850e (2.5GHz, DDR2-800 CL5) - 36s
    Turion X2 1.6GHz (DDR2-667) - 57s
    Athlon XP 2800+ (2083MHz, KT333, DDR-333 CL3) - 1m
    Athlon XP 2500+ (1833MHz, KT333, DDR-333 CL2) - 1m1s
    Sempron 2300+ (1583MHz, KT333, DDR-333 CL2) - 1m15s
    Pentium M 1MB 1.0GHz (i855pm, DDR-266) - 1m25s
    Pentium M 2MB 1.2GHz (i855pm, DDR-266) - 1m4s
    Pentium M 2MB 1.2GHz (i915gm, DDR2-400) - 1m1s
    Pentium M 1MB 1.7GHz (i855pm, DDR-333) - 56s
    Pentium M 2MB 1.6GHz (i855pm, DDR-333) - 53s
    Pentium 3 933MHz (PC-133) - 2m17s
    Pentium 3M 800 (i830gm, PC-133 CL2) - 2m35s
    Pentium 3M 1333MHz (i830gm, PC-133 CL3) - 2m20s
    Pentium 3M 1333MHz (i830gm, PC-133 CL2) - 2m06s
    Pentium 3M 1333MHz (i830gm, PC-133 CL2) - 1m57s (screen mode at 800x600x16 instead of 1024x768x32)
    Pentium 3 600e (PC-100) - 3m21s
    Pentium MMX 166 (256KB L2) - 15m2s
    

    2018 Jan 16 some old 3DMark2001SE benchmarks
    Socket 5 (Packard Bell C 115, i430VX chipset, Win98)
    Pentium MMX Overdrive 200, GeForce 2MX 200 PCI = 290 3dmarks
    
    Socket 7 (Shuttle HOT-591p, VIA MVP3 chipset, 512KB cache, Win98)
    IDT Winchip 200, Radeon 9200 = 892 3dmarks
    Pentium 166 at 188, Radeon 9200 = 1071 3dmarks
    Cyrix M2 PR400 (285MHz), Radeon 9200 = 1535 3dmarks
    K6-2 380, Radeon 9200 = 1891 3dmarks
    K6-3 380, Radeon 9200 = 2304 3dmarks
    K6-3 380, GeForce 2 GTS = 683 3dmarks
    K6-3 380, Matrox G250 (800x600) = 308 3dmarks
    
    Slot 1 (Intel SE440BX2, PC100 CL2 SDRAM, Win98)
    Pentium 3 600e, SiS 6326 (640x480) = 203 3dmarks
    Pentium 3 600e, S3 Savage 4 (16-bit color) = 870 3dmarks
    Pentium 3 600e, nVidia TNT (800x600) = 976 3dmarks
    Pentium 3 600e, S3 Savage 2000 (16-bit color) = 1098 3dmarks
    Pentium 3 600e, Voodoo 3 3000 (16-bit color, unofficial drivers) = 1243 3dmarks
    Pentium 3 600e, GeForce 4MX 420 = 2495 3dmarks
    Pentium 3 600e, GeForce FX 5200 64-bit = 3100 3dmarks
    Pentium 3 600e, GeForce FX 5700LE = 4438 3dmarks
    Pentium 3 550, GeForce FX 5200 64-bit = 2785 3dmarks
    Pentium 3 550, Radeon 9200 = 3530 3dmarks
    Pentium 2 350, Radeon 9200 = 2643 3dmarks
    
    Socket 370 (i815 chipset, Win98/2k)
    Celeron 533, GeForce 2 GTS = 1381 3dmarks
    Pentium 3 733, nVidia TNT (800x600) = 1011 3dmarks
    Pentium 3 733, nVidia TNT2 M64 (800x600) = 1162 3dmarks
    Pentium 3 733, Matrox G450 (800x600)= 1150 3dmarks
    Pentium 3 733, Matrox G450 = 896 3dmarks
    Pentium 3 933, i815 integrated (16-bit color) = 660 3dmarks
    Pentium 3 933, GeForce 2MX 200 = 1533 3dmarks
    Pentium 3 933, GeForce 4MX 420 = 3213 3dmarks
    Pentium 3 933, GeForce 2 GTS = 3106 3dmarks
    Pentium 3 933, GeForce FX 5200 64-bit = 3794 3dmarks
    Pentium 3 933, Radeon 9200 = 5059 3dmarks
    Pentium 3 933, GeForce FX 5700LE = 5230 3dmarks
    
    Socket A (Biostar M7VIP-pro, VIA KT333 chipset, DDR333, Win2k)
    Sempron 2300+ (1583MHz), nVidia TNT2 M64 (800x600) = 1884 3dmarks
    Sempron 2300+ (1583MHz), GeForce 2MX = 2913 3dmarks
    Sempron 2300+ (1583MHz), GeForce 4MX 420 = 3450 3dmarks
    Sempron 2300+ (1583MHz), GeForce 4MX 420 195MHz RAM = 3900 3dmarks
    Sempron 2300+ (1583MHz), GeForce 2 GTS = 4171 3dmarks
    Sempron 2300+ (1583MHz), Radeon 9200 = 6666 3dmarks
    Sempron 2300+ (1583MHz), GeForce 6200 = 7291 3dmarks
    Sempron 2300+ (1583MHz), GeForce FX 5700LE = 7790 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5200 64-bit = 4884 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5700LE = 8425 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5700LE 20% overclock = 9280 3dmarks
    
    Socket A (nForce2 chipset, Win2k)
    Athlon XP 2800+ (2083MHz), GeForce 4MX integrated = 3683 3dmarks
    Athlon XP 2800+ (2083MHz), GeForce FX 5700LE = 8977 3dmarks
    Athlon XP 2800+ (2083MHz), Quadro FX 1000 = 10771 3dmarks
    
    Socket 479 (Itox mini-ITX board, i855GM chipset, DDR333, Win2k)
    Pentium M 2MB 1.6GHz, i855 integrated = 2500 3dmarks
    Pentium M 2MB 1.6GHz, GeForce 4MX PCI = 4080 3dmarks
    Pentium M 2MB 1.6GHz, GeForce 8400GS PCI = 9200 3dmarks
    
    Socket AM2 (nForce 430 chipset, DDR2-800, Win2k)
    Athlon X2 4850e (2.5GHz), GeForce 6150 integrated = 5481 3dmarks
    Athlon X2 4850e (2.5GHz), Quadro NVS 285 = 9125 3dmarks
    Athlon X2 4850e (2.5GHz), Radeon X1300 64-bit = 10291 3dmarks
    Athlon X2 4850e (2.5GHz), Quadro NVS 290 = 13944 3dmarks
    Athlon X2 4850e (2.5GHz), GeForce 210 520/600 = 17900 3dmarks 
    Athlon X2 4850e (2.5GHz), GeForce 9500 DDR2 450/400 = 19887 3dmarks
    Athlon X2 4850e (2.5GHz), GeForce GT220 625/700 = 21626 3dmarks
    
    Socket AM2+ (Biostar A770-A2+, AMD 770 chipset, DDR2-800, Win2k)
    Athlon X2 4850e (2.5GHz), GeForce 7600GT = 23001 3dmarks
    Athlon X2 7850 (2.8GHz), GeForce 7600GT = 29945 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce 7600GT = 29055 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 4550 600/800 = 24945 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce 9500 DDR2 450/400 = 25815 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 7510 650/700 = 31000 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 5570 650/700 = 32828 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce GT220 625/700 = 32602 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce GT240 550/1000 = 32689 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 3850 = 34945 3dmarks
    
    Laptops (Win2k/XP)
    Dell Lattitude C400, Pentium 3M 1.33GHz, i830 integrated = 1080 3dmarks
    Fujitsu Lifebook B6110D, Pentium M 2MB 1.2GHz, i915 integrated = 3565 3dmarks
    NEC Versa S820, Pentium M 1MB 1.0GHz, Mobility Radeon 7500 64-bit = 4060 3dmarks
    Fujitsu Lifebook S series, Turion 64 X2 1.6GHz, DDR2-667, Radeon Xpress 200m = 4200 3dmarks
    Toshiba Portege M200, Pentium M 1MB 1.7GHz, GeForce Go 5200 64-bit = 4765 3dmarks
    Fujitsu Lifebook S series, Turion 64 2.0GHz, DDR2-800, Radeon Xpress 200m = 5300 3dmarks
    
    2017 Dec 20 power consumption
    Video cards at idle:
    GeForce 7600GT (560MHz) - 15W
    GeForce 9500 DDR2 (450MHz) - 10W
    
    cards that throttle to a lower speed at idle:
    GeForce 210 (135MHz) - 2.5W
    GeForce GT 220 (135MHz) - 7.5W
    GeForce GT 240 (135MHz) - 7.5W
    Radeon HD 3850 (300MHz) - 12W
    Radeon HD 4550 (110MHz) - 2.5W
    Radeon HD 5570 (400MHz) - 10W
    

    I also tested an IBM AT (286) loaded up with ISA memory board, VGA, sound, disk controller, and 3.5" harddisk. It drew 32W. A VLB 486DX4-100 system drew 28W.

    I test power consumption using a shunt resistor (10 ohms) in the AC supply and calculating the current and power draw from the voltage drop. (In the case of video cards, it's necessary to estimate and subtract out the power for the rest of the system.)

    2017 Oct 21 registry tweaks

    Use HDDs larger than 128GB in Windows 2000 by creating this DWORD registry key and setting the value to 1:

    HKLM\System\CurrentControlSet\Services\Atapi\Parameters\EnableBigLba 
    

    2017 Jan 17 GUI thought experiment