Anachro-mputing blog

retro and not-so-retro computing topics
Related links:
  • homebrew games, demos, and misc software
  • Old links
  • buy my rad t-shirts on Zazzle shirt 1 shirt 2 and miscellaneous art

  • Updates

    2019 Jan 19 Amiga hacking

    So I wanted to test some of my NOWUT demo code on my dusty old Amiga 2000. Years ago the machine had become very tempermental and I had lost interest in it. Then last year I reflowed some solder joints and reseated the 68000 and got it to boot once again.

    However it would not boot from the 2GB SCSI HDD that I used to use. I don't know why, and since I have no compatible SCSI interface to connect the drive to a PC, I have no way to investigate.

    I was left with only the 80MB Quantum HDD that I received along with the GVP A3001 CPU board. It connects to its built-in IDE interface. And now that drive is on its last legs.

    Having a computer that can't do anything for lack of bootable media is always an annoying situation to be in. When you have another computer nearby, usually its possible to "jump start" something, but it depends. When I got an Apple IIGS, all I had to do was load DOS 3.3 over a null modem cable and go from there. If a PC won't boot, I can take the harddisk out and connect it to a USB-IDE adaptor. When I got a 68k Mac with a dead HDD, I could not find any instructions on how to get it going again, so it had to go on the junk pile.

    In any case, if I were designing a computer I would definitely put a disk partitioning utility, hex/sector editor, and terminal program into ROM to make things easier.

    As for my A2000, it at least has an IDE interface, and I still have my old Amiga files backed up on my PC. The GVP A3001 IDE is said to have poor compatibility, but I have several old drives to try on it. Another important detail to know about it is that it swaps the upper and lower bytes when it reads/writes the disk.

    The first thing I tried was to create a bootable hardfile (.HDF) in WinUAE and write it to a Western Digital Caviar 1170. So how do you do that? Well, WinUAE is a bit clunky in this regard, but it is possible. First I created a blank HDF. It doesn't let you specify the exact size, it only asks for "MB" and I wasn't sure whether it used real megabytes or base 10 ones, so I put in 171. That ended up being a bit oversize, but it doesn't matter.

    Hit full RDB and manual geometry and put in the CHS values. You might need to exit this dialog and start again, it's a bit confusing.

    The next step was to put some files in it. I booted WinUAE and ran HDtoolbox. Now, I'm not sure if this is something I used to know and had forgotten, but running HDtoolbox only gives you a blank window with grayed-out buttons unless you start it with a secret command line argument. That argument needs to be the name of a device, in this case uaehf.device

    hdtoolbox uaehf.device

    I partitioned. I formatted. I copied over some backed up workbench files from a PC directory. I tested whether I could boot this HDF with WinUAE. It worked.

    Then I attached the WD1170 to the secondary IDE port of my old Athlon XP and wrote 333,300 sectors from the HDF file using Roadkil's Sector Editor.

    I connected that to the Amiga. But it went to the insert disk screen. FAIL.

    I'm not sure if that disk image WOULD have worked... But my later experiment proved that the WD1170 is simply not useable with the GVP A3001.

    So at this point I took a different approach. After disconnecting the 1170 from the Athlon system, I connected the old 80MB Quantum drive again. (Bet you didn't know you could hot-plug IDE drives under Windows 2000!) Previously the drive had been making a lot of clunking noises and refusing to read certain sectors. But I tried again to dump it, using Roadkil's Sector Editor. I was able to get a complete dump with minimal clunking. It turns out that a large portion is corrupt, but I at least got the first track, and a substantial amount of the second partition (it was divided into two 40MB partitions).

    I kept thinking "what if I just write this dump to another disk and plug it in?" The only problem was that the Quantum drive had 965 cylinders, 10 heads, and 17 sectors, while the 1170 had 1010 cylinders, 6 heads, and 55 sectors. The geometry didn't match up, and it would never work if the system expected 10 heads and had only 6. But I looked around and found a 260MB Seagate with 1001 cylinders, 15 heads, and 34 sectors. Aha!

    I connected the Seagate to the Athlon XP and set the parameters in the BIOS to match the Quantum drive. Then I booted to DOS. I couldn't use Windows 2000 because it bypasses the BIOS for disk access, and would surely discover the real geometry. I needed the computer to think that the CHS values were the Quantum ones, so that the disk image gets written out the way I want it to. I used Norton Diskedit for DOS to write the image. It was INCREDIBLY SLOW, for unknown reasons. Took about 3 hours to write 80MB.

    But when I connected this drive to the Amiga, it did something! Because of the aforementioned corruption in the Quantum dump, it didn't boot Workbench, but after some error messages it did land on a "1" prompt. Now, I'm not an expert on Amiga DOS, but miraculously I was able to deduce that the reason I couldn't run "dir" was because it's an external command, located in the "c" directory. I remembered the name of the second partition, so I went there. Did a "cd c" and then I could use "dir" AMAZING! It WORKS! (sort of). The next thing I did was run hdtoolbox and try to see what was up with the first partition. It said it needed to update something. But after it did that, then the Amiga would no longer boot. Hmmm...

    Having proved that the A3001 could read this Seagate drive, I wanted to get a better disk image put together that wasn't all corrupt and stuff. I found some info about the Amiga RDB, Partition block, and filesys block on the www.amigadev.elowar.com site. I found some more details on the Amiga OS Wiki. And I started writing a program to generate disk images with custom geometry, while keeping everything as close as possible to how it was layed out on in the Quantum dump, in case there are any peculiarities there that the GVP card depends on.

    After several hours of monkeying around, I was able to generate images which would appear in WinUAE, where I could then format and copy files. Then I could write the image to a harddisk and try it on the A2000. (And let's not forget to swap the byte order in the image first!)

    It was at this point that I concluded that the WD1170 simply would not work with the GVP A3001. But I got the Seagate to boot to WB 3.1. However it seems that this setup isn't 100% reliable. While it does boot and run stuff, checksum errors are a regular occurence (hitting retry does the trick) and I suspect that if I were to try to write anything to the disk it would go horribly wrong.

    My utility to create HDFs from nothing still needs work. But after everything that I went through to resurrect the A2000 I was able to run a terminal program, download the NOWUT demo code into RAM: and execute it. And it worked :)

    2019 Jan 19 NOWUT version 0.11b release

    This release contains bug fixes for NO68, NOSH2, and LINKBIN.

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2019 Jan 2 NOWUT version 0.11 release

    Here's an update to the self-hosted NOWUT compiler. Bug fixes, new linker, new DOS example program...

    Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.

    2018 Dec 15 cache policy

    A write-through cache can only speed up reads, as writes still result in a bus cycle (and avoids potential issues with "old" data in DRAM being accessed by another CPU/device). 486DX, 68020, etc. used write-through.

    With a write-back policy, writes can be stored in the cache instead of going to the bus. The cache line containing the modified data eventually gets written to DRAM when it is retired. However, when a write addresses a memory area which is not currently represented in the cache, two things can happen. Either it can be treated as a write-through (486DX4 uses this scheme if I'm not mistaken), or the relevant memory can be first read into the cache. The latter is called allocate-on-write and is used on 68040 and newer x86 CPUs (it's optional on Cyrix 6x86).

    Allocate-on-write can result in a degradation of performance when large data blocks are written. Data from the destination area of DRAM crowds out other data in the cache, and even if it is read later (data written to a video card frame buffer may not ever be read by the CPU) it is likely to still result in a miss.

    The improved efficiency of transferring an entire cache line at once in a burst, instead of accessing DRAM one word at a time, can mitigate slow-downs resulting from allocate-on-write while maintaining the advantages. Some socket 7 chipsets (eg. Via VP, MVP) don't do write bursts.

    If large data blocks are written one byte at time, rather than using 32- or 64-bit words, the reduced number of bus cycles using allocate-on-write may save more time than is lost by doing pointless reads before hand.

    A scheme called write-gathering or write-combining is used in specific circumstances (eg. video card frame buffer memory) to hold sequential writes in a buffer, without reading any data first, and then execute one large write.

    With all that being said, my idea is this. Maybe the decision on whether to allocate-on-write or not could take into account the size of the write. Eg. write-through (or combine) for 32-bit words, allocate for bytes. Large block copies would tend to use the larger size for speed. (Perhaps this has already been implemented.)

    2018 Dec 10 IMGTOOL version 0.94

    New version of my bitmap editor IMGTOOL. Has a lot of loose ends at the moment, but also many new features like tile/pixel editor modes. Time for a release! Includes FreeBASIC source and DOS/Win32 executables. Download the complete archive.

    2018 Dec 4 NOWUT compiler released

    This is the first release of the NOWUT cross compiler for the low-level NOWUT language. It runs on Win32 and targets x86, Amiga 68K, X68000, Sega 32x and Saturn. Check the documentation. Download the complete archive.

    And be sure to get Go Link from Go Tools website.


    2018 Dec 1 delayed immediates

    RISC CPUs sometimes employ a delayed branch, where the instruction immediately after the branch instruction is always executed, hence avoiding some disruption of the pipeline.

    RISC CPUs also tend not to allow immediate data (data following the opcode) to be used as operands. This is regrettable from a programming standpoint, and seems like it would be a mixed-bag from a performance standpoint, since a memory access has to occur somewhere else to load that data. So... what if, like a delayed branch, we had delayed immediate data? You could load a DWORD into a register with one instruction, but the data could go further down the stream.

    To be more specific, imagine that your CPU used a general purpose register as the program counter (eg. R15), and that you had a post-increment register-indirect addressing mode available to commonly used instructions. Loading from (R15+) would be the same thing as loading immediate data conceptually. You just have to assume that R15 was pointing at the location following the opcode, and that the post-increment would cause it to skip over the data before the next instruction fetch. Those assumptions wouldn't hold on a pipelined CPU, but perhaps the "delayed immediate" would make the situation workable, by lining up the data load and the "skip" with the proper pipeline stage of the opcode that needed it.

    2018 Apr 14 video card framebuffer memory bandwidth

    From my post on VOGONS:

    486DX4-100, Trident 8900 ISA - 5.4MB/s
    486DX4-100, Trident 9440 VLB - 31MB/s
    Pentium II-350, i440BX, Trident 9680 PCI - 38MB/s
    Pentium II-350, i440BX, Trident 9680 PCI - 62MB/s (write combining enabled)
    Pentium III-600e, i440BX, GeForce FX5200 AGP - 47MB/s
    Pentium III-600e, i440BX, GeForce FX5200 AGP - 240MB/s (write combining enabled)
    Pentium M-1200, i855PM, Radeon 7500 AGP - 50MB/s
    Pentium M-1200, i855PM, Radeon 7500 AGP - 169MB/s (write combining enabled)
    Athlon XP, ViaKT333, GeForce FX5700 AGP - 83MB/s
    Athlon XP, ViaKT333, GeForce FX5700 AGP - 192MB/s (write combining enabled)
    Phenom II, AMD770, Radeon 5670 PCIe - 189MB/s
    Phenom II, AMD770, Radeon 5670 PCIe - 2500MB/s (write combining enabled)
    

    These tests use the REP STOSD instruction to fill video memory with a constant. My guess is that the higher AGP 4X/8X speeds aren't enabled under DOS, since I haven't seen any AGP system beat this 440 board. (The Pentium M and Athlon are AGP 4X)

    2018 Mar 18 SuperPi

    From my post on VOGONS:

    Athlon II X2 260 (3.2GHz, DDR2-800 CL5) - 25s
    Athlon X2 7850 (2.8GHz, DDR2-800 CL5) - 28s
    Core2 SU9600 (1.6GHz, 1.8GHz turbo, DDR2-667) - 34s
    Athlon X2 4850e (2.5GHz, DDR2-800 CL5) - 36s
    Turion X2 1.6GHz (DDR2-667) - 57s
    Athlon XP 2800+ (2083MHz, KT333, DDR-333 CL3) - 1m
    Athlon XP 2500+ (1833MHz, KT333, DDR-333 CL2) - 1m1s
    Sempron 2300+ (1583MHz, KT333, DDR-333 CL2) - 1m15s
    Pentium M 1MB 1.0GHz (i855pm, DDR-266) - 1m25s
    Pentium M 2MB 1.2GHz (i855pm, DDR-266) - 1m4s
    Pentium M 2MB 1.2GHz (i915gm, DDR2-400) - 1m1s
    Pentium M 1MB 1.7GHz (i855pm, DDR-333) - 56s
    Pentium M 2MB 1.6GHz (i855pm, DDR-333) - 53s
    Pentium 3 933MHz (PC-133) - 2m17s
    Pentium 3M 800 (i830gm, PC-133 CL2) - 2m35s
    Pentium 3M 1333MHz (i830gm, PC-133 CL3) - 2m20s
    Pentium 3M 1333MHz (i830gm, PC-133 CL2) - 2m06s
    Pentium 3M 1333MHz (i830gm, PC-133 CL2) - 1m57s (screen mode at 800x600x16 instead of 1024x768x32)
    Pentium 3 600e (PC-100) - 3m21s
    Pentium MMX 166 (256KB L2) - 15m2s
    

    2018 Jan 16 some old 3DMark2001SE benchmarks
    Socket 5 (Packard Bell C 115, i430VX chipset, Win98)
    Pentium MMX Overdrive 200, GeForce 2MX 200 PCI = 290 3dmarks
    
    Socket 7 (Shuttle HOT-591p, VIA MVP3 chipset, 512KB cache, Win98)
    IDT Winchip 200, Radeon 9200 = 892 3dmarks
    Pentium 166 at 188, Radeon 9200 = 1071 3dmarks
    Cyrix M2 PR400 (285MHz), Radeon 9200 = 1535 3dmarks
    K6-2 380, Radeon 9200 = 1891 3dmarks
    K6-3 380, Radeon 9200 = 2304 3dmarks
    K6-3 380, GeForce 2 GTS = 683 3dmarks
    K6-3 380, Matrox G250 (800x600) = 308 3dmarks
    
    Slot 1 (Intel SE440BX2, PC100 CL2 SDRAM, Win98)
    Pentium 3 600e, SiS 6326 (640x480) = 203 3dmarks
    Pentium 3 600e, S3 Savage 4 (16-bit color) = 870 3dmarks
    Pentium 3 600e, nVidia TNT (800x600) = 976 3dmarks
    Pentium 3 600e, S3 Savage 2000 (16-bit color) = 1098 3dmarks
    Pentium 3 600e, Voodoo 3 3000 (16-bit color, unofficial drivers) = 1243 3dmarks
    Pentium 3 600e, GeForce 4MX 420 = 2495 3dmarks
    Pentium 3 600e, GeForce FX 5200 64-bit = 3100 3dmarks
    Pentium 3 600e, GeForce FX 5700LE = 4438 3dmarks
    Pentium 3 550, GeForce FX 5200 64-bit = 2785 3dmarks
    Pentium 3 550, Radeon 9200 = 3530 3dmarks
    Pentium 2 350, Radeon 9200 = 2643 3dmarks
    
    Socket 370 (i815 chipset, Win98/2k)
    Celeron 533, GeForce 2 GTS = 1381 3dmarks
    Pentium 3 733, nVidia TNT (800x600) = 1011 3dmarks
    Pentium 3 733, nVidia TNT2 M64 (800x600) = 1162 3dmarks
    Pentium 3 733, Matrox G450 (800x600)= 1150 3dmarks
    Pentium 3 733, Matrox G450 = 896 3dmarks
    Pentium 3 933, i815 integrated (16-bit color) = 660 3dmarks
    Pentium 3 933, GeForce 2MX 200 = 1533 3dmarks
    Pentium 3 933, GeForce 4MX 420 = 3213 3dmarks
    Pentium 3 933, GeForce 2 GTS = 3106 3dmarks
    Pentium 3 933, GeForce FX 5200 64-bit = 3794 3dmarks
    Pentium 3 933, Radeon 9200 = 5059 3dmarks
    Pentium 3 933, GeForce FX 5700LE = 5230 3dmarks
    
    Socket A (Biostar M7VIP-pro, VIA KT333 chipset, DDR333, Win2k)
    Sempron 2300+ (1583MHz), nVidia TNT2 M64 (800x600) = 1884 3dmarks
    Sempron 2300+ (1583MHz), GeForce 2MX = 2913 3dmarks
    Sempron 2300+ (1583MHz), GeForce 4MX 420 = 3450 3dmarks
    Sempron 2300+ (1583MHz), GeForce 4MX 420 195MHz RAM = 3900 3dmarks
    Sempron 2300+ (1583MHz), GeForce 2 GTS = 4171 3dmarks
    Sempron 2300+ (1583MHz), Radeon 9200 = 6666 3dmarks
    Sempron 2300+ (1583MHz), GeForce 6200 = 7291 3dmarks
    Sempron 2300+ (1583MHz), GeForce FX 5700LE = 7790 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5200 64-bit = 4884 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5700LE = 8425 3dmarks
    Athlon XP 2500+ (1833MHz), GeForce FX 5700LE 20% overclock = 9280 3dmarks
    
    Socket A (nForce2 chipset, Win2k)
    Athlon XP 2800+ (2083MHz), GeForce 4MX integrated = 3683 3dmarks
    Athlon XP 2800+ (2083MHz), GeForce FX 5700LE = 8977 3dmarks
    Athlon XP 2800+ (2083MHz), Quadro FX 1000 = 10771 3dmarks
    
    Socket 479 (Itox mini-ITX board, i855GM chipset, DDR333, Win2k)
    Pentium M 2MB 1.6GHz, i855 integrated = 2500 3dmarks
    Pentium M 2MB 1.6GHz, GeForce 4MX PCI = 4080 3dmarks
    Pentium M 2MB 1.6GHz, GeForce 8400GS PCI = 9200 3dmarks
    
    Socket AM2 (nForce 430 chipset, DDR2-800, Win2k)
    Athlon X2 4850e (2.5GHz), GeForce 6150 integrated = 5481 3dmarks
    Athlon X2 4850e (2.5GHz), Quadro NVS 285 = 9125 3dmarks
    Athlon X2 4850e (2.5GHz), Radeon X1300 64-bit = 10291 3dmarks
    Athlon X2 4850e (2.5GHz), Quadro NVS 290 = 13944 3dmarks
    Athlon X2 4850e (2.5GHz), GeForce 210 520/600 = 17900 3dmarks 
    Athlon X2 4850e (2.5GHz), GeForce 9500 DDR2 450/400 = 19887 3dmarks
    Athlon X2 4850e (2.5GHz), GeForce GT220 625/700 = 21626 3dmarks
    
    Socket AM2+ (Biostar A770-A2+, AMD 770 chipset, DDR2-800, Win2k)
    Athlon X2 4850e (2.5GHz), GeForce 7600GT = 23001 3dmarks
    Athlon X2 7850 (2.8GHz), GeForce 7600GT = 29945 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce 7600GT = 29055 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 4550 600/800 = 24945 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce 9500 DDR2 450/400 = 25815 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 7510 650/700 = 31000 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 5570 650/700 = 32828 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce GT220 625/700 = 32602 3dmarks
    Athlon II X2 260 (3.2GHz), GeForce GT240 550/1000 = 32689 3dmarks
    Athlon II X2 260 (3.2GHz), Radeon HD 3850 = 34945 3dmarks
    
    Laptops (Win2k/XP)
    Dell Lattitude C400, Pentium 3M 1.33GHz, i830 integrated = 1080 3dmarks
    Fujitsu Lifebook B6110D, Pentium M 2MB 1.2GHz, i915 integrated = 3565 3dmarks
    NEC Versa S820, Pentium M 1MB 1.0GHz, Mobility Radeon 7500 64-bit = 4060 3dmarks
    Fujitsu Lifebook S series, Turion 64 X2 1.6GHz, DDR2-667, Radeon Xpress 200m = 4200 3dmarks
    Toshiba Portege M200, Pentium M 1MB 1.7GHz, GeForce Go 5200 64-bit = 4765 3dmarks
    Fujitsu Lifebook S series, Turion 64 2.0GHz, DDR2-800, Radeon Xpress 200m = 5300 3dmarks
    
    2017 Dec 20 power consumption
    Video cards at idle:
    GeForce 7600GT (560MHz) - 15W
    GeForce 9500 DDR2 (450MHz) - 10W
    
    cards that throttle to a lower speed at idle:
    GeForce 210 (135MHz) - 2.5W
    GeForce GT 220 (135MHz) - 7.5W
    GeForce GT 240 (135MHz) - 7.5W
    Radeon HD 3850 (300MHz) - 12W
    Radeon HD 4550 (110MHz) - 2.5W
    Radeon HD 5570 (400MHz) - 10W
    

    I also tested an IBM AT (286) loaded up with ISA memory board, VGA, sound, disk controller, and 3.5" harddisk. It drew 32W. A VLB 486DX4-100 system drew 28W.

    I test power consumption using a shunt resistor (10 ohms) in the AC supply and calculating the current and power draw from the voltage drop. (In the case of video cards, it's necessary to estimate and subtract out the power for the rest of the system.)

    2017 Oct 21 registry tweaks

    Use HDDs larger than 128GB in Windows 2000 by creating this DWORD registry key and setting the value to 1:

    HKLM\System\CurrentControlSet\Services\Atapi\Parameters\EnableBigLba 
    

    2017 Jan 17 GUI thought experiment