NOWUT v0.35 - a programming language and compiler

At this stage of development, both the language and compiler are incomplete. Errors may not be caught,
bugs may bite (see below for list of known bugs), and code will be suboptimal.

However, NOWUT can successfully compile itself, as well as several demo programs.

The compiler is licensed under the GPL (see COPYING). (Example programs included in the archive should be
considered public domain unless stated otherwise.)

http://www.hyakushiki.net/nowut.htm
http://www.hyakushiki.net/anachro.htm
damage_x@hyakushiki.net


----Contents----

1) About the NOWUT compiler
  a. compilation
  b. platform-specific code
  c. linking

2) About the NOWUT language
3) Example function
4) Symbols, labels, variables, constants
5) Operands
6) Calculation/assignment
7) NOWUT language statements/instructions
8) Functions/procedures

9) Internal assembler
  a. x86 instruction set
    (and 8086 mode peculiarities)
  b. 68K instruction set
  c. SH2 instruction set
  d. MIPS instruction set
  e. ARM instruction set
  f. Z280 instruction set

10) Known bugs and limitations
11) Changes from the last version


----About the NOWUT compiler----

  The NOWUT compiler is a program written in NOWUT, which can now run on Win32, DOS, X68000, Amiga, EmuTOS,
and i386 Linux. It produces a COFF object file as output.

  By default, the OBJ files contain exactly one each BSS, data, and code sections, plus a ".drectve" section
which can be used to pass dynamic linking info to the linker. The SECTION statement can now be used in a
source file to specify additional sections with custom names and attributes.

  ----Compilation----

Command line:

        NOWUT platform [-one] [-lst] file

  The input file is assumed to have the extension .NO, therefore FILE.NO will be read and FILE.OBJ will
be written. 

  Since release 0.30, the platforms available in the compiler depend on which CPU modules were linked when
the compiler itself was built. It is possible to omit modules to produce a smaller build of NOWUT, mainly
for the sake of staying within real-mode DOS memory limits. Starting NOWUT with no command line arguments
will display a list of which CPU modules are present.

The CPUX86 module supports these platforms:

  386 - generates x86 32-bit code, outputs a standard COFF object file which can be fed to GoTools GoLink
        to create a PE executable, or to LINKBIN to create an ELF executable.

  8086tiny - generates 8086-compatible code which is limited to a 16-bit address space (CS/DS/ES/SS all
        assumed to be set to the same value), outputs a nonstandard COFF file which can be fed to LINKBIN
        to create a .COM file or MZ executable. 32-bit operations are done using multiple 16-bit
        operations, this causes some code bloat.

  8086 - similar to 8086tiny except that (non-stack) memory references are preceeded by reloading the
        DS register (causes even more code bloat). This means that initialized data and BSS can now be
        larger than 64KB. Code size is limited to 65472 bytes. Stack size is determined by a constant
        within LINKBIN (default is 1KB). Pointers still use a 32-bit linear address space! However,
        anything beyond 2MB is not valid. The mapping between logical and physical addresses is determined
        by a segment lookup table located in the first 64 bytes of the code segment (32 words = 32 64KB
        segments = 2MB). This table is set up by the program itself or by the PIODOS platform code. When it
        is setup by PIODOS, the first 640KB is relative to the beginning of the program (minus 64 bytes) and
        640K-1M is mapped to absolute addresses (ie. $A0000 is VGA buffer). The table entries for 1MB-2MB are
        unused but left open to be used for addressing other potentially useful segments (eg. BIOS data area,
        DOS environment, etc.). Be sure to keep your words/dwords aligned so they don't straddle a 64KB
        boundary. Output can only be used to create an MZ executable.

        The internal x86 assembler has been expanded to include all instructions of the Pentium MMX.
        However, a few mnemonics have changed and a nonstandard syntax is used.

        It's possible to mix 16-bit and 32-bit code within one program by using BITSxx to switch modes,
        but this hasn't been tested much.

The CPU68K module supports these platforms:

  68000 - generates plain 68000 code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN
        to create Amiga, Genesis, .PRG, or X68000 executables.

        The internal 68K assembler now handles all 68000 and 68010 instructions and all of their addressing
        modes. It also uses a nonstandard syntax.

The CPUSH module supports these platforms:

  sh2 - generates SuperH code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to
        create Sega 32X or Saturn binaries.

  sh4 - essentially the same as SH2, but with little-endian byte order. Can be fed into LINKBIN to generate
        a Dreamcast binary. (Hint: use the SCRAMBLE utility, then add an IP.BIN, and use DIR2BOOT to produce
        a Dreamcast disc image.)

        The internal SuperH assembler handles all SH2 instructions except one, and uses a nonstandard syntax.

The CPUMIPS module supports these platforms:

  mips - generates VR4300-compatible (MIPS III) code and outputs a standard(-ish?) COFF file. This can be
        fed into LINKBIN to create Nintendo 64 binaries. (Hint: use rn64crc to add the correct checksum.)
  mipsle - uses little-endian byte order (for PIC32)

        The internal MIPS assembler uses a number of consolidated mnemonics, and a nonstandard syntax. It
        supports 32- and 64-bit instructions of the VR4300, but not floating-point. Additional instructions
        have been added from the MIPS32 Release II specification.

The CPUARM module supports these platforms:

  armle - generates 32-bit ARM code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to
        create a GBA ROM image or RISC OS executable.
  armbe - uses big-endian byte order

        The internal ARM assembler supports most of ARM4, using a slightly modified set of mnemonics and
        a nonstandard syntax.

The CPUZ280 module supports these platforms:

  z280 - supports assembly only (does not compile NOWUT code) and outputs a nonstandard COFF file. This can
        be fed into LINKBIN to create a flat binary using the DOSCOM option.

(See the assembly section for details.)

NOWUT currently supports these command line switches:

  -one - makes the compiler do only one pass instead of two. Doing only one pass means that jump/branch
        instructions in the generated code will be long versions. Does not work if the source file has
        non-default number of sections.

  -lst - makes the compiler generate a .LST file.

  -opt - toggles compiler/assembler optimizations. Normally they are on by default and this would then turn
        them off. (useful for debugging purposes)


  As of version 0.27 the language is beginning to have floating-point support, but only for the '386'
platform. The x86 assembler supports x87 FPU instructions and the parser can convert decimal values into
their single-precision FP representation (with some limits on range and precision). Note that FP code needs
to have a stack frame setup, so it can only be used after a BEGINFUNC statement. (And you might need to
stick an FINIT somewhere.)


  ----Platform-specific code----

  While NOWUT is linked with CPU modules to allow it to target different platforms, it is also linked with
a platform-I/O module to allow it to run on different platforms. Other programs can also use a platform-I/O
module. To do so, the program must call INITPLATFORM upon starting. This function will return the address
of a (null-terminated) command line parameters string (if any). These routines are currently available in
each module:

        initplatform
        endprogram
        printel                (explicit length)
        printnt                (null-terminated)
        printhex16
        printhex8
        printhexr
        printhex
        fileopen
        filecreate
        fileclose
        fileread
        filewrite
        fileseek               (currently limited to 32-bit file offset)
        fileskread             (seek and read)
        fileskwrite            (seek and write)
        filegetsize
        pioquerytimer          (returns a count in milliseconds. accuracy depends on platform.)
        appgetpath             (returns a null string on EmuTOS and Linux, returns PROGDIR: on Amiga)
        memrequest
        memreturn

These are the platform-I/O modules that are included:

  PIOAMI - Amiga version. Records the initial stack pointer and opens dos.library during init. Closes it upon
exiting as well as closing any open files to prevent memory leaks. Converts backslashes in file names to
forward slashes.

  PIODOS - 8086 (DOSEXE) version. Sets up segment table during init. (Does not work with 8086tiny or .COM)

  PIOGEM - EmuTOS/GEMDOS version. Adjusts memory allocation during init.

  PIOLNX - 386/Linux version. Joins command line arguments into one big string during init. Converts
backslashes in file names to forward slashes.

  PIOWIN - 386/Win32 version. Gets a console handle during init. Skips over the program name part of the
command line. Can also be used for 32-bit DOS programs with the WDOSX Win32 wrapper.

  PIOX68 - X68000 version.


  Code that aims to be cross-platform should always take endianness and memory alignment into account.
On the 68000, words and dwords must be aligned to 2-byte boundaries. On SH2, MIPS they must be aligned to
2-byte or 4-byte boundaries. NOWUT includes ALIGNW and ALIGND statements for this purpose. The statements
LOADBIG and LOADLITTLE are provided to read words/dwords with a particular endianness on any CPU. (Note
that the 386 LOADBIG code uses the BSWAP instruction which is only available on 486+)

  
  ----Linking----

  OBJ files produced in 386 mode can be linked using GoTools GoLink to make a Win32 executable. Using the
/base switch (eg. /base 00400000) causes GoLink to generate relocation data. These executables can then also
be used with Win32s (Windows 3.x) and may be used in a DOS environment with the WDOSX stub (a DPMI host
which implements a subset of Win32 functions).

  GoLink requires a list of applicable DLLs as command line arguments. I use this command to compile my
Win32 stuff:

        golink %1.obj kernel32.dll user32.dll gdi32.dll winmm.dll /console

  As of NOWUT 0.26, it is possible to specify library names in the source code with LINKLIBFILE, which
removes the need for specifying them on the command line.

  LINKBIN is provided to transform one or more OBJ files, for targets other than Win32, into useful
executable formats. LINKBIN has been tested with up to seven input files at a time. Modify the MAXFILES
parameter in the source to enable more than this. (In theory, it should work...)

  LINKBIN version 0.30 and recent versions of GoLink include functionality that allow sections in COFF
object files to be arranged in the executable according to an alphanumeric index. This is necessary to
build the new Modular-NOWUT:

  - normal data sections (named ".data" in each OBJ) are concatenated as usual
  - sections named ".data$4" are placed -after- the normal ones
  - a section named ".data$x" is placed -after- those (because 'x' has a higher ASCII value than '4' does)

example command line for linking Win32 NOWUT:

        golink nowut.obj cpux86.obj cpu68k.obj cpush.obj cpumips.obj cpuarm.obj piowin.obj /console

example command line for linking real-mode DOS NOWUT with only x86 and 68k support:

        linkbin dosexe nowut cpux86 cpu68k piodos

  LINKBIN version 0.33 has command line options to override the default section base addresses. These are
-oc, -od, and -ob for code, data, and BSS, respectively. This is useful for generating a binary that needs
to run at a certain address.

These formats are supported by LINKBIN:

  genesis, 32x, 32x/68k - Sega Genesis/MD and 32x ROM images - SH2 or 68K side        

  amiga - Amiga 68K "hunk" format

  doscom, dosexe - 8086 PC .COM and .EXE programs

  dcast - SH4 binary based at $8C010000

  elf386 - an i386 Linux executable
           (output files need the filesystem executable flag set with chmod +x)

  gba - a Gameboy Advance ROM image

  n64 - a Nintendo 64 ROM image, automatically padded to at least 1MB for boot code requirements
        (Checksum is not calculated! Use rn64crc to do this.)

  optrom - 8086tiny ISA adaptor ROM (hasn't been tested yet)

  pic32 - little-endian MIPS binary with flash ROM at $BD000000 and RAM at $A0000000

  prg - runs under EmuTOS

  riscos - runs under RISC OS

  saturn, satsnd - Sega Saturn SH2 binary (load and execute at $06022000) and 68K sound CPU.

  x68 - Sharp X68000 executable (Human68K)

  The Sega 32X hardware is inactive when the system powers on. The 68K has control, and must enable the
32X. When generating a 32X ROM image with SH2 code, a stub file (68KPART8.32X) occupies the first 4KB of
the final image. The stub performs initialization and hands control over to the SH2. It also polls the
controller ports and passes the data to the SH2 through shared registers. The source code for the stub is
68KPART8.NO. The 32X master SH2 begins execution at the beginning of its code section. The slave SH2 starts
in an idle loop contained within the stub file. A dword write to address $26001020 will cause it to jump
to that location (the address in the dword that is written) so the second CPU can be utilized.

  N64 binaries also begin with a header and boilerplate boot code. This is linked from N64STUB.BIN


----About the NOWUT language----

  The goal was to combine certain aspects of assembly and high-level languages in a different way than what
has been done before.

NOWUT borrows these ideas from assembly:

  simplistic syntax consisting of instructions followed by operands 

  manual layout of initialized data, uninitialized data, and data structures

  no enforcement of data types

It borrows these ideas from HLLs:

  avoids (mostly) being CPU specific

  no micro-management of CPU register usage

  calculations can be specified in a form similar to mathematical notation (assignment)

NOWUT also handles inline assembly code with a nonstandard syntax.

  I should mention that the name NOWUT is an acronym which expands to "No One Will Use This." I figure that
if anyone else shared my taste in programming languages, I wouldn't have been forced to create my own! I
suspect that NOWUT may not become extremely popular.


----Example function in NOWUT----

        ; this line is a comment. comments are preceded by ' or ; characters

examplefunc:                                        ; examplefunc is a label (address)

        beginfunc param1.d,param2.d                 ; this function receives two parameters
                                                    ; they will be referred to as param1 and param2
                                                    ; and are dwords (32-bit words)

        localvar xx.d                               ; a local dword variable will be added to the stack

        xx=param1+1                                 ; here is an example of assignment

        countdown xx                                ; this begins a simple type of loop

        param2=_ shl 1                              ; an underscore refers back to the left side of
                                                    ; the equals sign.
                                                    ; param2 becomes (itself) shifted left one bit

        nextcount                                   ; this is the end of the "countdown" loop

        endfunc param2                              ; the function's return value will be equal to
                                                    ; the value of param2

        returnex 8                                  ; this causes program flow to return to the caller.
                                                    ; it also removes 8 bytes from the stack (important)
                                                    ; which had been occupied by param1 and param2



----Symbols, labels, variables, constants----

  Labels, variables, and constants are considered symbols. Symbol names are currently limited to 64
characters. (Currently, long symbol names may cause internal buffers to overflow and generate an
error message.) Symbol names are not case-sensitive, though a particular case can be written to the
output file for the benefit of case-sensitive linkers (the EXTERN statement is used for this purpose).
Symbol names must contain at least one letter, and may contain other characters EXCEPT these:

 ' ; : ! $ & ( ) [ ] , . ? + - * / = " >

For future compatibility, it's probably best that no ASCII characters below $40 be used.

Symbol names that are the same as CPU register names should not be used.

  Normally, labels are defined with a colon. An exclamation point is used instead of a colon to define
an exported symbol (can be referred to in other modules). For instance, when using the GoTools GoLink
linker, the program's entry point should be defined like so:

start!

  (The program entry point for other platforms is the beginning of the code section.)

  Every label has an address associated with it, although the actual address is not determined until
linking or upon the executable code being loaded by the operating system. The label "examplefunc" can
be used as the target of a jump/branch/call (in assembly), a goto/gosub/callex (in NOWUT), it can
have its address used in a calculation (eg. xx=examplefunc+40), or it can be used to refer to memory
contents (eg. xx=examplefunc.d).

  Labels and global variables are actually the same thing, except that labels used to refer to either
initialized data or uninitialized data will generally be defined with an appropriate default type.
However, this is not required, and when a symbol is referenced the default type can be overridden.

exampleaddr.a:                ; exampleaddr will be handled as an address
exampleaddr:                  ;      -       -     -        -     address (same as .a)
exampleaddr.b:                ;      -       -     -        -     byte value
exampleaddr.sb:               ;      -       -     -        -     signed byte value
exampleaddr.w:                ;      -       -     -        -     word value (16-bit)
exampleaddr.sw:               ;      -       -     -        -     signed word value
exampleaddr.d:                ;      -       -     -        -     dword value (32-bit)
exampleaddr.sd:               ;      -       -     -        -     signed dword value
exampleaddr.fd:               ;      -       -     -        -     floating-point value (32-bit)

  The default type is used when a label is referenced without any type tag (ie. no dot anything).

xx=exampleaddr                ; operation depends on the default type
xx=exampleaddr.a              ; always loads the address
xx=exampleaddr.d              ; always loads a dword

  The default type is determined when a label is FIRST referrenced (not necessarily when it is defined!).
Because the compiler only does one pass of the source code, it is therefore recommended to place any
initialized or uninitialized data BEFORE the program code. 

Recommended:

        sectionbss

buffer.d:
        resd 16

        sectiondata

dwordval.d:
        dd $1234ABCD

        sectioncode

        buffer(4)=dwordval


Problematic:

        sectioncode

        buffer.d(4)=dwordval      ; dwordval will be interpreted as an address
                                  ; because its definition has not yet been read by the compiler

        sectiondata

dwordval.d:
        dd $1234ABCD

        sectionbss

buffer.d:
        resd 16

  Alternatively, the DEF statement can be used at the beginning of a source file (regardless of section) to
manually initialize a symbol's default type. This is especially useful when referencing symbols in another
module.

  Variables that are defined by a beginfunc or localvar statement are located on the stack and have some
differences. The first is that the names may be used independently in multiple functions, and they have
no relevance outside of the function in which they are defined. The second difference is that the address
of a stack variable is not valid.

        localvar xx.d,yy.d

        xx=yy.a                ; this does NOT work**
                    
This functionality might be added to a later version of the compiler. But currently, the compiler passes
variables to the internal assembler as-is, and the assembler simply alters the addressing mode to refer
to the stack, it does not insert additional instructions (eg. an LEA) to determine the address.

**In fact, referencing the address of a stack variable is now allowed for 386 and 68000

Because the SH2 CPU only has register-indirect addressing with a 4-bit positive displacement, the SH2
version of the NOWUT compiler sets aside two registers for local variables. It limits the number of
local variables to 32, and they must be dwords (signed or unsigned). Function parameters are limited to
12 and also must be dwords.


Signed and unsigned types are handled the same during many operations, but there are some where it is
important to differentiate:

loading smaller data sizes:

        xx=array.b(5)                ; the byte will be zero-extended
        xx=array.sb(5)               ; the byte will be sign-extended
        xx=array.w(10)               ; the word will be zero-extended
        xx=array.sw(10)              ; the word will be sign-extended

comparisons:

        ifgreater xx.d,0,branchtarget        ; will branch unless xx is 0
        ifgreater xx.sd,0,branchtarget       ; will branch unless xx is 0 or negative

shifting right:

        xx=yy.d shr 1                ; top bit will become 0
        xx=yy.sd shr 1               ; top bit will become 0 (doh!)
        xx=yy shr 1.sb               ; top bit will remain the same

In previous versions of NOWUT, tagging the shift count as signed was the only way to do a signed shift-right
because the evaluator uses the second operand to make the decision on whether to do a signed operation. A
'sar' operator has been added as an alternate way to explicitly choose the signed operation.

        xx=yy sar 1                  ; top bit will remain the same


Constants are defined with the CONST statement:

        const secretvalue,3579545

References to the symbol (eg. secretvalue) will be replaced with the value. Only numeric values are
allowed (although this includes ASCII values).


Numeric values containing a decimal point which is followed by additional digits (in contrast with a data
size/type tag) are assumed to represent floating-point data. Without the decimal point, it is assumed to be
integer. Code may be generated to convert between the two depending on the operation being performed, but
this behavior can be overridden using tags.

        x.d=1             ; source is integer, destination is integer, no conversion needed.
                          ;   x becomes $00000001
        x.d=1.0           ; 1.0 begins as $3F800000 (in FP32 form), but it is converted back to 1 before
                          ;   being written to a non-FP variable. x becomes $00000001
        x.d=1.0.d         ; 1.0 begins as $3F800000 (in FP32 form), and because of the .d tag it is NOT
                          ;   converted before being written. x becomes $3F800000
        x.d=1.fd          ; the value $00000001 is treated as FP32 data, it is converted to integer before
                          ;   being written to a non-FP variable. x becomes 0
        x.fd=1.0          ; source is FP, destination is FP, no conversion needed.
                          ;   x becomes $3F800000
        x.fd=1            ; 1 is converted to FP32 form before being written to an FP variable.
                          ;   x becomes $3F800000


----Operands----

  Basically, everything that is accepted as part of an assignment/calculation or as an argument to an
instruction is considered an operand. These include numeric values, strings, symbols, and combinations
of such.

  format in NOWUT        format in assembly                description

1234                        1234                        decimal number                \
12.34                       12.34                       floating-point number          \
$1234                       $1234                       hex number                      \
0x1234                      0x1234                      hex number                       \  immediate
"a".b                       "a".b                       ASCII byte                       /
"ab".w                      "ab".w                      ASCII word                      /
"abcd".d                    "abcd".d                    ASCII dword                    /
constname                   constname                   a symbol defined with CONST   /
                            99000.h                     high word of a value (8086 assembly only)
varname                                                 address or memory reference
                            varname                     address
varname.a                   varname.a                   address
                            varname.h                   high word of address (8086 assembly only)
varname.b                                               memory reference (byte variable)
varname.sb                                              memory reference (signed byte variable)
varname.w                                               memory reference (word variable)
varname.sw                                              memory reference (signed word variable)
varname.d                                               memory reference (dword variable)
varname.sd                                              memory reference (signed dword variable)
varname.fd                                              memory reference (FP32 variable)
                            [varname].b                 memory reference (byte variable)
                            [varname].w                 memory reference (word variable)
                            [varname].d                 memory reference (dword variable)
                            [varname].h                 memory reference (high word, 8086 assembly only)
                            [varname].q                 memory reference (x87, MIPS 64-bit)
                            reg                         a CPU register
                            reg.b                       a byte CPU register (68K only)
                            reg.w                       a word CPU register (68K only)
                            reg.d                       a dword CPU register (68K only)
                            [reg]                       memory reference (address contained in reg)
                            [reg].b                     memory reference (byte at address in reg)
                            [reg].w                     memory reference (word at address in reg)
                            [reg].d                     memory reference (dword at address in reg)
                            [reg].q                     memory reference (64-bit, x87 floating-point only)
                            [reg+]                      memory reference with post-increment (68K)
                            [reg+].b                    memory reference with post-increment (68K and SH2)
                            [reg+].w                    memory reference with post-increment (68K and SH2)
                            [reg+].d                    memory reference with post-increment (68K and SH2)
                            [reg-]                      memory reference with pre-decrement (68K)
                            [reg-].b                    memory reference with pre-decrement (68K and SH2)
                            [reg-].w                    memory reference with pre-decrement (68K and SH2)
                            [reg-].d                    memory reference with pre-decrement (68K and SH2)
                            [reg+xx].x                  other indirect addressing modes (CPU dependent)
[varname].b                                             indirect memory reference (byte variable)
[varname].sb                                            indirect memory reference (signed byte variable)
[varname].w                                             indirect memory reference (word variable)
[varname].sw                                            indirect memory reference (signed word variable)
[varname].d                                             indirect memory reference (dword variable)
[varname].sd                                            indirect memory reference (signed dword variable)
[varname].fd                                            indirect memory reference (FP32 variable)
varname(xx)                                             indexed address or memory reference
varname.a(xx)                                           indexed address
varname.b(xx)                                           indexed memory reference (byte)
varname.sb(xx)                                          indexed memory reference (signed byte)
varname.w(xx)                                           indexed memory reference (word)
varname.sw(xx)                                          indexed memory reference (signed word)
varname.d(xx)                                           indexed memory reference (dword)
varname.sd(xx)                                          indexed memory reference (signed dword)
varname.fd(xx)                                          indexed memory reference (FP32)
"abcxyz123"                                             string
"abcxyz123".a                                           address of a string
_                                                       "itself" (left side of an assignment)


Note that a blank operand in a series of operands separated by commas will be interpreted as 0.

        callex result.d,somefunction,param1,,,param4        ; zeros are passed as 2nd and 3rd parameters

  Also: indexed symbols with a number as index can now be handled the same as plain symbols (ie. the
addition is done during linking instead of at run time).

        base.d(4)=0                ; both statements generate the same number of instructions
        base.d=0                   ;

But don't use indexed symbols for jumps/calls/branches as this is not guaranteed to work:

        goto place(8)              ; outcome is uncertain

  There are some differences between operand formats in NOWUT vs. the internal assembler. Some CPU
instructions make indirect memory references while using the same encoding as instructions which access
memory without the indirection (and use the same syntax in their standard assembly languages), hence
these ambiguities persist here. In assembly mode, memory references always use square brackets around
a register name, address, or symbol. In NOWUT language mode, square brackets are only used for indirect
memory references. Also in NOWUT language mode, calculations are currently not allowed inside of square
brackets.

[address+48].d=65                                ; this does NOT work

tempvar=address+48 > [tempvar].d=65              ; use this instead

ea.d(address+48)=65                              ; or this

The 'ea' special symbol works like other indexed symbols but has a base address of zero. It is more
flexible than an indirect memory reference but it counts as a 'calculation' operand type, so it can't be
used in every place that a memory reference can.

  Plain strings are only used for initialized data, the EXTERN statement, and the INCBIN statement. However
the address of a string can be used as with any other address, with the string data being dumped at the end
of the section. 

messageptr="stuff happens".a                ; pointer to a null-terminated string

messageptr2="stuff happens"r.a              ; the letter "r" adds a CR/LF

messageptr3="line 1"r"line 2"rr.a           ; muliple CR/LFs are possible, even inside a string

widestrptr="hello world"w.a                 ; the letter "w" at the end will insert a 0 byte after each
                                            ; character for Win32 APIs that use UCS-2 / UTF-16 encoding

  Strings that will be appended to the end of the section are buffered during compilation. The number of
strings that can be buffered is limited by the MAXSTRINGS constant in NOWUT.NO.

There is currently no other support for strings in the NOWUT language.


register names:

x86 -   eax ecx edx ebx esp ebp esi edi
        ax  cx  dx  bx  sp  bp  si  di
        al  cl  dl  bl  ah  ch  dh  bh
        es  cs  ss  ds  fs  gs
        cr0     cr2 cr3 cr4
        dr0 dr1 dr2 dr3         dr6 dr7
        st0 st1 st2 st3 st4 st5 st6 st7
        mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7

Since x86 registers have an inherent size, no size tag is needed.


68K -   d0  d1  d2  d3  d4  d5  d6  d7
        a0  a1  a2  a3  a4  a5  a6  a7
        ccr sr  pc

Data registers may be byte, word, or dword size. Address registers may be word or dword.


SH2 -   r0  r1  r2  r3  r4  r5  r6  r7
        r8  r9  r10 r11 r12 r13 r14 r15
        macl mach sr   gbr  vbr  pr  pc

SH2 registers are all 32 bits, no size tag is needed.


MIPS -  r0  r1  r2  r3  r4  r5  r6  r7
        r8  r9  r10 r11 r12 r13 r14 r15
        r16 r17 r18 r19 r20 r21 r22 r23
        r24 r25 r26 r27 r28 r29 r30 r31

MIPS registers are either 32 or 64 bits depending on the instruction. No size tag is needed.


ARM -   r0  r1  r2  r3  r4  r5  r6  r7
        r8  r9  r10 r11 r12 r13 r14 r15
        spsr cpsr

ARM registers are all 32 bits, no size tag is needed.


Z280 -  b  c  d  e  h  l  a
        ixh  ixl  iyh  iyl
        bc  de  hl  sp  ix  iy  i  r
        pc  af  usp

All Z280 registers have an inherent size, no size tag is needed.


----Calculation/assignment----

  A calculation is a series of operands and operators that cause a value to be computed at runtime. This
includes steps taken to calculate the address of a memory reference. Assignment causes a value to be
stored in a memory location. Assignment uses the equals sign.

somevar=foo+1*100                

The value of foo is loaded, 1 is added, it's multiplied by 100, the result is stored in somevar.

Needless to say, somevar should be a memory reference. If its default type is .a (an address) then
compilation will fail with an error message.

  Operations are performed in order from left to right, except where parenthesis are used to specify
that a different order should be used. When the compiler encounters parenthesis it pushes the current
value on the stack, performs the calculation inside the parenthesis, then pops the stack to continue.
Because the compiler doesn't attempt to optimize such things, manually re-ordering operations to avoid
parenthesis results in sleeker code. 

somevar=foo+(bar*100)                ; OK, but suboptimal
somevar=bar*100+foo                  ; better

Currently, parenthesis may be nested up to 20 levels deep.

supported operators:

  +    addition
  -    subtraction
  *    multiplication
  /    division
  and  logical AND
  or   logical OR
  shl  logical shift-left
  shr  logical or arithmetic shift-right
  sar  arithmetic shift-right
  xor  logical exclusive OR

When mixing floating-point and non-floating-point operands in a calculation (on an FP-capable platform)
non-FP values are generally converted to FP and the operations take place in the FPU. However, this is not
true for any of the logical operations. These are all performed using normal ALU instructions. In the case
of logical ops, FP data is used in its raw form without any conversion from FP to integer.

The parser handles operands and operators as a unit, and as a side effect of this, extra spaces should
not be inserted between them.

somevar=foo +10                  ; bad
somevar=foo+ 10                  ; acceptable
somevar=foo shl 2                ; good
somevar=foo  shl 2               ; bad

The underscore operand refers to the target of an assignment. It is particularly handy when the target
involves a complex address calculation that would otherwise need to be repeated.

array(index shl 4+12)=array(index shl 4+12)+20                ; OK, but ugly and slow
array(index shl 4+12)=_+20                                    ; better

Indices on symbols are always calculated in terms of bytes. If you want to access a numbered word in an
array of words, be sure to use a shift in your index.

array.d(0)=$00000001                ; the first dword
array.d(4)=$00000002                ; the second dword
array.d(N shl 2)=n                  ; the Nth dword

Hence, mixing of bytes, words, and dwords in a data structure is easily accomplished. But unaligned
memory accesses are also possible, and may not be desirable. The same memory location can also be
accessed with differing data sizes at different times, however this may cause unexpected results if
endianness is not taken into account.

  Calculations are sometimes accepted as operands in place of memory references or numeric values. See
the next section for details.


----NOWUT language statements/instructions----

The following is a list of recognized instructions in NOWUT language mode:

  instruction     number of operands   type of operand       description

ALIGNW            none                                       adds a byte if necessary to ensure an
                                                               even address

ALIGND            none                                       adds bytes if necessary to ensure an
                                                               address divisible by 4

ALIGNQ            none                                       adds bytes if necessary to ensure an
                                                               address divisible by 8

ALIGN16           none                                       adds bytes if necessary to ensure an
                                                               address divisible by 16

ASM               none                                       switches to assembly language mode

BEGINFUNC         variable             symbol                pushes registers on the stack and
                                                               optionally defines any parameters that
                                                               were provided by the caller
  Note:
        parameters will be listed in the reverse order compared to CALLEX

BITS16            none                                       switches x86 compilation mode to 8086

BITS16S           none                                       switches x86 compilation mode to 8086tiny

BITS32            none                                       switches x86 compilation mode to 386 

CALLAM            1                    memory reference      68000 only: used for Amiga OS system calls.
                  6                    calculation             First parameter is the return value.
                                                               Second is the base address.
                                                               Third is the function offset.
                                                               The remaining parameters are values to be
                                                               loaded in registers D0, D1, A0, A1
  example:
        callam dosbase.d,[4].d,-552,0,0,0,"dos.library".a        ; Exec - openlibrary

CALLEX            1                    memory reference      pushes any parameters on the stack, calls
                  1                    calculation             a function, and optionally stores a
                  variable             calculation             return value
  examples:
        callex result.d,functionname                 ; function call that receives a return value
        callex ,functionname                         ; function call without return value
        callex ,functionname,param1.d,foo*320+bar    ; function call with two parameters
        callex ,jumptable.d(x shl 2)                 ; calculating a function address
  Note:
        parameters are pushed on the stack from left to right and will appear in the reverse order
          compared to BEGINFUNC

CONST             1                    symbol                associates a value with a symbol
                  1                    immediate
  example:
        const bufsize,16384                          ; occurrences of bufsize are replaced with 16384

COPYBYTES         1                    calculation           copies an array of bytes from source address
                  1                    calculation             (first parameter) to destination address
                  1                    calculation             (second parameter). Third param is count.
  example:
        copybytes string1.a,string2.a+someoffset,foo-1
  Note:
        If the count parameter is equal to zero then no bytes are copied. Source and destination memory
          areas should not overlap. Count is limited to 32768 on 8086, and 65536 on 68000.

COUNTDOWN         1                    memory reference      marks the beginning of a loop
  example:
        countdown xx                                 ; 
        [...]                                        ; the code inside will execute xx times, and
        nextcount                                    ; afterward xx will be 0

DB                1 or more            immediate, string     initialized data

DW                1 or more            immediate, string     initialized data

DD                1 or more            immediate, string,    initialized data
                                         address, indexed
                                         address           

DEF               1 or more            symbol                provides the default type for a new symbol
                                                               (does not change an existing default)
  example:
        def var1.d,value2.b                          ; tells the compiler that var1 will default to
                                                     ; dword size, and value2 to byte size

END               none                                       jumps to the ENDPROGRAM routine in
                                                               a platform module

ENDFUNC           0 or 1               immediate, memory     invalidates local variables and stack-
                                         reference             based parameters, pops the stack, and
                                                               optionally loads a return value

EXTERN            1 or more            string                specifies a case-sensitive alias for symbol
                                                               names that will be written to output file
                                                               for the benefit of case-sensitive linkers

FILLPATTERN       1                    immediate             specifies the data bytes that will appear as 
                                         (bigend dword)        padding when ALIGNx or RESx (in sections
                                                               other than BSS) are used

FLUSHIMM          none                                       places an immediate data 'dump' (if any) at
                                                               the current address

GOSUB             1                    immediate, address    calls a subroutine without changing the
                                                               stack frame

GOTO              1                    immediate, address    jumps to an arbitrary location

IFCPU             1                    string (same as       skips subsequent statements on the same line
IFCPUNOT                                 platform name)        if the condition is not true
  example:
        ifcpu "68000" > const someparam,512          ; when compiling for 68000, someparam becomes 512  
        ifcpunot "68000" > const someparam,1024      ; otherwise, it becomes 1024

IFEQUAL           1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps to the location specified in
                                         reference             the third operand if they are equal
                  1                    address                                                    
  example:
        ifequal foo,15,someroutine                   ; jumps if foo is equal to 15

IFGREATER         1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps if the first is greater
                                         reference
                  1                    address                                                    
  example:
        ifgreater bar+10,foo,someroutine             ; jumps if bar+10 is greater than foo

IFLESS            1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps if the first is less
                                         reference
                  1                    address                                                    

IFNOTGREATER      1                    calculation           equivalent to less-than-or-equal
                  1                    immediate, memory
                                         reference
                  1                    address                                                    

IFNOTLESS         1                    calculation           equivalent to greater-than-or-equal
                  1                    immediate, memory
                                         reference
                  1                    address                                                    

IFUNEQUAL         1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps to the location specified in
                                         reference             the third operand if they are unequal
                  1                    address                                                    

INCBIN            1 or more            string                loads initialized data from a file
  example:
        incbin "gamegfx.bin"                         ; inserts the contents of GAMEGFX.BIN  

LINKLIBFILE       1 or more            string                adds the name of a dynamic link library to
  example:                                                   the output file .drectve section
        linklibfile "kernel32.dll","user32.dll"

LOADBIG           1                    memory reference      loads a big-endian second operand and changes
                  1                    immedate, memory        the byte order if necessary
                                        reference              

LOADLITTLE        1                    memory reference      loads a little-endian second operand and changes
                  1                    immedate, memory        the byte order if necessary
                                        reference              

LOCALVAR          1 or more            symbol with tag       defines stack-based variables to be used
                                                               within a function (must come after
                                                               BEGINFUNC)

NEXTCOUNT         none                                       decrements the variable specified by the
                                                               associated COUNTDOWN, then jumps to the
                                                               beginning of the loop if the result is
                                                               not 0         

RESB              1                    immediate*            reserves the number of bytes specified

RESW              1                    immediate*            reserves the number of words specified

RESD              1                    immediate*            reserves the number of dwords specified

RESQ              1                    immediate*            reserves the number of qwords specified
  Note:
        The count can now include addition, subtraction, and multiplication, (performed at compile time)
          as well as symbols defined with CONST. An underscore can be used to refer to the current section
          offset.
  examples:
        resb 512-_                        ; pads the section to a size of 512 bytes
        resd tablesize*2+1

RETURN            none                                       returns from a routine called by GOSUB

RETURNEX          1                    immediate             returns from a routine called by CALLEX
                                                               and pops the specified number of bytes
                                                               from the stack

SECTION           1                    string                creates a custom section in the COFF output
                  1                    immediate               using the specified name and flags (dword)
  example:
        SECTION ".foodata",$C0300040

SECTIONBSS        none                                       marks the beginning of the BSS
                                                               (reserved storage)

SECTIONCODE       none                                       marks the beginning of the code section

SECTIONDATA       none                                       marks the beginning of the data section
                                                               (initialized data)
  Note:
        section headers for the same section should not appear more than once in a NOWUT program

SWAP              2                    memory reference      exchanges data between two memory locations

USEFASTBASE       1                    symbol                enables an optimization on 68K, MIPS, and ARM
                  1                    0 or 1                  CPUs. (see assembler section)
                                                               Second parameter selects slot 0 or 1.
                                                               Specifying 0 as the symbol disables the
                                                               optimization.

WEND              none                                       marks the end of a WHILE loop (jumps back
                                                               to the beginning)

WHILEEQUAL        1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if they are unequal
                                         reference
  example:
        WHILEEQUAL [pointer].b,0                     ; if the memory location [pointer] contains a
        pointer=_+1                                  ; zero byte then the loop will execute, and it
        WEND                                         ; will continue until a non-zero byte is found

WHILEGREATER      1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if the first is not
                                         reference             greater

WHILELESS         1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if the first is not
                                         reference             less    
  example:
        xx=0
        WHILELESS xx,5                               ; the beep routine will be called 5 times
        GOSUB beep > xx=_+1
        WEND

WHILEUNEQUAL      1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if they are equal
                                         reference


----Functions/procedures----

  There are no declarations needed, return values are optional, and functions can even have multiple entry
and exit points. But care needs to be taken to avoid stack corruption and to refrain from referencing
local variables when they are not valid.

        ; example of a function with multiple entry points and an internal subroutine

somefunc2:
        globalsetting=defaultval
somefunc:
        beginfunc address.d
        localvar x.d,y.d

        x=address > y=8 > gosub printstuff

        ifequal globalsetting,0,label123

        x=carriagertn > y=2 > gosub printstuff
        goto label123


printstuff:
        callex ,writefile,,tempvar,y,x,chandle
        return        

label123:        
        endfunc
        returnex 4

  Imagine that you can call somefunc, and pass the address of an eight-character string which will then be
printed (using a Win32 call to write to an open console). If globalsetting is zero then no carriage return
is added to the output, otherwise it is added. If the caller wanted to override globalsetting then it could
call somefunc2 instead, which sets globalsetting itself, before the function proceeds.

  Having code execute before BEGINFUNC is no problem as long as it only references global variables. The
address, x, and y symbols are not yet valid until after BEGINFUNC and LOCALVAR.

  Likewise, when printstuff is called from within somefunc, the x and y variables are valid. However,
printstuff can NOT be called from outside of somefunc because x and y will point to who-knows-what.

  The compiler doesn't invalidate stack variables belonging to one function until it sees a BEGINFUNC
pertaining to another function. At that point, it will give an error if access to such variables is
attempted. At runtime, they would become invalid as soon as an ENDFUNC is executed. It's possible to have
more than one ENDFUNC associated with a function.

        ; example function with two exit points

anotherfunc:
        beginfunc param1.d,param2.d

        ifequal param1,param2,labelxyz
        endfunc 0
        returnex 8
labelxyz:
        endfunc 1
        returnex 8

Note that if you don't need to pass any parameters, GOSUB and RETURN can be used instead of CALLEX and
RETURNEX.

  
----Internal assembler----

  The NOWUT compiler generates assembly code and feeds it back to its internal assembler. This code
can be seen in comments in the .LST file that is generated during compilation. If an error occurs
during compilation, it is convenient to look at the end of the .LST file to see how far it progressed
before encountering the error. Hand-written assembly language can be included in NOWUT programs by
using the ASM and ENDASM statements.

These CPU-independent statements are recognized by the assembler:

ALIGNW                \
ALIGND                 \
ALIGNQ                  \ same as NOWUT mode
DB                      /
DW                     /
DD                    /

ENDASM - returns to NOWUT mode


The x86 instruction set:

  Instruction names are as usual, except for 8-bit jumps which have been given their own separate forms
SJMP and SJcc. Memory operands are contained in square brackets, and .b .w .d tags are used to make the
operand size explicit. Destination operands go on the left, source operands on the right.

  In 32-bit mode, operand-size prefixes are inserted before instructions that use 16-bit words. The reverse
is true for 16-bit mode, where instructions with a 32-bit operand size will have a prefix inserted.

  The instruction listing below describes acceptable operands mostly in terms of how they are encoded. The
equivalent NOWUT syntax or operand types are as follows:

  x86 operand        NOWUT                           notes

imm8               immediate                         8 bits, often sign-extended
imm16              immediate                         (could be an address in 16-bit mode)
imm32              immediate or address
reg8                                                 al  cl  dl  bl  ah  ch  dh  bh
reg16                                                ax  cx  dx  bx  sp  bp  si  di
reg32                                                eax ecx edx ebx esp ebp esi edi
segreg                                               es  cs  ss  ds  fs  gs
CRx                                                  cr0 cr2 cr3 cr4
DRx                                                  dr0 dr1 dr2 dr3 dr6 dr7
freg                                                 st0 st1 st2 st3 st4 st5 st6 st7
mmxreg                                               mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7
mem8               [immediate].b, [address].b,
                   [reg].b, [reg+xx].b
mem16              [immediate].w, [address].w,
                   [reg].w, [reg+xx].w
mem32              [immediate].d, [address].d,
                   [reg].d, [reg+xx].d
mem48/64/80        [immediate], [address],           .q should be accepted for 64-bit data, while .w may be
                   [reg], [reg+xx]                   used to distinguish 80-bit (this hasn't been tested)

disp16/32          [immediate], [address]            accesses memory of various sizes but doesn't use
                                                       mod/rm encoding
rm8  - same as mem8 or reg8
rm16 - same as mem16 or reg16
rm32 - same as mem32 or reg32

  The assembler will accept operands without a size tag, however in some cases the size ambiguity will
mean that more than one opcode would be valid. Shorter opcodes are generally favored.

  example:
        PUSH 7                ; this will be assembled as an imm8 instead of imm32

Also note that on the x86, displacements can be negative: MOV EAX,[EBP-40]

  Scaled index registers should be specified before a base or displacement, with a SHL operator to indicate
the scaling factor (shift).

  example:
        mov eax,[ecx shl 1+ebx].d

Hand-written assembly code should not modify (or should save and restore) the EBP register as the compiler
uses it to address stack variables. EBP is used to address stack variables in assembly code as well, and
these variables will become invalid when it is modified.

AAA                     $37
AAD imm8                $D5
AAM imm8                $D4
AAS                     $3F

ADC AL,imm8             $14
ADC AX/EAX,imm16/32     $15
ADC rm8,reg8            $10
ADC rm16/32,reg16/32    $11
ADC reg8,rm8            $12
ADC reg16/32,rm16/32    $13
ADC rm8,imm8            $80 /2
ADC rm16/32,imm16/32    $81 /2
ADC rm16/32,imm8        $83 /2

ADD AL,imm8             $04
ADD AX/EAX,imm16/32     $05
ADD rm8,reg8            $00
ADD rm16/32,reg16/32    $01
ADD reg8,rm8            $02
ADD reg16/32,rm16/32    $03
ADD rm8,imm8            $80 /0
ADD rm16/32,imm16/32    $81 /0
ADD rm16/32,imm8        $83 /0

AND AL,imm8             $24
AND AX/EAX,imm16/32     $25
AND rm8,reg8            $20
AND rm16/32,reg16/32    $21
AND reg8,rm8            $22
AND reg16/32,rm16/32    $23
AND rm8,imm8            $80 /4
AND rm16/32,imm16/32    $81 /4
AND rm16/32,imm8        $83 /4

ARPL rm16,reg16         $63
BOUND reg16/32,mem16/32 $62
BSF reg16/32,rm16/32    $0F BC
BSR reg16/32,rm16/32    $0F BD
BSWAP reg32             $0F C8+r

BT rm16/32,reg16/32     $0F A3
BT rm16/32,imm8         $0F BA /4
BTC rm16/32,reg16/32    $0F BB
BTC rm16/32,imm8        $0F BA /7
BTR rm16/32,reg16/32    $0F B3
BTR rm16/32,imm8        $0F BA /6
BTS rm16/32,reg16/32    $0F AB
BTS rm16/32,imm8        $0F BA /5

CALL imm16/32           $E8
CALL rm16/32            $FF /2

CALLF rm16/32           $FF /3
CALLF imm16,[imm16/32]  $9A                ; first operand is the segment

CDQ                     $99
CLC                     $F8
CLD                     $FC
CLI                     $FA
CLTS                    $0F 06
CMC                     $F5

CMP AL,imm8             $3C
CMP AX/EAX,imm16/32     $3D
CMP rm8,reg8            $38
CMP rm16/32,reg16/32    $39
CMP reg8,rm8            $3A
CMP reg16/32,rm16/32    $3B
CMP rm8,imm8            $80 /7
CMP rm16/32,imm16/32    $81 /7
CMP rm16/32,imm8        $83 /7

CMPSB                   $A6
CMPSD                   $A7
CMPSW                   $A7

CMPXCHG rm8,reg8           $0F B0
CMPXCHG rm16/32,reg16/32   $0F B1
CMPXCHG8B mem64            $0F C7 /1

CPUID                   $0F A2
CWDE                    $98
DAA                     $27
DAS                     $2F

DEC reg16/32            $48+r
DEC rm8                 $FE /1
DEC rm16/32             $FF /1

DIV rm8                 $F6 /6
DIV rm16/32             $F7 /6

HLT                     $F4

IDIV rm8                $F6 /7
IDIV rm16/32            $F7 /7

IMUL rm8                $F6 /5
IMUL rm16/32            $F7 /5
IMUL AL,rm8             $F6 /5
IMUL AX/EAX,rm16/32     $F7 /5
IMUL reg16/32,rm16/32   $0F AF
IMUL reg16/32,imm8      $6B
IMUL reg16/32,imm16/32  $69
IMUL reg16/32,rm16/32,imm8      $6B
IMUL reg16/32,rm16/32,imm16/32  $69

IN AL,imm8              $E4
IN AX/EAX,imm8          $E5
IN AL,DX                $EC
IN AX/EAX,DX            $ED

INC reg16/32            $40+r
INC rm8                 $FE /0
INC rm16/32             $FF /0

INSB                    $6C
INSD                    $6D
INSW                    $6D
INT imm8                $CD
INT3                    $CC
INTO                    $CE
INVD                    $0F 08
INVLPG mem              $0F 01 /0
IRET                    $CF
JECXZ imm8              $E3

Jcc imm16/32            $0F 80+cc
JMP imm16/32            $E9
JMP rm16/32             $FF /4

JMPF rm16/32            $FF /5
JMPF imm16,[imm16/32]   $EA                ; first operand is the segment

LAHF                    $9F
LAR reg16/32,rm16/32    $0F 02

LDS reg16/32,mem32/48   $C5
LES reg16/32,mem32/48   $C4
LFS reg16/32,mem32/48   $0F B4
LGS reg16/32,mem32/48   $0F B5
LSS reg16/32,mem32/48   $0F B2

LEA reg16/32,mem        $8D
LEAVE                   $C9

LGDT mem48              $0F 01 /2
LIDT mem48              $0F 01 /3
LLDT rm16               $0F 00 /2
LMSW rm16               $0F 01 /6

LODSB                   $AC
LODSD                   $AD
LODSW                   $AD
LSL reg16/32,rm16/32    $0F 03
LTR rm16                $0F 00 /3

MOV AL,disp16/32        $A0
MOV AX/EAX,disp16/32    $A1
MOV disp16/32,AL        $A2
MOV disp16/32,AX/EAX    $A3
MOV rm8,reg8            $88
MOV rm16/32,reg16/32    $89
MOV reg8,rm8            $8A
MOV reg16/32,rm16/32    $8B
MOV reg8,imm8           $B0+r
MOV reg16/32,imm16/32   $B8+r
MOV rm8,imm8            $C6 /0
MOV rm16/32,imm16/32    $C7 /0
MOV rm16/32,segreg      $8C
MOV segreg,rm16/32      $8E
MOV reg32,CRx           $0F 20
MOV reg32,DRx           $0F 21
MOV CRx,reg32           $0F 22
MOV DRx,reg32           $0F 23

MOVSB                   $A4
MOVSD                   $A5
MOVSW                   $A5

MOVSX reg16/32,rm8      $0F BE
MOVSX reg32,rm16        $0F BF
MOVZX reg16/32,rm8      $0F B6
MOVZX reg32,rm16        $0F B7

MUL rm8                 $F6 /4
MUL rm16/32             $F7 /4

NEG rm8                 $F6 /3
NEG rm16/32             $F7 /3

NOP                     $90

NOT rm8                 $F6 /2
NOT rm16/32             $F7 /2

OR AL,imm8              $0C
OR AX/EAX,imm16/32      $0D
OR rm8,reg8             $08
OR rm16/32,reg16/32     $09
OR reg8,rm8             $0A
OR reg16/32,rm16/32     $0B
OR rm8,imm8             $80 /1
OR rm16/32,imm16/32     $81 /1
OR rm16/32,imm8         $83 /1

OUT imm8,AL             $E6
OUT imm8,AX/EAX         $E7
OUT DX,AL               $EE
OUT DX,AX/EAX           $EF

OUTSB                   $6E
OUTSD                   $6F
OUTSW                   $6F

POP reg16/32            $58+r
POP rm16/32             $8F /0
POP DS                  $1F
POP ES                  $07
POP SS                  $17
POP FS                  $0F A1
POP GS                  $0F A9
POPA                    $61
POPAD
POPAW
POPF                    $9D
POPFD
POPFW

PUSH reg16/32           $50+r
PUSH rm16/32            $FF /6
PUSH imm8               $6A        (this byte is sign-extended)
PUSH imm16/32           $68
PUSH CS                 $0E
PUSH DS                 $1E
PUSH ES                 $06
PUSH SS                 $16
PUSH FS                 $0F A0
PUSH GS                 $0F A8
PUSHA                   $60
PUSHAD
PUSHAW
PUSHF                   $9C
PUSHFD
PUSHFW

RCL rm8                 $D0 /2
RCL rm8,CL              $D2 /2
RCL rm8,imm8            $C0 /2
RCL rm16/32             $D1 /2
RCL rm16/32,CL          $D3 /2
RCL rm16/32,imm8        $C1 /2

RCR rm8                 $D0 /3
RCR rm8,CL              $D2 /3
RCR rm8,imm8            $C0 /3
RCR rm16/32             $D1 /3
RCR rm16/32,CL          $D3 /3
RCR rm16/32,imm8        $C1 /3

RDMSR                   $0F 32
RDTSC                   $0F 31

RET                     $C3
RET imm16               $C2
RETF                    $CB
RETF imm16              $CA

ROL rm8                 $D0 /0
ROL rm8,CL              $D2 /0
ROL rm8,imm8            $C0 /0
ROL rm16/32             $D1 /0
ROL rm16/32,CL          $D3 /0
ROL rm16/32,imm8        $C1 /0

ROR rm8                 $D0 /1
ROR rm8,CL              $D2 /1
ROR rm8,imm8            $C0 /1
ROR rm16/32             $D1 /1
ROR rm16/32,CL          $D3 /1
ROR rm16/32,imm8        $C1 /1

SAL rm8                 $D0 /4
SAL rm8,CL              $D2 /4
SAL rm8,imm8            $C0 /4
SAL rm16/32             $D1 /4
SAL rm16/32,CL          $D3 /4
SAL rm16/32,imm8        $C1 /4

SAHF                    $9E

SAR rm8                 $D0 /7
SAR rm8,CL              $D2 /7
SAR rm8,imm8            $C0 /7
SAR rm16/32             $D1 /7
SAR rm16/32,CL          $D3 /7
SAR rm16/32,imm8        $C1 /7

SBB AL,imm8             $1C
SBB AX/EAX,imm16/32     $1D
SBB rm8,reg8            $18
SBB rm16/32,reg16/32    $19
SBB reg8,rm8            $1A
SBB reg16/32,rm16/32    $1B
SBB rm8,imm8            $80 /3
SBB rm16/32,imm16/32    $81 /3
SBB rm16/32,imm8        $83 /3

SCASB                   $AE
SCASD                   $AF
SCASW                   $AF

SETcc rm8               $0F 90+cc /0  (corrected)

SGDT mem48              $0F 01 /0
SIDT mem48              $0F 01 /1
SLDT rm16               $0F 00 /0

SHL rm8                 $D0 /4
SHL rm8,CL              $D2 /4
SHL rm8,imm8            $C0 /4
SHL rm16/32             $D1 /4
SHL rm16/32,CL          $D3 /4
SHL rm16/32,imm8        $C1 /4

SHR rm8                 $D0 /5
SHR rm8,CL              $D2 /5
SHR rm8,imm8            $C0 /5
SHR rm16/32             $D1 /5
SHR rm16/32,CL          $D3 /5
SHR rm16/32,imm8        $C1 /5

SHLD rm16/32,reg16/32,imm8      $0F A4
SHLD rm16/32,reg16/32,CL        $0F A5
SHRD rm16/32,reg16/32,imm8      $0F AC
SHRD rm16/32,reg16/32,CL        $0F AD

SJcc imm8 or address              $70+cc
SJMP imm8 or address              $EB

SMSW rm16               $0F 01 /4

STC                     $F9
STD                     $FD
STI                     $FB
STOSB                   $AA
STOSD                   $AB
STOSW                   $AB
STR rm16                $0F 00 /1

SUB AL,imm8             $2C
SUB AX/EAX,imm16/32     $2D
SUB rm8,reg8            $28
SUB rm16/32,reg16/32    $29
SUB reg8,rm8            $2A
SUB reg16/32,rm16/32    $2B
SUB rm8,imm8            $80 /5
SUB rm16/32,imm16/32    $81 /5
SUB rm16/32,imm8        $83 /5

TEST AL,imm8            $A8
TEST AX/EAX,imm16/32    $A9
TEST rm8,reg8           $84
TEST rm16/32,reg16/32   $85
TEST rm8,imm8           $F6 /0
TEST rm16/32,imm16/32   $F7 /0

VERR rm16               $0F 00 /4
VERW rm16               $0F 00 /5

WAIT                    $9B
WBINVD                  $0F 09
WRMSR                   $0F 30

XADD rm8,reg8           $0F C0
XADD rm16/32,reg16/32   $0F C1

XCHG AX/EAX,reg16/32    $90+r
XCHG reg16/32,AX/EAX    $90+r
XCHG reg8,rm8           $86
XCHG rm8,reg8           $86
XCHG reg16/32,rm16/32   $87
XCHG rm16/32,reg16/32   $87

XOR AL,imm8             $34
XOR AX/EAX,imm16/32     $35
XOR rm8,reg8            $30
XOR rm16/32,reg16/32    $31
XOR reg8,rm8            $32
XOR reg16/32,rm16/32    $33
XOR rm8,imm8            $80 /6
XOR rm16/32,imm16/32    $81 /6
XOR rm16/32,imm8        $83 /6

XLATB                   $D7

The following prefixes are supported:

ASIZE        $67        (address size override)
CS           $2E
DS           $3E
ES           $26
FS           $64
GS           $65
SS           $36
LOCK         $F0
REPNZ/NE     $F2
REP/E/Z      $F3

The following x87 instructions are supported:

F2XM1                   $D9 $F0
FABS                    $D9 $E1

FADD mem32              $D8 /0
FADD mem64              $DC /0
FADD freg               $D8 $C0+r
FADD freg,ST0           $DC $C0+r
FADDP freg,ST0          $DE $C0+r

FBLD mem80              $DF /4
FBSTP mem80             $DF /6

FCHS                    $D9 $E0

FCLEX                   $9B $DB $E2
FNCLEX                  $DB $E2

FCMOVB freg             $DA $C0+r
FCMOVBE freg            $DA $D0+r
FCMOVE freg             $DA $C8+r        \
FCMOVNB freg            $DB $C0+r         \  P6 instructions
FCMOVNBE freg           $DB $D0+r         /
FCMOVNE freg            $DB $C8+r        /
FCMOVNU freg            $DB $D8+r
FCMOVU freg             $DA $D8+r

FCOM mem32              $D8 /2
FCOM mem64              $DC /2
FCOM freg               $D8 $D0+r
FCOMP mem32             $D8 /3
FCOMP mem64             $DC /3
FCOMP freg              $D8 $D8+r
FCOMPP                  $DE $D9

FCOMI freg              $DB $F0+r        \ P6
FCOMIP freg             $DF $F0+r        /

FCOS                    $D9 $FF

FDECSTP                 $D9 $F6

FDISI                   $9B $DB $E1
FNDISI                  $DB $E1              \ 8087 only
FENI                    $9B $DB $E0          /
FNENI                   $DB $E0

FDIV mem32              $D8 /6
FDIV mem64              $DC /6
FDIV freg               $D8 $F0+r
FDIV freg,ST0           $DC $F8+r
FDIVR mem32             $D8 /7
FDIVR mem64             $DC /7
FDIVR freg              $D8 $F8+r
FDIVR freg,ST0          $DC $F0+r
FDIVP freg,ST0          $DE $F8+r
FDIVRP freg,ST0         $DE $F0+r

FFREE freg              $DD C0+r

FIADD mem16             $DE /0
FIADD mem32             $DA /0

FICOM mem16             $DE /2
FICOM mem32             $DA /2
FICOMP mem16            $DE /3
FICOMP mem32            $DA /3

FIDIV mem16             $DE /6
FIDIV mem32             $DA /6
FIDIVR mem16            $DE /7
FIDIVR mem32            $DA /7

FILD mem16              $DF /0
FILD mem32              $DB /0
FILD mem64              $DF /5
FIST mem16              $DF /2
FIST mem32              $DB /2
FISTP mem16             $DF /3
FISTP mem32             $DB /3
FISTP mem64             $DF /7

FIMUL mem16             $DE /1
FIMUL mem32             $DA /1

FINCSTP                 $D9 $F7

FINIT                   $9B $DB $E3
FNINIT                  $DB $E3

FISUB mem16             $DE /4
FISUB mem32             $DA /4
FISUBR mem16            $DE /5
FISUBR mem32            $DA /5

FLD mem32               $D9 /0
FLD mem64               $DD /0
FLD mem80               $DB /5
FLD freg                $D9 $C0+r

FLD1                    $D9 $E8
FLDL2E                  $D9 $EA
FLDL2T                  $D9 $E9
FLDLG2                  $D9 $EC
FLDLN2                  $D9 $ED
FLDP                    $D9 $EB
FLDZ                    $D9 $EE

FLDCW mem16             $D9 /5

FLDENV mem              $D9 /4

FMUL mem32              $D8 /1
FMUL mem64              $DC /1
FMUL freg               $D8 $C8+r
FMUL freg,ST0           $DC $C8+r
FMULP freg,ST0          $DE $C8+r

FNOP                    $D9 D0

FPATAN                  $D9 $F3
FPTAN                   $D9 $F2

FPREM                   $D9 $F8
FPREM1                  $D9 $F5

FRNDINT                 $D9 $FC

FSAVE mem               $9B $DD /6
FNSAVE mem              $DD /6
FRSTOR mem              $DD /4

FSCALE                  $D9 $FD

FSETPM                  $DB $E4

FSIN                    $D9 $FE
FSINCOS                 $D9 $FB

FSQRT                   $D9 $FA

FST mem32               $D9 /2
FST mem64               $DD /2
FST freg                $DD $D0+r
FSTP mem32              $D9 /3
FSTP mem64              $DD /3
FSTP mem80              $DB /7
FSTP freg               $DD $D8+r

FSTCW mem16             $9B $D9 /7
FNSTCW mem16            $D9 /7

FSTENV mem              $9B $D9 /6
FNSTENV mem             $D9 /6

FSTSW mem16             $9B $DD /7
FSTSW AX                $9B $DF $E0
FNSTSW mem16            $DD /7
FNSTSW AX               $DF $E0

FSUB mem32              $D8 /4
FSUB mem64              $DC /4
FSUB freg               $D8 $E0+r
FSUB freg,ST0           $DC $E8+r
FSUBR mem32             $D8 /5
FSUBR mem64             $DC /5
FSUBR freg              $D8 $E8+r
FSUBR freg,ST0          $DC $E0+r
FSUBP freg,ST0          $DE $E8+r
FSUBRP freg,ST0         $DE $E0+r

FTST                    $D9 $E4

FUCOM freg              $DD $E0+r
FUCOMP freg             $DD $E8+r
FUCOMPP                 $DA $E9
FUCOMI freg             $DB $E8+r        \ P6
FUCOMIP freg            $DF $E8+r        /

FXAM                    $D9 $E5

FXCH freg               $D9 $C8+r

FXTRACT                 $D9 $F4

FYL2X                   $D9 $F1
FYL2XP1                 $D9 $F9

The following MMX instructions are supported:

EMMS                    $0F 77        (MMX)

MOVD mmxreg,rm32        $0F 6E        (MMX)
MOVD rm32,mmxreg        $0F 7E        (MMX)

MOVQ mmxreg,mem64       $0F 6F        (MMX)
MOVQ mem64,mmxreg       $0F 7F        (MMX)

PACKSSDW mmxreg,mmxreg/mem64        $0F 6B        (MMX)
PACKSSWB mmxreg,mmxreg/mem64        $0F 63        (MMX)
PACKUSWB mmxreg,mmxreg/mem64        $0F 67        (MMX)

PADDB mmxreg,mmxreg/mem64           $0F FC        (MMX)
PADDW mmxreg,mmxreg/mem64           $0F FD        (MMX)
PADDD mmxreg,mmxreg/mem64           $0F FE        (MMX)
PADDSB mmxreg,mmxreg/mem64          $0F EC        (MMX)
PADDSW mmxreg,mmxreg/mem64          $0F ED        (MMX)
PADDUSB mmxreg,mmxreg/mem64         $0F DC        (MMX)
PADDUSW mmxreg,mmxreg/mem64         $0F DD        (MMX)

PAND mmxreg,mmxreg/mem64            $0F DB        (MMX)
PANDN mmxreg,mmxreg/mem64           $0F DF        (MMX)

PCMPEQB mmxreg,mmxreg/mem64         $0F 74        (MMX)
PCMPEQW mmxreg,mmxreg/mem64         $0F 75        (MMX)
PCMPEQD mmxreg,mmxreg/mem64         $0F 76        (MMX)
PCMPGTB mmxreg,mmxreg/mem64         $0F 64        (MMX)
PCMPGTW mmxreg,mmxreg/mem64         $0F 65        (MMX)
PCMPGTD mmxreg,mmxreg/mem64         $0F 66        (MMX)

PMADDWD mmxreg,mmxreg/mem64         $0F F5        (MMX)
PMULHW mmxreg,mmxreg/mem64          $0F E5        (MMX)
PMULLW mmxreg,mmxreg/mem64          $0F D5        (MMX)

POR mmxreg,mmxreg/mem64             $0F EB        (MMX)

PSLLW mmxreg,mmxreg/mem64           $0F F1        (MMX)
PSLLW mmxreg,imm8                   $0F 71 /6
PSLLD mmxreg,mmxreg/mem64           $0F F2        (MMX)
PSLLD mmxreg,imm8                   $0F 72 /6
PSLLQ mmxreg,mmxreg/mem64           $0F F3        (MMX)
PSLLQ mmxreg,imm8                   $0F 73 /6
PSRAW mmxreg,mmxreg/mem64           $0F E1        (MMX)
PSRAW mmxreg,imm8                   $0F 71 /4
PSRAD mmxreg,mmxreg/mem64           $0F E2        (MMX)
PSRAD mmxreg,imm8                   $0F 72 /4
PSRLW mmxreg,mmxreg/mem64           $0F D1        (MMX)
PSRLW mmxreg,imm8                   $0F 71 /2
PSRLD mmxreg,mmxreg/mem64           $0F D2        (MMX)
PSRLD mmxreg,imm8                   $0F 72 /2
PSRLQ mmxreg,mmxreg/mem64           $0F D3        (MMX)
PSRLQ mmxreg,imm8                   $0F 73 /2

PSUBB mmxreg,mmxreg/mem64           $0F F8        (MMX)
PSUBW mmxreg,mmxreg/mem64           $0F F9        (MMX)
PSUBD mmxreg,mmxreg/mem64           $0F FA        (MMX)
PSUBSB mmxreg,mmxreg/mem64          $0F E8        (MMX)
PSUBSW mmxreg,mmxreg/mem64          $0F E9        (MMX)
PSUBUSB mmxreg,mmxreg/mem64         $0F D8        (MMX)
PSUBUSW mmxreg,mmxreg/mem64         $0F D9        (MMX)

PUNPCKHBW mmxreg,mmxreg/mem64       $0F 68        (MMX)
PUNPCKHWD mmxreg,mmxreg/mem64       $0F 69        (MMX)
PUNPCKHDQ mmxreg,mmxreg/mem64       $0F 6A        (MMX)
PUNPCKLBW mmxreg,mmxreg/mem64       $0F 60        (MMX)
PUNPCKLWD mmxreg,mmxreg/mem64       $0F 61        (MMX)
PUNPCKLDQ mmxreg,mmxreg/mem64       $0F 62        (MMX)

PXOR mmxreg,mmxreg/mem64            $0F EF        (MMX)

8086 mode peculiarities:

There are two pseudo-instructions used in 8086 mode (ignored in 8086tiny or 386 mode):

SEGC reg16              ; causes DS to be reloaded if the next memory reference is NOT on the stack.
                        ; the register specified is used as an intermediary to hold the value
                        ; (since there is no move-immediate instruction for segment registers)

SEGR                    ; causes the next SEGC to be ignored, in case DS has already been setup

Hence the method of accessing global (non-stack) variables in 8086 assembly:

        SEGC AX                  ; choose a register whose contents aren't needed
        MOV AX,[symbol]          ; (for instance, the one we are about to reload)

Don't assume that two different symbols have the same segment!

Symbol addresses are handled a few different ways:

        DD symbol                ; results in a 32-bit offset that is relative to beginning of program

        MOV AX,symbol            ; loads the low word of the 32-bit value
        MOV DX,symbol.h          ; loads the high word of the 32-bit value

        SEGC SI                  ;
        LEA SI,[symbol]          ; this sequence will load a valid segment/offset pair into DS:SI

  The way that the compiler translates a 32-bit address to a segment/offset pair is by using the high word to
lookup a segment value from a table at CS:0000 (the table is populated by INITPLATFORM in PIODOS). This
happens whenever indexed or indirect addressing is used in NOWUT code. However, the resulting values are not
identical to the ones used by direct references in code after it has been linked and executed. The linker
tries to keep offsets under 32K so that there is room for an index to be added on, as would occur in this
example:

        SEGC SI
        MOV SI,[symbol(448)]

This mechanism does not allow for a negative index:

        SEGC SI
        MOV SI,[symbol(-12)]     ; does NOT work


The 68000 instruction set:

  All 68000/68010 instructions and addressing modes are now supported. Only plain 68000 opcodes are used by
the compiler. Normal branch instructions use the 16-bit displacement, the SBxx version should be used for
the shorter 8-bit displacement form. Variations on a single mnemonic such as ADDA or ADDI seen in other
assemblers have been eliminated in favor of using one mnemonic for all forms. 32-bit words are referred to as
dwords and use the .d tag, just as they do in x86 NOWUT. Likewise, memory operands use square brackets, and
operands receive a size tag rather than the instruction. Destination operands go on the right, source
operands on the left.

  The assembler will accept operands without a size tag, however in some cases the size ambiguity will
mean that more than one opcode would be valid. Shorter opcodes are generally favored.

  example:
        MOVE 7,d0                ; assembled as 8-bit immediate instead of 16 or 32-bit

        MOVE [address],d0.d      ;  \
        MOVE [address].d,d0      ;    these all do the same thing
        MOVE [address].d,d0.d    ;  /

ea (effective address) operands can be any of the following:
  
  imm8/16/32        ; immediate
  [address]         ; memory reference (32-bit or signed 16-bit)
  [ax]              ; address register indirect
  [ax+xxxx]         ; address register indirect with displacement
  [ax+]             ; address register indirect with post-increment
  [ax-]             ; address register indirect with pre-decrement
  dx                ; data register
  ax                ; address register

  [ax+ry+xx]        ; extension word: ry can be an address or data register, optionally with .w or .d tag
  [PC+xxxx]         ; PC relative
  [PC+symbol]       ; PC relative that refers to a symbol
  [PC+ry+xx]        ; PC plus extension word

not all modes are valid for all instructions (eg. immediate can't be a destination)

Also note that on the 68K, displacements can be negative: MOVE [a6-40].d,d0

Hand-written assembly code should not modify (or should save and restore) the A6 register as the compiler
uses it to address stack variables.

The USEFASTBASE statement can be used on 68000 to produce more compact executables. It loads registers A3
and/or A4 with the address of a specified symbol at run time, and uses 16-bit relative addressing to access
other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This
scheme depends on A3/A4 not being disturbed by other code and can only be enabled in one module when a
program consists of multiple modules linked together. Example usage:

        usefastbase datbase,0        ; datbase symbol will be loaded in A3 to use for relative addressing
        usefastbase bssbase,1        ; bssbase symbol will be loaded in A4 to use for relative addressing

(These optimizations shrink NOWUT.X by about 3KB)

Instruction list:

ABCD dx,dy
ABCD [ax-],[ay-]                      ; byte only

ADD imm,ea               /imm8/16/32
ADD imm3,ea
ADD dy,ea
ADD ea,dy
ADD ea,ay

ADDX dx,dy
ADDX [ax-],[ay-]

AND imm,ea               /imm8/16/32
AND imm,ccr              /imm8
AND imm,sr               /imm16
AND dy,ea
AND ea,dy

ASL ea                                ; word only
ASL imm3,dx
ASL dy,dx  

ASR ea                                ; word only
ASR imm3,dx
ASR dy,dx

BKPT imm3                             ; 68010

Bcc label

BCHG imm,ea
BCHG dn,ea                            ; byte only

BCLR imm,ea              /imm8
BCLR dn,ea                            ; byte only

BRA label

BSET imm,ea              /imm8
BSET dn,ea                            ; byte only

BSR label

BTST imm,ea              /imm8
BTST dn,ea                            ; byte only

CHK ea,dy                             ; word only
CLR ea

CMP imm,ea               /imm8/16/32
CMP [an+],[an+]
CMP ea,dy      
CMP ea,ay      

DBcc dx,label                         ; word only
DBRA dx,label                         ; word only 

DIVS ea,dy                            ; divide 32/16, remainder in high word
DIVU ea,dy

EOR imm,ea               /imm8/16/32
EOR imm,ccr              /imm8
EOR imm,sr               /imm16
EOR dy,ea

EXG dx,dy
EXG ax,ay
EXG ax,dy

EXT dx                                ; byte->word
EXT dx                                ; word->dword

ILLEGAL
JMP ea
JSR ea
LEA ea,ay
LINK ax,imm16

LSL ea                                ; word only
LSL imm3,dx
LSL dy,dx

LSR ea                                ; word only
LSR imm3,dx
LSR dy,dx

MOVE ea,ea
MOVE ea,an
MOVE sr,ea                            ; word only
MOVE ea,sr                            ; word only
MOVE ccr,ea                           ; word only, 68010
MOVE ea,ccr                           ; word only
MOVE imm8,reg                         ; sign extended

MOVEM imm,ea             /imm16       ; register bit mask
MOVEM ea,imm             /imm16       ; register bit mask

MOVEP dy,[ax+disp16]     /disp16
MOVEP [ax+disp16],dy     /disp16

MULS ea,dy                            ; 16x16->32
MULU ea,dy

NBCD ea                               ; byte only
NEG ea
NEGX ea
NOP
NOT ea

OR imm,ea                /imm8/16/32
OR imm,ccr               /imm8
OR imm,sr                /imm16
OR dy,ea   
OR ea,dy

PEA ea
RESET 
         
ROL ea
ROL imm3,dx
ROL dy,dx

ROR ea
ROR imm3,dx
ROR dy,dx  

ROXL ea     
ROXL imm3,dx
ROXL dy,dx  

ROXR ea     
ROXR imm3,dx
ROXR dy,dx  

RTD imm16                             ; 68010

RTE         
RTR         
RTS         

SBcc label
SBRA label
SBSR label

SBCD dx,dy                            ; byte only
SBCD [ax-],[ay-]                      ; byte only

Scc ea                                ; byte only
STOP imm                 /imm16   

SUB imm,ea               /imm8/16/32
SUB imm3,ea
SUB dy,ea  
SUB ea,dy  
SUB ea,ay  

SUBX dx,dy
SUBX [ax-],[ay-]

SWAP dx
TAS ea                                ; byte only
TRAP imm4
TRAPV
TST ea
UNLK ax

XDOS imm4            ; this pseudo-instruction generates "F-line" opcodes
                     ; for making system calls on the Sharp X68000


The SH2 instruction set:

  As with the 68K, the SH2 instruction set has undergone some cosmetic changes to bring it in line with
NOWUT norms. A few mnemonics were tweaked, memory operands use square brackets and a data size tag, and
long words (32-bit) are referred to as dwords. Destination operands go on the right, source operands on the
left.

  Since the SH2 doesn't allow dword immediate data or memory access using absolute addresses, the assembler
accepts a "fake" form of the MOV instruction and transparently inserts an extra instruction as needed. It
also adds immediate data to a buffer that is periodically flushed to the output file, with a BRA opcode
added to jump over the data. There are currently two issues with this:

  1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of
     assembly code may cause overflow (using 49 definitely will)
  2) The FLUSHIMM statement should be used before an INCBIN statement, if it is in a section which also
     contains code.

  The SH2 doesn't do divides in a single instruction. When division is needed, the compiler generates a
call to a subroutine. The program source should include these division subroutines (or some variation
thereof):

sh2divideu:                ' unsigned 32/16 r1/r2
        asm
        shll16 r2
        div0u
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1

        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        rotcl r1
        extuw r1,r1                
        endasm
        return

sh2divides:                ' signed 32/16 r1/r2
        asm
        shll16 r2
        mov r1,r4
        rotcl r4
        mov 0,r4
        subc r4,r1
        div0s r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1

        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        extsw r1,r1
        rotcl r1
        addc r4,r1        
        extsw r1,r1
        endasm
        return

Note: delayed branch instructions cause the instruction following the branch to be executed before the
      branch takes place.

Hand-written assembly code should not modify (or should save and restore) R11, R13, R14 as the compiler
uses them to address stack variables. R12 is used by "fake" MOVs.


ADD     Rm,Rn
ADD     imm,Rn          (immediate is 8-bit sign-extended)

ADDC    Rm,Rn
ADDV    Rm,Rn

AND     Rm,Rn
AND     imm,R0          (immediate is 8-bit zero-extended)
AND     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

BF      label/imm                       (8-bit displacement)
BFS     label/imm       (delayed branch)(8-bit displacement)
BRA     label/imm       (delayed branch)(12-bit displacement)
BRAF    Rm              (delayed branch)
BSR     label/imm       (delayed branch)(12-bit displacement)
BSRF    Rm              (delayed branch)
BT      label/imm                       (8-bit displacement)
BTS     label/imm       (delayed branch)(8-bit displacement)

CLRMAC
CLRT

CMPEQ   imm,R0          (immediate is 8-bit sign-extended)
CMPEQ   Rm,Rn             
CMPGE   Rm,Rn             rn>=rm, signed
CMPGT   Rm,Rn             rn>rm, signed
CMPHI   Rm,Rn             rn>rm, unsigned
CMPHS   Rm,Rn             rn>=rm, unsigned
CMPPL   Rn                rn>0
CMPPZ   Rn                rn>=0
CMPSTR  Rm,Rn           

DIV0S   Rm,Rn
DIV0U
DIV1    Rm,Rn

DMULS   Rm,Rn             32x32->64 (MAC)
DMULU   Rm,Rn

DT      Rn

EXTSB   Rm,Rn
EXTSW   Rm,Rn
EXTUB   Rm,Rn
EXTUW   Rm,Rn

JMP     Rm              (delayed branch)
JSR     Rm              (delayed branch)

LDC     Rm,SR    
LDC     Rm,GBR   
LDC     Rm,VBR   
LDC     [Rm+],SR 
LDC     [Rm+],GBR
LDC     [Rm+],VBR

LDS     Rm,MACH   
LDS     Rm,MACL   
LDS     Rm,PR     
LDS     [Rm+],MACH
LDS     [Rm+],MACL
LDS     [Rm+],PR  

MAC     [Rm+],[Rn+].d
MAC     [Rm+],[Rn+].w

MOV     imm/address,Rn               ; this pseudo-instruction uses [PC+label] address mode to load a dword
                                     ; from an automatically-created data dump

MOV     [symbol/address].b,Rn        ; these pseudo-instructions cause another instruction to be inserted
MOV     [symbol/address].w,Rn        ; which loads the address, then memory is accessed using register-
MOV     [symbol/address].d,Rn        ; indirect mode.
                                     ; stack variables are an exception, the extra instruction isn't needed

MOV     Rm,Rn
MOV     imm,Rn          (immediate is 8-bit sign-extended)
MOV     Rm,[Rn].b
MOV     Rm,[Rn].w
MOV     Rm,[Rn].d
MOV     [Rm].b,Rn
MOV     [Rm].w,Rn
MOV     [Rm].d,Rn
MOV     [Rm+].b,Rn
MOV     [Rm+].w,Rn
MOV     [Rm+].d,Rn
MOV     Rm,[Rn-].b
MOV     Rm,[Rn-].w
MOV     Rm,[Rn-].d
MOV     Rm,[R0+Rn].b
MOV     Rm,[R0+Rn].w
MOV     Rm,[R0+Rn].d
MOV     [R0+Rm].b,Rn
MOV     [R0+Rm].w,Rn
MOV     [R0+Rm].d,Rn
MOV     R0,[GBR+disp].b    (8-bit displacement, zero-extended)
MOV     R0,[GBR+disp].w    (8-bit displacement, zero-extended)
MOV     R0,[GBR+disp].d    (8-bit displacement, zero-extended)
MOV     [GBR+disp].b,R0    (8-bit displacement, zero-extended)
MOV     [GBR+disp].w,R0    (8-bit displacement, zero-extended)
MOV     [GBR+disp].d,R0    (8-bit displacement, zero-extended)
MOV     R0,[Rn+disp].b     (4-bit displacement, zero-extended)
MOV     R0,[Rn+disp].w     (4-bit displacement, zero-extended, doubled)
MOV     Rm,[Rn+disp].d     (4-bit displacement, zero-extended, quadrupled)
MOV     [Rn+disp].b,R0     (4-bit displacement, zero-extended)
MOV     [Rn+disp].w,R0     (4-bit displacement, zero-extended, doubled)
MOV     [Rn+disp].d,Rm     (4-bit displacement, zero-extended, quadrupled)
MOV     [PC+label],Rn      (8-bit displacement, zero-extended)

MOVA    [PC+label],R0
MOVT    Rn

MUL     Rm,Rn
MULS    Rm,Rn
MULU    Rm,Rn

NEG     Rm,Rn
NEGC    Rm,Rn

NOP

NOT     Rm,Rn

OR      Rm,Rn
OR      imm,R0          (immediate is 8-bit zero-extended)
OR      imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

ROTL    Rn
ROTR    Rn
ROTCL   Rn
ROTCR   Rn

RTE                     (delayed branch)
RTS                     (delayed branch)
SETT

SHAL    Rn
SHAR    Rn
SHLL    Rn
SHLR    Rn
SHLL2   Rn
SHLR2   Rn
SHLL8   Rn
SHLR8   Rn
SHLL16  Rn
SHLR16  Rn
SLEEP     
STC     SR,Rn
STC     GBR,Rn
STC     VBR,Rn
STC     SR,[Rn-]
STC     GBR,[Rn-]
STC     VBR,[Rn-]
STS     MACH,Rn
STS     MACL,Rn
STS     PR,Rn  
STS     MACH,[Rn-]
STS     MACL,[Rn-]
STS     PR,[Rn-]  

SUB     Rm,Rn
SUBC    Rm,Rn
SUBV    Rm,Rn

SWAPB   Rm,Rn
SWAPW   Rm,Rn

TAS     [Rn].b
TRAPA   imm             (immediate is 8-bit)

TST     Rm,Rn
TST     imm,R0          (immediate is 8-bit zero-extended)
TST     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

XOR     Rm,Rn
XOR     imm,R0          (immediate is 8-bit zero-extended)
XOR     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

XTRCT   Rm,Rn


The MIPS instruction set:

  MIPS assembly normally has a large number of mnemonics, having a one-to-one correspondence with each
opcode. It also defines a word as 32 bits, and a double-word as 64 bits. Rather than let these conventions
stand in contrast to the rest of NOWUT, the MIPS instruction set was mutated to conform:

  'i' for immediate was dropped from many mnemonics, so eg. ADDU can use either register or immediate
        operands

  various load and store instructions were all rolled into MOV, MOVZ (zero-extended), and MOVH (high word)

  some 'rebadged' and 'fake' instructions were added

Note: MIPS has delayed branch instructions which cause the instruction following the branch to be executed
        before the branch takes place. It also has 'likely' variants which skip the following instruction
        when the branch is not taken.

Hand-written assembly code should not modify (or should save and restore) R13 as the compiler uses it to
address stack variables. R12 is used by "fake" MOVs.

The USEFASTBASE statement can be used on MIPS to produce more compact executables. It loads registers R16
and/or R17 with the address of a specified symbol at run time, and uses 16-bit relative addressing to access
other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This
scheme depends on R16/R17 not being disturbed by other code and can only be enabled in one module when a
program consists of multiple modules linked together. Example usage:

        usefastbase datbase,0        ; datbase symbol will be loaded in R16 to use for relative addressing
        usefastbase bssbase,1        ; bssbase symbol will be loaded in R17 to use for relative addressing

For instructions that write a result to a general-purpose register or memory location, the first operand
is the destination. Instructions that write a coprocessor register may specify the destination with the
second operand.

**64-bit instructions are not available in 32-bit user/supervisor mode


ADD rc,ra,rb                  (has overflow exception)
ADD rb,ra,imm                 (has overflow exception)
ADDU rc,ra,rb
ADDU rb,ra,imm

AND rc,ra,rb
AND rb,ra,imm                 (imm is zero-extended)

BCzF imm                      (branch on coprocessor false, z is coprocessor number 0..3)
BCzFL imm                     (... likely)
BCzT imm                      (branch coproc z true)
BCzTL imm

BEQ ra,rb,imm                 (branch on equal)
BEQL ra,rb,imm
BGEZ ra,imm                   (branch greater-or-equal-to-zero)
BGEZL ra,imm
BGEZAL ra,imm                 (... and link)
BGEZALL ra,imm                (... and link + likely)
BGTZ ra,imm                   (branch greater-than-zero)
BGTZL ra,imm
BLEZ ra,imm                   (branch less-or-equal-to-zero)
BLEZL ra,imm
BLTZ ra,imm                   (branch less-than-zero)
BLTZL ra,imm
BLTZAL ra,imm
BLTZALL ra,imm
BNE ra,rb,imm                 (branch not-equal)
BNEL ra,rb,imm

BRA imm                       (unconditional branch, rebadged beq r0,r0)

BREAK

CACHE imm5,[ra+imm]           (imm5 is an operation type)

CFCz rb,rc                    (get coproc control register)
COPz imm25                    (coproc operation)
CTCz rb,rc                    (put coproc control register)

DADD rc,ra,rb                 **(64-bit add, has overflow exception)

DADD rb,ra,imm                **(has overflow exception)
DADDU rc,ra,rb                **
DADDU rb,ra,imm               **

DDIV ra,rb                    **(64-bit divide)
DDIVU ra,rb                   **(64-bit divide unsigned)

DIV ra,rb                     (ra divided by rb, quotient -> lo, remainder -> hi)
DIVU ra,rb

DMFC0 rb,rc                   **(64-bit move from coproc)
DMTC0 rb,rc                   **(64-bit move to coproc)

DMULT ra,rb                   **(64-bit multiply)
DMULTU ra,rb                  **(64-bit multiply unsigned)

DSLL rc,rb,sa                 **(64-bit shift left using 5-bit count)
DSLLV rc,rb,ra                **(64-bit shift left using count in ra)
DSLL32 rc,rb,sa               **(64-bit shift left using 5-bit count + 32)

DSRA rc,rb,sa                 **(64-bit signed shift right)
DSRAV rc,rb,ra                **
DSRA32 rc,rb,sa               **

DSRL rc,rb,sa                 **(64-bit unsigned shift right)
DSRLV rc,rb,ra                **
DSRL32 rc,rb,sa               **

DSUB rc,ra,rb                 **(has overflow exception)
DSUBU rc,ra,rb                **

ERET                          (return from exception, no delay slot)

JMP t26
JMP ra

JAL t26                       (jump and link)
JAL rc,ra                     (jump to ra, link rc)
JAL ra                        (same as above, link r31 implied)

LWCz rb,[ra+imm].d            (load 32-bit word to coproc) (cop =/= 0)
LDCz rb,[ra+imm].q            (load 64-bit word to coproc) (LDC3 not valid?)

LDL rb,[ra+imm]               **(load misaligned 64-bit word)
LDR rb,[ra+imm]               **
LWL rb,[ra+imm]               (load misaligned 32-bit word)
LWR rb,[ra+imm]

LL rb,[ra+imm]                (locked 32-bit load)
LLD rb,[ra+imm]               (locked 64-bit load)

MOV rc,rb                     (rebadged ADDU instruction)
MOV [ra+imm],rb               (8/16/32-bit memory store)
MOV [ra+imm].q,rb             **(64-bit memory store)
MOV [address],rb              (fake store instructions, assemble to two opcodes)
MOV rb,imm                    (rebadged ADDU instruction)
MOV rb,imm (unsigned)         (rebadged OR instruction)
MOV rb,imm32                  (fake immediate load instruction, assembles to two opcodes)
MOV rb,[ra+imm]               (8/16/32-bit memory load)
MOV rb,[ra+imm].q             **(64-bit memory load)
MOV rb,[address]              (fake load instructions, assemble to two opcodes)

MOVH rb,imm                   (load high word)

MOVZ rb,[ra+imm].b            (load unsigned byte)
MOVZ rb,[ra+imm].w
MOVZ rb,[ra+imm].d            **
MOVZ rb,[address]             (fake load instructions, assemble to two opcodes)

MFCz rb,rc                    (32-bit move FROM coproc - 2nd reg is coproc reg)
MTCz rb,rc                    (32-bit move TO coproc - 2nd reg is still coproc reg)

MFHI rc                       (move from 'HI')
MFLO rc                       (move from 'LO')
MTHI ra                       (move to 'HI')
MTLO ra                       (move to 'LO')

MULT ra,rb
MULTU ra,rb

NOP                           ($00000000 opcode, does nothing)

NOR rc,ra,rb

OR rc,ra,rb
OR rb,ra,imm                  (imm is zero-extended)

SC rb,[ra+imm].d              (store conditional)
SCD rb,[ra+imm].q             **

SWCz rb,[ra+imm].d            (store 32 bits from coproc) (cop =/= 0)
SDCz rb,[ra+imm].q            (SDC3 not valid?)

SLL rc,rb,sa
SLLV rc,rb,ra

SLT rc,ra,rb                  (set on less-than)
SLT rb,ra,imm
SLTU rc,ra,rb                 (set on less-than, unsigned)
SLTU rb,ra,imm

SRA rc,rb,sa
SRAV rc,rb,ra
SRL rc,rb,sa
SRLV rc,rb,ra

SUB rc,ra,rb                  (has overflow exception)
SUBU rc,ra,rb

SDL rb,[ra+imm]               **(storing of unaligned data)
SDR rb,[ra+imm]               **
SWL rb,[ra+imm]
SWR rb,[ra+imm]

SYNC

SYSCALL imm20

TEQ ra,rb                     (trap on equal)
TEQ ra,imm
TGE ra,rb                     (trap on greater-or-equal)
TGE ra,imm
TGEU ra,rb                    (unsigned)
TGEU ra,imm
TLT ra,rb                     (trap on less)
TLT ra,imm
TLTU ra,rb                    (unsigned)
TLTU ra,imm
TNE ra,rb                     (trap on unequal)
TNE ra,imm

TLBP                          (TLB probe)
TLBR                          (read TLB entry)
TLBWI                         (write indexed TLB entry)
TLBWR                         (write random TLB entry)

XOR rc,ra,rb
XOR rb,ra,imm                 (imm is zero-extended)

Additional instructions from MIPS32 spec:

BAL imm                       (rebadged bgezal r0)

CLO rc,rb,ra
CLZ rc,rb,ra

DERET

DI
DI rb

EHB

EI
EI rb

MADD ra,rb
MADDU ra,rb
MSUB ra,rb
MSUBU ra,rb

MUL rc,ra,rb

RDHWR rb,rc
RDPGPR rc,rb

ROTR rc,rb,imm
ROTRV rc,rb,ra

SDBBP imm

SEB rc,rb
SEH rc,rb

SSNOP

SYNCI [ra+imm]

WRPGPR rc,rb
WSBH rc,rb


The ARM instruction set:

  In standard ARM assembly, groups of letters are selected corresponding to an operation, a condition code, a
data size, and whether flags should be set, then they are concatenated to make a mnemonic. (It's kind of like
Hangul.) The number of possible mnemonics under this system is HUGE, which is not good from the perspective
of having to accomodate it in NOWUT.

  In NOWUT, the instruction set has been altered so that:

  1) Except for branch instructions, a condition code (if any) goes in a prefix.
  2) Data sizes are distinguished with the usual .b .w .d (not by the mnemonic).
  3) Loads that do sign-extension use a LDSX mnemonic.

  Like the SH2 assembler, the ARM assembler supports a "fake" MOV instruction and a "fake" addressing mode
which allow arbitrary 32-bit data to be specified. An extra instruction is transparently inserted to load
this data, using PC-relative addressing, from a "data dump" which gets mixed into the code. The same
cautions apply:

  1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of
     assembly code may cause overflow (using 49 definitely will)
  2) The FLUSHIMM statement should be used before an INCBIN statement, if it is in a section which also
     contains code.

  ARM doesn't always include a divide instruction. When division is needed, the compiler generates a call to
a subroutine. The program source should include these division subroutines (or some variation thereof):

armdivideu:                ; divide r8 by r9 (32/16 unsigned), result in r8
        asm
        mov r10,0
        mov r7,15
armdiv10:
        cmp r8,r9 shl r7
        cs sub r8,r8,r9 shl r7      ; if r9 was less than r8 then r8 becomes r8-r9
        adc r10,r10,r10             ; result bit is shifted left into r10
        subs r7,r7,1
        bpl armdiv10
        mov r8,r10
        mov r15,r14        ; return
        endasm


armdivides:                ; divide r8 by r9 (32/16 signed), result in r8
        asm
        eor r10,r8,r9
        mov r10,r10 sar 31          ; r10 becomes 0 (if signs the same) or -1 (if signs different)

        rsbs r7,r8,0
        pl mov r8,r7                ; make r8 positive
        rsbs r7,r9,0
        pl mov r9,r7                ; make r9 positive

        mov r7,15
armdiv11:
        cmp r8,r9 shl r7
        cs sub r8,r8,r9 shl r7      ; if r9 was less than r8 then r8 becomes r8-r9
        adc r10,r10,r10             ; result bit is shifted left into r10
        subs r7,r7,1
        bpl armdiv11
        movs r8,r10
        mi rsb r8,r8,r7 shl 16      ; r7 was -1, so we subtract r8 from -65536

        mov r15,r14        ; return
        endasm

  Hand-written assembly code should not modify (or should save and restore) R11 as the compiler uses it to
address stack variables. R12 is used by "fake" MOVs.

  The USEFASTBASE statement can be used on ARM to produce more compact executables. It loads registers R3
and/or R4 with the address of a specified symbol at run time, and uses 12-bit relative addressing to access
other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This
scheme depends on R3/R4 not being disturbed by other code and can only be enabled in one module when a
program consists of multiple modules linked together. Example usage:

        usefastbase datbase,0        ; datbase symbol will be loaded in R3 to use for relative addressing
        usefastbase bssbase,1        ; bssbase symbol will be loaded in R4 to use for relative addressing

op2 (operand2) may include:
        rm                ; register
        rm shl imm5       ; register shifted left
        rm shr imm5       ; register shifted right
        rm sar imm5       ; register with signed-shift-right
        rm ror imm5       ; register rotated right
        rm shl rs         ; register shifted left with count from another register
        rm shr rs         ; register shifted right with count from another register
        rm sar rs         ; register with signed-shift-right with count from another register
        rm ror rs         ; register rotated right with count from another register
        $000000xx
        $00000xx0         ; rotated imm8
        etc.

addr2 (address mode 2) may include:
        [rn+imm12]
        [rn-imm12]
        [rn+rm shift imm5]        ; where shift is shl/shr/sar/ror
        [rn-rm shift imm5]        ; where shift is shl/shr/sar/ror

offset2 may include:
        imm12
        -imm12
        rm shift imm5             ; where shift is shl/shr/sar/ror
        -rm shift imm5            ; where shift is shl/shr/sar/ror

addr3 (address mode 3) may include:
        [rn+imm8]
        [rn-imm8]
        [rn+rm]
        [rn-rm]

offset3 may include:
        imm8
        -imm8
        rm
        -rm

condition code prefixes:
        EQ - Z set           - equal
        NE - Z clear         - not equal
        CS - C set           - higher or same (unsigned)
        CC - C clear         - lower (unsigned)
        MI - N set           - negative
        PL - N clear         - positive or zero
        VS - V set           - overflow
        VC - V clear         - no overflow
        HI - C set, Z clear  - higher (unsigned)
        LS - C clear, Z set  - lower or same (unsigned)
        GE - N == V          - greater or equal
        LT - N != V          - less than
        GT - N == V, Z clear - greater than
        LE - N != V OR Z set - less than or equal

Except for STR instructions, the first register operand is generally the destination register (if any).

ADD rd,rn,op2                 ; does not set flags
ADDS rd,rn,op2                ; sets flags
ADC rd,rn,op2
ADCS rd,rn,op2

AND rd,rn,op2
ANDS rd,rn,op2

BIC rd,rn,op2
BICS rd,rn,op2

BEQ label/imm
BNE label/imm
BCS label/imm
BCC label/imm
BMI label/imm
BPL label/imm
BVS label/imm
BVC label/imm
BHI label/imm
BLS label/imm
BGE label/imm
BLT label/imm
BGT label/imm
BLE label/imm
BRA label/imm                 ; unconditional branch

BL label/imm
BX rn

CMP rn,op2
CMN rn,op2

EOR rd,rn,op2
EORS rd,rn,op2

LDM [rn],imm16                ; imm16 is a bit pattern representing a register list
LDM [rn+],imm16               ; post-increment
LDM [rn-],imm16               ; pre-decrement
LDMU [rn],imm16               ; user mode

LDSX rd,[addr3].b             ; load byte with sign extension
LDSX rd,[addr3].b,_           ; pre-decrement / pre-increment
LDSX rd,[rn].b,offset3        ; post-decrement / post-increment

LDSX rd,[addr3].w             ; load 16-bit word with sign extension
LDSX rd,[addr3].w,_           ; pre-decrement / pre-increment
LDSX rd,[rn].w,offset3        ; post-decrement / post-increment

LDR rd,[addr2].b
LDR rd,[addr2].b,_            ; pre-decrement / pre-increment
LDR rd,[rn].b,offset2         ; post-decrement / post-increment

LDR rd,[addr3].w
LDR rd,[addr3].w,_            ; pre-decrement / pre-increment
LDR rd,[rn].w,offset3         ; post-decrement / post-increment

LDR rd,[addr2].d
LDR rd,[addr2].d,_            ; pre-decrement / pre-increment
LDR rd,[rn].d,offset2         ; post-decrement / post-increment

LDRU rd,[rn].b,offset2        ; user mode, post-decrement / post-increment
LDRU rd,[rn].w,offset3        ; user mode, post-decrement / post-increment
LDRU rd,[rn].d,offset2        ; user mode, post-decrement / post-increment

MOV rd,op2
MOV rd,imm32                  ; "fake" move
MOVS rd,op2

MVN rd,op2
MVNS rd,op2

MUL rm,rn,rs
MULS rm,rn,rs
MLA rn,rm,rs,rd               ; rn = rm * rs + rd
MLAS rn,rm,rs,rd

MRS rd,SPSR
MRS rd,CPSR
MSR SPSR,rm
MSR CPSR,rm
MSRF SPSR,rm                  ; flag bits only
MSRF CPSR,rm
MSRF SPSR,imm
MSRF CPSR,imm

ORR rd,rn,op2
ORRS rd,rn,op2

RSB rd,rn,op2
RSBS rd,rn,op2
RSC rd,rn,op2
RSCS rd,rn,op2

SUB rd,rn,op2
SUBS rd,rn,op2
SBC rd,rn,op2
SBCS rd,rn,op2

SMULL rd,rn,rm,rs             ; 32 * 32 -> 64 signed multiply. high 32 bits of result go in rn
SMULLS rd,rn,rm,rs
SMLAL rd,rn,rm,rs             ; multiply and accumulate
SMLALS rd,rn,rm,rs

STM [rn],imm16                ; imm16 is a bit pattern representing a register list
STM [rn+],imm16               ; post-increment
STM [rn-],imm16               ; pre-decrement
STMU [rn],imm16               ; user mode

STR rd,[addr2].b
STR rd,[addr2].b,_            ; pre-decrement / pre-increment
STR rd,[rn].b,offset2         ; post-decrement / post-increment

STR rd,[addr3].w
STR rd,[addr3].w,_            ; pre-decrement / pre-increment
STR rd,[rn].w,offset3         ; post-decrement / post-increment

STR rd,[addr2].d
STR rd,[addr2].d,_            ; pre-decrement / pre-increment
STR rd,[rn].d,offset2         ; post-decrement / post-increment

STRU rd,[rn].b,offset2        ; user mode, post-decrement / post-increment
STRU rd,[rn].w,offset3        ; user mode, post-decrement / post-increment
STRU rd,[rn].d,offset2        ; user mode, post-decrement / post-increment

SWI imm24

SWP rd,rm,[rn].b              ; rd gets loaded, rm gets stored
SWP rd,rm,[rn].d

TST rn,op2
TEQ rn,op2

UMULL rd,rn,rm,rs             ; 32 * 32 -> 64 unsigned multiply. high 32 bits of result go in rn
UMULLS rd,rn,rm,rs
UMLAL rd,rn,rm,rs             ; multiply and accumulate
UMLALS rd,rn,rm,rs


The Z280 instruction set:

  This is a superset of the official Z80 instruction set. Many common undocumented Z80 opcodes are also
included, but some more obscure ones (eg. DDCB shift-and-also-load) are not. Most Z80 ALU instructions
take the 'A' register as destination (first) operand, but it can also be omitted for compatibility with
Z80 syntax in which the 'A' register was implied.

operand types legend:
  R      - 8-bit registers a,b,c,d,e,h,l
  RX     - 8-bit registers ixl,ixh,iyl,iyh
  RR     - 16-bit registers bc,de,hl,sp
  XX     - 16-bit registers hl,ix,iy
  XY     - 16-bit registers ix,iy
  rel    - 8-bit relative jump
  addr   - 16-bit direct jump
  [addr] - 16-bit direct address
  [HL]   - hl indirect addressing
  [XY+d] - ix or iy indirect with (signed) 8-bit displacement
  mode1  - Z280 extended addressing modes [ix+disp16], [iy+disp16], [hl+disp16], [pc+disp16]
  mode2  - Z280 extended addressing modes [sp+disp16], [hl+ix], [hl+iy], [ix+iy]

note that 16-bit ALU instructions (ADDW, CPW, DECW, INCW, SUBW) do not allow the [hl+disp16] mode

ADC A,R/RX
ADC A,imm8
ADC A,[HL]
ADC A,[XY+d]
ADC A,[addr]
ADC A,mode1/2
ADC XX,RR/XY

ADD A,R/RX
ADD A,imm8
ADD A,[HL]
ADD A,[XY+d]
ADD A,[addr]
ADD A,mode1/2
ADD XX,RR/XY                (16-bit add that affects carry flag but not others)
ADD XX,A                    (A is sign-extended)

ADDW HL,RR/XY
ADDW HL,imm16
ADDW HL,[HL]
ADDW HL,[addr]
ADDW HL,mode1

AND A,R/RX
AND A,imm8
AND A,[HL]
AND A,[XY+d]
AND A,[addr]
AND A,mode1/2

BIT imm,R
BIT imm,[HL]
BIT imm,[XY+d]

CALLcc [HL]
CALLcc addr
CALLcc [PC+disp16]
(includes call, callnz, callz, callnc, callc, callpo, callnv, callpe, callv, callp, callns, callm, calls)

CCF

CP A,R/RX
CP A,imm8
CP A,[HL]
CP A,[XY+d]
CP A,[addr]
CP A,mode1/2

CPD
CPDR
CPI
CPIR

CPL A

CPW HL,RR/XY
CPW HL,imm16
CPW HL,[HL]
CPW HL,[addr]
CPW HL,mode1

DAA

DEC R/RX
DEC imm8
DEC [HL]
DEC [XY+d]
DEC [addr]
DEC mode1/2

DEC(W) RR/XY

DECW [HL]
DECW [addr]
DECW mode1

DI
DI imm

DIV HL,R/RX
DIV HL,imm8
DIV HL,[HL]
DIV HL,[XY+d]
DIV HL,[addr]
DIV HL,mode1/2
DIVU HL,R/RX
DIVU HL,imm8
DIVU HL,[HL]
DIVU HL,[XY+d]
DIVU HL,[addr]
DIVU HL,mode1/2

DIVW RR/XY
DIVW imm16
DIVW [HL]
DIVW [addr]
DIVW mode1
DIVWU RR/XY
DIVWU imm16
DIVWU [HL]
DIVWU [addr]
DIVWU mode1

DJNZ rel

EI
EI imm

EX AF,AF
EX [SP],HL
EX [SP],XY
EX H,L
EX DE,HL
EX XY,HL

EX A,R/RX
EX A,[HL]
EX A,[XY+d]
EX A,[addr]
EX A,mode1/2

EXTS A
EXTS HL

EXX

HALT

IM0
IM1
IM2
IM3

IN A,imm8
IN R/RX,[C]
IN [addr],[C]
IN mode1/2,[C]

INC R/RX
INC imm8
INC [HL]
INC [XY+d]
INC [addr]
INC mode1/2

INC(W) RR/XY

INCW [HL]
INCW [addr]
INCW mode1

IND
INDW
INDR
INDRW
INI
INIW
INIR
INIRW

INW HL,[C]

JAF addr
JAR addr

JP XY
JPcc HL
JPcc addr
JPcc [PC+disp16]
(includes jp, jpnz, jpz, jpnc, jpc, jppo, jpnv, jppe, jpv, jpp, jpns, jpm, jps)

JR rel
JRNZ rel
JRZ rel
JRNC rel
JRC rel

LD R,R/RX
LD R/RX,R
LD R,[HL]
LD [HL],R
LD R,[XY+d]
LD [XY+d],R

LD A,I
LD I,A
LD A,R        ; refresh counter register (...which is not used for refresh on Z280)
LD R,A

LD R/RX,imm8
LD [HL],imm8
LD [XY+d],imm8
LD [addr],imm8
LD mode1/2,imm8

LD A,[BC]
LD A,[DE]
LD A,[addr]
LD A,mode1/2

LD(W) RR/XY,imm16            ; can use Z80-compatible LD mnemonic, or new LDW mnemonic
LD(W) RR/XY,[addr]
LD(W) [addr],RR/XY
LD(W) SP,XX

LEA XX,[addr]                ; load effective address (does not access memory)
LEA XX,mode1/2

LDCTL XX,[C]
LDCTL [C],XX
LDCTL XX,USP
LDCTL USP,XX

LDD
LDDR
LDI
LDIR

LDUD A,[HL]
LDUD A,[XY+d]
LDUD [HL],A
LDUD [XY+d],A

LDUP A,[HL]
LDUP A,[XY+d]
LDUP [HL],A
LDUP [XY+d],A

LDW RR,[HL]
LDW [HL],RR
LDW RR,[XY+d]
LDW [XY+d],RR
LDW HL/XY,mode1/2
LDW mode1/2,HL/XY

LDW [HL],imm16
LDW [addr],imm16
LDW [PC+disp16],imm16

MULT A,R/RX
MULT A,imm8
MULT A,[HL]
MULT A,[XY+d]
MULT A,[addr]
MULT A,mode1/2
MULTU A,R/RX
MULTU A,imm8
MULTU A,[HL]
MULTU A,[XY+d]
MULTU A,[addr]
MULTU A,mode1/2

MULTW HL,RR/XY
MULTW HL,imm16
MULTW HL,[HL]
MULTW HL,[addr]
MULTW HL,mode1
MULTWU HL,RR/XY
MULTWU HL,imm16
MULTWU HL,[HL]
MULTWU HL,[addr]
MULTWU HL,mode1

NEG A
NEG HL

NOP

OR A,R/RX
OR A,imm8
OR A,[HL]
OR A,[XY+d]
OR A,[addr]
OR A,mode1/2

OTDR
OTDRW
OTIR
OTIRW

OUT imm8,A
OUT [C],R/RX
OUT [C],[addr]
OUT [C],mode1/2

OUTD
OUTDW
OUTI
OUTIW

OUTW [C],HL

PCACHE

POP RR/XY                 ; AF is a valid operand instead of SP
POP [HL]
POP [addr]
POP [PC+disp16]

PUSH RR/XY                ; AF is a valid operand instead of SP
PUSH imm16
PUSH [HL]
PUSH [addr]
PUSH [PC+disp16]

RES imm,R
RES imm,[HL]
RES imm,[XY+d]

RETcc
(includes ret, retnz, retz, retnc, retc, retpo, retnv, retpe, retv, retp, retns, retm, rets)

RETI
RETIL
RETN

RL R
RL [HL]
RL [XY+d]

RLA

RLC R
RLC [HL]
RLC [XY+d]

RLCA

RLD

RR R
RR [HL]
RR [XY+d]

RRA

RRC R
RRC [HL]
RRC [XY+d]

RRCA

RRD

RST imm

SBC A,R/RX
SBC A,imm8
SBC A,[HL]
SBC A,[XY+d]
SBC A,[addr]
SBC A,mode1/2
SBC XX,RR

SC imm16

SCF

SET imm,R
SET imm,[HL]
SET imm,[XY+d]

SLA R
SLA [HL]
SLA [XY+d]

SRA R
SRA [HL]
SRA [XY+d]

SRL R
SRL [HL]
SRL [XY+d]

SUB A,R/RX
SUB A,imm8
SUB A,[HL]
SUB A,[XY+d]
SUB A,[addr]
SUB A,mode1/2

SUBW HL,RR/XY
SUBW HL,imm16
SUBW HL,[HL]
SUBW HL,[addr]
SUBW HL,mode1

TSET R
TSET [HL]
TSET [XY+d]

TSTI [C]

XOR A,R/RX
XOR A,imm8
XOR A,[HL]
XOR A,[XY+d]
XOR A,[addr]
XOR A,mode1/2


----Known bugs and limitations----

  OBJ files to be linked together must have different source file names or else automatically-generated
labels will conflict. (Specifically, they must differ by alphabetic characters because other characters are 
not used for name generation.)

  When using two-pass compilation on 68000, a GOTO, GOSUB, or IF statement that targets the next instruction
will cause an error (because SBRA 0 is not valid on the 68K).

  Absolute addresses as memory references don't work correctly in 8086 long mode:

        [$B8000].w=$0841                        ; does NOT work
        tempvar.d=$B8000 > [tempvar].w=$0841    ; use this instead
        ea.w($B8000)=$0841                      ; or this

  NOWUT 68K uses the processor's multiply and divide instructions despite the fact that they are not fully
32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs.
unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit.

  NOWUT in 8086 mode uses the processor's multiply and divide instructions despite the fact that they are
not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results.
Signed vs. unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit.

  Related to the above, when the compiler runs on 68K or 8086, compile-time calculations done as arguments
to a RESB/RESW/RESD can only multiply numbers in the range 0...65535

  NOWUT SH2 only allows 12 dword parameters on the stack and 32 dword local variables at a time. (bytes and
words are not allowed.)

  The address of a stack variable is not valid. Likewise, it can't be indexed.

  When a calculation is used as an operand for a NOWUT instruction that involves comparison (ie. IFGREATER,
IFLESS, WHILEGREATER, WHILELESS) the value is assumed to be unsigned, regardless of any components of that
calculation. If signed comparison is desired, the second operand should be marked as signed.

  The maximum number of symbols supported by the compiler (and the corresponding memory allocation) is
specified in the source code of the compiler. Modify this line and recompile if needed (same for fixups):

        const maxsymbols,2048
        const maxfixups,8192

  Maximum line length is 255 characters, maximum number of labels+statements+operands on a line is defined
by the "maxargs" constant (default: 32).

  BITS16/BITS32 shouldn't be used to switch modes within a BEGINFUNC...ENDFUNC structure as variables on the
stack would no longer be addressed properly.

  Floating-point support is only partially functional and only on 386. Calculations involving FP values
can't be used as operands with NOWUT statements/instructions, etc.

  A plain numeric index on a symbol, eg. foo(42), is limited to 32767 when compiling for 8086, or 65535 when
compiling for MIPS.

  A GOTO statement with a hard-coded address (eg. GOTO $3F00) will be incorrectly interpreted as a relative
branch on x86 or ARM. It can also fail on MIPS if the address does not fit in 26 bits.

  The x86 assembler is missing the ENTER instruction.


----Changes from the last version----

(First release was December 2018)


Release 0.11 (2019/1/2):

  When I attempted to link a program with two OBJ files (using GoLink) I received a pile of errors about
duplicate symbol definitions. Some were irrelevant symbol names used by local variables and constants. These
have been given a null section/class in the OBJ so that they don't cause trouble. But I also received errors
about program labels that had been automatically generated by the compiler, even though they were not marked
as exports. I was not expecting this. As a workaround, the compiler now uses the name of the source file in
its automatically generated labels so that they won't be the same in different OBJs. (No more 0NOWUT0000)

  The DEF statement was added, for manually setting a default type. This only required a change to a data
table, and no changes to the code ;)

  Introduced the LINKBIN utility, which in addition to supporting two input files, also correctly generates
X68000 executables that have relocations that are separated by more than 64K.

  Fixed some bugs in NO68 that caused subtraction and division to (sometimes) produce a wrong result.

  Added some code to NOSH2 to optimize shifts (when the value is known at compile time) instead of always
using a loop.

  Corrected a few typos and made a few additions to this document.

  Included a DOS example program.


Release 0.11b (2019/1/19):

  An endianness issue with indexed symbols in initialized data was fixed in NO68 and NOSH2.

  LINKBIN can now build an Amiga program from two OBJ files (it was all screwed up before).


Release 0.12 (2019/2/24):

  x87 FPU instructions were added to NOWUT x86.

  NOWUT x86 parser can accept numbers with a decimal point and convert them to 32-bit floats.

  Fixed the NOWUT x86 assembler bug pertaining to AX/EAX ambiguity.

  The offending filename is displayed when a file specified by INCBIN fails to open.
                          
  Fixed shifts when value to be shifted was on the stack (NOSH2).

  Fixed large negative immediates (NOSH2).

  CALLEX function address can be a calculation (NOWUT x86 only).


Release 0.13 (2019/3/23):

  Bug fix in NO68 for exclusive-or, and for shifts of values on the stack

  Changed the sh2divides routine, since the old one didn't work and could also corrupt a needed register
        
  Added experimental 8086 support, including 16-bit style MODRM addressing and BITS16/BITS32 commands

  Added genesis and doscom platform options to LINKBIN, as well as Genesis and 8086 example programs

  Added a "maxarg" constant to NOWUT x86 to increase the number of allowed arguments on one line

  Added single-operand versions of IMUL to the x86 assembler

  Did some reorganization of x86 NOWUT source and tweaking of the generated code (now slightly more
compact). In the future I will probably roll 68000 support back into NOWUT x86 to take advantage of some
potential optimizations.


Release 0.14 (2019/5/1):

  Combined NOWUT, NO68, and NOSH2 into the new MULTINO.

  8086 support is now fully functional (minus any bugs/limitations)

  68K and SH2 generated code has received modest improvements. In particular, the SH2 compiler does
deduplication of immediate data/addresses.

  Fixed the RETURNEX 0 and ADD 0 problems for 68K.

  Compilation is slightly faster.

  Removed some stuff from the archive.


Release 0.14b (2019/5/11):

  Fixed byte order of ASCII words/dwords on big-endian.

  Fixed shifts with word/dword memory operands on big-endian.

  Upon a duplicate label error, the label name will be displayed.

  Fixed pushing a byte from memory on 8086.

  Fixed division on 8086.

  Added islist and &, reducing stack operations in generated code when handling indexed symbols.

  Fixed mojibake in the auto-generated labels for string constants. These were not mentioned in the
documentation until now, but the way they work is "random string".a can be provided as an operand and
the address of the string will be passed, while the text itself will get dumped at the end of the section.
Currently there is no way to put a carriage return in the string. (but they are terminated with 0)

  Signed shift-right (with immediate operand) can be replaced with an unrolled loop on SH2.


Release 0.20 (2019/8/12):

  Made MULTINO and LINKBIN buildable on Win32, DOS, X68000, and Amiga, using platform modules.
  
  Reorganized and tweaked a lot of internal compiler code.

  Compiler does two passes by default, allowing to replace some long jumps with short ones.

  Tweaked some other generated code (8086 shifts, address calculation, and compares, 68K 3-bit quick form).

  Compiler does not generate a .LST file by default, but can show the offending line of code when aborting
due to an error.

  Replaced x87-specific FP conversion routine with a portable integer-based one (but it is badly written and
has range/precision limits).

  Added LOADBIG and LOADLITTLE statements to deal with endianness problems in a portable way (although the
implementation seems less than ideal).

  Fixed typo in SH2 code for pushing a word on the stack.

  Added CALLAM statement for making Amiga OS system calls in a slightly less messy way.

  Boosted file I/O buffers to 4KB (doesn't help much except maybe on a network file system).

  If compilation aborts with an error, the .OBJ header is left incomplete and LINKBIN won't attempt to link
it.

  Placing one (or multiple) letter "r" after a string inserts a CR/LF. String can also be continued.

  x86 assembler should now generate the correct modrm byte when ASIZE prefix is used.

  Fixed parsing of MOV rm,reg instructions with displacement.

  Fixed MOV instructions that use x86 control/debug registers.

  Added IFCPUxxx, IFCPUNOTxxx statements.

  Added x86 far calls (CALLF).

  Tested LINKBIN with three input files.

  Changed the amount of space reserved by LINKBIN in DOS executables for a segment lookup table to 64 bytes.

  
Release 0.21 (2019/11/23):

  Reworked parser/assembler to handle some indexed symbols the same as plain symbols, resulting in smaller
generated code. As a side effect, this lead to some changes in how 8086 segment/offset addresses are
handled.

  Added some optimization logic that causes some redundant load instructions to be omitted.

  Fixed 8086 signed right shifts, and improved the other ones.

  Fixed the problem where the last line of the source file could be ignored.

  Fixed an x86 assembler bug for MOV [disp],imm instructions.

  When compilation fails due to an out-of-range jump, the target label will be displayed.

  Added far jump instructions to the x86 assembler and fixed the documentation pertaining to CALLF.

  Cleaned up some compiler code and made it so the assembler is invoked for each statement on a line with
multiple statements, instead of buffering the code until the entire line has been parsed/compiled.

  Added the optrom platform option to LINKBIN.

  Made the compiler store the number of segment relocs in the time stamp field of 8086 big OBJs so that
LINKBIN can allocate the correct amount of space in the MZ header.


Release 0.22 (2020/2/25):

  Fixed the problem where the last line of the source file could be ignored (for real this time).

  Added missing 68000 addressing modes and fixed which addressing modes were allowed for JMP/JSR.

  Also made it so 68000 branch instructions can accept a number.

  Fixed parser so that [-$1234] can work.

  Fixed extra byte being inserted in some data statements eg.: DW "blah",$1234

  Revised internal compiler code and added more optimization possibilities, plus -opt switch.

  Updated several sections in this document.


Release 0.23 (2020/4/21):

  Added a fix to allow compiling source files outside the current directory.

  Made an improvement to the optimizer and fixed a potential problem where, for instance, 'x' and '[x].d'
could have been interchanged.

  Rearranged the selecttables routine inside the compiler for smaller code.

  Only on 386, it is now possible to get the address of a stack variable. PIOWIN was changed to make most
routines (except printhex and such) safe for multithreading.

  The 68K assembler outputs LSL instead of ASL, which shouldn't matter (but I was messing around with an
emulator which wouldn't show ASL in its debugger...)


Release 0.24 (2020/10/21):

  Added minimal Linux support via the elf386 LINKBIN option and PIOLNX module.

  Faster compilation.

  More tweaks to generate smaller compiled code.

  Added the COPYBYTES statement for all CPUs. (correction: except 8086 tiny mode)

  (386) Fixed a bug where loading the address of a stack variable was optimized incorrectly.

  (SH2) Fixed a problem where loading unsigned bytes generated unintentionally inefficient code.

  (SH2) assembler now accepts PC-relative form of MOV with hard-coded disp ie. mov [PC+$20],r1

  (SH2) The address in R11 is only setup when needed by LOCALVAR, instead of every BEGINFUNC.

  Swapped BSS and data sections in the source code...


Release 0.25 (2021/3/13):

  Added a check and error message for code that attempts to index a stack symbol.

  Made further changes to LINKBIN ELF386 code toward eventually supporting dynamic linking.

  Fixed a bug where indexed, signed memory references were treated as unsigned.

  Fixed incorrect code optimizations after assignment to a byte or word variable.

  Added IFNOTGREATER and IFNOTLESS.

  Fixed an x86 assembler bug. mov [ebx+somelabel],eax now works and is used by the compiler.

  Fixed a parsing bug that affected x86 assembly prefixes before some instructions.


Release 0.26 (2021/7/2):

  Added LINKLIBFILE. OBJ files now contain a .drectve section header, for a total of 4 section headers.
Works with new version of LINKBIN and with GoLink.
 
  LINKBIN can output PRG files. ELF files with dynamic linking appear to work...

  Changed SH2 RETURN statement so it generates rts>mov 0,r2 instead of rts>nop. Mixing up RETURN with
RETURNEX 0 won't cause a crash now.

  Subtractions on SH2 can generate an add instruction with negative operand for efficiency.

  Optimizer code doesn't assume dword and signed dword are different things. (acctype)

  Added ALIGN16 (section alignment is also configurable inside LINKBIN).

  PIOWIN fileskread/fileskwrite return the bytes read/written.


Release 0.27 (2021/11/10):

  Added partial floating-point support in NOWUT evaluator via the .fd type.

  Improved the parser's FP conversion code. Now allows more than 4 decimal places.

  Stack symbols can be defined more than once with different default types.

  Added 'ea' special symbol for doing address calculations.

  Rearranged internal code tables to make things more consistant between different target platforms.


Release 0.28 (2022/1/18):

  MIPS and N64 targets added.

  Fixed a possible 8086-big-mode linking bug.

  Compacted 8086 code slightly by replacing some xor dx,dx instructions with cwd.

  The CONST statement now records tag bits, allowing floating-point numbers to be used.

  Fixed a bug where storing an FP value to an indirect memory reference could fail.

  Fixed COPYBYTES on 8086 tiny.


Release 0.29 (2022/5/18):

  Added SAR operator to replace the SHR 3.sb oddity.

  Added SECTION keyword and capability for variable number of COFF sections.

  Fixed ea.b($401000) 

  Added 'w' mark for expanding ASCII strings to 16-bits per character.

  COFF section alignment flags now get set according to largest ALIGN statement encountered, with the
minimum alignment being DWORD.

  Tweaked code generation for SH2 and 8086.

  
Release 0.30 (2022/11/14):

  Modular-NOWUT replaced Multi-NOWUT, with new file names and compilation procedure.

  Added SH4 (little-endian) as a target platform.

  Added FILLPATTERN and USEFASTBASE statements.

  New, generic IFCPU and IFCPUNOT statements replaced the CPU-specific ones.

  Added a check and error message for when the compiler hits the maxsymbols limit.

  Added a check and warning to help prevent DB/DW/DD mixups.

  Fixed some problems with using GOTO/GOSUB with a hard-coded address.

  Tweaked some x86, 68K, and SuperH code output for size/performance.

  Fixed an 8086 assembler bug pertaining to indexed symbols with .h

  Added SWAP statement in the documentation. (It was already in the compiler for a while)

  8086 GOSUB and GOTO can take a calculated address. (Needed for the compiler to work.)

  Fixed startup code in PIOX68 which didn't always work before.


Release 0.31 (2023/5/11):

  Added ARM CPU support.

  Reserved storage in non-BSS sections now utilize the FILLPATTERN.

  Each reserved storage statement is no longer limited to 65535 bytes when compiling on 8086/68000.

  The size of reserved storage can now be calculated at compile time with +, -, *, _ operators.

  List files are now cleared before the second compilation pass, so old data will not remain at the end.


Release 0.32 (2023/8/26):

  Tweaked the compiler's parser code to better handle Shift-JIS in source code, and added the shiftjis
test program.

  Added filegetsize routine to PIOxxx modules.

  Fixed a bug in 8086 code generation and improved some conditional branches.

  Fixed reloc type mismatch between compiler and linker. Dreamcast example can be built again.


Release 0.33 (2024/5/20):

  Fixed a bug in 8086 COUNTDOWN/NEXTCOUNT loops with count >65535

  Fixed some ARM bugs.

  Added MIPSLE target to NOWUT, more MIPS instructions, and PIC32 platform in LINKBIN.

  Added Z280 module and MSXHELLO example program.

  Added SIB addressing and MMX instructions to CPUX86 module.

  Some redundant loads which previously occurred after a CALLEX, LOADBIG/LOADLITTLE, or a WHILExx
statement are no longer generated.

  Added -oc, -od, -ob options to LINKBIN, and fixed some bugs when running on big-endian CPU.


Release 0.34 (2024/8/25):

  Fixed LOADBIG/LOADLITTLE on SH4, MIPSLE, and ARMLE.

  Fixed COPYBYTES on ARM.

  Fixed a bug in data dumps placement, and added FLUSHIMM. Doing INCBIN without FLUSHIMM first produces
a warning message.

  Fixed data section COFF relocations on certain platforms.

  Moved a string from the CPUSH source file to the main NOWUT source file (since it is referenced also
by CPUARM).

  x86 assembler now has separate JCXZ and JECXZ instructions.


Release 0.35 (2025/11/26):

  Added -ac, -ad, -ab command line options to LINKBIN.

  8086 compilation now uses unrolled loops for shifts of 1 or 2 bit positions.

  Fixed X68000 JPG example so it doesn't cause a bus error.

  Added appgetpath and memory routines in the PIOxxx modules.

  USEFASTBASE can be disabled (and reenabled) within a source file.

  Fixed x86 FPU instructions: FSTP mem80 and FSTCW / FNSTCW

  Made corrections to Z280 instruction list.


