NOWUT v0.13 - a programming language and compiler

At this stage of development, both the language and compiler are incomplete. Errors may not be caught,
bugs may bite (see below for list of known bugs), and code will be suboptimal.

However, NOWUT can successfully compile itself, as well as several demo programs.

The compilers (NOWUT, NO68, NOSH2) are licensed under the GPL (see COPYING). (Example programs included
in the archive should be considered public domain unless stated otherwise.)

http://www.hyakushiki.net/anachro.htm
damage_x@hyakushiki.net


----Contents----

1) About the NOWUT compiler

2) About the NOWUT language
3) Example function
4) Symbols, labels, variables, constants
5) Operands
6) Calculation/assignment
7) NOWUT language statements/instructions
8) Functions/procedures

9) Internal assembler
  a. x86 instruction set
  b. 68K instruction set
  c. SH2 instruction set

10) Known bugs and limitations
11) Changes from the last version


----About the NOWUT compiler----

  The NOWUT compiler is a Win32 program written in NOWUT. During compilation it makes a single pass of
the source code and produces a COFF object file as output.

  The OBJ files contain exactly one each code, data, and BSS sections. Note that a data section with a
size of 0 bytes is known to result in executables that won't run (some dummy data might be needed).

Command line:

        NOWUT file

  The input file is assumed to have the extension .NO, therefore FILE.NO will be read and FILE.OBJ will
be written. 

These CPUs are currently supported:

NOWUT
  x86 32-bit - the internal assembler handles most instructions up to the Pentium, but does not handle
                SIB addressing modes, and it uses a nonstandard syntax
  8086 (experimental) - uses the same assembler as in 32-bit mode, but generates 16-bit real mode code.
                The compiler (mostly) simulates a 32-bit CPU so that dword variables can be used, however
                segment registers aren't handled and addressing is currently limited to 64KB.

NO68
  M68000 - the internal assembler handles all 68000 instructions but not all addressing modes, it also
                uses a nonstandard syntax

NOSH2
  SuperH - the internal assembler handles all SH2 instructions except one, and uses a nonstandard syntax

See the assembly section for details.

  The language currently has no floating-point support, however the x86 compiler supports x87 FPU
instructions in the internal assembler and the parser can convert constants into 32-bit FP values.

  NO68 and NOSH2 can't compile themselves because of platform-specific I/O as well as endianness
and alignment issues. Being able to compile on the Amiga or X68 would take a bit of work and hasn't been
a priority so far.


  OBJ files produced by the x86 version of the compiler can be linked using GoTools GoLink to make a
Win32 executable. Using the /base switch (eg. /base 00400000) causes GoLink to generate relocation data.
These executables can then also be used with Win32s (Windows 3.x) and may be used in a DOS environment with
the WDOSX stub (a DPMI host which includes some emulation of Win32 functions).

  OBJ files produced in 8086 mode can be linked into a DOS .COM program using LINKBIN.

  GOLINK requires a list of applicable DLLs as command line arguments. I use this command to compile my
Win32 stuff:

  golink %1.obj kernel32.dll user32.dll gdi32.dll winmm.dll /console

  As there is currently no standard library for NOWUT, programs could be written without any use of Win32
calls and target some other platform instead, although a suitable linker would be needed. (The built-in
CALLEX, BEGINFUNC, ENDFUNC, and RETURNEX statements may present a problem as they are designed around
Win32 calling conventions.)

  Code that aims to be cross-platform should also take endianness and memory alignment into account.
On the 68000, words and dwords must be aligned to 2-byte boundaries. On the SH2 they must be aligned to
2-byte or 4-byte boundaries. NOWUT includes ALIGNW and ALIGND statements for this purpose.

  The 68K and SH2 versions of the compiler also produce a COFF output file. The OBJ2BIN program was included
to turn these into executable formats, however it has now been replaced by LINKBIN. LINKBIN works the same
way but allows multiple OBJ files to specified on the command line, joining them together into one binary.
This is handy for cross-platform code which could, for instance, include the main program in one module and
platform-specific functions in a separate one. (OBJ2BIN is still included just in case.)

  LINKBIN has only been tested with two input files at a time. Modify the MAXFILES parameter in the source
to enable more than two. (In theory, it should work...)

These formats are supported:

  Sega 32X and Genesis/MD ROM images - SH2 or 68K side        

  Amiga 68K executables

  8086 PC .COM programs

  Sega Saturn binary (load and execute at $06004000)

  Sharp X68000 executable (Human68K)

  The Sega 32X hardware is inactive when the system powers on. The 68K has control, and must enable the
32X. When generating a 32X ROM image with SH2 code, a stub file (68KPART8.32X) occupies the first 4KB of
the final image. The stub performs initialization and hands control over to the SH2. It also polls the
controller ports and passes the data to the SH2 through shared registers. The source code for the stub is
68KPART8.NO


----About the NOWUT language----

  The goal was to combine certain aspects of assembly and high-level languages in a different way than what
has been done before.

NOWUT borrows these ideas from assembly:

  simplistic syntax consisting of instructions followed by operands 

  manual layout of initialized data, uninitialized data, and data structures

  no enforcement of data types

It borrows these ideas from HLLs:

  avoids (mostly) being CPU specific

  no micro-management of CPU register usage

  calculations can be specified in a form similar to mathematical notation (assignment)

NOWUT also handles inline assembly code with a nonstandard syntax.

  I should mention that the name NOWUT is an acronym which expands to "No One Will Use This." I figure that
if anyone else shared my taste in programming languages, I wouldn't have been forced to create my own! I
suspect that NOWUT may not become extremely popular.


----Example function in NOWUT----

        ; this line is a comment. comments are preceded by ' or ; characters

examplefunc:                                        ; examplefunc is a label (address)

        beginfunc param1.d,param2.d                 ; this function receives two parameters
                                                    ; they will be referred to as param1 and param2
                                                    ; and are dwords (32-bit words)

        localvar xx.d                               ; a local dword variable will be added to the stack

        xx=param1+1                                 ; here is an example of assignment

        countdown xx                                ; this begins a simple type of loop

        param2=_ shl 1                              ; an underscore refers back to the left side of
                                                    ; equals sign.
                                                    ; param2 becomes (itself) shifted left one bit

        nextcount                                   ; this is the end of the "countdown" loop

        endfunc param2                              ; the function's return value will be equal to
                                                    ; the value of param2

        returnex 8                                  ; this causes program flow to return to the caller.
                                                    ; it also removes 8 bytes from the stack (important)
                                                    ; which had been occupied by param1 and param2


----Symbols, labels, variables, constants----

  Labels, variables, and constants are considered symbols. Symbol names are currently limited to 64
characters. (Currently, long symbol names may cause internal buffers to overflow and generate an
error message.) Symbol names are not case-sensitive, though a particular case can be written to the
output file for the benefit of case-sensitive linkers (the EXTERN statement is used for this purpose).
Symbol names must contain at least one letter, and may contain other characters EXCEPT these:

 ' ; : ! $ ( ) [ ] , . ? + - * / = " >

Symbol names that are the same as CPU register names should not be used.

  Normally, labels are defined with a colon. An exclamation point is used instead of a colon to define
an exported symbol (can be referred to in other modules). For instance, when using the GoTools GoLink
linker, the program's entry point should be defined like so:

start!

  (The program entry point for other platforms is the beginning of the code section.)

  Every label has an address associated with it, although the actual address is not determined until
linking or upon the executable code being loaded by the operating system. The label "examplefunc" can
be used as the target of a jump/branch/call (in assembly), a goto/gosub/callex (in NOWUT), it can
have its address used in a calulation (eg. xx=examplefunc+40), or it can be used to refer to memory
contents (eg. xx=examplefunc.d).

  Labels and global variables are actually the same thing, except that labels used to refer to either
initialized data or uninitialized data will generally be defined with an appropriate default type.
However, this is not required, and when a symbol is referenced the default type can be overridden.

exampleaddr.a:                ; exampleaddr will be handled as an address
exampleaddr:                  ;      -       -     -        -     address (same as .a)
exampleaddr.b:                ;      -       -     -        -     byte value
exampleaddr.sb:               ;      -       -     -        -     signed byte value
exampleaddr.w:                ;      -       -     -        -     word value (16-bit)
exampleaddr.sw:               ;      -       -     -        -     signed word value
exampleaddr.d:                ;      -       -     -        -     dword value (32-bit)
exampleaddr.sd:               ;      -       -     -        -     signed dword value

  The default type is used when a label is referenced without any type tag (ie. no dot anything).

xx=exampleaddr                ; operation depends on the default type
xx=exampleaddr.a              ; always loads the address
xx=exampleaddr.d              ; always loads a dword

  The default type is determined when a label is FIRST referrenced (not necessarily when it is defined!).
Because the compiler only does one pass of the source code, it is therefore recommended to place any
initialized or uninitialized data BEFORE the program code. 

Recommended:

        sectiondata

dwordval.d:
        dd $1234ABCD

        sectionbss

buffer.d:
        resd 16

        sectioncode

        buffer(4)=dwordval


Problematic:

        sectioncode

        buffer.d(4)=dwordval      ; dwordval will be interpretted as an address
                                  ; because its definition has not yet been read by the compiler

        sectiondata

dwordval.d:
        dd $1234ABCD

        sectionbss

buffer.d:
        resd 16

  Alternatively, the DEF statement can be used at the beginning of a source file (regardless of section) to
manually initialize a symbol's default type. This is especially useful when referencing symbols in another
module.

  Variables that are defined by a beginfunc or localvar statement are located on the stack and have some
differences. The first is that the names may be used independently in multiple functions, and they have
no relevance outside of the function in which they are defined. The second difference is that the address
of a stack variable is not valid.

        localvar xx.d,yy.d

        xx=yy.a                ; this does NOT work!
                    
This functionality might be added to a later version of the compiler. But currently, the compiler passes
variables to the internal assembler as-is, and the assembler simply alters the addressing mode to refer
to the stack, it does not insert additional instructions (eg. an LEA) to determine the address.

Because the SH2 CPU only has register-indirect addressing with a 4-bit positive displacement, the SH2
version of the NOWUT compiler sets aside two registers for local variables. It limits the number of
local variables to 32, and they must be dwords (signed or unsigned). Function parameters are limited to
12 and also must be dwords.


Signed and unsigned types are handled the same during many operations, but there are some where it is
important to differentiate:

loading smaller data sizes:

        xx=array.b(5)                ; the byte will be zero-extended
        xx=array.sb(5)               ; the byte will be sign-extended
        xx=array.w(10)               ; the word will be zero-extended
        xx=array.sw(10)              ; the word will be sign-extended

comparisons:

        ifgreater xx.d,0,branchtarget        ; will branch unless xx is 0
        ifgreater xx.sd,0,branchtarget       ; will branch unless xx is 0 or negative

shifting right:

        xx=yy.d shr 1                ; top bit will become 0
        xx=yy.sd shr 1               ; top bit will become 0 (doh!)
        xx=yy shr 1.sb               ; top bit will remain the same

Due to a certain situation in the compiler, a signed shift-right will not be used unless the shift count
is specified as a signed value. This is a bug.


Constants are defined with the CONST statement:

        const secretvalue,3579545

References to the symbol (eg. secretvalue) will be replaced with the value. Only numeric values are
allowed (although this includes ASCII values).


----Operands----

  Basically, everything that is accepted as part of an assignment/calculation or as an argument to an
instruction is considered an operand. These include numeric values, strings, symbols, and combinations
of such.

  format in NOWUT        format in assembly                description

1234                        1234                        decimal number                 \
$1234                       $1234                       hex number                      \
0x1234                      0x1234                      hex number                       \  immediate
"a".b                       "a".b                       ASCII byte                       /
"ab".w                      "ab".w                      ASCII word                      /
"abcd".d                    "abcd".d                    ASCII dword                    /
constname                   constname                   a symbol defined with CONST   /
99000.h                     99000.h                     high word of a value (used by 8086 compiler)
varname                                                 address or memory reference
                            varname                     address
varname.a                   varname.a                   address
varname.b                                               memory reference (byte variable)
varname.sb                                              memory reference (signed byte variable)
varname.w                                               memory reference (word variable)
varname.sw                                              memory reference (signed word variable)
varname.d                                               memory reference (dword variable)
varname.sd                                              memory reference (signed dword variable)
                            [varname].b                 memory reference (byte variable)
                            [varname].w                 memory reference (word variable)
                            [varname].d                 memory reference (sword variable)
                            reg                         a CPU register
                            reg.b                       a byte CPU register (68K only)
                            reg.w                       a byte CPU register (68K only)
                            reg.d                       a byte CPU register (68K only)
                            [reg]                       memory reference (address contained in reg)
                            [reg].b                     memory reference (byte at address in reg)
                            [reg].w                     memory reference (word at address in reg)
                            [reg].d                     memory reference (dword at address in reg)
                            [reg+]                      memory reference with post-increment (68K)
                            [reg+].b                    memory reference with post-increment (68K and SH2)
                            [reg+].w                    memory reference with post-increment (68K and SH2)
                            [reg+].d                    memory reference with post-increment (68K and SH2)
                            [reg-]                      memory reference with pre-decrement (68K)
                            [reg-].b                    memory reference with pre-decrement (68K and SH2)
                            [reg-].w                    memory reference with pre-decrement (68K and SH2)
                            [reg-].d                    memory reference with pre-decrement (68K and SH2)
                            [reg+xx].x                  other indirect addressing modes (CPU dependent)
[varname].b                                             indirect memory reference (byte variable)
[varname].sb                                            indirect memory reference (signed byte variable)
[varname].w                                             indirect memory reference (word variable)
[varname].sw                                            indirect memory reference (signed word variable)
[varname].d                                             indirect memory reference (dword variable)
[varname].sd                                            indirect memory reference (signed dword variable)
varname(xx)                                             indexed address or memory reference
varname.a(xx)                                           indexed address
varname.b(xx)                                           indexed memory reference (byte)
varname.sb(xx)                                          indexed memory reference (signed byte)
varname.w(xx)                                           indexed memory reference (word)
varname.sw(xx)                                          indexed memory reference (signed word)
varname.d(xx)                                           indexed memory reference (dword)
varname.sd(xx)                                          indexed memory reference (signed dword)
"abcxyz123"                                             string
_                                                       "itself" (left side of an assignment)


Note that a blank operand in a series of operands separated by commas will be interpretted as 0.

        callex result.d,somefunction,param1,,,param4        ; zeros are passed as 2nd and 3rd parameters


  There are some differences between operand formats in NOWUT vs. the internal assembler. Some CPU
instructions make indirect memory references while using the same encoding as instructions which access
memory without the indirection (and use the same syntax in their standard assembly languages), hence
these ambiguities persist here. In assembly mode, memory references always use square brackets around
a register name, address, or symbol. In NOWUT language mode, square brackets are only used for indirect
memory references. Also in NOWUT language mode, calculations are currently not allowed inside of square
brackets.

[address+48].d=65                                ; this does NOT work

tempvar=address+48 > [tempvar].d=65              ; use this instead


  Strings are only used for initialized data, the EXTERN statement, and the INCBIN statement. There is
currently no high-level support for strings in NOWUT code.


register names:

x86 -   eax ecx edx ebx esp ebp esi edi
        ax  cx  dx  bx  sp  bp  si  di
        al  cl  dl  bl  ah  ch  dh  bh
        es  cs  ss  ds  fs  gs
        cr0     cr2 cr3 cr4
        dr0 dr1 dr2 dr3         dr6 dr7
        st0 st1 st2 st3 st4 st5 st6 st7

Since x86 registers have an inherent size, no size tag is needed.


68K -   d0  d1  d2  d3  d4  d5  d6  d7
        a0  a1  a2  a3  a4  a5  a6  a7
        ccr sr  pc

Data registers may be byte, word, or dword size. Address registers may be word or dword.


SH2 -   r0  r1  r2  r3  r4  r5  r6  r7
        r8  r9  r10 r11 r12 r13 r14 r15
        macl mach sr   gbr  vbr  pr  pc

SH2 registers are all 32 bits, no size tag is needed.


----Calculation/assignment----

  A calculation is a series of operands and operators that cause a value to be computed at runtime. This
includes steps taken to calculate the address of a memory reference. Assignment causes a value to be
stored in a memory location. Assignment uses the equals sign.

somevar=foo+1*100                

The value of foo is loaded, 1 is added, it's multiplied by 100, the result is stored in somevar.

Needless to say, somevar should be a memory reference. If its default type is .a (an address) then
compilation will fail with an error message.

  Operations are performed in order from left to right, except where parenthesis are used to specify
that a different order should be used. When the compiler encounters parenthesis it pushes the current
value on the stack, performs the calculation inside the parenthesis, then pops the stack to continue.
This also applies to indexed symbols. Because the compiler doesn't attempt to optimize such things,
manually re-ordering operations to avoid parenthesis results in sleeker code. 

somevar=foo+(bar*100)                ; OK, but suboptimal
somevar=bar*100+foo                  ; better

Currently, parenthesis may be nested up to 20 levels deep.

supported operators:

  +    addition
  -    subtraction
  *    multiplication
  /    division
  and  logical AND
  or   logical OR
  shl  logical shift-left
  shr  logical or arithmetic shift-right
  xor  logical exclusive OR

The parser handles operands and operators as a unit, and as a side effect of this, extra spaces should
not be inserted between them.

somevar=foo +10                  ; bad
somevar=foo+ 10                  ; acceptable
somevar=foo shl 2                ; good
somevar=foo  shl 2               ; bad

The underscore operand refers to the target of an assignment. It is particularly handy when the target
involves a complex address calculation that would otherwise need to be repeated.

array(index shl 4+12)=array(index shl 4+12)+20                ; OK, but ugly and slow
array(index shl 4+12)=_+20                                    ; better

Indices on symbols are always calculated in terms of bytes. If you want to access a numbered word in an
array of words, be sure to use a shift in your index.

array.d(0)=$00000001                ; the first dword
array.d(4)=$00000002                ; the second dword
array.d(N shl 2)=n                  ; the Nth dword

Hence, mixing of bytes, words, and dwords in a data structure is easily accomplished. But unaligned
memory accesses are also possible, and may not be desirable. The same memory location can also be
accessed with differing data sizes at different times, however this may cause unexpected results if
endianness is not taken into account.

  Calculations are sometimes accepted as operands in place of memory references or numeric values. See
the next section for details.


----NOWUT language statements/instructions----

The following is a list of recognized instructions in NOWUT language mode:

  instruction     number of operands   type of operand       description

ALIGNW            none                                       adds a byte if necessary to ensure an
                                                               even address

ALIGND            none                                       adds bytes if necessary to ensure an
                                                               address divisible by 4

ALIGNQ            none                                       adds bytes if necessary to ensure an
                                                               address divisible by 8

ASM               none                                       switches to assembly language mode

BEGINFUNC         variable             symbol                pushes registers on the stack and
                                                               optionally defines any parameters that
                                                               were provided by the caller
  Note:
        parameters will be listed in the reverse order compared to CALLEX

BITS16            none                                       switches x86 compilation mode to 8086

BITS32            none                                       switches x86 compilation mode to 386 

CALLEX            1                    memory reference      pushes any parameters on the stack, calls
                  1                    immediate, address      a function, and optionally stores a
                  variable             calculation             return value
  examples:
        callex result.d,functionname                 ; function call that receives a return value
        callex ,functionname                         ; function call without return value
        callex ,functionname,param1.d,foo*320+bar    ; function call with two parameters
  Note:
        parameters are pushed on the stack from left to right and will appear in the reverse order
          compared to BEGINFUNC

CONST             1                    symbol                associates a value with a symbol
                  1                    immediate
  example:
        const bufsize,16384                          ; occurences of bufsize are replaced with 16384

COUNTDOWN         1                    memory reference      marks the beginning of a loop
  example:
        countdown xx                                 ; 
        [...]                                        ; the code inside will execute xx times, and
        nextcount                                    ; afterward xx will be 0

DB                1 or more            immediate, string     initialized data

DW                1 or more            immediate, string     initialized data

DD                1 or more            immediate, string,    initialized data
                                         address, indexed
                                         address           

DEF               1 or more            symbol                provides the default type for a new symbol
                                                               (does not change an existing default)
  example:
        def var1.d,value2.b                          ; tells the compiler that var1 will default to
                                                     ; dword size, and value2 to byte size

ENDFUNC           0 or 1               immediate, memory     invalidates local variables and stack-
                                         reference             based parameters, pops the stack, and
                                                               optionally loads a return value

EXTERN            1 or more            string                specifies a case-sensitive alias for symbol
                                                               names that will be written to output file
                                                               for the benefit of case-sensitive linkers

GOSUB             1                    immediate, address    calls a subroutine without changing the
                                                               stack frame

GOTO              1                    immediate, address    jumps to an arbitrary location

IFEQUAL           1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps to the location specified in
                                         reference             the third operand if they are equal
                  1                    address                                                    
  example:
        ifequal foo,15,someroutine                   ; jumps if foo is equal to 15

IFGREATER         1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps if the first is greater
                                         reference
                  1                    address                                                    
  example:
        ifgreater bar+10,foo,someroutine             ; jumps if bar+10 is greater than foo

IFLESS            1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps if the first is less
                                         reference
                  1                    address                                                    

IFUNEQUAL         1                    calculation           compares the first and second operands,
                  1                    immediate, memory       then jumps to the location specified in
                                         reference             the third operand if they are unequal
                  1                    address                                                    

INCBIN            1 or more            string                loads initialized data from a file
  example:
        incbin "gamegfx.bin"                         ; inserts the contents of GAMEGFX.BIN  

LOCALVAR          1 or more            symbol with tag       defines stack-based variables to be used
                                                               within a function (must come after
                                                               BEGINFUNC)

NEXTCOUNT         none                                       decrements the variable specified by the
                                                               associated COUNTDOWN, then jumps to the
                                                               beginning of the loop if the result is
                                                               not 0         

RESB              1                    immediate             reserves the number of bytes specified

RESW              1                    immediate             reserves the number of words specified

RESD              1                    immediate             reserves the number of dwords specified

RESQ              1                    immediate             reserves the number of qwords specified
  Note:
        If RESx are used in the code or data sections, zeros will be written.

RETURN            none                                       returns from a routine called by GOSUB

RETURNEX          1                    immediate             returns from a routine called by CALLEX
                                                               and pops the specified number of bytes
                                                               from the stack
  Note:
        Even if there are no parameters on the stack, RETURN and RETURNEX shouldn't be mixed up as the
          generated code is incompatible on the SH2.

SECTIONBSS        none                                       marks the beginning of the BSS section
                                                               (reserved storage)

SECTIONCODE       none                                       marks the beginning of the code section

SECTIONDATA       none                                       marks the beginning of the data section
                                                               (initialized data)
  Note:
        section headers should not appear more than once in a NOWUT program

WEND              none                                       marks the end of a WHILE loop (jumps back
                                                               to the beginning)

WHILEEQUAL        1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if they are unequal
                                         reference
  example:
        WHILEEQUAL [pointer].b,0                     ; if the memory location [pointer] contains a
        pointer=_+1                                  ; zero byte then the loop will execute, and it
        WEND                                         ; will continue until a non-zero byte is found

WHILEGREATER      1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if the first is not
                                         reference             greater

WHILELESS         1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if the first is not
                                         reference             less    
  example:
        xx=0
        WHILELESS xx,5                               ; the beep routine will be called 5 times
        GOSUB beep > xx=_+1
        WEND

WHILEUNEQUAL      1                    calculation           compares the operands and jumps to the
                  1                    immediate, memory       corresponding WEND if they are equal
                                         reference


----Functions/procedures----

  There are no declarations needed, return values are optional, and functions can even have multiple entry
and exit points. But care needs to be taken to avoid stack corruption and to refrain from referencing
local variables when they are not valid.

        ; example of a function with multiple entry points and an internal subroutine

somefunc2:
        globalsetting=defaultval
somefunc:
        beginfunc address.d
        localvar x.d,y.d

        x=address > y=8 > gosub printstuff

        ifequal globalsetting,0,label123

        x=carriagertn > y=2 > gosub printstuff
        goto label123


printstuff:
        callex ,writefile,,tempvar,y,x,chandle
        return        

label123:        
        endfunc
        returnex 4

  Imagine that you can call somefunc, and pass the address of an eight-character string which will then be
printed (using a Win32 call to write to an open console). If globalsetting is zero then no carriage return
is added to the output, otherwise it is added. If the caller wanted to override globalsetting then it could
call somefunc2 instead, which sets globalsetting itself, before the function proceeds.

  Having code execute before BEGINFUNC is no problem as long as it only references global variables. The
address, x, and y symbols are not yet valid until after BEGINFUNC and LOCALVAR.

  Likewise, when printstuff is called from within somefunc, the x and y variables are valid. However,
printstuff can NOT be called from outside of somefunc because x and y will point to who-knows-what.

  The compiler doesn't invalidate stack variables belonging to one function until it sees a BEGINFUNC
pertaining to another function. At that point, it will give an error if access to such variables is
attempted. At runtime, they would become invalid as soon as an ENDFUNC is executed. It's possible to have
more than one ENDFUNC associated with a function.

        ; example function with two exit points

anotherfunc:
        beginfunc param1.d,param2.d

        ifequal param1,param2,labelxyz
        endfunc 0
        returnex 8
labelxyz:
        endfunc 1
        returnex 8

Note that if you don't need to pass any parameters, GOSUB and RETURN can be used instead of CALLEX and
RETURNEX.

  
----Internal assembler----

  The NOWUT compiler generates assembly code and feeds it back to its internal assembler. This code
can be seen in comments in the .LST file that is generated during compilation. If an error occurs
during compilation, it is convenient to look at the end of the .LST file to see how far it progressed
before encountering the error. Hand-written assembly language can be included in NOWUT programs by
using the ASM and ENDASM statements.

These CPU-independent statements are recognized by the assembler:

ALIGNW                \
ALIGND                 \
ALIGNQ                  \ same as NOWUT mode
DB                      /
DW                     /
DD                    /

ENDASM - returns to NOWUT mode


The x86 instruction set:

  Instruction names are as usual, except for 8-bit jumps which have been given their own separate forms
SJMP and SJcc. The compiler always uses the long forms in 32-bit mode. Memory operands are contained in
square brackets, and .b .w .d tags are used to make the operand size explicit. Destination operands go on
the left, source operands on the right.

  In 32-bit mode (the default mode), operand-size prefixes are inserted before instructions that use 16-bit
words. The reverse is true for 16-bit mode, where instructions with a 32-bit operand size will have a
prefix inserted.

  SIB addressing modes (eg. [ESI+EAX*4]) are currently not supported in any mode.

  The instruction listing below describes acceptable operands mostly in terms of how they are encoded. The
equivalent NOWUT syntax or operand types are as follows:

  x86 operand        NOWUT                           notes

imm8               immediate                         8 bits, often sign-extended
imm16              immediate                         (could be an address in 16-bit mode)
imm32              immediate or address
reg8                                                 al  cl  dl  bl  ah  ch  dh  bh
reg16                                                ax  cx  dx  bx  sp  bp  si  di
reg32                                                eax ecx edx ebx esp ebp esi edi
segreg                                               es  cs  ss  ds  fs  gs
CRx                                                  cr0 cr2 cr3 cr4
DRx                                                  dr0 dr1 dr2 dr3 dr6 dr7
freg                                                 st0 st1 st2 st3 st4 st5 st6 st7
mem8               [immediate].b, [address].b,
                   [reg].b, [reg+xx].b
mem16              [immediate].w, [address].w,
                   [reg].w, [reg+xx].w
mem32              [immediate].d, [address].d,
                   [reg].d, [reg+xx].d
mem48/64           [immediate], [address],
                   [reg], [reg+xx]
disp16/32          [immediate], [address]            accesses memory of various sizes but doesn't use
                                                       mod/rm encoding
rm8  - same as mem8 or reg8
rm16 - same as mem16 or reg16
rm32 - same as mem32 or reg32

  The assembler will accept operands without a size tag, however in some cases the size ambiguity will
mean that more than one opcode would be valid. Shorter opcodes are generally favored.

  example:
        PUSH 7                ; this will be assembled as an imm8 instead of imm32

Also note that on the x86, displacements can be negative: MOV EAX,[EBP-40]

Hand-written assembly code should not modify (or should save and restore) the EBP register as the compiler
uses it to address stack variables. EBP is used to address stack variables in assembly code as well, and
these variables will become invalid when it is modified.

AAA                     $37
AAD imm8                $D5
AAM imm8                $D4
AAS                     $3F

ADC AL,imm8             $14
ADC AX/EAX,imm16/32     $15
ADC rm8,reg8            $10
ADC rm16/32,reg16/32    $11
ADC reg8,rm8            $12
ADC reg16/32,rm16/32    $13
ADC rm8,imm8            $80 /2
ADC rm16/32,imm16/32    $81 /2
ADC rm16/32,imm8        $83 /2

ADD AL,imm8             $04
ADD AX/EAX,imm16/32     $05
ADD rm8,reg8            $00
ADD rm16/32,reg16/32    $01
ADD reg8,rm8            $02
ADD reg16/32,rm16/32    $03
ADD rm8,imm8            $80 /0
ADD rm16/32,imm16/32    $81 /0
ADD rm16/32,imm8        $83 /0

AND AL,imm8             $24
AND AX/EAX,imm16/32     $25
AND rm8,reg8            $20
AND rm16/32,reg16/32    $21
AND reg8,rm8            $22
AND reg16/32,rm16/32    $23
AND rm8,imm8            $80 /4
AND rm16/32,imm16/32    $81 /4
AND rm16/32,imm8        $83 /4

ARPL rm16,reg16         $63
BOUND reg16/32,mem16/32 $62
BSF reg16/32,rm16/32    $0F BC
BSR reg16/32,rm16/32    $0F BD
BSWAP reg32             $0F C8+r

BT rm16/32,reg16/32     $0F A3
BT rm16/32,imm8         $0F BA /4
BTC rm16/32,reg16/32    $0F BB
BTC rm16/32,imm8        $0F BA /7
BTR rm16/32,reg16/32    $0F B3
BTR rm16/32,imm8        $0F BA /6
BTS rm16/32,reg16/32    $0F AB
BTS rm16/32,imm8        $0F BA /5

CALL imm16/32           $E8
CALL rm16/32            $FF /2

CDQ                     $99
CLC                     $F8
CLD                     $FC
CLI                     $FA
CLTS                    $0F 06
CMC                     $F5

CMP AL,imm8             $3C
CMP AX/EAX,imm16/32     $3D
CMP rm8,reg8            $38
CMP rm16/32,reg16/32    $39
CMP reg8,rm8            $3A
CMP reg16/32,rm16/32    $3B
CMP rm8,imm8            $80 /7
CMP rm16/32,imm16/32    $81 /7
CMP rm16/32,imm8        $83 /7

CMPSB                   $A6
CMPSD                   $A7
CMPSW                   $A7

CMPXCHG rm8,reg8           $0F B0
CMPXCHG rm16/32,reg16/32   $0F B1
CMPXCHG8B mem64            $0F C7 /1

CPUID                   $0F A2
CWDE                    $98
DAA                     $27
DAS                     $2F

DEC reg16/32            $48+r
DEC rm8                 $FE /1
DEC rm16/32             $FF /1

DIV rm8                 $F6 /6
DIV rm16/32             $F7 /6

HLT                     $F4

IDIV rm8                $F6 /7
IDIV rm16/32            $F7 /7

IMUL rm8                $F6 /5
IMUL rm16/32            $F7 /5
IMUL AL,rm8             $F6 /5
IMUL AX/EAX,rm16/32     $F7 /5
IMUL reg16/32,rm16/32   $0F AF
IMUL reg16/32,imm8      $6B
IMUL reg16/32,imm16/32  $69
IMUL reg16/32,rm16/32,imm8      $6B
IMUL reg16/32,rm16/32,imm16/32  $69

IN AL,imm8              $E4
IN AX/EAX,imm8          $E5
IN AL,DX                $EC
IN AX/EAX,DX            $ED

INC reg16/32            $40+r
INC rm8                 $FE /0
INC rm16/32             $FF /0

INSB                    $6C
INSD                    $6D
INSW                    $6D
INT imm8                $CD
INT3                    $CC
INTO                    $CE
INVD                    $0F 08
INVLPG                  $0F 01 /0
IRET                    $CF
JECXZ imm8              $E3

Jcc imm16/32            $0F 80+cc
JMP imm16/32            $E9
JMP rm16/32             $FF /4

LAHF                    $9F
LAR reg16/32,rm16/32    $0F 02

LDS reg16/32,mem32/48   $C5
LES reg16/32,mem32/48   $C4
LFS reg16/32,mem32/48   $0F B4
LGS reg16/32,mem32/48   $0F B5
LSS reg16/32,mem32/48   $0F B2

LEA reg16/32,mem        $8D
LEAVE                   $C9

LGDT mem48              $0F 01 /2
LIDT mem48              $0F 01 /3
LLDT rm16               $0F 00 /2
LMSW rm16               $0F 01 /6

LODSB                   $AC
LODSD                   $AD
LODSW                   $AD
LSL reg16/32,rm16/32    $0F 03
LTR rm16                $0F 00 /3

MOV AL,disp16/32        $A0
MOV AX/EAX,disp16/32    $A1
MOV disp16/32,AL        $A2
MOV disp16/32,AX/EAX    $A3
MOV rm8,reg8            $88
MOV rm16/32,reg16/32    $89
MOV reg8,rm8            $8A
MOV reg16/32,rm16/32    $8B
MOV reg8,imm8           $B0+r
MOV reg16/32,imm16/32   $B8+r
MOV rm8,imm8            $C6 /0
MOV rm16/32,imm16/32    $C7 /0
MOV rm16/32,segreg      $8C
MOV segreg,rm16/32      $8E
MOV reg32,CRx           $0F 20
MOV reg32,DRx           $0F 21
MOV CRx,reg32           $0F 22
MOV DRx,reg32           $0F 23

MOVSB                   $A4
MOVSD                   $A5
MOVSW                   $A5

MOVSX reg16/32,rm8      $0F BE
MOVSX reg32,rm16        $0F BF
MOVZX reg16/32,rm8      $0F B6
MOVZX reg32,rm16        $0F B7

MUL rm8                 $F6 /4
MUL rm16/32             $F7 /4

NEG rm8                 $F6 /3
NEG rm16/32             $F7 /3

NOP                     $90

NOT rm8                 $F6 /2
NOT rm16/32             $F7 /2

OR AL,imm8              $0C
OR AX/EAX,imm16/32      $0D
OR rm8,reg8             $08
OR rm16/32,reg16/32     $09
OR reg8,rm8             $0A
OR reg16/32,rm16/32     $0B
OR rm8,imm8             $80 /1
OR rm16/32,imm16/32     $81 /1
OR rm16/32,imm8         $83 /1

OUT imm8,AL             $E6
OUT imm8,AX/EAX         $E7
OUT DX,AL               $EE
OUT DX,AX/EAX           $EF

OUTSB                   $6E
OUTSD                   $6F
OUTSW                   $6F

POP reg16/32            $58+r
POP rm16/32             $8F /0
POP DS                  $1F
POP ES                  $07
POP SS                  $17
POP FS                  $0F A1
POP GS                  $0F A9
POPA                    $61
POPF                    $9D

PUSH reg16/32           $50+r
PUSH rm16/32            $FF /6
PUSH imm8               $6A
PUSH imm16/32           $68
PUSH CS                 $0E
PUSH DS                 $1E
PUSH ES                 $06
PUSH SS                 $16
PUSH FS                 $0F A0
PUSH GS                 $0F A8
PUSHA                   $60
PUSHF                   $9C

RCL rm8                 $D0 /2
RCL rm8,CL              $D2 /2
RCL rm8,imm8            $C0 /2
RCL rm16/32             $D1 /2
RCL rm16/32,CL          $D3 /2
RCL rm16/32,imm8        $C1 /2

RCR rm8                 $D0 /3
RCR rm8,CL              $D2 /3
RCR rm8,imm8            $C0 /3
RCR rm16/32             $D1 /3
RCR rm16/32,CL          $D3 /3
RCR rm16/32,imm8        $C1 /3

RDMSR                   $0F 32
RDTSC                   $0F 31

RET                     $C3
RET imm16               $C2
RETF                    $CB
RETF imm16              $CA

ROL rm8                 $D0 /0
ROL rm8,CL              $D2 /0
ROL rm8,imm8            $C0 /0
ROL rm16/32             $D1 /0
ROL rm16/32,CL          $D3 /0
ROL rm16/32,imm8        $C1 /0

ROR rm8                 $D0 /1
ROR rm8,CL              $D2 /1
ROR rm8,imm8            $C0 /1
ROR rm16/32             $D1 /1
ROR rm16/32,CL          $D3 /1
ROR rm16/32,imm8        $C1 /1

SAL rm8                 $D0 /4
SAL rm8,CL              $D2 /4
SAL rm8,imm8            $C0 /4
SAL rm16/32             $D1 /4
SAL rm16/32,CL          $D3 /4
SAL rm16/32,imm8        $C1 /4

SAHF                    $9E

SAR rm8                 $D0 /7
SAR rm8,CL              $D2 /7
SAR rm8,imm8            $C0 /7
SAR rm16/32             $D1 /7
SAR rm16/32,CL          $D3 /7
SAR rm16/32,imm8        $C1 /7

SBB AL,imm8             $1C
SBB AX/EAX,imm16/32     $1D
SBB rm8,reg8            $18
SBB rm16/32,reg16/32    $19
SBB reg8,rm8            $1A
SBB reg16/32,rm16/32    $1B
SBB rm8,imm8            $80 /3
SBB rm16/32,imm16/32    $81 /3
SBB rm16/32,imm8        $83 /3

SCASB                   $AE
SCASD                   $AF
SCASW                   $AF

SETcc rm8               $0F 90+cc /2

SGDT mem48              $0F 01 /0
SIDT mem48              $0F 01 /1
SLDT rm16               $0F 00 /0

SHL rm8                 $D0 /4
SHL rm8,CL              $D2 /4
SHL rm8,imm8            $C0 /4
SHL rm16/32             $D1 /4
SHL rm16/32,CL          $D3 /4
SHL rm16/32,imm8        $C1 /4

SHR rm8                 $D0 /5
SHR rm8,CL              $D2 /5
SHR rm8,imm8            $C0 /5
SHR rm16/32             $D1 /5
SHR rm16/32,CL          $D3 /5
SHR rm16/32,imm8        $C1 /5

SHLD rm16/32,reg16/32,imm8      $0F A4
SHLD rm16/32,reg16/32,CL        $0F A5
SHRD rm16/32,reg16/32,imm8      $0F AC
SHRD rm16/32,reg16/32,CL        $0F AD

SJcc imm8 or address              $70+cc
SJMP imm8 or address              $EB

SMSW rm16               $0F 01 /4

STC                     $F9
STD                     $FD
STI                     $FB
STOSB                   $AA
STOSD                   $AB
STOSW                   $AB
STR rm16                $0F 00 /1

SUB AL,imm8             $2C
SUB AX/EAX,imm16/32     $2D
SUB rm8,reg8            $28
SUB rm16/32,reg16/32    $29
SUB reg8,rm8            $2A
SUB reg16/32,rm16/32    $2B
SUB rm8,imm8            $80 /5
SUB rm16/32,imm16/32    $81 /5
SUB rm16/32,imm8        $83 /5

TEST AL,imm8            $A8
TEST AX/EAX,imm16/32    $A9
TEST rm8,reg8           $84
TEST rm16/32,reg16/32   $85
TEST rm8,imm8           $F6 /0
TEST rm16/32,imm16/32   $F7 /0

VERR rm16               $0F 00 /4
VERW rm16               $0F 00 /5

WAIT                    $9B
WBINVD                  $0F 09
WRMSR                   $0F 30

XADD rm8,reg8           $0F C0
XADD rm16/32,reg16/32   $0F C1

XCHG AX/EAX,reg16/32    $90+r
XCHG reg16/32,AX/EAX    $90+r
XCHG reg8,rm8           $86
XCHG rm8,reg8           $86
XCHG reg16/32,rm16/32   $87
XCHG rm16/32,reg16/32   $87

XOR AL,imm8             $34
XOR AX/EAX,imm16/32     $35
XOR rm8,reg8            $30
XOR rm16/32,reg16/32    $31
XOR reg8,rm8            $32
XOR reg16/32,rm16/32    $33
XOR rm8,imm8            $80 /6
XOR rm16/32,imm16/32    $81 /6
XOR rm16/32,imm8        $83 /6

XLATB                   $D7

The following prefixes are supported:

ASIZE        $67        (address size override)
CS           $2E
DS           $3E
ES           $26
FS           $64
GS           $65
SS           $36
LOCK         $F0
REPNZ/NE     $F2
REP/E/Z      $F3

The following x87 instructions are supported:

F2XM1                   $D9 $F0
FABS                    $D9 $E1

FADD mem32              $D8 /0
FADD mem64              $DC /0
FADD freg               $D8 $C0+r
FADD freg,ST0           $DC $C0+r
FADDP freg,ST0          $DE $C0+r

FBLD mem80              $DF /4
FBSTP mem80             $DF /6

FCHS                    $D9 $E0

FCLEX                   $9B $DB $E2
FNCLEX                  $DB $E2

FCMOVB freg             $DA $C0+r
FCMOVBE freg            $DA $D0+r
FCMOVE freg             $DA $C8+r        \
FCMOVNB freg            $DB $C0+r         \  P6 instructions
FCMOVNBE freg           $DB $D0+r         /
FCMOVNE freg            $DB $C8+r        /
FCMOVNU freg            $DB $D8+r
FCMOVU freg             $DA $D8+r

FCOM mem32              $D8 /2
FCOM mem64              $DC /2
FCOM freg               $D8 $D0+r
FCOMP mem32             $D8 /3
FCOMP mem64             $DC /3
FCOMP freg              $D8 $D8+r
FCOMPP                  $DE $D9

FCOMI freg              $DB $F0+r        \ P6
FCOMIP freg             $DF $F0+r        /

FCOS                    $D9 $FF

FDECSTP                 $D9 $F6

FDISI                   $9B $DB $E1
FNDISI                  $DB $E1              \ 8087 only
FENI                    $9B $DB $E0          /
FNENI                   $DB $E0

FDIV mem32              $D8 /6
FDIV mem64              $DC /6
FDIV freg               $D8 $F0+r
FDIV freg,ST0           $DC $F8+r
FDIVR mem32             $D8 /7
FDIVR mem64             $DC /7
FDIVR freg              $D8 $F8+r
FDIVR freg,ST0          $DC $F0+r
FDIVP freg,ST0          $DE $F8+r
FDIVRP freg,ST0         $DE $F0+r

FFREE freg              $DD C0+r

FIADD mem16             $DE /0
FIADD mem32             $DA /0

FICOM mem16             $DE /2
FICOM mem32             $DA /2
FICOMP mem16            $DE /3
FICOMP mem32            $DA /3

FIDIV mem16             $DE /6
FIDIV mem32             $DA /6
FIDIVR mem16            $DE /7
FIDIVR mem32            $DA /7

FILD mem16              $DF /0
FILD mem32              $DB /0
FILD mem64              $DF /5
FIST mem16              $DF /2
FIST mem32              $DB /2
FISTP mem16             $DF /3
FISTP mem32             $DB /3
FISTP mem64             $DF /7

FIMUL mem16             $DE /1
FIMUL mem32             $DA /1

FINCSTP                 $D9 $F7

FINIT                   $9B $DB $E3
FNINIT                  $DB $E3

FISUB mem16             $DE /4
FISUB mem32             $DA /4
FISUBR mem16            $DE /5
FISUBR mem32            $DA /5

FLD mem32               $D9 /0
FLD mem64               $DD /0
FLD mem80               $DB /5
FLD freg                $D9 $C0+r

FLD1                    $D9 $E8
FLDL2E                  $D9 $EA
FLDL2T                  $D9 $E9
FLDLG2                  $D9 $EC
FLDLN2                  $D9 $ED
FLDP                    $D9 $EB
FLDZ                    $D9 $EE

FLDCW mem16             $D9 /5

FLDENV mem              $D9 /4

FMUL mem32              $D8 /1
FMUL mem64              $DC /1
FMUL freg               $D8 $C8+r
FMUL freg,ST0           $DC $C8+r
FMULP freg,ST0          $DE $C8+r

FNOP                    $D9 D0

FPATAN                  $D9 $F3
FPTAN                   $D9 $F2

FPREM                   $D9 $F8
FPREM1                  $D9 $F5

FRNDINT                 $D9 $FC

FSAVE mem               $9B $DD /6
FNSAVE mem              $DD /6
FRSTOR mem              $DD /4

FSCALE                  $D9 $FD

FSETPM                  $DB $E4

FSIN                    $D9 $FE
FSINCOS                 $D9 $FB

FSQRT                   $D9 $FA

FST mem32               $D9 /2
FST mem64               $DD /2
FST freg                $DD $D0+r
FSTP mem32              $D9 /3
FSTP mem64              $DD /3
FSTP mem80              $DB /0
FSTP freg               $DD $D8+r

FSTCW mem16             $9B $D9 /0
FNSTCW mem16            $D9 /0

FSTENV mem              $9B $D9 /6
FNSTENV mem             $D9 /6

FSTSW mem16             $9B $DD /7
FSTSW AX                $9B $DF $E0
FNSTSW mem16            $DD /7
FNSTSW AX               $DF $E0

FSUB mem32              $D8 /4
FSUB mem64              $DC /4
FSUB freg               $D8 $E0+r
FSUB freg,ST0           $DC $E8+r
FSUBR mem32             $D8 /5
FSUBR mem64             $DC /5
FSUBR freg              $D8 $E8+r
FSUBR freg,ST0          $DC $E0+r
FSUBP freg,ST0          $DE $E8+r
FSUBRP freg,ST0         $DE $E0+r

FTST                    $D9 $E4

FUCOM freg              $DD $E0+r
FUCOMP freg             $DD $E8+r
FUCOMPP                 $DA $E9
FUCOMI freg             $DB $E8+r        \ P6
FUCOMIP freg            $DF $E8+r        /

FXAM                    $D9 $E5

FXCH freg               $D9 $C8+r

FXTRACT                 $D9 $F4

FYL2X                   $D9 $F1
FYL2XP1                 $D9 $F9


The 68000 instruction set:

  NO68 supports all 68000 instructions but does not currently handle PC relative or extension-word
addressing modes. It always uses a 16-bit displacement for branches. Variations on a single mnemonic such
as ADDA or ADDI seen in other assemblers have been eliminated in favor of using one mnemonic for all forms.
32-bit words are referred to as dwords and use the .d tag, just as they do in x86 NOWUT. Likewise, memory
operands use square brackets, and operands receive a size tag rather than the instruction. Destination
operands go on the right, source operands on the left.

  The assembler will accept operands without a size tag, however in some cases the size ambiguity will
mean that more than one opcode would be valid. Shorter opcodes are generally favored.

  example:
        MOVE 7,d0                ; assembled as 8-bit immediate instead of 16 or 32-bit

        MOVE [address],d0.d      ;  \
        MOVE [address].d,d0      ;    these all do the same thing
        MOVE [address].d,d0.d    ;  /

ea (effective address) operands can be any of the following:
  
  imm8/16/32        ; immediate
  [address]         ; memory reference
  [ax]              ; address register indirect
  [ax+xx]           ; address register indirect with displacement
  [ax+]             ; address register indirect with post-increment
  [ax-]             ; address register indirect with pre-decrement
  dx                ; data register
  ax                ; address register

not all modes are valid for all instructions (eg. immediate can't be a destination)

Also note that on the 68K, displacements can be negative: MOVE [a6-40].d,d0

Hand-written assembly code should not modify (or should save and restore) the A6 register as the compiler
uses it to address stack variables.


ABCD dx,dy
ABCD [ax-],[ay-]                      ; byte only

ADD imm,ea               /imm8/16/32
ADD imm3,ea
ADD dy,ea
ADD ea,dy
ADD ea,ay

ADDX dx,dy
ADDX [ax-],[ay-]

AND imm,ea               /imm8/16/32
AND imm,ccr              /imm8
AND imm,sr               /imm16
AND dy,ea
AND ea,dy

ASL ea                                ; word only
ASL imm3,dx
ASL dy,dx  

ASR ea                                ; word only
ASR imm3,dx
ASR dy,dx

Bcc label

BCHG imm,ea
BCHG dn,ea                            ; byte only

BCLR imm,ea              /imm8
BCLR dn,ea                            ; byte only

BRA label

BSET imm,ea              /imm8
BSET dn,ea                            ; byte only

BSR label

BTST imm,ea              /imm8
BTST dn,ea                            ; byte only

CHK ea,dy                             ; word only
CLR ea

CMP imm,ea               /imm8/16/32
CMP [an+],[an+]
CMP ea,dy      
CMP ea,ay      

DBcc dx,label                         ; word only
DBRA dx,label                         ; word only 

DIVS ea,dy                            ; divide 32/16, remainder in high word
DIVU ea,dy

EOR imm,ea               /imm8/16/32
EOR imm,ccr              /imm8
EOR imm,sr               /imm16
EOR dy,ea

EXG dx,dy
EXG ax,ay
EXG ax,dy

EXT dx                                ; byte->word
EXT dx                                ; word->dword

ILLEGAL
JMP ea
JSR ea
LEA ea,ay
LINK ax,imm16

LSL ea                                ; word only
LSL imm3,dx
LSL dy,dx

LSR ea                                ; word only
LSR imm3,dx
LSR dy,dx

MOVE ea,ea
MOVE ea,an
MOVE sr,ea                            ; word only
MOVE ea,sr                            ; word only
MOVE ccr,ea                           ; word only
MOVE ea,ccr                           ; word only
MOVE imm8,reg                         ; sign extended

MOVEM imm,ea             /imm16       ; register bit mask
MOVEM ea,imm             /imm16       ; register bit mask

MOVEP dy,[ax+disp16]     /disp16
MOVEP [ax+disp16],dy     /disp16

MULS ea,dy                            ; 16x16->32
MULU ea,dy

NBCD ea                               ; byte only
NEG ea
NEGX ea
NOP
NOT ea

OR imm,ea                /imm8/16/32
OR imm,ccr               /imm8
OR imm,sr                /imm16
OR dy,ea   
OR ea,dy

PEA ea
RESET 
         
ROL ea
ROL imm3,dx
ROL dy,dx

ROR ea
ROR imm3,dx
ROR dy,dx  

ROXL ea     
ROXL imm3,dx
ROXL dy,dx  

ROXR ea     
ROXR imm3,dx
ROXR dy,dx  

RTE         
RTR         
RTS         

SBCD dx,dy                            ; byte only
SBCD [ax-],[ay-]                      ; byte only

Scc ea                                ; byte only
STOP imm                 /imm16   

SUB imm,ea               /imm8/16/32
SUB imm3,ea
SUB dy,ea  
SUB ea,dy  
SUB ea,ay  

SUBX dx,dy
SUBX [ax-],[ay-]

SWAP dx
TAS ea                                ; byte only
TRAP imm4
TRAPV
TST ea
UNLK ax

XDOS imm4            ; this pseudo-instruction generates "F-line" opcodes
                     ; for making system calls on the Sharp X68000


The SH2 instruction set:

  As with the 68K, the SH2 instruction set has undergone some cosmetic changes to bring it in line with
NOWUT norms. A few mnemonics were tweaked, memory operands use square brackets and a data size tag, and
long words (32-bit) are referred to as dwords. Destination operands go on the right, source operands on the
left.

  Since the SH2 doesn't allow dword immediate data or memory access using absolute addresses, the assembler
accepts a "fake" form of the MOV instruction and transparently inserts an extra instruction as needed. It
also adds immediate data to a buffer that is periodically flushed to the output file, with a BRA opcode
added to jump over the data. There are currently two issues with this:

  1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of
     assembly code may cause failure (using 32 definitely will)
  2) The buffer is flushed before an INCBIN but any label immediately prior to the INCBIN will no longer
     point to the data as expected. Therefore, INCBIN should not be used in the same section as code.

  The SH2 doesn't do divides in a single instruction. When division is needed, the compiler generates a
call to a subroutine. The program source should include these division subroutines (or some variation
thereof):

sh2divideu:                ' unsigned 32/16 r1/r2
        asm
        shll16 r2
        div0u
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1

        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        rotcl r1
        extuw r1,r1                
        endasm
        return

sh2divides:                ' signed 32/16 r1/r2
        shll16 r2
        mov r1,r4
        rotcl r4
        mov 0,r4
        subc r4,r1
        div0s r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1

        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        div1 r2,r1
        extsw r1,r1
        rotcl r1
        addc r4,r1        
        extsw r1,r1
        endasm
        return

Note: delayed branch instructions cause the instruction following the branch to be executed before the
      branch takes place.

Hand-written assembly code should not modify (or should save and restore) R11, R13, R14 as the compiler
uses them to address stack variables.


ADD     Rm,Rn
ADD     imm,Rn          (immediate is 8-bit sign-extended)

ADDC    Rm,Rn
ADDV    Rm,Rn

AND     Rm,Rn
AND     imm,R0          (immediate is 8-bit zero-extended)
AND     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

BF      label/imm                       (8-bit displacement)
BFS     label/imm       (delayed branch)(8-bit displacement)
BRA     label/imm       (delayed branch)(12-bit displacement)
BRAF    Rm              (delayed branch)
BSR     label/imm       (delayed branch)(12-bit displacement)
BSRF    Rm              (delayed branch)
BT      label/imm                       (8-bit displacement)
BTS     label/imm       (delayed branch)(8-bit displacement)

CLRMAC
CLRT

CMPEQ   imm,R0          (immediate is 8-bit sign-extended)
CMPEQ   Rm,Rn             
CMPGE   Rm,Rn             rn>=rm, signed
CMPGT   Rm,Rn             rn>rm, signed
CMPHI   Rm,Rn             rn>rm, unsigned
CMPHS   Rm,Rn             rn>=rm, unsigned
CMPPL   Rn                rn>0
CMPPZ   Rn                rn>=0
CMPSTR  Rm,Rn           

DIV0S   Rm,Rn
DIV0U
DIV1    Rm,Rn

DMULS   Rm,Rn             32x32->64 (MAC)
DMULU   Rm,Rn

DT      Rn

EXTSB   Rm,Rn
EXTSW   Rm,Rn
EXTUB   Rm,Rn
EXTUW   Rm,Rn

JMP     Rm              (delayed branch)
JSR     Rm              (delayed branch)

LDC     Rm,SR    
LDC     Rm,GBR   
LDC     Rm,VBR   
LDC     [Rm+],SR 
LDC     [Rm+],GBR
LDC     [Rm+],VBR

LDS     Rm,MACH   
LDS     Rm,MACL   
LDS     Rm,PR     
LDS     [Rm+],MACH
LDS     [Rm+],MACL
LDS     [Rm+],PR  

MAC     [Rm+],[Rn+].d
MAC     [Rm+],[Rn+].w

MOV     imm/address,Rn               ; this pseudo-instruction uses [PC+label] address mode to load a dword
                                     ; from an automatically-created data dump

MOV     [symbol/address].b,Rn        ; these pseudo-instructions cause another instruction to be inserted
MOV     [symbol/address].w,Rn        ; which loads the address, then memory is accessed using register-
MOV     [symbol/address].d,Rn        ; indirect mode.
                                     ; stack variables are an exception, the extra instruction isn't needed

MOV     Rm,Rn
MOV     imm,Rn          (immediate is 8-bit sign-extended)
MOV     Rm,[Rn].b
MOV     Rm,[Rn].w
MOV     Rm,[Rn].d
MOV     [Rm].b,Rn
MOV     [Rm].w,Rn
MOV     [Rm].d,Rn
MOV     [Rm+].b,Rn
MOV     [Rm+].w,Rn
MOV     [Rm+].d,Rn
MOV     Rm,[Rn-].b
MOV     Rm,[Rn-].w
MOV     Rm,[Rn-].d
MOV     Rm,[R0+Rn].b
MOV     Rm,[R0+Rn].w
MOV     Rm,[R0+Rn].d
MOV     [R0+Rm].b,Rn
MOV     [R0+Rm].w,Rn
MOV     [R0+Rm].d,Rn
MOV     R0,[GBR+disp].b    (8-bit displacement, zero-extended)
MOV     R0,[GBR+disp].w    (8-bit displacement, zero-extended)
MOV     R0,[GBR+disp].d    (8-bit displacement, zero-extended)
MOV     [GBR+disp].b,R0    (8-bit displacement, zero-extended)
MOV     [GBR+disp].w,R0    (8-bit displacement, zero-extended)
MOV     [GBR+disp].d,R0    (8-bit displacement, zero-extended)
MOV     R0,[Rn+disp].b     (4-bit displacement, zero-extended)
MOV     R0,[Rn+disp].w     (4-bit displacement, zero-extended, doubled)
MOV     Rm,[Rn+disp].d     (4-bit displacement, zero-extended, quadrupled)
MOV     [Rn+disp].b,R0     (4-bit displacement, zero-extended)
MOV     [Rn+disp].w,R0     (4-bit displacement, zero-extended, doubled)
MOV     [Rn+disp].d,R0     (4-bit displacement, zero-extended, quadrupled)
MOV     [PC+label],Rn      (8-bit displacement, zero-extended)

MOVA    [PC+label],R0
MOVT    Rn

MUL     Rm,Rn
MULS    Rm,Rn
MULU    Rm,Rn

NEG     Rm,Rn
NEGC    Rm,Rn

NOP

NOT     Rm,Rn

OR      Rm,Rn
OR      imm,R0          (immediate is 8-bit zero-extended)
OR      imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

ROTL    Rn
ROTR    Rn
ROTCL   Rn
ROTCR   Rn

RTE                     (delayed branch)
RTS                     (delayed branch)
SETT

SHAL    Rn
SHAR    Rn
SHLL    Rn
SHLR    Rn
SHLL2   Rn
SHLR2   Rn
SHLL8   Rn
SHLR8   Rn
SHLL16  Rn
SHLR16  Rn
SLEEP     
STC     SR,Rn
STC     GBR,Rn
STC     VBR,Rn
STC     SR,[Rn-]
STC     GBR,[Rn-]
STC     VBR,[Rn-]
STS     MACH,Rn
STS     MACL,Rn
STS     PR,Rn  
STS     MACH,[Rn-]
STS     MACL,[Rn-]
STS     PR,[Rn-]  

SUB     Rm,Rn
SUBC    Rm,Rn
SUBV    Rm,Rn

SWAPB   Rm,Rn
SWAPW   Rm,Rn

TAS     [Rn].b
TRAPA   imm             (immediate is 8-bit)

TST     Rm,Rn
TST     imm,R0          (immediate is 8-bit zero-extended)
TST     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

XOR     Rm,Rn
XOR     imm,R0          (immediate is 8-bit zero-extended)
XOR     imm,[R0+GBR].b  (immediate is 8-bit zero-extended)

XTRCT   Rm,Rn


----Known bugs and limitations----

  NOWUT 68K uses the processor's multiply and divide instructions despite the fact that they are not fully
32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs.
unsigned multiplcands will also affect the result, whereas they would not with 32bit*32bit=32bit.

  NOWUT in 8086 mode uses the processor's multiply and divide instructions despite the fact that they are
not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results.
Signed vs. unsigned multiplcands will also affect the result, whereas they would not with 32bit*32bit=32bit.

  NOWUT SH2 only allows 12 dword parameters on the stack and 32 dword local variables at a time. (bytes and
words are not allowed.)

  Local variables in different functions that have the same name must have the same default size.

  The address of a stack variable is not valid. Likewise, it can't be indexed.

  When a calculation is used as an operand for a NOWUT instruction that involves comparison (ie. IFGREATER,
IFLESS, WHILEGREATER, WHILELESS) the value is assumed to be unsigned, regardless of any components of that
calculation. If signed comparison is desired, the second operand should be marked as signed.

  Right-shifts do not maintain the sign bit unless the shift count is marked as signed. Eg.:

        signedvar=_ shr 3.sb                ; maintains the sign bit

  Signed right-shifts in 8086 mode only operate on the bottom 16 bits.

  The maximum number of symbols supported by the compiler (and the corresponding memory allocation) is
specified in the source code of the compiler. Modify this line and recompile if needed (same for fixups):

        const maxsymbols,2048
        const maxfixups,8192

  Maximum line length is 255 characters, maximum number of labels+statements+operands on a line is 20

  When a RETURNEX 0 is compiled by NO68, it results in an ADD 0 instruction (quick form) which actually adds
eight. Then bad things happen. (Use RETURN as a workaround)

  The last line in a source file is ignored if it has no carriage return after it.

  BITS16/BITS32 shouldn't be used to switch modes within a BEGINFUNC...ENDFUNC structure as variables on the
stack would no longer be addressed properly.


----Changes from the last version----

Release 0.11:

  When I attempted to link a program with two OBJ files (using GoLink) I received a pile of errors about
duplicate symbol definitions. Some were irrelevant symbol names used by local variables and constants. These
have been given a null section/class in the OBJ so that they don't cause trouble. But I also received errors
about program labels that had been automatically generated by the compiler, even though they were not marked
as exports. I was not expecting this. As a workaround, the compiler now uses the name of the source file in
its automatically generated labels so that they won't be the same in different OBJs. (No more 0NOWUT0000)

  The DEF statement was added, for manually setting a default type. This only required a change to a data
table, and no changes to the code ;)

  Introduced the LINKBIN utility, which in addition to supporting two input files, also correctly generates
X68000 executables that have relocations that are separated by more than 64K.

  Fixed some bugs in NO68 that caused subtraction and division to (sometimes) produce a wrong result.

  Added some code to NOSH2 to optimize shifts (when the value is known at compile time) instead of always
using a loop.

  Corrected a few typos and made a few additions to this document.

  Included a DOS example program.


Release 0.11b (2019/1/19):

  An endianness issue with indexed symbols in initialized data was fixed in NO68 and NOSH2.

  LINKBIN can now build an Amiga program from two OBJ files (it was all screwed up before).


Release 0.12 (2019/2/24):

  x87 FPU instructions were added to NOWUT x86.

  NOWUT x86 parser can accept numbers with a decimal point and convert them to 32-bit floats.

  Fixed the NOWUT x86 assembler bug pertaining to AX/EAX ambiguity.

  The offending filename is displayed when a file specified by INCBIN fails to open.
                          
  Fixed shifts when value to be shifted was on the stack (NOSH2).

  Fixed large negative immediates (NOSH2).

  CALLEX function address can be a calculation (NOWUT x86 only).


Release 0.13 (2019/3/23):

  Bug fix in NO68 for exclusive-or, and for shifts of values on the stack

  Changed the sh2divides routine, since the old one didn't work and could also corrupt a needed register
        
  Added experimental 8086 support, including 16-bit style MODRM addressing and BITS16/BITS32 commands

  Added genesis and doscom platform options to LINKBIN, as well as Genesis and 8086 example programs

  Added a "maxarg" constant to NOWUT x86 to increase the number of allowed arguments on one line

  Added single-operand versions of IMUL to the x86 assembler

  Did some reorganization of x86 NOWUT source and tweaking of the generated code (now slightly more
compact). In the future I will probably roll 68000 support back into NOWUT x86 to take advantage of some
potential optimizations.