NOWUT v0.34 - a programming language and compiler At this stage of development, both the language and compiler are incomplete. Errors may not be caught, bugs may bite (see below for list of known bugs), and code will be suboptimal. However, NOWUT can successfully compile itself, as well as several demo programs. The compiler is licensed under the GPL (see COPYING). (Example programs included in the archive should be considered public domain unless stated otherwise.) http://www.hyakushiki.net/nowut.htm http://www.hyakushiki.net/anachro.htm damage_x@hyakushiki.net ----Contents---- 1) About the NOWUT compiler a. compilation b. platform-specific code c. linking 2) About the NOWUT language 3) Example function 4) Symbols, labels, variables, constants 5) Operands 6) Calculation/assignment 7) NOWUT language statements/instructions 8) Functions/procedures 9) Internal assembler a. x86 instruction set (and 8086 mode peculiarities) b. 68K instruction set c. SH2 instruction set d. MIPS instruction set e. ARM instruction set f. Z280 instruction set 10) Known bugs and limitations 11) Changes from the last version ----About the NOWUT compiler---- The NOWUT compiler is a program written in NOWUT, which can now run on Win32, DOS, X68000, Amiga, EmuTOS, and i386 Linux. It produces a COFF object file as output. By default, the OBJ files contain exactly one each BSS, data, and code sections, plus a ".drectve" section which can be used to pass dynamic linking info to the linker. The SECTION statement can now be used in a source file to specify additional sections with custom names and attributes. ----Compilation---- Command line: NOWUT platform [-one] [-lst] file The input file is assumed to have the extension .NO, therefore FILE.NO will be read and FILE.OBJ will be written. Since release 0.30, the platforms available in the compiler depend on which CPU modules were linked when the compiler itself was built. It is possible to omit modules to produce a smaller build of NOWUT, mainly for the sake of staying within real-mode DOS memory limits. Starting NOWUT with no command line arguments will display a list of which CPU modules are present. The CPUX86 module supports these platforms: 386 - generates x86 32-bit code, outputs a standard COFF object file which can be fed to GoTools GoLink to create a PE executable, or to LINKBIN to create an ELF executable. 8086tiny - generates 8086-compatible code which is limited to a 16-bit address space (CS/DS/ES/SS all assumed to be set to the same value), outputs a nonstandard COFF file which can be fed to LINKBIN to create a .COM file or MZ executable. 32-bit operations are done using multiple 16-bit operations, this causes some code bloat. 8086 - similar to 8086tiny except that (non-stack) memory references are preceeded by reloading the DS register (causes even more code bloat). This means that initialized data and BSS can now be larger than 64KB. Code size is limited to 65472 bytes. Stack size is determined by a constant within LINKBIN (default is 1KB). Pointers still use a 32-bit linear address space! However, anything beyond 2MB is not valid. The mapping between logical and physical addresses is determined by a segment lookup table located in the first 64 bytes of the code segment (32 words = 32 64KB segments = 2MB). This table is set up by the program itself or by the PIODOS platform code. When it is setup by PIODOS, the first 640KB is relative to the beginning of the program (minus 64 bytes) and 640K-1M is mapped to absolute addresses (ie. $A0000 is VGA buffer). The table entries for 1MB-2MB are unused but left open to be used for addressing other potentially useful segments (eg. BIOS data area, DOS environment, etc.). Be sure to keep your words/dwords aligned so they don't straddle a 64KB boundary. Output can only be used to create an MZ executable. The internal x86 assembler has been expanded to include all instructions of the Pentium MMX. However, a few mnemonics have changed and a nonstandard syntax is used. It's possible to mix 16-bit and 32-bit code within one program by using BITSxx to switch modes, but this hasn't been tested much. The CPU68K module supports these platforms: 68000 - generates plain 68000 code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create Amiga, Genesis, .PRG, or X68000 executables. The internal 68K assembler now handles all 68000 and 68010 instructions and all of their addressing modes. It also uses a nonstandard syntax. The CPUSH module supports these platforms: sh2 - generates SuperH code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create Sega 32X or Saturn binaries. sh4 - essentially the same as SH2, but with little-endian byte order. Can be fed into LINKBIN to generate a Dreamcast binary. (Hint: use the SCRAMBLE utility, then add an IP.BIN, and use DIR2BOOT to produce a Dreamcast disc image.) The internal SuperH assembler handles all SH2 instructions except one, and uses a nonstandard syntax. The CPUMIPS module supports these platforms: mips - generates VR4300-compatible (MIPS III) code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create Nintendo 64 binaries. (Hint: use rn64crc to add the correct checksum.) mipsle - uses little-endian byte order (for PIC32) The internal MIPS assembler uses a number of consolidated mnemonics, and a nonstandard syntax. It supports 32- and 64-bit instructions of the VR4300, but not floating-point. Additional instructions have been added from the MIPS32 Release II specification. The CPUARM module supports these platforms: armle - generates 32-bit ARM code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create a GBA ROM image or RISC OS executable. armbe - uses big-endian byte order The internal ARM assembler supports most of ARM4, using a slightly modified set of mnemonics and a nonstandard syntax. The CPUZ280 module supports these platforms: z280 - supports assembly only (does not compile NOWUT code) and outputs a nonstandard COFF file. This can be fed into LINKBIN to create a flat binary using the DOSCOM option. (See the assembly section for details.) NOWUT currently supports these command line switches: -one - makes the compiler do only one pass instead of two. Doing only one pass means that jump/branch instructions in the generated code will be long versions. Does not work if the source file has non-default number of sections. -lst - makes the compiler generate a .LST file. -opt - toggles compiler/assembler optimizations. Normally they are on by default and this would then turn them off. (useful for debugging purposes) As of version 0.27 the language is beginning to have floating-point support, but only for the '386' platform. The x86 assembler supports x87 FPU instructions and the parser can convert decimal values into their single-precision FP representation (with some limits on range and precision). Note that FP code needs to have a stack frame setup, so it can only be used after a BEGINFUNC statement. (And you might need to stick an FINIT somewhere.) ----Platform-specific code---- While NOWUT is linked with CPU modules to allow it to target different platforms, it is also linked with a platform-I/O module to allow it to run on different platforms. Other programs can also use a platform-I/O module. To do so, the program must call INITPLATFORM upon starting. This function will return the address of a (null-terminated) command line parameters string (if any). These routines are currently available in each module: initplatform endprogram printel (explicit length) printnt (null-terminated) printhex16 printhex8 printhexr printhex fileopen filecreate fileclose fileread filewrite fileseek (currently limited to 32-bit file offset) fileskread (seek and read) fileskwrite (seek and write) filegetsize pioquerytimer (returns a count in milliseconds. accuracy depends on platform.) These are the platform-I/O modules that are included: PIOAMI - Amiga version. Records the initial stack pointer and opens dos.library during init. Closes it upon exiting as well as closing any open files to prevent memory leaks. Converts backslashes in file names to forward slashes. PIODOS - 8086 (DOSEXE) version. Sets up segment table during init. (Does not work with 8086tiny or .COM) PIOGEM - EmuTOS/GEMDOS version. Adjusts memory allocation during init. PIOLNX - 386/Linux version. Joins command line arguments into one big string during init. Converts backslashes in file names to forward slashes. PIOWIN - 386/Win32 version. Gets a console handle during init. Skips over the program name part of the command line. Can also be used for 32-bit DOS programs with the WDOSX Win32 wrapper. PIOX68 - X68000 version. Code that aims to be cross-platform should always take endianness and memory alignment into account. On the 68000, words and dwords must be aligned to 2-byte boundaries. On SH2, MIPS they must be aligned to 2-byte or 4-byte boundaries. NOWUT includes ALIGNW and ALIGND statements for this purpose. The statements LOADBIG and LOADLITTLE are provided to read words/dwords with a particular endianness on any CPU. (Note that the 386 LOADBIG code uses the BSWAP instruction which is only available on 486+) ----Linking---- OBJ files produced in 386 mode can be linked using GoTools GoLink to make a Win32 executable. Using the /base switch (eg. /base 00400000) causes GoLink to generate relocation data. These executables can then also be used with Win32s (Windows 3.x) and may be used in a DOS environment with the WDOSX stub (a DPMI host which implements a subset of Win32 functions). GoLink requires a list of applicable DLLs as command line arguments. I use this command to compile my Win32 stuff: golink %1.obj kernel32.dll user32.dll gdi32.dll winmm.dll /console As of NOWUT 0.26, it is possible to specify library names in the source code with LINKLIBFILE, which removes the need for specifying them on the command line. LINKBIN is provided to transform one or more OBJ files, for targets other than Win32, into useful executable formats. LINKBIN has been tested with up to seven input files at a time. Modify the MAXFILES parameter in the source to enable more than this. (In theory, it should work...) LINKBIN version 0.30 and recent versions of GoLink include functionality that allow sections in COFF object files to be arranged in the executable according to an alphanumeric index. This is necessary to build the new Modular-NOWUT: - normal data sections (named ".data" in each OBJ) are concatenated as usual - sections named ".data$4" are placed -after- the normal ones - a section named ".data$x" is placed -after- those (because 'x' has a higher ASCII value than '4' does) example command line for linking Win32 NOWUT: golink nowut.obj cpux86.obj cpu68k.obj cpush.obj cpumips.obj cpuarm.obj piowin.obj /console example command line for linking real-mode DOS NOWUT with only x86 and 68k support: linkbin dosexe nowut cpux86 cpu68k piodos LINKBIN version 0.33 has command line options to override the default section base addresses. These are -oc, -od, and -ob for code, data, and BSS, respectively. This is useful for generating a binary that needs to run at a certain address. These formats are supported by LINKBIN: genesis, 32x, 32x/68k - Sega Genesis/MD and 32x ROM images - SH2 or 68K side amiga - Amiga 68K "hunk" format doscom, dosexe - 8086 PC .COM and .EXE programs dcast - SH4 binary based at $8C010000 elf386 - an i386 Linux executable (output files need the filesystem executable flag set with chmod +x) gba - a Gameboy Advance ROM image n64 - a Nintendo 64 ROM image, automatically padded to at least 1MB for boot code requirements (Checksum is not calculated! Use rn64crc to do this.) optrom - 8086tiny ISA adaptor ROM (hasn't been tested yet) pic32 - little-endian MIPS binary with flash ROM at $BD000000 and RAM at $A0000000 prg - runs under EmuTOS riscos - runs under RISC OS saturn, satsnd - Sega Saturn SH2 binary (load and execute at $06022000) and 68K sound CPU. x68 - Sharp X68000 executable (Human68K) The Sega 32X hardware is inactive when the system powers on. The 68K has control, and must enable the 32X. When generating a 32X ROM image with SH2 code, a stub file (68KPART8.32X) occupies the first 4KB of the final image. The stub performs initialization and hands control over to the SH2. It also polls the controller ports and passes the data to the SH2 through shared registers. The source code for the stub is 68KPART8.NO. The 32X master SH2 begins execution at the beginning of its code section. The slave SH2 starts in an idle loop contained within the stub file. A dword write to address $26001020 will cause it to jump to that location (the address in the dword that is written) so the second CPU can be utilized. N64 binaries also begin with a header and boilerplate boot code. This is linked from N64STUB.BIN ----About the NOWUT language---- The goal was to combine certain aspects of assembly and high-level languages in a different way than what has been done before. NOWUT borrows these ideas from assembly: simplistic syntax consisting of instructions followed by operands manual layout of initialized data, uninitialized data, and data structures no enforcement of data types It borrows these ideas from HLLs: avoids (mostly) being CPU specific no micro-management of CPU register usage calculations can be specified in a form similar to mathematical notation (assignment) NOWUT also handles inline assembly code with a nonstandard syntax. I should mention that the name NOWUT is an acronym which expands to "No One Will Use This." I figure that if anyone else shared my taste in programming languages, I wouldn't have been forced to create my own! I suspect that NOWUT may not become extremely popular. ----Example function in NOWUT---- ; this line is a comment. comments are preceded by ' or ; characters examplefunc: ; examplefunc is a label (address) beginfunc param1.d,param2.d ; this function receives two parameters ; they will be referred to as param1 and param2 ; and are dwords (32-bit words) localvar xx.d ; a local dword variable will be added to the stack xx=param1+1 ; here is an example of assignment countdown xx ; this begins a simple type of loop param2=_ shl 1 ; an underscore refers back to the left side of ; the equals sign. ; param2 becomes (itself) shifted left one bit nextcount ; this is the end of the "countdown" loop endfunc param2 ; the function's return value will be equal to ; the value of param2 returnex 8 ; this causes program flow to return to the caller. ; it also removes 8 bytes from the stack (important) ; which had been occupied by param1 and param2 ----Symbols, labels, variables, constants---- Labels, variables, and constants are considered symbols. Symbol names are currently limited to 64 characters. (Currently, long symbol names may cause internal buffers to overflow and generate an error message.) Symbol names are not case-sensitive, though a particular case can be written to the output file for the benefit of case-sensitive linkers (the EXTERN statement is used for this purpose). Symbol names must contain at least one letter, and may contain other characters EXCEPT these: ' ; : ! $ & ( ) [ ] , . ? + - * / = " > For future compatibility, it's probably best that no ASCII characters below $40 be used. Symbol names that are the same as CPU register names should not be used. Normally, labels are defined with a colon. An exclamation point is used instead of a colon to define an exported symbol (can be referred to in other modules). For instance, when using the GoTools GoLink linker, the program's entry point should be defined like so: start! (The program entry point for other platforms is the beginning of the code section.) Every label has an address associated with it, although the actual address is not determined until linking or upon the executable code being loaded by the operating system. The label "examplefunc" can be used as the target of a jump/branch/call (in assembly), a goto/gosub/callex (in NOWUT), it can have its address used in a calculation (eg. xx=examplefunc+40), or it can be used to refer to memory contents (eg. xx=examplefunc.d). Labels and global variables are actually the same thing, except that labels used to refer to either initialized data or uninitialized data will generally be defined with an appropriate default type. However, this is not required, and when a symbol is referenced the default type can be overridden. exampleaddr.a: ; exampleaddr will be handled as an address exampleaddr: ; - - - - address (same as .a) exampleaddr.b: ; - - - - byte value exampleaddr.sb: ; - - - - signed byte value exampleaddr.w: ; - - - - word value (16-bit) exampleaddr.sw: ; - - - - signed word value exampleaddr.d: ; - - - - dword value (32-bit) exampleaddr.sd: ; - - - - signed dword value exampleaddr.fd: ; - - - - floating-point value (32-bit) The default type is used when a label is referenced without any type tag (ie. no dot anything). xx=exampleaddr ; operation depends on the default type xx=exampleaddr.a ; always loads the address xx=exampleaddr.d ; always loads a dword The default type is determined when a label is FIRST referrenced (not necessarily when it is defined!). Because the compiler only does one pass of the source code, it is therefore recommended to place any initialized or uninitialized data BEFORE the program code. Recommended: sectionbss buffer.d: resd 16 sectiondata dwordval.d: dd $1234ABCD sectioncode buffer(4)=dwordval Problematic: sectioncode buffer.d(4)=dwordval ; dwordval will be interpreted as an address ; because its definition has not yet been read by the compiler sectiondata dwordval.d: dd $1234ABCD sectionbss buffer.d: resd 16 Alternatively, the DEF statement can be used at the beginning of a source file (regardless of section) to manually initialize a symbol's default type. This is especially useful when referencing symbols in another module. Variables that are defined by a beginfunc or localvar statement are located on the stack and have some differences. The first is that the names may be used independently in multiple functions, and they have no relevance outside of the function in which they are defined. The second difference is that the address of a stack variable is not valid. localvar xx.d,yy.d xx=yy.a ; this does NOT work** This functionality might be added to a later version of the compiler. But currently, the compiler passes variables to the internal assembler as-is, and the assembler simply alters the addressing mode to refer to the stack, it does not insert additional instructions (eg. an LEA) to determine the address. **In fact, referencing the address of a stack variable is now allowed for 386 and 68000 Because the SH2 CPU only has register-indirect addressing with a 4-bit positive displacement, the SH2 version of the NOWUT compiler sets aside two registers for local variables. It limits the number of local variables to 32, and they must be dwords (signed or unsigned). Function parameters are limited to 12 and also must be dwords. Signed and unsigned types are handled the same during many operations, but there are some where it is important to differentiate: loading smaller data sizes: xx=array.b(5) ; the byte will be zero-extended xx=array.sb(5) ; the byte will be sign-extended xx=array.w(10) ; the word will be zero-extended xx=array.sw(10) ; the word will be sign-extended comparisons: ifgreater xx.d,0,branchtarget ; will branch unless xx is 0 ifgreater xx.sd,0,branchtarget ; will branch unless xx is 0 or negative shifting right: xx=yy.d shr 1 ; top bit will become 0 xx=yy.sd shr 1 ; top bit will become 0 (doh!) xx=yy shr 1.sb ; top bit will remain the same In previous versions of NOWUT, tagging the shift count as signed was the only way to do a signed shift-right because the evaluator uses the second operand to make the decision on whether to do a signed operation. A 'sar' operator has been added as an alternate way to explicitly choose the signed operation. xx=yy sar 1 ; top bit will remain the same Constants are defined with the CONST statement: const secretvalue,3579545 References to the symbol (eg. secretvalue) will be replaced with the value. Only numeric values are allowed (although this includes ASCII values). Numeric values containing a decimal point which is followed by additional digits (in contrast with a data size/type tag) are assumed to represent floating-point data. Without the decimal point, it is assumed to be integer. Code may be generated to convert between the two depending on the operation being performed, but this behavior can be overridden using tags. x.d=1 ; source is integer, destination is integer, no conversion needed. ; x becomes $00000001 x.d=1.0 ; 1.0 begins as $3F800000 (in FP32 form), but it is converted back to 1 before ; being written to a non-FP variable. x becomes $00000001 x.d=1.0.d ; 1.0 begins as $3F800000 (in FP32 form), and because of the .d tag it is NOT ; converted before being written. x becomes $3F800000 x.d=1.fd ; the value $00000001 is treated as FP32 data, it is converted to integer before ; being written to a non-FP variable. x becomes 0 x.fd=1.0 ; source is FP, destination is FP, no conversion needed. ; x becomes $3F800000 x.fd=1 ; 1 is converted to FP32 form before being written to an FP variable. ; x becomes $3F800000 ----Operands---- Basically, everything that is accepted as part of an assignment/calculation or as an argument to an instruction is considered an operand. These include numeric values, strings, symbols, and combinations of such. format in NOWUT format in assembly description 1234 1234 decimal number \ 12.34 12.34 floating-point number \ $1234 $1234 hex number \ 0x1234 0x1234 hex number \ immediate "a".b "a".b ASCII byte / "ab".w "ab".w ASCII word / "abcd".d "abcd".d ASCII dword / constname constname a symbol defined with CONST / 99000.h high word of a value (8086 assembly only) varname address or memory reference varname address varname.a varname.a address varname.h high word of address (8086 assembly only) varname.b memory reference (byte variable) varname.sb memory reference (signed byte variable) varname.w memory reference (word variable) varname.sw memory reference (signed word variable) varname.d memory reference (dword variable) varname.sd memory reference (signed dword variable) varname.fd memory reference (FP32 variable) [varname].b memory reference (byte variable) [varname].w memory reference (word variable) [varname].d memory reference (dword variable) [varname].h memory reference (high word, 8086 assembly only) [varname].q memory reference (x87, MIPS 64-bit) reg a CPU register reg.b a byte CPU register (68K only) reg.w a word CPU register (68K only) reg.d a dword CPU register (68K only) [reg] memory reference (address contained in reg) [reg].b memory reference (byte at address in reg) [reg].w memory reference (word at address in reg) [reg].d memory reference (dword at address in reg) [reg].q memory reference (64-bit, x87 floating-point only) [reg+] memory reference with post-increment (68K) [reg+].b memory reference with post-increment (68K and SH2) [reg+].w memory reference with post-increment (68K and SH2) [reg+].d memory reference with post-increment (68K and SH2) [reg-] memory reference with pre-decrement (68K) [reg-].b memory reference with pre-decrement (68K and SH2) [reg-].w memory reference with pre-decrement (68K and SH2) [reg-].d memory reference with pre-decrement (68K and SH2) [reg+xx].x other indirect addressing modes (CPU dependent) [varname].b indirect memory reference (byte variable) [varname].sb indirect memory reference (signed byte variable) [varname].w indirect memory reference (word variable) [varname].sw indirect memory reference (signed word variable) [varname].d indirect memory reference (dword variable) [varname].sd indirect memory reference (signed dword variable) [varname].fd indirect memory reference (FP32 variable) varname(xx) indexed address or memory reference varname.a(xx) indexed address varname.b(xx) indexed memory reference (byte) varname.sb(xx) indexed memory reference (signed byte) varname.w(xx) indexed memory reference (word) varname.sw(xx) indexed memory reference (signed word) varname.d(xx) indexed memory reference (dword) varname.sd(xx) indexed memory reference (signed dword) varname.fd(xx) indexed memory reference (FP32) "abcxyz123" string "abcxyz123".a address of a string _ "itself" (left side of an assignment) Note that a blank operand in a series of operands separated by commas will be interpreted as 0. callex result.d,somefunction,param1,,,param4 ; zeros are passed as 2nd and 3rd parameters Also: indexed symbols with a number as index can now be handled the same as plain symbols (ie. the addition is done during linking instead of at run time). base.d(4)=0 ; both statements generate the same number of instructions base.d=0 ; But don't use indexed symbols for jumps/calls/branches as this is not guaranteed to work: goto place(8) ; outcome is uncertain There are some differences between operand formats in NOWUT vs. the internal assembler. Some CPU instructions make indirect memory references while using the same encoding as instructions which access memory without the indirection (and use the same syntax in their standard assembly languages), hence these ambiguities persist here. In assembly mode, memory references always use square brackets around a register name, address, or symbol. In NOWUT language mode, square brackets are only used for indirect memory references. Also in NOWUT language mode, calculations are currently not allowed inside of square brackets. [address+48].d=65 ; this does NOT work tempvar=address+48 > [tempvar].d=65 ; use this instead ea.d(address+48)=65 ; or this The 'ea' special symbol works like other indexed symbols but has a base address of zero. It is more flexible than an indirect memory reference but it counts as a 'calculation' operand type, so it can't be used in every place that a memory reference can. Plain strings are only used for initialized data, the EXTERN statement, and the INCBIN statement. However the address of a string can be used as with any other address, with the string data being dumped at the end of the section. messageptr="stuff happens".a ; pointer to a null-terminated string messageptr2="stuff happens"r.a ; the letter "r" adds a CR/LF messageptr3="line 1"r"line 2"rr.a ; muliple CR/LFs are possible, even inside a string widestrptr="hello world"w.a ; the letter "w" at the end will insert a 0 byte after each ; character for Win32 APIs that use UCS-2 / UTF-16 encoding Strings that will be appended to the end of the section are buffered during compilation. The number of strings that can be buffered is limited by the MAXSTRINGS constant in NOWUT.NO. There is currently no other support for strings in the NOWUT language. register names: x86 - eax ecx edx ebx esp ebp esi edi ax cx dx bx sp bp si di al cl dl bl ah ch dh bh es cs ss ds fs gs cr0 cr2 cr3 cr4 dr0 dr1 dr2 dr3 dr6 dr7 st0 st1 st2 st3 st4 st5 st6 st7 mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 Since x86 registers have an inherent size, no size tag is needed. 68K - d0 d1 d2 d3 d4 d5 d6 d7 a0 a1 a2 a3 a4 a5 a6 a7 ccr sr pc Data registers may be byte, word, or dword size. Address registers may be word or dword. SH2 - r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 macl mach sr gbr vbr pr pc SH2 registers are all 32 bits, no size tag is needed. MIPS - r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 MIPS registers are either 32 or 64 bits depending on the instruction. No size tag is needed. ARM - r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 spsr cpsr ARM registers are all 32 bits, no size tag is needed. Z280 - b c d e h l a ixh ixl iyh iyl bc de hl sp ix iy i r pc af usp All Z280 registers have an inherent size, no size tag is needed. ----Calculation/assignment---- A calculation is a series of operands and operators that cause a value to be computed at runtime. This includes steps taken to calculate the address of a memory reference. Assignment causes a value to be stored in a memory location. Assignment uses the equals sign. somevar=foo+1*100 The value of foo is loaded, 1 is added, it's multiplied by 100, the result is stored in somevar. Needless to say, somevar should be a memory reference. If its default type is .a (an address) then compilation will fail with an error message. Operations are performed in order from left to right, except where parenthesis are used to specify that a different order should be used. When the compiler encounters parenthesis it pushes the current value on the stack, performs the calculation inside the parenthesis, then pops the stack to continue. Because the compiler doesn't attempt to optimize such things, manually re-ordering operations to avoid parenthesis results in sleeker code. somevar=foo+(bar*100) ; OK, but suboptimal somevar=bar*100+foo ; better Currently, parenthesis may be nested up to 20 levels deep. supported operators: + addition - subtraction * multiplication / division and logical AND or logical OR shl logical shift-left shr logical or arithmetic shift-right sar arithmetic shift-right xor logical exclusive OR When mixing floating-point and non-floating-point operands in a calculation (on an FP-capable platform) non-FP values are generally converted to FP and the operations take place in the FPU. However, this is not true for any of the logical operations. These are all performed using normal ALU instructions. In the case of logical ops, FP data is used in its raw form without any conversion from FP to integer. The parser handles operands and operators as a unit, and as a side effect of this, extra spaces should not be inserted between them. somevar=foo +10 ; bad somevar=foo+ 10 ; acceptable somevar=foo shl 2 ; good somevar=foo shl 2 ; bad The underscore operand refers to the target of an assignment. It is particularly handy when the target involves a complex address calculation that would otherwise need to be repeated. array(index shl 4+12)=array(index shl 4+12)+20 ; OK, but ugly and slow array(index shl 4+12)=_+20 ; better Indices on symbols are always calculated in terms of bytes. If you want to access a numbered word in an array of words, be sure to use a shift in your index. array.d(0)=$00000001 ; the first dword array.d(4)=$00000002 ; the second dword array.d(N shl 2)=n ; the Nth dword Hence, mixing of bytes, words, and dwords in a data structure is easily accomplished. But unaligned memory accesses are also possible, and may not be desirable. The same memory location can also be accessed with differing data sizes at different times, however this may cause unexpected results if endianness is not taken into account. Calculations are sometimes accepted as operands in place of memory references or numeric values. See the next section for details. ----NOWUT language statements/instructions---- The following is a list of recognized instructions in NOWUT language mode: instruction number of operands type of operand description ALIGNW none adds a byte if necessary to ensure an even address ALIGND none adds bytes if necessary to ensure an address divisible by 4 ALIGNQ none adds bytes if necessary to ensure an address divisible by 8 ALIGN16 none adds bytes if necessary to ensure an address divisible by 16 ASM none switches to assembly language mode BEGINFUNC variable symbol pushes registers on the stack and optionally defines any parameters that were provided by the caller Note: parameters will be listed in the reverse order compared to CALLEX BITS16 none switches x86 compilation mode to 8086 BITS16S none switches x86 compilation mode to 8086tiny BITS32 none switches x86 compilation mode to 386 CALLAM 1 memory reference 68000 only: used for Amiga OS system calls. 6 calculation First parameter is the return value. Second is the base address. Third is the function offset. The remaining parameters are values to be loaded in registers D0, D1, A0, A1 example: callam dosbase.d,[4].d,-552,0,0,0,"dos.library".a ; Exec - openlibrary CALLEX 1 memory reference pushes any parameters on the stack, calls 1 calculation a function, and optionally stores a variable calculation return value examples: callex result.d,functionname ; function call that receives a return value callex ,functionname ; function call without return value callex ,functionname,param1.d,foo*320+bar ; function call with two parameters callex ,jumptable.d(x shl 2) ; calculating a function address Note: parameters are pushed on the stack from left to right and will appear in the reverse order compared to BEGINFUNC CONST 1 symbol associates a value with a symbol 1 immediate example: const bufsize,16384 ; occurrences of bufsize are replaced with 16384 COPYBYTES 1 calculation copies an array of bytes from source address 1 calculation (first parameter) to destination address 1 calculation (second parameter). Third param is count. example: copybytes string1.a,string2.a+someoffset,foo-1 Note: If the count parameter is equal to zero then no bytes are copied. Source and destination memory areas should not overlap. Count is limited to 32768 on 8086, and 65536 on 68000. COUNTDOWN 1 memory reference marks the beginning of a loop example: countdown xx ; [...] ; the code inside will execute xx times, and nextcount ; afterward xx will be 0 DB 1 or more immediate, string initialized data DW 1 or more immediate, string initialized data DD 1 or more immediate, string, initialized data address, indexed address DEF 1 or more symbol provides the default type for a new symbol (does not change an existing default) example: def var1.d,value2.b ; tells the compiler that var1 will default to ; dword size, and value2 to byte size END none jumps to the ENDPROGRAM routine in a platform module ENDFUNC 0 or 1 immediate, memory invalidates local variables and stack- reference based parameters, pops the stack, and optionally loads a return value EXTERN 1 or more string specifies a case-sensitive alias for symbol names that will be written to output file for the benefit of case-sensitive linkers FILLPATTERN 1 immediate specifies the data bytes that will appear as (bigend dword) padding when ALIGNx or RESx (in sections other than BSS) are used FLUSHIMM none places an immediate data 'dump' (if any) at the current address GOSUB 1 immediate, address calls a subroutine without changing the stack frame GOTO 1 immediate, address jumps to an arbitrary location IFCPU 1 string (same as skips subsequent statements on the same line IFCPUNOT platform name) if the condition is not true example: ifcpu "68000" > const someparam,512 ; when compiling for 68000, someparam becomes 512 ifcpunot "68000" > const someparam,1024 ; otherwise, it becomes 1024 IFEQUAL 1 calculation compares the first and second operands, 1 immediate, memory then jumps to the location specified in reference the third operand if they are equal 1 address example: ifequal foo,15,someroutine ; jumps if foo is equal to 15 IFGREATER 1 calculation compares the first and second operands, 1 immediate, memory then jumps if the first is greater reference 1 address example: ifgreater bar+10,foo,someroutine ; jumps if bar+10 is greater than foo IFLESS 1 calculation compares the first and second operands, 1 immediate, memory then jumps if the first is less reference 1 address IFNOTGREATER 1 calculation equivalent to less-than-or-equal 1 immediate, memory reference 1 address IFNOTLESS 1 calculation equivalent to greater-than-or-equal 1 immediate, memory reference 1 address IFUNEQUAL 1 calculation compares the first and second operands, 1 immediate, memory then jumps to the location specified in reference the third operand if they are unequal 1 address INCBIN 1 or more string loads initialized data from a file example: incbin "gamegfx.bin" ; inserts the contents of GAMEGFX.BIN LINKLIBFILE 1 or more string adds the name of a dynamic link library to example: the output file .drectve section linklibfile "kernel32.dll","user32.dll" LOADBIG 1 memory reference loads a big-endian second operand and changes 1 immedate, memory the byte order if necessary reference LOADLITTLE 1 memory reference loads a little-endian second operand and changes 1 immedate, memory the byte order if necessary reference LOCALVAR 1 or more symbol with tag defines stack-based variables to be used within a function (must come after BEGINFUNC) NEXTCOUNT none decrements the variable specified by the associated COUNTDOWN, then jumps to the beginning of the loop if the result is not 0 RESB 1 immediate* reserves the number of bytes specified RESW 1 immediate* reserves the number of words specified RESD 1 immediate* reserves the number of dwords specified RESQ 1 immediate* reserves the number of qwords specified Note: The count can now include addition, subtraction, and multiplication, (performed at compile time) as well as symbols defined with CONST. An underscore can be used to refer to the current section offset. examples: resb 512-_ ; pads the section to a size of 512 bytes resd tablesize*2+1 RETURN none returns from a routine called by GOSUB RETURNEX 1 immediate returns from a routine called by CALLEX and pops the specified number of bytes from the stack SECTION 1 string creates a custom section in the COFF output 1 immediate using the specified name and flags (dword) example: SECTION ".foodata",$C0300040 SECTIONBSS none marks the beginning of the BSS (reserved storage) SECTIONCODE none marks the beginning of the code section SECTIONDATA none marks the beginning of the data section (initialized data) Note: section headers for the same section should not appear more than once in a NOWUT program SWAP 2 memory reference exchanges data between two memory locations USEFASTBASE 1 symbol enables an optimization on 68K, MIPS, and ARM 1 0 or 1 CPUs. (see assembler section) Second parameter selects slot 0 or 1. WEND none marks the end of a WHILE loop (jumps back to the beginning) WHILEEQUAL 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if they are unequal reference example: WHILEEQUAL [pointer].b,0 ; if the memory location [pointer] contains a pointer=_+1 ; zero byte then the loop will execute, and it WEND ; will continue until a non-zero byte is found WHILEGREATER 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if the first is not reference greater WHILELESS 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if the first is not reference less example: xx=0 WHILELESS xx,5 ; the beep routine will be called 5 times GOSUB beep > xx=_+1 WEND WHILEUNEQUAL 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if they are equal reference ----Functions/procedures---- There are no declarations needed, return values are optional, and functions can even have multiple entry and exit points. But care needs to be taken to avoid stack corruption and to refrain from referencing local variables when they are not valid. ; example of a function with multiple entry points and an internal subroutine somefunc2: globalsetting=defaultval somefunc: beginfunc address.d localvar x.d,y.d x=address > y=8 > gosub printstuff ifequal globalsetting,0,label123 x=carriagertn > y=2 > gosub printstuff goto label123 printstuff: callex ,writefile,,tempvar,y,x,chandle return label123: endfunc returnex 4 Imagine that you can call somefunc, and pass the address of an eight-character string which will then be printed (using a Win32 call to write to an open console). If globalsetting is zero then no carriage return is added to the output, otherwise it is added. If the caller wanted to override globalsetting then it could call somefunc2 instead, which sets globalsetting itself, before the function proceeds. Having code execute before BEGINFUNC is no problem as long as it only references global variables. The address, x, and y symbols are not yet valid until after BEGINFUNC and LOCALVAR. Likewise, when printstuff is called from within somefunc, the x and y variables are valid. However, printstuff can NOT be called from outside of somefunc because x and y will point to who-knows-what. The compiler doesn't invalidate stack variables belonging to one function until it sees a BEGINFUNC pertaining to another function. At that point, it will give an error if access to such variables is attempted. At runtime, they would become invalid as soon as an ENDFUNC is executed. It's possible to have more than one ENDFUNC associated with a function. ; example function with two exit points anotherfunc: beginfunc param1.d,param2.d ifequal param1,param2,labelxyz endfunc 0 returnex 8 labelxyz: endfunc 1 returnex 8 Note that if you don't need to pass any parameters, GOSUB and RETURN can be used instead of CALLEX and RETURNEX. ----Internal assembler---- The NOWUT compiler generates assembly code and feeds it back to its internal assembler. This code can be seen in comments in the .LST file that is generated during compilation. If an error occurs during compilation, it is convenient to look at the end of the .LST file to see how far it progressed before encountering the error. Hand-written assembly language can be included in NOWUT programs by using the ASM and ENDASM statements. These CPU-independent statements are recognized by the assembler: ALIGNW \ ALIGND \ ALIGNQ \ same as NOWUT mode DB / DW / DD / ENDASM - returns to NOWUT mode The x86 instruction set: Instruction names are as usual, except for 8-bit jumps which have been given their own separate forms SJMP and SJcc. Memory operands are contained in square brackets, and .b .w .d tags are used to make the operand size explicit. Destination operands go on the left, source operands on the right. In 32-bit mode, operand-size prefixes are inserted before instructions that use 16-bit words. The reverse is true for 16-bit mode, where instructions with a 32-bit operand size will have a prefix inserted. The instruction listing below describes acceptable operands mostly in terms of how they are encoded. The equivalent NOWUT syntax or operand types are as follows: x86 operand NOWUT notes imm8 immediate 8 bits, often sign-extended imm16 immediate (could be an address in 16-bit mode) imm32 immediate or address reg8 al cl dl bl ah ch dh bh reg16 ax cx dx bx sp bp si di reg32 eax ecx edx ebx esp ebp esi edi segreg es cs ss ds fs gs CRx cr0 cr2 cr3 cr4 DRx dr0 dr1 dr2 dr3 dr6 dr7 freg st0 st1 st2 st3 st4 st5 st6 st7 mmxreg mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 mem8 [immediate].b, [address].b, [reg].b, [reg+xx].b mem16 [immediate].w, [address].w, [reg].w, [reg+xx].w mem32 [immediate].d, [address].d, [reg].d, [reg+xx].d mem48/64/80 [immediate], [address], .q should be accepted for 64-bit data, while .w may be [reg], [reg+xx] used to distinguish 80-bit (this hasn't been tested) disp16/32 [immediate], [address] accesses memory of various sizes but doesn't use mod/rm encoding rm8 - same as mem8 or reg8 rm16 - same as mem16 or reg16 rm32 - same as mem32 or reg32 The assembler will accept operands without a size tag, however in some cases the size ambiguity will mean that more than one opcode would be valid. Shorter opcodes are generally favored. example: PUSH 7 ; this will be assembled as an imm8 instead of imm32 Also note that on the x86, displacements can be negative: MOV EAX,[EBP-40] Scaled index registers should be specified before a base or displacement, with a SHL operator to indicate the scaling factor (shift). example: mov eax,[ecx shl 1+ebx].d Hand-written assembly code should not modify (or should save and restore) the EBP register as the compiler uses it to address stack variables. EBP is used to address stack variables in assembly code as well, and these variables will become invalid when it is modified. AAA $37 AAD imm8 $D5 AAM imm8 $D4 AAS $3F ADC AL,imm8 $14 ADC AX/EAX,imm16/32 $15 ADC rm8,reg8 $10 ADC rm16/32,reg16/32 $11 ADC reg8,rm8 $12 ADC reg16/32,rm16/32 $13 ADC rm8,imm8 $80 /2 ADC rm16/32,imm16/32 $81 /2 ADC rm16/32,imm8 $83 /2 ADD AL,imm8 $04 ADD AX/EAX,imm16/32 $05 ADD rm8,reg8 $00 ADD rm16/32,reg16/32 $01 ADD reg8,rm8 $02 ADD reg16/32,rm16/32 $03 ADD rm8,imm8 $80 /0 ADD rm16/32,imm16/32 $81 /0 ADD rm16/32,imm8 $83 /0 AND AL,imm8 $24 AND AX/EAX,imm16/32 $25 AND rm8,reg8 $20 AND rm16/32,reg16/32 $21 AND reg8,rm8 $22 AND reg16/32,rm16/32 $23 AND rm8,imm8 $80 /4 AND rm16/32,imm16/32 $81 /4 AND rm16/32,imm8 $83 /4 ARPL rm16,reg16 $63 BOUND reg16/32,mem16/32 $62 BSF reg16/32,rm16/32 $0F BC BSR reg16/32,rm16/32 $0F BD BSWAP reg32 $0F C8+r BT rm16/32,reg16/32 $0F A3 BT rm16/32,imm8 $0F BA /4 BTC rm16/32,reg16/32 $0F BB BTC rm16/32,imm8 $0F BA /7 BTR rm16/32,reg16/32 $0F B3 BTR rm16/32,imm8 $0F BA /6 BTS rm16/32,reg16/32 $0F AB BTS rm16/32,imm8 $0F BA /5 CALL imm16/32 $E8 CALL rm16/32 $FF /2 CALLF rm16/32 $FF /3 CALLF imm16,[imm16/32] $9A ; first operand is the segment CDQ $99 CLC $F8 CLD $FC CLI $FA CLTS $0F 06 CMC $F5 CMP AL,imm8 $3C CMP AX/EAX,imm16/32 $3D CMP rm8,reg8 $38 CMP rm16/32,reg16/32 $39 CMP reg8,rm8 $3A CMP reg16/32,rm16/32 $3B CMP rm8,imm8 $80 /7 CMP rm16/32,imm16/32 $81 /7 CMP rm16/32,imm8 $83 /7 CMPSB $A6 CMPSD $A7 CMPSW $A7 CMPXCHG rm8,reg8 $0F B0 CMPXCHG rm16/32,reg16/32 $0F B1 CMPXCHG8B mem64 $0F C7 /1 CPUID $0F A2 CWDE $98 DAA $27 DAS $2F DEC reg16/32 $48+r DEC rm8 $FE /1 DEC rm16/32 $FF /1 DIV rm8 $F6 /6 DIV rm16/32 $F7 /6 HLT $F4 IDIV rm8 $F6 /7 IDIV rm16/32 $F7 /7 IMUL rm8 $F6 /5 IMUL rm16/32 $F7 /5 IMUL AL,rm8 $F6 /5 IMUL AX/EAX,rm16/32 $F7 /5 IMUL reg16/32,rm16/32 $0F AF IMUL reg16/32,imm8 $6B IMUL reg16/32,imm16/32 $69 IMUL reg16/32,rm16/32,imm8 $6B IMUL reg16/32,rm16/32,imm16/32 $69 IN AL,imm8 $E4 IN AX/EAX,imm8 $E5 IN AL,DX $EC IN AX/EAX,DX $ED INC reg16/32 $40+r INC rm8 $FE /0 INC rm16/32 $FF /0 INSB $6C INSD $6D INSW $6D INT imm8 $CD INT3 $CC INTO $CE INVD $0F 08 INVLPG mem $0F 01 /0 IRET $CF JECXZ imm8 $E3 Jcc imm16/32 $0F 80+cc JMP imm16/32 $E9 JMP rm16/32 $FF /4 JMPF rm16/32 $FF /5 JMPF imm16,[imm16/32] $EA ; first operand is the segment LAHF $9F LAR reg16/32,rm16/32 $0F 02 LDS reg16/32,mem32/48 $C5 LES reg16/32,mem32/48 $C4 LFS reg16/32,mem32/48 $0F B4 LGS reg16/32,mem32/48 $0F B5 LSS reg16/32,mem32/48 $0F B2 LEA reg16/32,mem $8D LEAVE $C9 LGDT mem48 $0F 01 /2 LIDT mem48 $0F 01 /3 LLDT rm16 $0F 00 /2 LMSW rm16 $0F 01 /6 LODSB $AC LODSD $AD LODSW $AD LSL reg16/32,rm16/32 $0F 03 LTR rm16 $0F 00 /3 MOV AL,disp16/32 $A0 MOV AX/EAX,disp16/32 $A1 MOV disp16/32,AL $A2 MOV disp16/32,AX/EAX $A3 MOV rm8,reg8 $88 MOV rm16/32,reg16/32 $89 MOV reg8,rm8 $8A MOV reg16/32,rm16/32 $8B MOV reg8,imm8 $B0+r MOV reg16/32,imm16/32 $B8+r MOV rm8,imm8 $C6 /0 MOV rm16/32,imm16/32 $C7 /0 MOV rm16/32,segreg $8C MOV segreg,rm16/32 $8E MOV reg32,CRx $0F 20 MOV reg32,DRx $0F 21 MOV CRx,reg32 $0F 22 MOV DRx,reg32 $0F 23 MOVSB $A4 MOVSD $A5 MOVSW $A5 MOVSX reg16/32,rm8 $0F BE MOVSX reg32,rm16 $0F BF MOVZX reg16/32,rm8 $0F B6 MOVZX reg32,rm16 $0F B7 MUL rm8 $F6 /4 MUL rm16/32 $F7 /4 NEG rm8 $F6 /3 NEG rm16/32 $F7 /3 NOP $90 NOT rm8 $F6 /2 NOT rm16/32 $F7 /2 OR AL,imm8 $0C OR AX/EAX,imm16/32 $0D OR rm8,reg8 $08 OR rm16/32,reg16/32 $09 OR reg8,rm8 $0A OR reg16/32,rm16/32 $0B OR rm8,imm8 $80 /1 OR rm16/32,imm16/32 $81 /1 OR rm16/32,imm8 $83 /1 OUT imm8,AL $E6 OUT imm8,AX/EAX $E7 OUT DX,AL $EE OUT DX,AX/EAX $EF OUTSB $6E OUTSD $6F OUTSW $6F POP reg16/32 $58+r POP rm16/32 $8F /0 POP DS $1F POP ES $07 POP SS $17 POP FS $0F A1 POP GS $0F A9 POPA $61 POPAD POPAW POPF $9D POPFD POPFW PUSH reg16/32 $50+r PUSH rm16/32 $FF /6 PUSH imm8 $6A (this byte is sign-extended) PUSH imm16/32 $68 PUSH CS $0E PUSH DS $1E PUSH ES $06 PUSH SS $16 PUSH FS $0F A0 PUSH GS $0F A8 PUSHA $60 PUSHAD PUSHAW PUSHF $9C PUSHFD PUSHFW RCL rm8 $D0 /2 RCL rm8,CL $D2 /2 RCL rm8,imm8 $C0 /2 RCL rm16/32 $D1 /2 RCL rm16/32,CL $D3 /2 RCL rm16/32,imm8 $C1 /2 RCR rm8 $D0 /3 RCR rm8,CL $D2 /3 RCR rm8,imm8 $C0 /3 RCR rm16/32 $D1 /3 RCR rm16/32,CL $D3 /3 RCR rm16/32,imm8 $C1 /3 RDMSR $0F 32 RDTSC $0F 31 RET $C3 RET imm16 $C2 RETF $CB RETF imm16 $CA ROL rm8 $D0 /0 ROL rm8,CL $D2 /0 ROL rm8,imm8 $C0 /0 ROL rm16/32 $D1 /0 ROL rm16/32,CL $D3 /0 ROL rm16/32,imm8 $C1 /0 ROR rm8 $D0 /1 ROR rm8,CL $D2 /1 ROR rm8,imm8 $C0 /1 ROR rm16/32 $D1 /1 ROR rm16/32,CL $D3 /1 ROR rm16/32,imm8 $C1 /1 SAL rm8 $D0 /4 SAL rm8,CL $D2 /4 SAL rm8,imm8 $C0 /4 SAL rm16/32 $D1 /4 SAL rm16/32,CL $D3 /4 SAL rm16/32,imm8 $C1 /4 SAHF $9E SAR rm8 $D0 /7 SAR rm8,CL $D2 /7 SAR rm8,imm8 $C0 /7 SAR rm16/32 $D1 /7 SAR rm16/32,CL $D3 /7 SAR rm16/32,imm8 $C1 /7 SBB AL,imm8 $1C SBB AX/EAX,imm16/32 $1D SBB rm8,reg8 $18 SBB rm16/32,reg16/32 $19 SBB reg8,rm8 $1A SBB reg16/32,rm16/32 $1B SBB rm8,imm8 $80 /3 SBB rm16/32,imm16/32 $81 /3 SBB rm16/32,imm8 $83 /3 SCASB $AE SCASD $AF SCASW $AF SETcc rm8 $0F 90+cc /0 (corrected) SGDT mem48 $0F 01 /0 SIDT mem48 $0F 01 /1 SLDT rm16 $0F 00 /0 SHL rm8 $D0 /4 SHL rm8,CL $D2 /4 SHL rm8,imm8 $C0 /4 SHL rm16/32 $D1 /4 SHL rm16/32,CL $D3 /4 SHL rm16/32,imm8 $C1 /4 SHR rm8 $D0 /5 SHR rm8,CL $D2 /5 SHR rm8,imm8 $C0 /5 SHR rm16/32 $D1 /5 SHR rm16/32,CL $D3 /5 SHR rm16/32,imm8 $C1 /5 SHLD rm16/32,reg16/32,imm8 $0F A4 SHLD rm16/32,reg16/32,CL $0F A5 SHRD rm16/32,reg16/32,imm8 $0F AC SHRD rm16/32,reg16/32,CL $0F AD SJcc imm8 or address $70+cc SJMP imm8 or address $EB SMSW rm16 $0F 01 /4 STC $F9 STD $FD STI $FB STOSB $AA STOSD $AB STOSW $AB STR rm16 $0F 00 /1 SUB AL,imm8 $2C SUB AX/EAX,imm16/32 $2D SUB rm8,reg8 $28 SUB rm16/32,reg16/32 $29 SUB reg8,rm8 $2A SUB reg16/32,rm16/32 $2B SUB rm8,imm8 $80 /5 SUB rm16/32,imm16/32 $81 /5 SUB rm16/32,imm8 $83 /5 TEST AL,imm8 $A8 TEST AX/EAX,imm16/32 $A9 TEST rm8,reg8 $84 TEST rm16/32,reg16/32 $85 TEST rm8,imm8 $F6 /0 TEST rm16/32,imm16/32 $F7 /0 VERR rm16 $0F 00 /4 VERW rm16 $0F 00 /5 WAIT $9B WBINVD $0F 09 WRMSR $0F 30 XADD rm8,reg8 $0F C0 XADD rm16/32,reg16/32 $0F C1 XCHG AX/EAX,reg16/32 $90+r XCHG reg16/32,AX/EAX $90+r XCHG reg8,rm8 $86 XCHG rm8,reg8 $86 XCHG reg16/32,rm16/32 $87 XCHG rm16/32,reg16/32 $87 XOR AL,imm8 $34 XOR AX/EAX,imm16/32 $35 XOR rm8,reg8 $30 XOR rm16/32,reg16/32 $31 XOR reg8,rm8 $32 XOR reg16/32,rm16/32 $33 XOR rm8,imm8 $80 /6 XOR rm16/32,imm16/32 $81 /6 XOR rm16/32,imm8 $83 /6 XLATB $D7 The following prefixes are supported: ASIZE $67 (address size override) CS $2E DS $3E ES $26 FS $64 GS $65 SS $36 LOCK $F0 REPNZ/NE $F2 REP/E/Z $F3 The following x87 instructions are supported: F2XM1 $D9 $F0 FABS $D9 $E1 FADD mem32 $D8 /0 FADD mem64 $DC /0 FADD freg $D8 $C0+r FADD freg,ST0 $DC $C0+r FADDP freg,ST0 $DE $C0+r FBLD mem80 $DF /4 FBSTP mem80 $DF /6 FCHS $D9 $E0 FCLEX $9B $DB $E2 FNCLEX $DB $E2 FCMOVB freg $DA $C0+r FCMOVBE freg $DA $D0+r FCMOVE freg $DA $C8+r \ FCMOVNB freg $DB $C0+r \ P6 instructions FCMOVNBE freg $DB $D0+r / FCMOVNE freg $DB $C8+r / FCMOVNU freg $DB $D8+r FCMOVU freg $DA $D8+r FCOM mem32 $D8 /2 FCOM mem64 $DC /2 FCOM freg $D8 $D0+r FCOMP mem32 $D8 /3 FCOMP mem64 $DC /3 FCOMP freg $D8 $D8+r FCOMPP $DE $D9 FCOMI freg $DB $F0+r \ P6 FCOMIP freg $DF $F0+r / FCOS $D9 $FF FDECSTP $D9 $F6 FDISI $9B $DB $E1 FNDISI $DB $E1 \ 8087 only FENI $9B $DB $E0 / FNENI $DB $E0 FDIV mem32 $D8 /6 FDIV mem64 $DC /6 FDIV freg $D8 $F0+r FDIV freg,ST0 $DC $F8+r FDIVR mem32 $D8 /7 FDIVR mem64 $DC /7 FDIVR freg $D8 $F8+r FDIVR freg,ST0 $DC $F0+r FDIVP freg,ST0 $DE $F8+r FDIVRP freg,ST0 $DE $F0+r FFREE freg $DD C0+r FIADD mem16 $DE /0 FIADD mem32 $DA /0 FICOM mem16 $DE /2 FICOM mem32 $DA /2 FICOMP mem16 $DE /3 FICOMP mem32 $DA /3 FIDIV mem16 $DE /6 FIDIV mem32 $DA /6 FIDIVR mem16 $DE /7 FIDIVR mem32 $DA /7 FILD mem16 $DF /0 FILD mem32 $DB /0 FILD mem64 $DF /5 FIST mem16 $DF /2 FIST mem32 $DB /2 FISTP mem16 $DF /3 FISTP mem32 $DB /3 FISTP mem64 $DF /7 FIMUL mem16 $DE /1 FIMUL mem32 $DA /1 FINCSTP $D9 $F7 FINIT $9B $DB $E3 FNINIT $DB $E3 FISUB mem16 $DE /4 FISUB mem32 $DA /4 FISUBR mem16 $DE /5 FISUBR mem32 $DA /5 FLD mem32 $D9 /0 FLD mem64 $DD /0 FLD mem80 $DB /5 FLD freg $D9 $C0+r FLD1 $D9 $E8 FLDL2E $D9 $EA FLDL2T $D9 $E9 FLDLG2 $D9 $EC FLDLN2 $D9 $ED FLDP $D9 $EB FLDZ $D9 $EE FLDCW mem16 $D9 /5 FLDENV mem $D9 /4 FMUL mem32 $D8 /1 FMUL mem64 $DC /1 FMUL freg $D8 $C8+r FMUL freg,ST0 $DC $C8+r FMULP freg,ST0 $DE $C8+r FNOP $D9 D0 FPATAN $D9 $F3 FPTAN $D9 $F2 FPREM $D9 $F8 FPREM1 $D9 $F5 FRNDINT $D9 $FC FSAVE mem $9B $DD /6 FNSAVE mem $DD /6 FRSTOR mem $DD /4 FSCALE $D9 $FD FSETPM $DB $E4 FSIN $D9 $FE FSINCOS $D9 $FB FSQRT $D9 $FA FST mem32 $D9 /2 FST mem64 $DD /2 FST freg $DD $D0+r FSTP mem32 $D9 /3 FSTP mem64 $DD /3 FSTP mem80 $DB /0 FSTP freg $DD $D8+r FSTCW mem16 $9B $D9 /0 FNSTCW mem16 $D9 /0 FSTENV mem $9B $D9 /6 FNSTENV mem $D9 /6 FSTSW mem16 $9B $DD /7 FSTSW AX $9B $DF $E0 FNSTSW mem16 $DD /7 FNSTSW AX $DF $E0 FSUB mem32 $D8 /4 FSUB mem64 $DC /4 FSUB freg $D8 $E0+r FSUB freg,ST0 $DC $E8+r FSUBR mem32 $D8 /5 FSUBR mem64 $DC /5 FSUBR freg $D8 $E8+r FSUBR freg,ST0 $DC $E0+r FSUBP freg,ST0 $DE $E8+r FSUBRP freg,ST0 $DE $E0+r FTST $D9 $E4 FUCOM freg $DD $E0+r FUCOMP freg $DD $E8+r FUCOMPP $DA $E9 FUCOMI freg $DB $E8+r \ P6 FUCOMIP freg $DF $E8+r / FXAM $D9 $E5 FXCH freg $D9 $C8+r FXTRACT $D9 $F4 FYL2X $D9 $F1 FYL2XP1 $D9 $F9 The following MMX instructions are supported: EMMS $0F 77 (MMX) MOVD mmxreg,rm32 $0F 6E (MMX) MOVD rm32,mmxreg $0F 7E (MMX) MOVQ mmxreg,mem64 $0F 6F (MMX) MOVQ mem64,mmxreg $0F 7F (MMX) PACKSSDW mmxreg,mmxreg/mem64 $0F 6B (MMX) PACKSSWB mmxreg,mmxreg/mem64 $0F 63 (MMX) PACKUSWB mmxreg,mmxreg/mem64 $0F 67 (MMX) PADDB mmxreg,mmxreg/mem64 $0F FC (MMX) PADDW mmxreg,mmxreg/mem64 $0F FD (MMX) PADDD mmxreg,mmxreg/mem64 $0F FE (MMX) PADDSB mmxreg,mmxreg/mem64 $0F EC (MMX) PADDSW mmxreg,mmxreg/mem64 $0F ED (MMX) PADDUSB mmxreg,mmxreg/mem64 $0F DC (MMX) PADDUSW mmxreg,mmxreg/mem64 $0F DD (MMX) PAND mmxreg,mmxreg/mem64 $0F DB (MMX) PANDN mmxreg,mmxreg/mem64 $0F DF (MMX) PCMPEQB mmxreg,mmxreg/mem64 $0F 74 (MMX) PCMPEQW mmxreg,mmxreg/mem64 $0F 75 (MMX) PCMPEQD mmxreg,mmxreg/mem64 $0F 76 (MMX) PCMPGTB mmxreg,mmxreg/mem64 $0F 64 (MMX) PCMPGTW mmxreg,mmxreg/mem64 $0F 65 (MMX) PCMPGTD mmxreg,mmxreg/mem64 $0F 66 (MMX) PMADDWD mmxreg,mmxreg/mem64 $0F F5 (MMX) PMULHW mmxreg,mmxreg/mem64 $0F E5 (MMX) PMULLW mmxreg,mmxreg/mem64 $0F D5 (MMX) POR mmxreg,mmxreg/mem64 $0F EB (MMX) PSLLW mmxreg,mmxreg/mem64 $0F F1 (MMX) PSLLW mmxreg,imm8 $0F 71 /6 PSLLD mmxreg,mmxreg/mem64 $0F F2 (MMX) PSLLD mmxreg,imm8 $0F 72 /6 PSLLQ mmxreg,mmxreg/mem64 $0F F3 (MMX) PSLLQ mmxreg,imm8 $0F 73 /6 PSRAW mmxreg,mmxreg/mem64 $0F E1 (MMX) PSRAW mmxreg,imm8 $0F 71 /4 PSRAD mmxreg,mmxreg/mem64 $0F E2 (MMX) PSRAD mmxreg,imm8 $0F 72 /4 PSRLW mmxreg,mmxreg/mem64 $0F D1 (MMX) PSRLW mmxreg,imm8 $0F 71 /2 PSRLD mmxreg,mmxreg/mem64 $0F D2 (MMX) PSRLD mmxreg,imm8 $0F 72 /2 PSRLQ mmxreg,mmxreg/mem64 $0F D3 (MMX) PSRLQ mmxreg,imm8 $0F 73 /2 PSUBB mmxreg,mmxreg/mem64 $0F F8 (MMX) PSUBW mmxreg,mmxreg/mem64 $0F F9 (MMX) PSUBD mmxreg,mmxreg/mem64 $0F FA (MMX) PSUBSB mmxreg,mmxreg/mem64 $0F E8 (MMX) PSUBSW mmxreg,mmxreg/mem64 $0F E9 (MMX) PSUBUSB mmxreg,mmxreg/mem64 $0F D8 (MMX) PSUBUSW mmxreg,mmxreg/mem64 $0F D9 (MMX) PUNPCKHBW mmxreg,mmxreg/mem64 $0F 68 (MMX) PUNPCKHWD mmxreg,mmxreg/mem64 $0F 69 (MMX) PUNPCKHDQ mmxreg,mmxreg/mem64 $0F 6A (MMX) PUNPCKLBW mmxreg,mmxreg/mem64 $0F 60 (MMX) PUNPCKLWD mmxreg,mmxreg/mem64 $0F 61 (MMX) PUNPCKLDQ mmxreg,mmxreg/mem64 $0F 62 (MMX) PXOR mmxreg,mmxreg/mem64 $0F EF (MMX) 8086 mode peculiarities: There are two pseudo-instructions used in 8086 mode (ignored in 8086tiny or 386 mode): SEGC reg16 ; causes DS to be reloaded if the next memory reference is NOT on the stack. ; the register specified is used as an intermediary to hold the value ; (since there is no move-immediate instruction for segment registers) SEGR ; causes the next SEGC to be ignored, in case DS has already been setup Hence the method of accessing global (non-stack) variables in 8086 assembly: SEGC AX ; choose a register whose contents aren't needed MOV AX,[symbol] ; (for instance, the one we are about to reload) Don't assume that two different symbols have the same segment! Symbol addresses are handled a few different ways: DD symbol ; results in a 32-bit offset that is relative to beginning of program MOV AX,symbol ; loads the low word of the 32-bit value MOV DX,symbol.h ; loads the high word of the 32-bit value SEGC SI ; LEA SI,[symbol] ; this sequence will load a valid segment/offset pair into DS:SI The way that the compiler translates a 32-bit address to a segment/offset pair is by using the high word to lookup a segment value from a table at CS:0000 (the table is populated by INITPLATFORM in PIODOS). This happens whenever indexed or indirect addressing is used in NOWUT code. However, the resulting values are not identical to the ones used by direct references in code after it has been linked and executed. The linker tries to keep offsets under 32K so that there is room for an index to be added on, as would occur in this example: SEGC SI MOV SI,[symbol(448)] This mechanism does not allow for a negative index: SEGC SI MOV SI,[symbol(-12)] ; does NOT work The 68000 instruction set: All 68000/68010 instructions and addressing modes are now supported. Only plain 68000 opcodes are used by the compiler. Normal branch instructions use the 16-bit displacement, the SBxx version should be used for the shorter 8-bit displacement form. Variations on a single mnemonic such as ADDA or ADDI seen in other assemblers have been eliminated in favor of using one mnemonic for all forms. 32-bit words are referred to as dwords and use the .d tag, just as they do in x86 NOWUT. Likewise, memory operands use square brackets, and operands receive a size tag rather than the instruction. Destination operands go on the right, source operands on the left. The assembler will accept operands without a size tag, however in some cases the size ambiguity will mean that more than one opcode would be valid. Shorter opcodes are generally favored. example: MOVE 7,d0 ; assembled as 8-bit immediate instead of 16 or 32-bit MOVE [address],d0.d ; \ MOVE [address].d,d0 ; these all do the same thing MOVE [address].d,d0.d ; / ea (effective address) operands can be any of the following: imm8/16/32 ; immediate [address] ; memory reference (32-bit or signed 16-bit) [ax] ; address register indirect [ax+xxxx] ; address register indirect with displacement [ax+] ; address register indirect with post-increment [ax-] ; address register indirect with pre-decrement dx ; data register ax ; address register [ax+ry+xx] ; extension word: ry can be an address or data register, optionally with .w or .d tag [PC+xxxx] ; PC relative [PC+symbol] ; PC relative that refers to a symbol [PC+ry+xx] ; PC plus extension word not all modes are valid for all instructions (eg. immediate can't be a destination) Also note that on the 68K, displacements can be negative: MOVE [a6-40].d,d0 Hand-written assembly code should not modify (or should save and restore) the A6 register as the compiler uses it to address stack variables. The USEFASTBASE statement can be used on 68000 to produce more compact executables. It loads registers A3 and/or A4 with the address of a specified symbol at run time, and uses 16-bit relative addressing to access other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This scheme depends on A3/A4 not being disturbed by other code and can only be enabled in one module when a program consists of multiple modules linked together. Example usage: usefastbase datbase,0 ; datbase symbol will be loaded in A3 to use for relative addressing usefastbase bssbase,1 ; bssbase symbol will be loaded in A4 to use for relative addressing (These optimizations shrink NOWUT.X by about 3KB) Instruction list: ABCD dx,dy ABCD [ax-],[ay-] ; byte only ADD imm,ea /imm8/16/32 ADD imm3,ea ADD dy,ea ADD ea,dy ADD ea,ay ADDX dx,dy ADDX [ax-],[ay-] AND imm,ea /imm8/16/32 AND imm,ccr /imm8 AND imm,sr /imm16 AND dy,ea AND ea,dy ASL ea ; word only ASL imm3,dx ASL dy,dx ASR ea ; word only ASR imm3,dx ASR dy,dx BKPT imm3 ; 68010 Bcc label BCHG imm,ea BCHG dn,ea ; byte only BCLR imm,ea /imm8 BCLR dn,ea ; byte only BRA label BSET imm,ea /imm8 BSET dn,ea ; byte only BSR label BTST imm,ea /imm8 BTST dn,ea ; byte only CHK ea,dy ; word only CLR ea CMP imm,ea /imm8/16/32 CMP [an+],[an+] CMP ea,dy CMP ea,ay DBcc dx,label ; word only DBRA dx,label ; word only DIVS ea,dy ; divide 32/16, remainder in high word DIVU ea,dy EOR imm,ea /imm8/16/32 EOR imm,ccr /imm8 EOR imm,sr /imm16 EOR dy,ea EXG dx,dy EXG ax,ay EXG ax,dy EXT dx ; byte->word EXT dx ; word->dword ILLEGAL JMP ea JSR ea LEA ea,ay LINK ax,imm16 LSL ea ; word only LSL imm3,dx LSL dy,dx LSR ea ; word only LSR imm3,dx LSR dy,dx MOVE ea,ea MOVE ea,an MOVE sr,ea ; word only MOVE ea,sr ; word only MOVE ccr,ea ; word only, 68010 MOVE ea,ccr ; word only MOVE imm8,reg ; sign extended MOVEM imm,ea /imm16 ; register bit mask MOVEM ea,imm /imm16 ; register bit mask MOVEP dy,[ax+disp16] /disp16 MOVEP [ax+disp16],dy /disp16 MULS ea,dy ; 16x16->32 MULU ea,dy NBCD ea ; byte only NEG ea NEGX ea NOP NOT ea OR imm,ea /imm8/16/32 OR imm,ccr /imm8 OR imm,sr /imm16 OR dy,ea OR ea,dy PEA ea RESET ROL ea ROL imm3,dx ROL dy,dx ROR ea ROR imm3,dx ROR dy,dx ROXL ea ROXL imm3,dx ROXL dy,dx ROXR ea ROXR imm3,dx ROXR dy,dx RTD imm16 ; 68010 RTE RTR RTS SBcc label SBRA label SBSR label SBCD dx,dy ; byte only SBCD [ax-],[ay-] ; byte only Scc ea ; byte only STOP imm /imm16 SUB imm,ea /imm8/16/32 SUB imm3,ea SUB dy,ea SUB ea,dy SUB ea,ay SUBX dx,dy SUBX [ax-],[ay-] SWAP dx TAS ea ; byte only TRAP imm4 TRAPV TST ea UNLK ax XDOS imm4 ; this pseudo-instruction generates "F-line" opcodes ; for making system calls on the Sharp X68000 The SH2 instruction set: As with the 68K, the SH2 instruction set has undergone some cosmetic changes to bring it in line with NOWUT norms. A few mnemonics were tweaked, memory operands use square brackets and a data size tag, and long words (32-bit) are referred to as dwords. Destination operands go on the right, source operands on the left. Since the SH2 doesn't allow dword immediate data or memory access using absolute addresses, the assembler accepts a "fake" form of the MOV instruction and transparently inserts an extra instruction as needed. It also adds immediate data to a buffer that is periodically flushed to the output file, with a BRA opcode added to jump over the data. There are currently two issues with this: 1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of assembly code may cause overflow (using 49 definitely will) 2) The FLUSHIMM statement should be used before an INCBIN statement, if it is in a section which also contains code. The SH2 doesn't do divides in a single instruction. When division is needed, the compiler generates a call to a subroutine. The program source should include these division subroutines (or some variation thereof): sh2divideu: ' unsigned 32/16 r1/r2 asm shll16 r2 div0u div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 rotcl r1 extuw r1,r1 endasm return sh2divides: ' signed 32/16 r1/r2 asm shll16 r2 mov r1,r4 rotcl r4 mov 0,r4 subc r4,r1 div0s r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 extsw r1,r1 rotcl r1 addc r4,r1 extsw r1,r1 endasm return Note: delayed branch instructions cause the instruction following the branch to be executed before the branch takes place. Hand-written assembly code should not modify (or should save and restore) R11, R13, R14 as the compiler uses them to address stack variables. R12 is used by "fake" MOVs. ADD Rm,Rn ADD imm,Rn (immediate is 8-bit sign-extended) ADDC Rm,Rn ADDV Rm,Rn AND Rm,Rn AND imm,R0 (immediate is 8-bit zero-extended) AND imm,[R0+GBR].b (immediate is 8-bit zero-extended) BF label/imm (8-bit displacement) BFS label/imm (delayed branch)(8-bit displacement) BRA label/imm (delayed branch)(12-bit displacement) BRAF Rm (delayed branch) BSR label/imm (delayed branch)(12-bit displacement) BSRF Rm (delayed branch) BT label/imm (8-bit displacement) BTS label/imm (delayed branch)(8-bit displacement) CLRMAC CLRT CMPEQ imm,R0 (immediate is 8-bit sign-extended) CMPEQ Rm,Rn CMPGE Rm,Rn rn>=rm, signed CMPGT Rm,Rn rn>rm, signed CMPHI Rm,Rn rn>rm, unsigned CMPHS Rm,Rn rn>=rm, unsigned CMPPL Rn rn>0 CMPPZ Rn rn>=0 CMPSTR Rm,Rn DIV0S Rm,Rn DIV0U DIV1 Rm,Rn DMULS Rm,Rn 32x32->64 (MAC) DMULU Rm,Rn DT Rn EXTSB Rm,Rn EXTSW Rm,Rn EXTUB Rm,Rn EXTUW Rm,Rn JMP Rm (delayed branch) JSR Rm (delayed branch) LDC Rm,SR LDC Rm,GBR LDC Rm,VBR LDC [Rm+],SR LDC [Rm+],GBR LDC [Rm+],VBR LDS Rm,MACH LDS Rm,MACL LDS Rm,PR LDS [Rm+],MACH LDS [Rm+],MACL LDS [Rm+],PR MAC [Rm+],[Rn+].d MAC [Rm+],[Rn+].w MOV imm/address,Rn ; this pseudo-instruction uses [PC+label] address mode to load a dword ; from an automatically-created data dump MOV [symbol/address].b,Rn ; these pseudo-instructions cause another instruction to be inserted MOV [symbol/address].w,Rn ; which loads the address, then memory is accessed using register- MOV [symbol/address].d,Rn ; indirect mode. ; stack variables are an exception, the extra instruction isn't needed MOV Rm,Rn MOV imm,Rn (immediate is 8-bit sign-extended) MOV Rm,[Rn].b MOV Rm,[Rn].w MOV Rm,[Rn].d MOV [Rm].b,Rn MOV [Rm].w,Rn MOV [Rm].d,Rn MOV [Rm+].b,Rn MOV [Rm+].w,Rn MOV [Rm+].d,Rn MOV Rm,[Rn-].b MOV Rm,[Rn-].w MOV Rm,[Rn-].d MOV Rm,[R0+Rn].b MOV Rm,[R0+Rn].w MOV Rm,[R0+Rn].d MOV [R0+Rm].b,Rn MOV [R0+Rm].w,Rn MOV [R0+Rm].d,Rn MOV R0,[GBR+disp].b (8-bit displacement, zero-extended) MOV R0,[GBR+disp].w (8-bit displacement, zero-extended) MOV R0,[GBR+disp].d (8-bit displacement, zero-extended) MOV [GBR+disp].b,R0 (8-bit displacement, zero-extended) MOV [GBR+disp].w,R0 (8-bit displacement, zero-extended) MOV [GBR+disp].d,R0 (8-bit displacement, zero-extended) MOV R0,[Rn+disp].b (4-bit displacement, zero-extended) MOV R0,[Rn+disp].w (4-bit displacement, zero-extended, doubled) MOV Rm,[Rn+disp].d (4-bit displacement, zero-extended, quadrupled) MOV [Rn+disp].b,R0 (4-bit displacement, zero-extended) MOV [Rn+disp].w,R0 (4-bit displacement, zero-extended, doubled) MOV [Rn+disp].d,Rm (4-bit displacement, zero-extended, quadrupled) MOV [PC+label],Rn (8-bit displacement, zero-extended) MOVA [PC+label],R0 MOVT Rn MUL Rm,Rn MULS Rm,Rn MULU Rm,Rn NEG Rm,Rn NEGC Rm,Rn NOP NOT Rm,Rn OR Rm,Rn OR imm,R0 (immediate is 8-bit zero-extended) OR imm,[R0+GBR].b (immediate is 8-bit zero-extended) ROTL Rn ROTR Rn ROTCL Rn ROTCR Rn RTE (delayed branch) RTS (delayed branch) SETT SHAL Rn SHAR Rn SHLL Rn SHLR Rn SHLL2 Rn SHLR2 Rn SHLL8 Rn SHLR8 Rn SHLL16 Rn SHLR16 Rn SLEEP STC SR,Rn STC GBR,Rn STC VBR,Rn STC SR,[Rn-] STC GBR,[Rn-] STC VBR,[Rn-] STS MACH,Rn STS MACL,Rn STS PR,Rn STS MACH,[Rn-] STS MACL,[Rn-] STS PR,[Rn-] SUB Rm,Rn SUBC Rm,Rn SUBV Rm,Rn SWAPB Rm,Rn SWAPW Rm,Rn TAS [Rn].b TRAPA imm (immediate is 8-bit) TST Rm,Rn TST imm,R0 (immediate is 8-bit zero-extended) TST imm,[R0+GBR].b (immediate is 8-bit zero-extended) XOR Rm,Rn XOR imm,R0 (immediate is 8-bit zero-extended) XOR imm,[R0+GBR].b (immediate is 8-bit zero-extended) XTRCT Rm,Rn The MIPS instruction set: MIPS assembly normally has a large number of mnemonics, having a one-to-one correspondence with each opcode. It also defines a word as 32 bits, and a double-word as 64 bits. Rather than let these conventions stand in contrast to the rest of NOWUT, the MIPS instruction set was mutated to conform: 'i' for immediate was dropped from many mnemonics, so eg. ADDU can use either register or immediate operands various load and store instructions were all rolled into MOV, MOVZ (zero-extended), and MOVH (high word) some 'rebadged' and 'fake' instructions were added Note: MIPS has delayed branch instructions which cause the instruction following the branch to be executed before the branch takes place. It also has 'likely' variants which skip the following instruction when the branch is not taken. Hand-written assembly code should not modify (or should save and restore) R13 as the compiler uses it to address stack variables. R12 is used by "fake" MOVs. The USEFASTBASE statement can be used on MIPS to produce more compact executables. It loads registers R16 and/or R17 with the address of a specified symbol at run time, and uses 16-bit relative addressing to access other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This scheme depends on R16/R17 not being disturbed by other code and can only be enabled in one module when a program consists of multiple modules linked together. Example usage: usefastbase datbase,0 ; datbase symbol will be loaded in R16 to use for relative addressing usefastbase bssbase,1 ; bssbase symbol will be loaded in R17 to use for relative addressing For instructions that write a result to a general-purpose register or memory location, the first operand is the destination. Instructions that write a coprocessor register may specify the destination with the second operand. **64-bit instructions are not available in 32-bit user/supervisor mode ADD rc,ra,rb (has overflow exception) ADD rb,ra,imm (has overflow exception) ADDU rc,ra,rb ADDU rb,ra,imm AND rc,ra,rb AND rb,ra,imm (imm is zero-extended) BCzF imm (branch on coprocessor false, z is coprocessor number 0..3) BCzFL imm (... likely) BCzT imm (branch coproc z true) BCzTL imm BEQ ra,rb,imm (branch on equal) BEQL ra,rb,imm BGEZ ra,imm (branch greater-or-equal-to-zero) BGEZL ra,imm BGEZAL ra,imm (... and link) BGEZALL ra,imm (... and link + likely) BGTZ ra,imm (branch greater-than-zero) BGTZL ra,imm BLEZ ra,imm (branch less-or-equal-to-zero) BLEZL ra,imm BLTZ ra,imm (branch less-than-zero) BLTZL ra,imm BLTZAL ra,imm BLTZALL ra,imm BNE ra,rb,imm (branch not-equal) BNEL ra,rb,imm BRA imm (unconditional branch, rebadged beq r0,r0) BREAK CACHE imm5,[ra+imm] (imm5 is an operation type) CFCz rb,rc (get coproc control register) COPz imm25 (coproc operation) CTCz rb,rc (put coproc control register) DADD rc,ra,rb **(64-bit add, has overflow exception) DADD rb,ra,imm **(has overflow exception) DADDU rc,ra,rb ** DADDU rb,ra,imm ** DDIV ra,rb **(64-bit divide) DDIVU ra,rb **(64-bit divide unsigned) DIV ra,rb (ra divided by rb, quotient -> lo, remainder -> hi) DIVU ra,rb DMFC0 rb,rc **(64-bit move from coproc) DMTC0 rb,rc **(64-bit move to coproc) DMULT ra,rb **(64-bit multiply) DMULTU ra,rb **(64-bit multiply unsigned) DSLL rc,rb,sa **(64-bit shift left using 5-bit count) DSLLV rc,rb,ra **(64-bit shift left using count in ra) DSLL32 rc,rb,sa **(64-bit shift left using 5-bit count + 32) DSRA rc,rb,sa **(64-bit signed shift right) DSRAV rc,rb,ra ** DSRA32 rc,rb,sa ** DSRL rc,rb,sa **(64-bit unsigned shift right) DSRLV rc,rb,ra ** DSRL32 rc,rb,sa ** DSUB rc,ra,rb **(has overflow exception) DSUBU rc,ra,rb ** ERET (return from exception, no delay slot) JMP t26 JMP ra JAL t26 (jump and link) JAL rc,ra (jump to ra, link rc) JAL ra (same as above, link r31 implied) LWCz rb,[ra+imm].d (load 32-bit word to coproc) (cop =/= 0) LDCz rb,[ra+imm].q (load 64-bit word to coproc) (LDC3 not valid?) LDL rb,[ra+imm] **(load misaligned 64-bit word) LDR rb,[ra+imm] ** LWL rb,[ra+imm] (load misaligned 32-bit word) LWR rb,[ra+imm] LL rb,[ra+imm] (locked 32-bit load) LLD rb,[ra+imm] (locked 64-bit load) MOV rc,rb (rebadged ADDU instruction) MOV [ra+imm],rb (8/16/32-bit memory store) MOV [ra+imm].q,rb **(64-bit memory store) MOV [address],rb (fake store instructions, assemble to two opcodes) MOV rb,imm (rebadged ADDU instruction) MOV rb,imm (unsigned) (rebadged OR instruction) MOV rb,imm32 (fake immediate load instruction, assembles to two opcodes) MOV rb,[ra+imm] (8/16/32-bit memory load) MOV rb,[ra+imm].q **(64-bit memory load) MOV rb,[address] (fake load instructions, assemble to two opcodes) MOVH rb,imm (load high word) MOVZ rb,[ra+imm].b (load unsigned byte) MOVZ rb,[ra+imm].w MOVZ rb,[ra+imm].d ** MOVZ rb,[address] (fake load instructions, assemble to two opcodes) MFCz rb,rc (32-bit move FROM coproc - 2nd reg is coproc reg) MTCz rb,rc (32-bit move TO coproc - 2nd reg is still coproc reg) MFHI rc (move from 'HI') MFLO rc (move from 'LO') MTHI ra (move to 'HI') MTLO ra (move to 'LO') MULT ra,rb MULTU ra,rb NOP ($00000000 opcode, does nothing) NOR rc,ra,rb OR rc,ra,rb OR rb,ra,imm (imm is zero-extended) SC rb,[ra+imm].d (store conditional) SCD rb,[ra+imm].q ** SWCz rb,[ra+imm].d (store 32 bits from coproc) (cop =/= 0) SDCz rb,[ra+imm].q (SDC3 not valid?) SLL rc,rb,sa SLLV rc,rb,ra SLT rc,ra,rb (set on less-than) SLT rb,ra,imm SLTU rc,ra,rb (set on less-than, unsigned) SLTU rb,ra,imm SRA rc,rb,sa SRAV rc,rb,ra SRL rc,rb,sa SRLV rc,rb,ra SUB rc,ra,rb (has overflow exception) SUBU rc,ra,rb SDL rb,[ra+imm] **(storing of unaligned data) SDR rb,[ra+imm] ** SWL rb,[ra+imm] SWR rb,[ra+imm] SYNC SYSCALL imm20 TEQ ra,rb (trap on equal) TEQ ra,imm TGE ra,rb (trap on greater-or-equal) TGE ra,imm TGEU ra,rb (unsigned) TGEU ra,imm TLT ra,rb (trap on less) TLT ra,imm TLTU ra,rb (unsigned) TLTU ra,imm TNE ra,rb (trap on unequal) TNE ra,imm TLBP (TLB probe) TLBR (read TLB entry) TLBWI (write indexed TLB entry) TLBWR (write random TLB entry) XOR rc,ra,rb XOR rb,ra,imm (imm is zero-extended) Additional instructions from MIPS32 spec: BAL imm (rebadged bgezal r0) CLO rc,rb,ra CLZ rc,rb,ra DERET DI DI rb EHB EI EI rb MADD ra,rb MADDU ra,rb MSUB ra,rb MSUBU ra,rb MUL rc,ra,rb RDHWR rb,rc RDPGPR rc,rb ROTR rc,rb,imm ROTRV rc,rb,ra SDBBP imm SEB rc,rb SEH rc,rb SSNOP SYNCI [ra+imm] WRPGPR rc,rb WSBH rc,rb The ARM instruction set: In standard ARM assembly, groups of letters are selected corresponding to an operation, a condition code, a data size, and whether flags should be set, then they are concatenated to make a mnemonic. (It's kind of like Hangul.) The number of possible mnemonics under this system is HUGE, which is not good from the perspective of having to accomodate it in NOWUT. In NOWUT, the instruction set has been altered so that: 1) Except for branch instructions, a condition code (if any) goes in a prefix. 2) Data sizes are distinguished with the usual .b .w .d (not by the mnemonic). 3) Loads that do sign-extension use a LDSX mnemonic. Like the SH2 assembler, the ARM assembler supports a "fake" MOV instruction and a "fake" addressing mode which allow arbitrary 32-bit data to be specified. An extra instruction is transparently inserted to load this data, using PC-relative addressing, from a "data dump" which gets mixed into the code. The same cautions apply: 1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of assembly code may cause overflow (using 49 definitely will) 2) The FLUSHIMM statement should be used before an INCBIN statement, if it is in a section which also contains code. ARM doesn't always include a divide instruction. When division is needed, the compiler generates a call to a subroutine. The program source should include these division subroutines (or some variation thereof): armdivideu: ; divide r8 by r9 (32/16 unsigned), result in r8 asm mov r10,0 mov r7,15 armdiv10: cmp r8,r9 shl r7 cs sub r8,r8,r9 shl r7 ; if r9 was less than r8 then r8 becomes r8-r9 adc r10,r10,r10 ; result bit is shifted left into r10 subs r7,r7,1 bpl armdiv10 mov r8,r10 mov r15,r14 ; return endasm armdivides: ; divide r8 by r9 (32/16 signed), result in r8 asm eor r10,r8,r9 mov r10,r10 sar 31 ; r10 becomes 0 (if signs the same) or -1 (if signs different) rsbs r7,r8,0 pl mov r8,r7 ; make r8 positive rsbs r7,r9,0 pl mov r9,r7 ; make r9 positive mov r7,15 armdiv11: cmp r8,r9 shl r7 cs sub r8,r8,r9 shl r7 ; if r9 was less than r8 then r8 becomes r8-r9 adc r10,r10,r10 ; result bit is shifted left into r10 subs r7,r7,1 bpl armdiv11 movs r8,r10 mi rsb r8,r8,r7 shl 16 ; r7 was -1, so we subtract r8 from -65536 mov r15,r14 ; return endasm Hand-written assembly code should not modify (or should save and restore) R11 as the compiler uses it to address stack variables. R12 is used by "fake" MOVs. The USEFASTBASE statement can be used on ARM to produce more compact executables. It loads registers R3 and/or R4 with the address of a specified symbol at run time, and uses 12-bit relative addressing to access other nearby symbols in the same section, instead of 32-bit addressing which takes a longer opcode. This scheme depends on R3/R4 not being disturbed by other code and can only be enabled in one module when a program consists of multiple modules linked together. Example usage: usefastbase datbase,0 ; datbase symbol will be loaded in R3 to use for relative addressing usefastbase bssbase,1 ; bssbase symbol will be loaded in R4 to use for relative addressing op2 (operand2) may include: rm ; register rm shl imm5 ; register shifted left rm shr imm5 ; register shifted right rm sar imm5 ; register with signed-shift-right rm ror imm5 ; register rotated right rm shl rs ; register shifted left with count from another register rm shr rs ; register shifted right with count from another register rm sar rs ; register with signed-shift-right with count from another register rm ror rs ; register rotated right with count from another register $000000xx $00000xx0 ; rotated imm8 etc. addr2 (address mode 2) may include: [rn+imm12] [rn-imm12] [rn+rm shift imm5] ; where shift is shl/shr/sar/ror [rn-rm shift imm5] ; where shift is shl/shr/sar/ror offset2 may include: imm12 -imm12 rm shift imm5 ; where shift is shl/shr/sar/ror -rm shift imm5 ; where shift is shl/shr/sar/ror addr3 (address mode 3) may include: [rn+imm8] [rn-imm8] [rn+rm] [rn-rm] offset3 may include: imm8 -imm8 rm -rm condition code prefixes: EQ - Z set - equal NE - Z clear - not equal CS - C set - higher or same (unsigned) CC - C clear - lower (unsigned) MI - N set - negative PL - N clear - positive or zero VS - V set - overflow VC - V clear - no overflow HI - C set, Z clear - higher (unsigned) LS - C clear, Z set - lower or same (unsigned) GE - N == V - greater or equal LT - N != V - less than GT - N == V, Z clear - greater than LE - N != V OR Z set - less than or equal Except for STR instructions, the first register operand is generally the destination register (if any). ADD rd,rn,op2 ; does not set flags ADDS rd,rn,op2 ; sets flags ADC rd,rn,op2 ADCS rd,rn,op2 AND rd,rn,op2 ANDS rd,rn,op2 BIC rd,rn,op2 BICS rd,rn,op2 BEQ label/imm BNE label/imm BCS label/imm BCC label/imm BMI label/imm BPL label/imm BVS label/imm BVC label/imm BHI label/imm BLS label/imm BGE label/imm BLT label/imm BGT label/imm BLE label/imm BRA label/imm ; unconditional branch BL label/imm BX rn CMP rn,op2 CMN rn,op2 EOR rd,rn,op2 EORS rd,rn,op2 LDM [rn],imm16 ; imm16 is a bit pattern representing a register list LDM [rn+],imm16 ; post-increment LDM [rn-],imm16 ; pre-decrement LDMU [rn],imm16 ; user mode LDSX rd,[addr3].b ; load byte with sign extension LDSX rd,[addr3].b,_ ; pre-decrement / pre-increment LDSX rd,[rn].b,offset3 ; post-decrement / post-increment LDSX rd,[addr3].w ; load 16-bit word with sign extension LDSX rd,[addr3].w,_ ; pre-decrement / pre-increment LDSX rd,[rn].w,offset3 ; post-decrement / post-increment LDR rd,[addr2].b LDR rd,[addr2].b,_ ; pre-decrement / pre-increment LDR rd,[rn].b,offset2 ; post-decrement / post-increment LDR rd,[addr3].w LDR rd,[addr3].w,_ ; pre-decrement / pre-increment LDR rd,[rn].w,offset3 ; post-decrement / post-increment LDR rd,[addr2].d LDR rd,[addr2].d,_ ; pre-decrement / pre-increment LDR rd,[rn].d,offset2 ; post-decrement / post-increment LDRU rd,[rn].b,offset2 ; user mode, post-decrement / post-increment LDRU rd,[rn].w,offset3 ; user mode, post-decrement / post-increment LDRU rd,[rn].d,offset2 ; user mode, post-decrement / post-increment MOV rd,op2 MOV rd,imm32 ; "fake" move MOVS rd,op2 MVN rd,op2 MVNS rd,op2 MUL rm,rn,rs MULS rm,rn,rs MLA rn,rm,rs,rd ; rn = rm * rs + rd MLAS rn,rm,rs,rd MRS rd,SPSR MRS rd,CPSR MSR SPSR,rm MSR CPSR,rm MSRF SPSR,rm ; flag bits only MSRF CPSR,rm MSRF SPSR,imm MSRF CPSR,imm ORR rd,rn,op2 ORRS rd,rn,op2 RSB rd,rn,op2 RSBS rd,rn,op2 RSC rd,rn,op2 RSCS rd,rn,op2 SUB rd,rn,op2 SUBS rd,rn,op2 SBC rd,rn,op2 SBCS rd,rn,op2 SMULL rd,rn,rm,rs ; 32 * 32 -> 64 signed multiply. high 32 bits of result go in rn SMULLS rd,rn,rm,rs SMLAL rd,rn,rm,rs ; multiply and accumulate SMLALS rd,rn,rm,rs STM [rn],imm16 ; imm16 is a bit pattern representing a register list STM [rn+],imm16 ; post-increment STM [rn-],imm16 ; pre-decrement STMU [rn],imm16 ; user mode STR rd,[addr2].b STR rd,[addr2].b,_ ; pre-decrement / pre-increment STR rd,[rn].b,offset2 ; post-decrement / post-increment STR rd,[addr3].w STR rd,[addr3].w,_ ; pre-decrement / pre-increment STR rd,[rn].w,offset3 ; post-decrement / post-increment STR rd,[addr2].d STR rd,[addr2].d,_ ; pre-decrement / pre-increment STR rd,[rn].d,offset2 ; post-decrement / post-increment STRU rd,[rn].b,offset2 ; user mode, post-decrement / post-increment STRU rd,[rn].w,offset3 ; user mode, post-decrement / post-increment STRU rd,[rn].d,offset2 ; user mode, post-decrement / post-increment SWI imm24 SWP rd,rm,[rn].b ; rd gets loaded, rm gets stored SWP rd,rm,[rn].d TST rn,op2 TEQ rn,op2 UMULL rd,rn,rm,rs ; 32 * 32 -> 64 unsigned multiply. high 32 bits of result go in rn UMULLS rd,rn,rm,rs UMLAL rd,rn,rm,rs ; multiply and accumulate UMLALS rd,rn,rm,rs The Z280 instruction set: This is a superset of the official Z80 instruction set. Many common undocumented Z80 opcodes are also included, but some more obscure ones (eg. DDCB shift-and-also-load) are not. Most Z80 ALU instructions take the 'A' register as destination (first) operand, but it can also be omitted for compatibility with Z80 syntax in which the 'A' register was implied. operand types legend: R - 8-bit registers a,b,c,d,e,h,l RX - 8-bit registers ixl,ixh,iyl,iyh RR - 16-bit registers bc,de,hl,sp XX - 16-bit registers hl,ix,iy XY - 16-bit registers ix,iy rel - 8-bit relative jump addr - 16-bit direct jump [addr] - 16-bit direct address [HL] - hl indirect addressing [XY+d] - ix or iy indirect with (signed) 8-bit displacement mode1 - Z280 extended addressing modes [ix+disp16], [iy+disp16], [hl+disp16], [pc+disp16] mode2 - Z280 extended addressing modes [sp+disp16], [hl+ix], [hl+iy], [ix+iy] ADC A,R/RX ADC A,imm8 ADC A,[HL] ADC A,[XY+d] ADC A,[addr] ADC A,mode1/2 ADC XX,RR/XY ADD A,R/RX ADD A,imm8 ADD A,[HL] ADD A,[XY+d] ADD A,[addr] ADD A,mode1/2 ADD XX,RR/XY (16-bit add that affects carry flag but not others) ADD XX,A (A is sign-extended) ADDW HL,RR/XY ADDW HL,imm16 ADDW HL,[HL] ADDW HL,[addr] ADDW HL,mode1 AND A,R/RX AND A,imm8 AND A,[HL] AND A,[XY+d] AND A,[addr] AND A,mode1/2 BIT imm,R BIT imm,[HL] BIT imm,[XY+d] CALLcc [HL] CALLcc addr CALLcc [PC+disp16] (includes call, callnz, callz, callnc, callc, callpo, callnv, callpe, callv, callp, callns, callm, calls) CCF CP A,R/RX CP A,imm8 CP A,[HL] CP A,[XY+d] CP A,[addr] CP A,mode1/2 CPD CPDR CPI CPIR CPL A CPW HL,RR/XY CPW HL,imm16 CPW HL,[HL] CPW HL,[addr] CPW HL,mode1 DAA DEC R/RX DEC imm8 DEC [HL] DEC [XY+d] DEC [addr] DEC mode1/2 DEC(W) RR/XY DECW [HL] DECW [addr] DECW mode1 DI DI imm DIV HL,R/RX DIV HL,imm8 DIV HL,[HL] DIV HL,[XY+d] DIV HL,[addr] DIV HL,mode1/2 DIVU HL,R/RX DIVU HL,imm8 DIVU HL,[HL] DIVU HL,[XY+d] DIVU HL,[addr] DIVU HL,mode1/2 DIVW DEHL,RR/XY DIVW DEHL,imm16 DIVW DEHL,[HL] DIVW DEHL,[addr] DIVW DEHL,mode1 DIVWU DEHL,RR/XY DIVWU DEHL,imm16 DIVWU DEHL,[HL] DIVWU DEHL,[addr] DIVWU DEHL,mode1 DJNZ rel EI EI imm EX AF,AF EX [SP],HL EX [SP],XY EX H,L EX DE,HL EX XY,HL EX A,R/RX EX A,[HL] EX A,[XY+d] EX A,[addr] EX A,mode1/2 EXTS A EXTS HL EXX HALT IM0 IM1 IM2 IM3 IN A,imm8 IN R/RX,[C] IN [addr],[C] IN mode1/2,[C] INC R/RX INC imm8 INC [HL] INC [XY+d] INC [addr] INC mode1/2 INC(W) RR/XY INCW [HL] INCW [addr] INCW mode1 IND INDW INDR INDRW INI INIW INIR INIRW INW HL,[C] JAF addr JAR addr JP [XY] JPcc [HL] JPcc addr JPcc [PC+disp16] (includes jp, jpnz, jpz, jpnc, jpc, jppo, jpnv, jppe, jpv, jpp, jpns, jpm, jps) JR rel JRNZ rel JRZ rel JRNC rel JRC rel LD R,R/RX LD R/RX,R LD R,[HL] LD [HL],R LD R,[XY+d] LD [XY+d],R LD A,I LD I,A LD A,R ; refresh counter register (...which is not used for refresh on Z280) LD R,A LD R/RX,imm8 LD [HL],imm8 LD [XY+d],imm8 LD [addr],imm8 LD mode1/2,imm8 LD A,[BC] LD A,[DE] LD A,[addr] LD A,mode1/2 LD(W) RR/XY,imm16 ; can use Z80-compatible LD mnemonic, or new LDW mnemonic LD(W) RR/XY,[addr] LD(W) [addr],RR/XY LD(W) SP,XX LEA XX,[addr] ; load effective address (does not access memory) LEA XX,mode1/2 LDCTL XX,[C] LDCTL [C],XX LDCTL XX,USP LDCTL USP,XX LDD LDDR LDI LDIR LDUD A,[HL] LDUD [XY+d],A LDUD A,[HL] LDUD [XY+d],A LDUP A,[HL] LDUP [XY+d],A LDUP A,[HL] LDUP [XY+d],A LDW RR,[HL] LDW [HL],RR LDW RR,[XY+d] LDW [XY+d],RR LDW HL/XY,mode1/2 LDW mode1/2,HL/XY LDW [HL],imm16 LDW [addr],imm16 LDW [PC+disp16],imm16 MULT A,R/RX MULT A,imm8 MULT A,[HL] MULT A,[XY+d] MULT A,[addr] MULT A,mode1/2 MULTU A,R/RX MULTU A,imm8 MULTU A,[HL] MULTU A,[XY+d] MULTU A,[addr] MULTU A,mode1/2 MULTW HL,RR/XY MULTW HL,imm16 MULTW HL,[HL] MULTW HL,[addr] MULTW HL,mode1 MULTWU HL,RR/XY MULTWU HL,imm16 MULTWU HL,[HL] MULTWU HL,[addr] MULTWU HL,mode1 NEG A NEG HL NOP OR A,R/RX OR A,imm8 OR A,[HL] OR A,[XY+d] OR A,[addr] OR A,mode1/2 OTDR OTDRW OTIR OTIRW OUT imm8,A OUT [C],R/RX OUT [C],[addr] OUT [C],mode1/2 OUTD OUTDW OUTI OUTIW OUTW [C],HL PCACHE POP RR/XY ; AF is a valid operand instead of SP POP [HL] POP [addr] POP [PC+disp16] PUSH RR/XY ; AF is a valid operand instead of SP PUSH imm16 PUSH [HL] PUSH [addr] PUSH [PC+disp16] RES imm,R RES imm,[HL] RES imm,[XY+d] RETcc (includes ret, retnz, retz, retnc, retc, retpo, retnv, retpe, retv, retp, retns, retm, rets) RETI RETIL RETN RL R RL [HL] RL [XY+d] RLA RLC R RLC [HL] RLC [XY+d] RLCA RLD RR R RR [HL] RR [XY+d] RRA RRC R RRC [HL] RRC [XY+d] RRCA RRD RST imm SBC A,R/RX SBC A,imm8 SBC A,[HL] SBC A,[XY+d] SBC A,[addr] SBC A,mode1/2 SBC XX,RR SC imm16 SCF SET imm,R SET imm,[HL] SET imm,[XY+d] SLA R SLA [HL] SLA [XY+d] SRA R SRA [HL] SRA [XY+d] SRL R SRL [HL] SRL [XY+d] SUB A,R/RX SUB A,imm8 SUB A,[HL] SUB A,[XY+d] SUB A,[addr] SUB A,mode1/2 SUBW HL,RR/XY SUBW HL,imm16 SUBW HL,[HL] SUBW HL,[addr] SUBW HL,mode1 TSET R TSET [HL] TSET [XY+d] TSTI [C] XOR A,R/RX XOR A,imm8 XOR A,[HL] XOR A,[XY+d] XOR A,[addr] XOR A,mode1/2 ----Known bugs and limitations---- OBJ files to be linked together must have different source file names or else automatically-generated labels will conflict. (Specifically, they must differ by alphabetic characters because other characters are not used for name generation.) When using two-pass compilation on 68000, a GOTO, GOSUB, or IF statement that targets the next instruction will cause an error (because SBRA 0 is not valid on the 68K). Absolute addresses as memory references don't work correctly in 8086 long mode: [$B8000].w=$0841 ; does NOT work tempvar.d=$B8000 > [tempvar].w=$0841 ; use this instead ea.w($B8000)=$0841 ; or this NOWUT 68K uses the processor's multiply and divide instructions despite the fact that they are not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs. unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit. NOWUT in 8086 mode uses the processor's multiply and divide instructions despite the fact that they are not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs. unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit. Related to the above, when the compiler runs on 68K or 8086, compile-time calculations done as arguments to a RESB/RESW/RESD can only multiply numbers in the range 0...65535 NOWUT SH2 only allows 12 dword parameters on the stack and 32 dword local variables at a time. (bytes and words are not allowed.) The address of a stack variable is not valid. Likewise, it can't be indexed. When a calculation is used as an operand for a NOWUT instruction that involves comparison (ie. IFGREATER, IFLESS, WHILEGREATER, WHILELESS) the value is assumed to be unsigned, regardless of any components of that calculation. If signed comparison is desired, the second operand should be marked as signed. The maximum number of symbols supported by the compiler (and the corresponding memory allocation) is specified in the source code of the compiler. Modify this line and recompile if needed (same for fixups): const maxsymbols,2048 const maxfixups,8192 Maximum line length is 255 characters, maximum number of labels+statements+operands on a line is defined by the "maxargs" constant (default: 32). BITS16/BITS32 shouldn't be used to switch modes within a BEGINFUNC...ENDFUNC structure as variables on the stack would no longer be addressed properly. Floating-point support is only partially functional and only on 386. Calculations involving FP values can't be used as operands with NOWUT statements/instructions, etc. A plain numeric index on a symbol, eg. foo(42), is limited to 32767 when compiling for 8086, or 65535 when compiling for MIPS. A GOTO statement with a hard-coded address (eg. GOTO $3F00) will be incorrectly interpreted as a relative branch on x86 or ARM. It can also fail on MIPS if the address does not fit in 26 bits. The x86 assembler is missing the ENTER instruction. ----Changes from the last version---- (First release was December 2018) Release 0.11 (2019/1/2): When I attempted to link a program with two OBJ files (using GoLink) I received a pile of errors about duplicate symbol definitions. Some were irrelevant symbol names used by local variables and constants. These have been given a null section/class in the OBJ so that they don't cause trouble. But I also received errors about program labels that had been automatically generated by the compiler, even though they were not marked as exports. I was not expecting this. As a workaround, the compiler now uses the name of the source file in its automatically generated labels so that they won't be the same in different OBJs. (No more 0NOWUT0000) The DEF statement was added, for manually setting a default type. This only required a change to a data table, and no changes to the code ;) Introduced the LINKBIN utility, which in addition to supporting two input files, also correctly generates X68000 executables that have relocations that are separated by more than 64K. Fixed some bugs in NO68 that caused subtraction and division to (sometimes) produce a wrong result. Added some code to NOSH2 to optimize shifts (when the value is known at compile time) instead of always using a loop. Corrected a few typos and made a few additions to this document. Included a DOS example program. Release 0.11b (2019/1/19): An endianness issue with indexed symbols in initialized data was fixed in NO68 and NOSH2. LINKBIN can now build an Amiga program from two OBJ files (it was all screwed up before). Release 0.12 (2019/2/24): x87 FPU instructions were added to NOWUT x86. NOWUT x86 parser can accept numbers with a decimal point and convert them to 32-bit floats. Fixed the NOWUT x86 assembler bug pertaining to AX/EAX ambiguity. The offending filename is displayed when a file specified by INCBIN fails to open. Fixed shifts when value to be shifted was on the stack (NOSH2). Fixed large negative immediates (NOSH2). CALLEX function address can be a calculation (NOWUT x86 only). Release 0.13 (2019/3/23): Bug fix in NO68 for exclusive-or, and for shifts of values on the stack Changed the sh2divides routine, since the old one didn't work and could also corrupt a needed register Added experimental 8086 support, including 16-bit style MODRM addressing and BITS16/BITS32 commands Added genesis and doscom platform options to LINKBIN, as well as Genesis and 8086 example programs Added a "maxarg" constant to NOWUT x86 to increase the number of allowed arguments on one line Added single-operand versions of IMUL to the x86 assembler Did some reorganization of x86 NOWUT source and tweaking of the generated code (now slightly more compact). In the future I will probably roll 68000 support back into NOWUT x86 to take advantage of some potential optimizations. Release 0.14 (2019/5/1): Combined NOWUT, NO68, and NOSH2 into the new MULTINO. 8086 support is now fully functional (minus any bugs/limitations) 68K and SH2 generated code has received modest improvements. In particular, the SH2 compiler does deduplication of immediate data/addresses. Fixed the RETURNEX 0 and ADD 0 problems for 68K. Compilation is slightly faster. Removed some stuff from the archive. Release 0.14b (2019/5/11): Fixed byte order of ASCII words/dwords on big-endian. Fixed shifts with word/dword memory operands on big-endian. Upon a duplicate label error, the label name will be displayed. Fixed pushing a byte from memory on 8086. Fixed division on 8086. Added islist and &, reducing stack operations in generated code when handling indexed symbols. Fixed mojibake in the auto-generated labels for string constants. These were not mentioned in the documentation until now, but the way they work is "random string".a can be provided as an operand and the address of the string will be passed, while the text itself will get dumped at the end of the section. Currently there is no way to put a carriage return in the string. (but they are terminated with 0) Signed shift-right (with immediate operand) can be replaced with an unrolled loop on SH2. Release 0.20 (2019/8/12): Made MULTINO and LINKBIN buildable on Win32, DOS, X68000, and Amiga, using platform modules. Reorganized and tweaked a lot of internal compiler code. Compiler does two passes by default, allowing to replace some long jumps with short ones. Tweaked some other generated code (8086 shifts, address calculation, and compares, 68K 3-bit quick form). Compiler does not generate a .LST file by default, but can show the offending line of code when aborting due to an error. Replaced x87-specific FP conversion routine with a portable integer-based one (but it is badly written and has range/precision limits). Added LOADBIG and LOADLITTLE statements to deal with endianness problems in a portable way (although the implementation seems less than ideal). Fixed typo in SH2 code for pushing a word on the stack. Added CALLAM statement for making Amiga OS system calls in a slightly less messy way. Boosted file I/O buffers to 4KB (doesn't help much except maybe on a network file system). If compilation aborts with an error, the .OBJ header is left incomplete and LINKBIN won't attempt to link it. Placing one (or multiple) letter "r" after a string inserts a CR/LF. String can also be continued. x86 assembler should now generate the correct modrm byte when ASIZE prefix is used. Fixed parsing of MOV rm,reg instructions with displacement. Fixed MOV instructions that use x86 control/debug registers. Added IFCPUxxx, IFCPUNOTxxx statements. Added x86 far calls (CALLF). Tested LINKBIN with three input files. Changed the amount of space reserved by LINKBIN in DOS executables for a segment lookup table to 64 bytes. Release 0.21 (2019/11/23): Reworked parser/assembler to handle some indexed symbols the same as plain symbols, resulting in smaller generated code. As a side effect, this lead to some changes in how 8086 segment/offset addresses are handled. Added some optimization logic that causes some redundant load instructions to be omitted. Fixed 8086 signed right shifts, and improved the other ones. Fixed the problem where the last line of the source file could be ignored. Fixed an x86 assembler bug for MOV [disp],imm instructions. When compilation fails due to an out-of-range jump, the target label will be displayed. Added far jump instructions to the x86 assembler and fixed the documentation pertaining to CALLF. Cleaned up some compiler code and made it so the assembler is invoked for each statement on a line with multiple statements, instead of buffering the code until the entire line has been parsed/compiled. Added the optrom platform option to LINKBIN. Made the compiler store the number of segment relocs in the time stamp field of 8086 big OBJs so that LINKBIN can allocate the correct amount of space in the MZ header. Release 0.22 (2020/2/25): Fixed the problem where the last line of the source file could be ignored (for real this time). Added missing 68000 addressing modes and fixed which addressing modes were allowed for JMP/JSR. Also made it so 68000 branch instructions can accept a number. Fixed parser so that [-$1234] can work. Fixed extra byte being inserted in some data statements eg.: DW "blah",$1234 Revised internal compiler code and added more optimization possibilities, plus -opt switch. Updated several sections in this document. Release 0.23 (2020/4/21): Added a fix to allow compiling source files outside the current directory. Made an improvement to the optimizer and fixed a potential problem where, for instance, 'x' and '[x].d' could have been interchanged. Rearranged the selecttables routine inside the compiler for smaller code. Only on 386, it is now possible to get the address of a stack variable. PIOWIN was changed to make most routines (except printhex and such) safe for multithreading. The 68K assembler outputs LSL instead of ASL, which shouldn't matter (but I was messing around with an emulator which wouldn't show ASL in its debugger...) Release 0.24 (2020/10/21): Added minimal Linux support via the elf386 LINKBIN option and PIOLNX module. Faster compilation. More tweaks to generate smaller compiled code. Added the COPYBYTES statement for all CPUs. (correction: except 8086 tiny mode) (386) Fixed a bug where loading the address of a stack variable was optimized incorrectly. (SH2) Fixed a problem where loading unsigned bytes generated unintentionally inefficient code. (SH2) assembler now accepts PC-relative form of MOV with hard-coded disp ie. mov [PC+$20],r1 (SH2) The address in R11 is only setup when needed by LOCALVAR, instead of every BEGINFUNC. Swapped BSS and data sections in the source code... Release 0.25 (2021/3/13): Added a check and error message for code that attempts to index a stack symbol. Made further changes to LINKBIN ELF386 code toward eventually supporting dynamic linking. Fixed a bug where indexed, signed memory references were treated as unsigned. Fixed incorrect code optimizations after assignment to a byte or word variable. Added IFNOTGREATER and IFNOTLESS. Fixed an x86 assembler bug. mov [ebx+somelabel],eax now works and is used by the compiler. Fixed a parsing bug that affected x86 assembly prefixes before some instructions. Release 0.26 (2021/7/2): Added LINKLIBFILE. OBJ files now contain a .drectve section header, for a total of 4 section headers. Works with new version of LINKBIN and with GoLink. LINKBIN can output PRG files. ELF files with dynamic linking appear to work... Changed SH2 RETURN statement so it generates rts>mov 0,r2 instead of rts>nop. Mixing up RETURN with RETURNEX 0 won't cause a crash now. Subtractions on SH2 can generate an add instruction with negative operand for efficiency. Optimizer code doesn't assume dword and signed dword are different things. (acctype) Added ALIGN16 (section alignment is also configurable inside LINKBIN). PIOWIN fileskread/fileskwrite return the bytes read/written. Release 0.27 (2021/11/10): Added partial floating-point support in NOWUT evaluator via the .fd type. Improved the parser's FP conversion code. Now allows more than 4 decimal places. Stack symbols can be defined more than once with different default types. Added 'ea' special symbol for doing address calculations. Rearranged internal code tables to make things more consistant between different target platforms. Release 0.28 (2022/1/18): MIPS and N64 targets added. Fixed a possible 8086-big-mode linking bug. Compacted 8086 code slightly by replacing some xor dx,dx instructions with cwd. The CONST statement now records tag bits, allowing floating-point numbers to be used. Fixed a bug where storing an FP value to an indirect memory reference could fail. Fixed COPYBYTES on 8086 tiny. Release 0.29 (2022/5/18): Added SAR operator to replace the SHR 3.sb oddity. Added SECTION keyword and capability for variable number of COFF sections. Fixed ea.b($401000) Added 'w' mark for expanding ASCII strings to 16-bits per character. COFF section alignment flags now get set according to largest ALIGN statement encountered, with the minimum alignment being DWORD. Tweaked code generation for SH2 and 8086. Release 0.30 (2022/11/14): Modular-NOWUT replaced Multi-NOWUT, with new file names and compilation procedure. Added SH4 (little-endian) as a target platform. Added FILLPATTERN and USEFASTBASE statements. New, generic IFCPU and IFCPUNOT statements replaced the CPU-specific ones. Added a check and error message for when the compiler hits the maxsymbols limit. Added a check and warning to help prevent DB/DW/DD mixups. Fixed some problems with using GOTO/GOSUB with a hard-coded address. Tweaked some x86, 68K, and SuperH code output for size/performance. Fixed an 8086 assembler bug pertaining to indexed symbols with .h Added SWAP statement in the documentation. (It was already in the compiler for a while) 8086 GOSUB and GOTO can take a calculated address. (Needed for the compiler to work.) Fixed startup code in PIOX68 which didn't always work before. Release 0.31 (2023/5/11): Added ARM CPU support. Reserved storage in non-BSS sections now utilize the FILLPATTERN. Each reserved storage statement is no longer limited to 65535 bytes when compiling on 8086/68000. The size of reserved storage can now be calculated at compile time with +, -, *, _ operators. List files are now cleared before the second compilation pass, so old data will not remain at the end. Release 0.32 (2023/8/26): Tweaked the compiler's parser code to better handle Shift-JIS in source code, and added the shiftjis test program. Added filegetsize routine to PIOxxx modules. Fixed a bug in 8086 code generation and improved some conditional branches. Fixed reloc type mismatch between compiler and linker. Dreamcast example can be built again. Release 0.33 (2024/5/20): Fixed a bug in 8086 COUNTDOWN/NEXTCOUNT loops with count >65535 Fixed some ARM bugs. Added MIPSLE target to NOWUT, more MIPS instructions, and PIC32 platform in LINKBIN. Added Z280 module and MSXHELLO example program. Added SIB addressing and MMX instructions to CPUX86 module. Some redundant loads which previously occurred after a CALLEX, LOADBIG/LOADLITTLE, or a WHILExx statement are no longer generated. Added -oc, -od, -ob options to LINKBIN, and fixed some bugs when running on big-endian CPU. Release 0.34 (2024/8/25): Fixed LOADBIG/LOADLITTLE on SH4, MIPSLE, and ARMLE. Fixed COPYBYTES on ARM. Fixed a bug in data dumps placement, and added FLUSHIMM. Doing INCBIN without FLUSHIMM first produces a warning message. Fixed data section COFF relocations on certain platforms. Moved a string from the CPUSH source file to the main NOWUT source file (since it is referenced also by CPUARM). x86 assembler now has separate JCXZ and JECXZ instructions.