NOWUT v0.26 - a programming language and compiler At this stage of development, both the language and compiler are incomplete. Errors may not be caught, bugs may bite (see below for list of known bugs), and code will be suboptimal. However, NOWUT can successfully compile itself, as well as several demo programs. The compiler (MULTINO) is licensed under the GPL (see COPYING). (Example programs included in the archive should be considered public domain unless stated otherwise.) http://www.hyakushiki.net/nowut.htm http://www.hyakushiki.net/anachro.htm damage_x@hyakushiki.net ----Contents---- 1) About the NOWUT compiler 2) About the NOWUT language 3) Example function 4) Symbols, labels, variables, constants 5) Operands 6) Calculation/assignment 7) NOWUT language statements/instructions 8) Functions/procedures 9) Internal assembler a. x86 instruction set (8086 mode peculiarities) b. 68K instruction set c. SH2 instruction set 10) Known bugs and limitations 11) Changes from the last version ----About the NOWUT compiler---- The NOWUT compiler is a program written in NOWUT, which can now run on Win32, DOS, X68000, Amiga, EmuTOS, and i386 Linux. It produces a COFF object file as output. The OBJ files contain exactly one each code, data, and BSS sections. Note that a data section with a size of 0 bytes is known to result in (Win32) executables that won't run (some dummy data might be needed). Command line: MULTINO platform [-one] [-lst] file The input file is assumed to have the extension .NO, therefore FILE.NO will be read and FILE.OBJ will be written. CPU support was previously divided among separate compilers, but now they have been unified into MULTINO. These modes are currently supported: 386 - generates x86 32-bit code, outputs a standard COFF object file which can be fed to GoTools GoLink to create a PE executable, or to LINKBIN to create an ELF executable. 8086tiny - generates 8086-compatible code which is limited to a 16-bit address space (CS/DS/ES/SS all assumed to be set to the same value), outputs a nonstandard COFF file which can be fed to LINKBIN to create a .COM file or MZ executable. 32-bit operations are done using multiple 16-bit operations, this causes some code bloat. 8086 - similar to 8086tiny except that (non-stack) memory references are preceeded by reloading the DS register (causes even more code bloat). This means that initialized data and BSS can now be larger than 64KB. Code size is limited to 65472 bytes. Stack size is determined by a constant within LINKBIN (default is 1KB). Pointers still use a 32-bit linear address space! However, anything beyond 2MB is not valid. The mapping between logical and physical addresses is determined by a segment lookup table located in the first 64 bytes of the code segment (32 words = 32 64KB segments = 2MB). This table is set up by the program itself or by the PIODOS platform code. When it is setup by PIODOS, the first 640KB is relative to the beginning of the program (minus 64 bytes) and 640K-1M is mapped to absolute addresses (ie. $A0000 is VGA buffer). The table entries for 1MB-2MB are unused but left open to be used for addressing other potentially useful segments (eg. BIOS data area, DOS environment, etc.). Be sure to keep your words/dwords aligned so they don't straddle a 64KB boundary. Output can only be used to create an MZ executable. The internal x86 assembler handles most instructions up to the Pentium, but does not handle SIB addressing modes, and it uses a nonstandard syntax. It's possible to mix 16-bit and 32-bit code within one program by using BITSxx to switch modes, but this hasn't been tested much. 68000 - generates plain 68000 code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create Amiga, Genesis, or X68000 executables. The internal 68K assembler now handles all 68000 and 68010 instructions and all of their addressing modes. It also uses a nonstandard syntax. sh2 - generates SuperH code and outputs a standard(-ish?) COFF file. This can be fed into LINKBIN to create Sega 32X or Saturn binaries. The internal SH2 assembler handles all SH2 instructions except one, and uses a nonstandard syntax. (See the assembly section for details.) These switches are currently supported: -one - makes the compiler do only one pass instead of two. (All jumps will be long versions.) -lst - makes the compiler generate a .LST file. -opt - toggles compiler/assembler optimizations. Normally they are on by default and this would then turn them off. (useful for debugging purposes) The language currently has no floating-point support, however the x86 assembler supports x87 FPU instructions and the parser can convert constants into 32-bit FP values. (The current conversion routine has limited range and precision.) MULTINO is linked with platform modules to handle platform-specific initialization and console/file I/O. When using a platform module, programs must call INITPLATFORM upon starting. This function will return a pointer to a (null-terminated) command line parameters string (if any). Call or jump to ENDPROGRAM to exit, or use the END statement. These are the platform modules: PIOAMI - Amiga version. Records the initial stack pointer and opens dos.library during init. Closes it upon exiting as well as closing any open files to prevent memory leaks. Converts backslashes in file names to forward slashes. PIODOS - 8086/DOSEXE version. Sets up segment table during init. (Does not work with 8086tiny or .COM) PIOGEM - EmuTOS/GEMDOS version. Adjusts memory allocation during init. PIOLNX - 386/Linux version. Joins command line arguments into one big string during init. Converts backslashes in file names to forward slashes. PIOWIN - 386/Win32 version. Gets a console handle during init. Skips over the program name part of the command line. Can also be used for 32-bit DOS programs with the WDOSX Win32 wrapper. PIOX68 - X68000 version. LINKBIN also uses the platform modules. OBJ files produced in 386 mode can be linked using GoTools GoLink to make a Win32 executable. Using the /base switch (eg. /base 00400000) causes GoLink to generate relocation data. These executables can then also be used with Win32s (Windows 3.x) and may be used in a DOS environment with the WDOSX stub (a DPMI host which implements a subset of Win32 functions). GOLINK requires a list of applicable DLLs as command line arguments. I use this command to compile my Win32 stuff: golink %1.obj kernel32.dll user32.dll gdi32.dll winmm.dll /console As of NOWUT 0.26, it is possible to specify library names in the source code with LINKLIBFILE, which removes the need for specifying them on the command line. Code that aims to be cross-platform should always take endianness and memory alignment into account. On the 68000, words and dwords must be aligned to 2-byte boundaries. On the SH2 they must be aligned to 2-byte or 4-byte boundaries. NOWUT includes ALIGNW and ALIGND statements for this purpose. The statements LOADBIG and LOADLITTLE are provided to read words/dwords with a particular endianness on any CPU. (Note that the 386 LOADBIG code uses the BSWAP instruction which is only available on 486+) LINKBIN is provided to transform one or more OBJ files, from modes other than 386 mode, into useful executable formats. LINKBIN has only been tested with three input files at a time. Modify the MAXFILES parameter in the source to enable more than this. (In theory, it should work...) These formats are supported: genesis, 32x, 32x/68k - Sega Genesis/MD and 32x ROM images - SH2 or 68K side amiga - Amiga 68K "hunk" format doscom, dosexe - 8086 PC .COM and .EXE programs elf386 - an i386 Linux executable (output files need the filesystem executable flag set with chmod +x) optrom - 8086tiny ISA adaptor ROM (hasn't been tested yet) prg - runs under EmuTOS saturn, satsnd - Sega Saturn SH2 binary (load and execute at $06004000) and 68K sound CPU. x68 - Sharp X68000 executable (Human68K) The Sega 32X hardware is inactive when the system powers on. The 68K has control, and must enable the 32X. When generating a 32X ROM image with SH2 code, a stub file (68KPART8.32X) occupies the first 4KB of the final image. The stub performs initialization and hands control over to the SH2. It also polls the controller ports and passes the data to the SH2 through shared registers. The source code for the stub is 68KPART8.NO. The 32X master SH2 begins execution at the beginning of its code section. The slave SH2 starts in an idle loop contained within the stub file. A dword write to address $26001020 will cause it to jump to that location (the address in the dword that is written) so the second CPU can be utilized. ----About the NOWUT language---- The goal was to combine certain aspects of assembly and high-level languages in a different way than what has been done before. NOWUT borrows these ideas from assembly: simplistic syntax consisting of instructions followed by operands manual layout of initialized data, uninitialized data, and data structures no enforcement of data types It borrows these ideas from HLLs: avoids (mostly) being CPU specific no micro-management of CPU register usage calculations can be specified in a form similar to mathematical notation (assignment) NOWUT also handles inline assembly code with a nonstandard syntax. I should mention that the name NOWUT is an acronym which expands to "No One Will Use This." I figure that if anyone else shared my taste in programming languages, I wouldn't have been forced to create my own! I suspect that NOWUT may not become extremely popular. ----Example function in NOWUT---- ; this line is a comment. comments are preceded by ' or ; characters examplefunc: ; examplefunc is a label (address) beginfunc param1.d,param2.d ; this function receives two parameters ; they will be referred to as param1 and param2 ; and are dwords (32-bit words) localvar xx.d ; a local dword variable will be added to the stack xx=param1+1 ; here is an example of assignment countdown xx ; this begins a simple type of loop param2=_ shl 1 ; an underscore refers back to the left side of ; the equals sign. ; param2 becomes (itself) shifted left one bit nextcount ; this is the end of the "countdown" loop endfunc param2 ; the function's return value will be equal to ; the value of param2 returnex 8 ; this causes program flow to return to the caller. ; it also removes 8 bytes from the stack (important) ; which had been occupied by param1 and param2 ----Symbols, labels, variables, constants---- Labels, variables, and constants are considered symbols. Symbol names are currently limited to 64 characters. (Currently, long symbol names may cause internal buffers to overflow and generate an error message.) Symbol names are not case-sensitive, though a particular case can be written to the output file for the benefit of case-sensitive linkers (the EXTERN statement is used for this purpose). Symbol names must contain at least one letter, and may contain other characters EXCEPT these: ' ; : ! $ & ( ) [ ] , . ? + - * / = " > Symbol names that are the same as CPU register names should not be used. Normally, labels are defined with a colon. An exclamation point is used instead of a colon to define an exported symbol (can be referred to in other modules). For instance, when using the GoTools GoLink linker, the program's entry point should be defined like so: start! (The program entry point for other platforms is the beginning of the code section.) Every label has an address associated with it, although the actual address is not determined until linking or upon the executable code being loaded by the operating system. The label "examplefunc" can be used as the target of a jump/branch/call (in assembly), a goto/gosub/callex (in NOWUT), it can have its address used in a calculation (eg. xx=examplefunc+40), or it can be used to refer to memory contents (eg. xx=examplefunc.d). Labels and global variables are actually the same thing, except that labels used to refer to either initialized data or uninitialized data will generally be defined with an appropriate default type. However, this is not required, and when a symbol is referenced the default type can be overridden. exampleaddr.a: ; exampleaddr will be handled as an address exampleaddr: ; - - - - address (same as .a) exampleaddr.b: ; - - - - byte value exampleaddr.sb: ; - - - - signed byte value exampleaddr.w: ; - - - - word value (16-bit) exampleaddr.sw: ; - - - - signed word value exampleaddr.d: ; - - - - dword value (32-bit) exampleaddr.sd: ; - - - - signed dword value The default type is used when a label is referenced without any type tag (ie. no dot anything). xx=exampleaddr ; operation depends on the default type xx=exampleaddr.a ; always loads the address xx=exampleaddr.d ; always loads a dword The default type is determined when a label is FIRST referrenced (not necessarily when it is defined!). Because the compiler only does one pass of the source code, it is therefore recommended to place any initialized or uninitialized data BEFORE the program code. Recommended: sectiondata dwordval.d: dd $1234ABCD sectionbss buffer.d: resd 16 sectioncode buffer(4)=dwordval Problematic: sectioncode buffer.d(4)=dwordval ; dwordval will be interpretted as an address ; because its definition has not yet been read by the compiler sectiondata dwordval.d: dd $1234ABCD sectionbss buffer.d: resd 16 Alternatively, the DEF statement can be used at the beginning of a source file (regardless of section) to manually initialize a symbol's default type. This is especially useful when referencing symbols in another module. Variables that are defined by a beginfunc or localvar statement are located on the stack and have some differences. The first is that the names may be used independently in multiple functions, and they have no relevance outside of the function in which they are defined. The second difference is that the address of a stack variable is not valid. localvar xx.d,yy.d xx=yy.a ; this does NOT work** This functionality might be added to a later version of the compiler. But currently, the compiler passes variables to the internal assembler as-is, and the assembler simply alters the addressing mode to refer to the stack, it does not insert additional instructions (eg. an LEA) to determine the address. **In fact, referencing the address of a stack variable is now allowed for 386 and 68000 Because the SH2 CPU only has register-indirect addressing with a 4-bit positive displacement, the SH2 version of the NOWUT compiler sets aside two registers for local variables. It limits the number of local variables to 32, and they must be dwords (signed or unsigned). Function parameters are limited to 12 and also must be dwords. Signed and unsigned types are handled the same during many operations, but there are some where it is important to differentiate: loading smaller data sizes: xx=array.b(5) ; the byte will be zero-extended xx=array.sb(5) ; the byte will be sign-extended xx=array.w(10) ; the word will be zero-extended xx=array.sw(10) ; the word will be sign-extended comparisons: ifgreater xx.d,0,branchtarget ; will branch unless xx is 0 ifgreater xx.sd,0,branchtarget ; will branch unless xx is 0 or negative shifting right: xx=yy.d shr 1 ; top bit will become 0 xx=yy.sd shr 1 ; top bit will become 0 (doh!) xx=yy shr 1.sb ; top bit will remain the same Due to a certain situation in the compiler, a signed shift-right will not be used unless the shift count is specified as a signed value. This is a bug. Constants are defined with the CONST statement: const secretvalue,3579545 References to the symbol (eg. secretvalue) will be replaced with the value. Only numeric values are allowed (although this includes ASCII values). ----Operands---- Basically, everything that is accepted as part of an assignment/calculation or as an argument to an instruction is considered an operand. These include numeric values, strings, symbols, and combinations of such. format in NOWUT format in assembly description 1234 1234 decimal number \ 12.34 12.34 floating-point number \ $1234 $1234 hex number \ 0x1234 0x1234 hex number \ immediate "a".b "a".b ASCII byte / "ab".w "ab".w ASCII word / "abcd".d "abcd".d ASCII dword / constname constname a symbol defined with CONST / 99000.h high word of a value (8086 assembly only) varname address or memory reference varname address varname.a varname.a address varname.h high word of address (8086 assembly only) varname.b memory reference (byte variable) varname.sb memory reference (signed byte variable) varname.w memory reference (word variable) varname.sw memory reference (signed word variable) varname.d memory reference (dword variable) varname.sd memory reference (signed dword variable) [varname].b memory reference (byte variable) [varname].w memory reference (word variable) [varname].d memory reference (sword variable) [varname].h memory reference (high word, 8086 assembly only) [varname].q memory reference (64-bit, x87 floating-point only) reg a CPU register reg.b a byte CPU register (68K only) reg.w a word CPU register (68K only) reg.d a data CPU register (68K only) [reg] memory reference (address contained in reg) [reg].b memory reference (byte at address in reg) [reg].w memory reference (word at address in reg) [reg].d memory reference (dword at address in reg) [reg].q memory reference (64-bit, x87 floating-point only) [reg+] memory reference with post-increment (68K) [reg+].b memory reference with post-increment (68K and SH2) [reg+].w memory reference with post-increment (68K and SH2) [reg+].d memory reference with post-increment (68K and SH2) [reg-] memory reference with pre-decrement (68K) [reg-].b memory reference with pre-decrement (68K and SH2) [reg-].w memory reference with pre-decrement (68K and SH2) [reg-].d memory reference with pre-decrement (68K and SH2) [reg+xx].x other indirect addressing modes (CPU dependent) [varname].b indirect memory reference (byte variable) [varname].sb indirect memory reference (signed byte variable) [varname].w indirect memory reference (word variable) [varname].sw indirect memory reference (signed word variable) [varname].d indirect memory reference (dword variable) [varname].sd indirect memory reference (signed dword variable) varname(xx) indexed address or memory reference varname.a(xx) indexed address varname.b(xx) indexed memory reference (byte) varname.sb(xx) indexed memory reference (signed byte) varname.w(xx) indexed memory reference (word) varname.sw(xx) indexed memory reference (signed word) varname.d(xx) indexed memory reference (dword) varname.sd(xx) indexed memory reference (signed dword) "abcxyz123" string "abcxyz123".a address of a string _ "itself" (left side of an assignment) Note that a blank operand in a series of operands separated by commas will be interpreted as 0. callex result.d,somefunction,param1,,,param4 ; zeros are passed as 2nd and 3rd parameters Also: indexed symbols with a number as index can now be handled the same as plain symbols (ie. the addition is done during linking instead of at run time). base.d(4)=0 ; both statements generate the same number of instructions base.d=0 ; But don't use indexed symbols for jumps/calls/branches as this is not guaranteed to work: goto place(8) ; outcome is uncertain There are some differences between operand formats in NOWUT vs. the internal assembler. Some CPU instructions make indirect memory references while using the same encoding as instructions which access memory without the indirection (and use the same syntax in their standard assembly languages), hence these ambiguities persist here. In assembly mode, memory references always use square brackets around a register name, address, or symbol. In NOWUT language mode, square brackets are only used for indirect memory references. Also in NOWUT language mode, calculations are currently not allowed inside of square brackets. [address+48].d=65 ; this does NOT work tempvar=address+48 > [tempvar].d=65 ; use this instead Plain strings are only used for initialized data, the EXTERN statement, and the INCBIN statement. However the address of a string can be used as with any other address, with the string data being dumped at the end of the section. messageptr="stuff happens".a ; pointer to a null-terminated string messageptr2="stuff happens"r.a ; the letter "r" adds a CR/LF messageptr3="line 1"r"line 2"rr.a ; muliple CR/LFs are possible, even inside a string Strings that will be appended to the end of the section are buffered during compilation. The number of strings that can be buffered is limited by the MAXSTRINGS constant in MULTINO. There is currently no other support for strings in the NOWUT language. register names: x86 - eax ecx edx ebx esp ebp esi edi ax cx dx bx sp bp si di al cl dl bl ah ch dh bh es cs ss ds fs gs cr0 cr2 cr3 cr4 dr0 dr1 dr2 dr3 dr6 dr7 st0 st1 st2 st3 st4 st5 st6 st7 Since x86 registers have an inherent size, no size tag is needed. 68K - d0 d1 d2 d3 d4 d5 d6 d7 a0 a1 a2 a3 a4 a5 a6 a7 ccr sr pc Data registers may be byte, word, or dword size. Address registers may be word or dword. SH2 - r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 macl mach sr gbr vbr pr pc SH2 registers are all 32 bits, no size tag is needed. ----Calculation/assignment---- A calculation is a series of operands and operators that cause a value to be computed at runtime. This includes steps taken to calculate the address of a memory reference. Assignment causes a value to be stored in a memory location. Assignment uses the equals sign. somevar=foo+1*100 The value of foo is loaded, 1 is added, it's multiplied by 100, the result is stored in somevar. Needless to say, somevar should be a memory reference. If its default type is .a (an address) then compilation will fail with an error message. Operations are performed in order from left to right, except where parenthesis are used to specify that a different order should be used. When the compiler encounters parenthesis it pushes the current value on the stack, performs the calculation inside the parenthesis, then pops the stack to continue. Because the compiler doesn't attempt to optimize such things, manually re-ordering operations to avoid parenthesis results in sleeker code. somevar=foo+(bar*100) ; OK, but suboptimal somevar=bar*100+foo ; better Currently, parenthesis may be nested up to 20 levels deep. supported operators: + addition - subtraction * multiplication / division and logical AND or logical OR shl logical shift-left shr logical or arithmetic shift-right xor logical exclusive OR The parser handles operands and operators as a unit, and as a side effect of this, extra spaces should not be inserted between them. somevar=foo +10 ; bad somevar=foo+ 10 ; acceptable somevar=foo shl 2 ; good somevar=foo shl 2 ; bad The underscore operand refers to the target of an assignment. It is particularly handy when the target involves a complex address calculation that would otherwise need to be repeated. array(index shl 4+12)=array(index shl 4+12)+20 ; OK, but ugly and slow array(index shl 4+12)=_+20 ; better Indices on symbols are always calculated in terms of bytes. If you want to access a numbered word in an array of words, be sure to use a shift in your index. array.d(0)=$00000001 ; the first dword array.d(4)=$00000002 ; the second dword array.d(N shl 2)=n ; the Nth dword Hence, mixing of bytes, words, and dwords in a data structure is easily accomplished. But unaligned memory accesses are also possible, and may not be desirable. The same memory location can also be accessed with differing data sizes at different times, however this may cause unexpected results if endianness is not taken into account. Calculations are sometimes accepted as operands in place of memory references or numeric values. See the next section for details. ----NOWUT language statements/instructions---- The following is a list of recognized instructions in NOWUT language mode: instruction number of operands type of operand description ALIGNW none adds a byte if necessary to ensure an even address ALIGND none adds bytes if necessary to ensure an address divisible by 4 ALIGNQ none adds bytes if necessary to ensure an address divisible by 8 ASM none switches to assembly language mode BEGINFUNC variable symbol pushes registers on the stack and optionally defines any parameters that were provided by the caller Note: parameters will be listed in the reverse order compared to CALLEX BITS16 none switches x86 compilation mode to 8086 BITS16S none switches x86 compilation mode to 8086tiny BITS32 none switches x86 compilation mode to 386 CALLAM 1 memory reference 68000 only: used for Amiga OS system calls. 6 calculation First parameter is the return value. Second is the base address. Third is the function offset. The remaining parameters are values to be loaded in registers D0, D1, A0, A1 example: callam dosbase.d,[4].d,-552,0,0,0,"dos.library".a ; Exec - openlibrary CALLEX 1 memory reference pushes any parameters on the stack, calls 1 calculation a function, and optionally stores a variable calculation return value examples: callex result.d,functionname ; function call that receives a return value callex ,functionname ; function call without return value callex ,functionname,param1.d,foo*320+bar ; function call with two parameters callex ,jumptable.d(x shl 2) ; calculating a function address Note: parameters are pushed on the stack from left to right and will appear in the reverse order compared to BEGINFUNC CONST 1 symbol associates a value with a symbol 1 immediate example: const bufsize,16384 ; occurences of bufsize are replaced with 16384 COPYBYTES 1 calculation copies an array of bytes from source address 1 calculation (first parameter) to destination address 1 calculation (second parameter). Third param is count. example: copybytes string1.a,string2.a+someoffset,foo-1 Note: If the count parameter is equal to zero then no bytes are copied. Source and destination memory areas should not overlap. Count is limited to 32768 on 8086, and 65536 on 68000. COUNTDOWN 1 memory reference marks the beginning of a loop example: countdown xx ; [...] ; the code inside will execute xx times, and nextcount ; afterward xx will be 0 DB 1 or more immediate, string initialized data DW 1 or more immediate, string initialized data DD 1 or more immediate, string, initialized data address, indexed address DEF 1 or more symbol provides the default type for a new symbol (does not change an existing default) example: def var1.d,value2.b ; tells the compiler that var1 will default to ; dword size, and value2 to byte size END none jumps to the ENDPROGRAM routine in a platform module ENDFUNC 0 or 1 immediate, memory invalidates local variables and stack- reference based parameters, pops the stack, and optionally loads a return value EXTERN 1 or more string specifies a case-sensitive alias for symbol names that will be written to output file for the benefit of case-sensitive linkers GOSUB 1 immediate, address calls a subroutine without changing the stack frame GOTO 1 immediate, address jumps to an arbitrary location IFCPU386 none skips subsequent statements on the same line IFCPUNOT386 if the condition is not true IFCPU8086T IFCPUNOT8086T IFCPU8086 IFCPUNOT8086 IFCPU68000 IFCPUNOT68000 IFCPUSH2 IFCPUNOTSH2 example: ifcpu68000 > const someparam,512 ; when compiling for 68000, someparam becomes 512 ifcpunot68000 > const someparam,1024 ; otherwise, it becomes 1024 IFEQUAL 1 calculation compares the first and second operands, 1 immediate, memory then jumps to the location specified in reference the third operand if they are equal 1 address example: ifequal foo,15,someroutine ; jumps if foo is equal to 15 IFGREATER 1 calculation compares the first and second operands, 1 immediate, memory then jumps if the first is greater reference 1 address example: ifgreater bar+10,foo,someroutine ; jumps if bar+10 is greater than foo IFLESS 1 calculation compares the first and second operands, 1 immediate, memory then jumps if the first is less reference 1 address IFNOTGREATER 1 calculation equivalent to less-than-or-equal 1 immediate, memory reference 1 address IFNOTLESS 1 calculation equivalent to greater-than-or-equal 1 immediate, memory reference 1 address IFUNEQUAL 1 calculation compares the first and second operands, 1 immediate, memory then jumps to the location specified in reference the third operand if they are unequal 1 address INCBIN 1 or more string loads initialized data from a file example: incbin "gamegfx.bin" ; inserts the contents of GAMEGFX.BIN LINKLIBFILE 1 or more string adds the name of a dynamic link library to example: the output file .drectve section linklibfile "kernel32.dll","user32.dll" LOADBIG 1 memory reference loads a big-endian second operand and changes 1 immedate, memory the byte order if necessary reference LOADLITTLE 1 memory reference loads a little-endian second operand and changes 1 immedate, memory the byte order if necessary reference LOCALVAR 1 or more symbol with tag defines stack-based variables to be used within a function (must come after BEGINFUNC) NEXTCOUNT none decrements the variable specified by the associated COUNTDOWN, then jumps to the beginning of the loop if the result is not 0 RESB 1 immediate reserves the number of bytes specified RESW 1 immediate reserves the number of words specified RESD 1 immediate reserves the number of dwords specified RESQ 1 immediate reserves the number of qwords specified Note: If RESx are used in the code or data sections, zeros will be written. RETURN none returns from a routine called by GOSUB RETURNEX 1 immediate returns from a routine called by CALLEX and pops the specified number of bytes from the stack SECTIONBSS none marks the beginning of the BSS section (reserved storage) SECTIONCODE none marks the beginning of the code section SECTIONDATA none marks the beginning of the data section (initialized data) Note: section headers should not appear more than once in a NOWUT program WEND none marks the end of a WHILE loop (jumps back to the beginning) WHILEEQUAL 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if they are unequal reference example: WHILEEQUAL [pointer].b,0 ; if the memory location [pointer] contains a pointer=_+1 ; zero byte then the loop will execute, and it WEND ; will continue until a non-zero byte is found WHILEGREATER 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if the first is not reference greater WHILELESS 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if the first is not reference less example: xx=0 WHILELESS xx,5 ; the beep routine will be called 5 times GOSUB beep > xx=_+1 WEND WHILEUNEQUAL 1 calculation compares the operands and jumps to the 1 immediate, memory corresponding WEND if they are equal reference ----Functions/procedures---- There are no declarations needed, return values are optional, and functions can even have multiple entry and exit points. But care needs to be taken to avoid stack corruption and to refrain from referencing local variables when they are not valid. ; example of a function with multiple entry points and an internal subroutine somefunc2: globalsetting=defaultval somefunc: beginfunc address.d localvar x.d,y.d x=address > y=8 > gosub printstuff ifequal globalsetting,0,label123 x=carriagertn > y=2 > gosub printstuff goto label123 printstuff: callex ,writefile,,tempvar,y,x,chandle return label123: endfunc returnex 4 Imagine that you can call somefunc, and pass the address of an eight-character string which will then be printed (using a Win32 call to write to an open console). If globalsetting is zero then no carriage return is added to the output, otherwise it is added. If the caller wanted to override globalsetting then it could call somefunc2 instead, which sets globalsetting itself, before the function proceeds. Having code execute before BEGINFUNC is no problem as long as it only references global variables. The address, x, and y symbols are not yet valid until after BEGINFUNC and LOCALVAR. Likewise, when printstuff is called from within somefunc, the x and y variables are valid. However, printstuff can NOT be called from outside of somefunc because x and y will point to who-knows-what. The compiler doesn't invalidate stack variables belonging to one function until it sees a BEGINFUNC pertaining to another function. At that point, it will give an error if access to such variables is attempted. At runtime, they would become invalid as soon as an ENDFUNC is executed. It's possible to have more than one ENDFUNC associated with a function. ; example function with two exit points anotherfunc: beginfunc param1.d,param2.d ifequal param1,param2,labelxyz endfunc 0 returnex 8 labelxyz: endfunc 1 returnex 8 Note that if you don't need to pass any parameters, GOSUB and RETURN can be used instead of CALLEX and RETURNEX. ----Internal assembler---- The NOWUT compiler generates assembly code and feeds it back to its internal assembler. This code can be seen in comments in the .LST file that is generated during compilation. If an error occurs during compilation, it is convenient to look at the end of the .LST file to see how far it progressed before encountering the error. Hand-written assembly language can be included in NOWUT programs by using the ASM and ENDASM statements. These CPU-independent statements are recognized by the assembler: ALIGNW \ ALIGND \ ALIGNQ \ same as NOWUT mode DB / DW / DD / ENDASM - returns to NOWUT mode The x86 instruction set: Instruction names are as usual, except for 8-bit jumps which have been given their own separate forms SJMP and SJcc. Memory operands are contained in square brackets, and .b .w .d tags are used to make the operand size explicit. Destination operands go on the left, source operands on the right. In 32-bit mode, operand-size prefixes are inserted before instructions that use 16-bit words. The reverse is true for 16-bit mode, where instructions with a 32-bit operand size will have a prefix inserted. SIB addressing modes (eg. [ESI+EAX*4]) are currently not supported in any mode. The instruction listing below describes acceptable operands mostly in terms of how they are encoded. The equivalent NOWUT syntax or operand types are as follows: x86 operand NOWUT notes imm8 immediate 8 bits, often sign-extended imm16 immediate (could be an address in 16-bit mode) imm32 immediate or address reg8 al cl dl bl ah ch dh bh reg16 ax cx dx bx sp bp si di reg32 eax ecx edx ebx esp ebp esi edi segreg es cs ss ds fs gs CRx cr0 cr2 cr3 cr4 DRx dr0 dr1 dr2 dr3 dr6 dr7 freg st0 st1 st2 st3 st4 st5 st6 st7 mem8 [immediate].b, [address].b, [reg].b, [reg+xx].b mem16 [immediate].w, [address].w, [reg].w, [reg+xx].w mem32 [immediate].d, [address].d, [reg].d, [reg+xx].d mem48/64/80 [immediate], [address], .q should be accepted for 64-bit data, while .w may be [reg], [reg+xx] used to distinguish 80-bit (this hasn't been tested) disp16/32 [immediate], [address] accesses memory of various sizes but doesn't use mod/rm encoding rm8 - same as mem8 or reg8 rm16 - same as mem16 or reg16 rm32 - same as mem32 or reg32 The assembler will accept operands without a size tag, however in some cases the size ambiguity will mean that more than one opcode would be valid. Shorter opcodes are generally favored. example: PUSH 7 ; this will be assembled as an imm8 instead of imm32 Also note that on the x86, displacements can be negative: MOV EAX,[EBP-40] Hand-written assembly code should not modify (or should save and restore) the EBP register as the compiler uses it to address stack variables. EBP is used to address stack variables in assembly code as well, and these variables will become invalid when it is modified. AAA $37 AAD imm8 $D5 AAM imm8 $D4 AAS $3F ADC AL,imm8 $14 ADC AX/EAX,imm16/32 $15 ADC rm8,reg8 $10 ADC rm16/32,reg16/32 $11 ADC reg8,rm8 $12 ADC reg16/32,rm16/32 $13 ADC rm8,imm8 $80 /2 ADC rm16/32,imm16/32 $81 /2 ADC rm16/32,imm8 $83 /2 ADD AL,imm8 $04 ADD AX/EAX,imm16/32 $05 ADD rm8,reg8 $00 ADD rm16/32,reg16/32 $01 ADD reg8,rm8 $02 ADD reg16/32,rm16/32 $03 ADD rm8,imm8 $80 /0 ADD rm16/32,imm16/32 $81 /0 ADD rm16/32,imm8 $83 /0 AND AL,imm8 $24 AND AX/EAX,imm16/32 $25 AND rm8,reg8 $20 AND rm16/32,reg16/32 $21 AND reg8,rm8 $22 AND reg16/32,rm16/32 $23 AND rm8,imm8 $80 /4 AND rm16/32,imm16/32 $81 /4 AND rm16/32,imm8 $83 /4 ARPL rm16,reg16 $63 BOUND reg16/32,mem16/32 $62 BSF reg16/32,rm16/32 $0F BC BSR reg16/32,rm16/32 $0F BD BSWAP reg32 $0F C8+r BT rm16/32,reg16/32 $0F A3 BT rm16/32,imm8 $0F BA /4 BTC rm16/32,reg16/32 $0F BB BTC rm16/32,imm8 $0F BA /7 BTR rm16/32,reg16/32 $0F B3 BTR rm16/32,imm8 $0F BA /6 BTS rm16/32,reg16/32 $0F AB BTS rm16/32,imm8 $0F BA /5 CALL imm16/32 $E8 CALL rm16/32 $FF /2 CALLF rm16/32 $FF /3 CALLF imm16,[imm16/32] $9A ; first operand is the segment CDQ $99 CLC $F8 CLD $FC CLI $FA CLTS $0F 06 CMC $F5 CMP AL,imm8 $3C CMP AX/EAX,imm16/32 $3D CMP rm8,reg8 $38 CMP rm16/32,reg16/32 $39 CMP reg8,rm8 $3A CMP reg16/32,rm16/32 $3B CMP rm8,imm8 $80 /7 CMP rm16/32,imm16/32 $81 /7 CMP rm16/32,imm8 $83 /7 CMPSB $A6 CMPSD $A7 CMPSW $A7 CMPXCHG rm8,reg8 $0F B0 CMPXCHG rm16/32,reg16/32 $0F B1 CMPXCHG8B mem64 $0F C7 /1 CPUID $0F A2 CWDE $98 DAA $27 DAS $2F DEC reg16/32 $48+r DEC rm8 $FE /1 DEC rm16/32 $FF /1 DIV rm8 $F6 /6 DIV rm16/32 $F7 /6 HLT $F4 IDIV rm8 $F6 /7 IDIV rm16/32 $F7 /7 IMUL rm8 $F6 /5 IMUL rm16/32 $F7 /5 IMUL AL,rm8 $F6 /5 IMUL AX/EAX,rm16/32 $F7 /5 IMUL reg16/32,rm16/32 $0F AF IMUL reg16/32,imm8 $6B IMUL reg16/32,imm16/32 $69 IMUL reg16/32,rm16/32,imm8 $6B IMUL reg16/32,rm16/32,imm16/32 $69 IN AL,imm8 $E4 IN AX/EAX,imm8 $E5 IN AL,DX $EC IN AX/EAX,DX $ED INC reg16/32 $40+r INC rm8 $FE /0 INC rm16/32 $FF /0 INSB $6C INSD $6D INSW $6D INT imm8 $CD INT3 $CC INTO $CE INVD $0F 08 INVLPG $0F 01 /0 IRET $CF JECXZ imm8 $E3 Jcc imm16/32 $0F 80+cc JMP imm16/32 $E9 JMP rm16/32 $FF /4 JMPF rm16/32 $FF /5 JMPF imm16,[imm16/32] $EA ; first operand is the segment LAHF $9F LAR reg16/32,rm16/32 $0F 02 LDS reg16/32,mem32/48 $C5 LES reg16/32,mem32/48 $C4 LFS reg16/32,mem32/48 $0F B4 LGS reg16/32,mem32/48 $0F B5 LSS reg16/32,mem32/48 $0F B2 LEA reg16/32,mem $8D LEAVE $C9 LGDT mem48 $0F 01 /2 LIDT mem48 $0F 01 /3 LLDT rm16 $0F 00 /2 LMSW rm16 $0F 01 /6 LODSB $AC LODSD $AD LODSW $AD LSL reg16/32,rm16/32 $0F 03 LTR rm16 $0F 00 /3 MOV AL,disp16/32 $A0 MOV AX/EAX,disp16/32 $A1 MOV disp16/32,AL $A2 MOV disp16/32,AX/EAX $A3 MOV rm8,reg8 $88 MOV rm16/32,reg16/32 $89 MOV reg8,rm8 $8A MOV reg16/32,rm16/32 $8B MOV reg8,imm8 $B0+r MOV reg16/32,imm16/32 $B8+r MOV rm8,imm8 $C6 /0 MOV rm16/32,imm16/32 $C7 /0 MOV rm16/32,segreg $8C MOV segreg,rm16/32 $8E MOV reg32,CRx $0F 20 MOV reg32,DRx $0F 21 MOV CRx,reg32 $0F 22 MOV DRx,reg32 $0F 23 MOVSB $A4 MOVSD $A5 MOVSW $A5 MOVSX reg16/32,rm8 $0F BE MOVSX reg32,rm16 $0F BF MOVZX reg16/32,rm8 $0F B6 MOVZX reg32,rm16 $0F B7 MUL rm8 $F6 /4 MUL rm16/32 $F7 /4 NEG rm8 $F6 /3 NEG rm16/32 $F7 /3 NOP $90 NOT rm8 $F6 /2 NOT rm16/32 $F7 /2 OR AL,imm8 $0C OR AX/EAX,imm16/32 $0D OR rm8,reg8 $08 OR rm16/32,reg16/32 $09 OR reg8,rm8 $0A OR reg16/32,rm16/32 $0B OR rm8,imm8 $80 /1 OR rm16/32,imm16/32 $81 /1 OR rm16/32,imm8 $83 /1 OUT imm8,AL $E6 OUT imm8,AX/EAX $E7 OUT DX,AL $EE OUT DX,AX/EAX $EF OUTSB $6E OUTSD $6F OUTSW $6F POP reg16/32 $58+r POP rm16/32 $8F /0 POP DS $1F POP ES $07 POP SS $17 POP FS $0F A1 POP GS $0F A9 POPA $61 POPF $9D PUSH reg16/32 $50+r PUSH rm16/32 $FF /6 PUSH imm8 $6A PUSH imm16/32 $68 PUSH CS $0E PUSH DS $1E PUSH ES $06 PUSH SS $16 PUSH FS $0F A0 PUSH GS $0F A8 PUSHA $60 PUSHF $9C RCL rm8 $D0 /2 RCL rm8,CL $D2 /2 RCL rm8,imm8 $C0 /2 RCL rm16/32 $D1 /2 RCL rm16/32,CL $D3 /2 RCL rm16/32,imm8 $C1 /2 RCR rm8 $D0 /3 RCR rm8,CL $D2 /3 RCR rm8,imm8 $C0 /3 RCR rm16/32 $D1 /3 RCR rm16/32,CL $D3 /3 RCR rm16/32,imm8 $C1 /3 RDMSR $0F 32 RDTSC $0F 31 RET $C3 RET imm16 $C2 RETF $CB RETF imm16 $CA ROL rm8 $D0 /0 ROL rm8,CL $D2 /0 ROL rm8,imm8 $C0 /0 ROL rm16/32 $D1 /0 ROL rm16/32,CL $D3 /0 ROL rm16/32,imm8 $C1 /0 ROR rm8 $D0 /1 ROR rm8,CL $D2 /1 ROR rm8,imm8 $C0 /1 ROR rm16/32 $D1 /1 ROR rm16/32,CL $D3 /1 ROR rm16/32,imm8 $C1 /1 SAL rm8 $D0 /4 SAL rm8,CL $D2 /4 SAL rm8,imm8 $C0 /4 SAL rm16/32 $D1 /4 SAL rm16/32,CL $D3 /4 SAL rm16/32,imm8 $C1 /4 SAHF $9E SAR rm8 $D0 /7 SAR rm8,CL $D2 /7 SAR rm8,imm8 $C0 /7 SAR rm16/32 $D1 /7 SAR rm16/32,CL $D3 /7 SAR rm16/32,imm8 $C1 /7 SBB AL,imm8 $1C SBB AX/EAX,imm16/32 $1D SBB rm8,reg8 $18 SBB rm16/32,reg16/32 $19 SBB reg8,rm8 $1A SBB reg16/32,rm16/32 $1B SBB rm8,imm8 $80 /3 SBB rm16/32,imm16/32 $81 /3 SBB rm16/32,imm8 $83 /3 SCASB $AE SCASD $AF SCASW $AF SETcc rm8 $0F 90+cc /2 SGDT mem48 $0F 01 /0 SIDT mem48 $0F 01 /1 SLDT rm16 $0F 00 /0 SHL rm8 $D0 /4 SHL rm8,CL $D2 /4 SHL rm8,imm8 $C0 /4 SHL rm16/32 $D1 /4 SHL rm16/32,CL $D3 /4 SHL rm16/32,imm8 $C1 /4 SHR rm8 $D0 /5 SHR rm8,CL $D2 /5 SHR rm8,imm8 $C0 /5 SHR rm16/32 $D1 /5 SHR rm16/32,CL $D3 /5 SHR rm16/32,imm8 $C1 /5 SHLD rm16/32,reg16/32,imm8 $0F A4 SHLD rm16/32,reg16/32,CL $0F A5 SHRD rm16/32,reg16/32,imm8 $0F AC SHRD rm16/32,reg16/32,CL $0F AD SJcc imm8 or address $70+cc SJMP imm8 or address $EB SMSW rm16 $0F 01 /4 STC $F9 STD $FD STI $FB STOSB $AA STOSD $AB STOSW $AB STR rm16 $0F 00 /1 SUB AL,imm8 $2C SUB AX/EAX,imm16/32 $2D SUB rm8,reg8 $28 SUB rm16/32,reg16/32 $29 SUB reg8,rm8 $2A SUB reg16/32,rm16/32 $2B SUB rm8,imm8 $80 /5 SUB rm16/32,imm16/32 $81 /5 SUB rm16/32,imm8 $83 /5 TEST AL,imm8 $A8 TEST AX/EAX,imm16/32 $A9 TEST rm8,reg8 $84 TEST rm16/32,reg16/32 $85 TEST rm8,imm8 $F6 /0 TEST rm16/32,imm16/32 $F7 /0 VERR rm16 $0F 00 /4 VERW rm16 $0F 00 /5 WAIT $9B WBINVD $0F 09 WRMSR $0F 30 XADD rm8,reg8 $0F C0 XADD rm16/32,reg16/32 $0F C1 XCHG AX/EAX,reg16/32 $90+r XCHG reg16/32,AX/EAX $90+r XCHG reg8,rm8 $86 XCHG rm8,reg8 $86 XCHG reg16/32,rm16/32 $87 XCHG rm16/32,reg16/32 $87 XOR AL,imm8 $34 XOR AX/EAX,imm16/32 $35 XOR rm8,reg8 $30 XOR rm16/32,reg16/32 $31 XOR reg8,rm8 $32 XOR reg16/32,rm16/32 $33 XOR rm8,imm8 $80 /6 XOR rm16/32,imm16/32 $81 /6 XOR rm16/32,imm8 $83 /6 XLATB $D7 The following prefixes are supported: ASIZE $67 (address size override) CS $2E DS $3E ES $26 FS $64 GS $65 SS $36 LOCK $F0 REPNZ/NE $F2 REP/E/Z $F3 The following x87 instructions are supported: F2XM1 $D9 $F0 FABS $D9 $E1 FADD mem32 $D8 /0 FADD mem64 $DC /0 FADD freg $D8 $C0+r FADD freg,ST0 $DC $C0+r FADDP freg,ST0 $DE $C0+r FBLD mem80 $DF /4 FBSTP mem80 $DF /6 FCHS $D9 $E0 FCLEX $9B $DB $E2 FNCLEX $DB $E2 FCMOVB freg $DA $C0+r FCMOVBE freg $DA $D0+r FCMOVE freg $DA $C8+r \ FCMOVNB freg $DB $C0+r \ P6 instructions FCMOVNBE freg $DB $D0+r / FCMOVNE freg $DB $C8+r / FCMOVNU freg $DB $D8+r FCMOVU freg $DA $D8+r FCOM mem32 $D8 /2 FCOM mem64 $DC /2 FCOM freg $D8 $D0+r FCOMP mem32 $D8 /3 FCOMP mem64 $DC /3 FCOMP freg $D8 $D8+r FCOMPP $DE $D9 FCOMI freg $DB $F0+r \ P6 FCOMIP freg $DF $F0+r / FCOS $D9 $FF FDECSTP $D9 $F6 FDISI $9B $DB $E1 FNDISI $DB $E1 \ 8087 only FENI $9B $DB $E0 / FNENI $DB $E0 FDIV mem32 $D8 /6 FDIV mem64 $DC /6 FDIV freg $D8 $F0+r FDIV freg,ST0 $DC $F8+r FDIVR mem32 $D8 /7 FDIVR mem64 $DC /7 FDIVR freg $D8 $F8+r FDIVR freg,ST0 $DC $F0+r FDIVP freg,ST0 $DE $F8+r FDIVRP freg,ST0 $DE $F0+r FFREE freg $DD C0+r FIADD mem16 $DE /0 FIADD mem32 $DA /0 FICOM mem16 $DE /2 FICOM mem32 $DA /2 FICOMP mem16 $DE /3 FICOMP mem32 $DA /3 FIDIV mem16 $DE /6 FIDIV mem32 $DA /6 FIDIVR mem16 $DE /7 FIDIVR mem32 $DA /7 FILD mem16 $DF /0 FILD mem32 $DB /0 FILD mem64 $DF /5 FIST mem16 $DF /2 FIST mem32 $DB /2 FISTP mem16 $DF /3 FISTP mem32 $DB /3 FISTP mem64 $DF /7 FIMUL mem16 $DE /1 FIMUL mem32 $DA /1 FINCSTP $D9 $F7 FINIT $9B $DB $E3 FNINIT $DB $E3 FISUB mem16 $DE /4 FISUB mem32 $DA /4 FISUBR mem16 $DE /5 FISUBR mem32 $DA /5 FLD mem32 $D9 /0 FLD mem64 $DD /0 FLD mem80 $DB /5 FLD freg $D9 $C0+r FLD1 $D9 $E8 FLDL2E $D9 $EA FLDL2T $D9 $E9 FLDLG2 $D9 $EC FLDLN2 $D9 $ED FLDP $D9 $EB FLDZ $D9 $EE FLDCW mem16 $D9 /5 FLDENV mem $D9 /4 FMUL mem32 $D8 /1 FMUL mem64 $DC /1 FMUL freg $D8 $C8+r FMUL freg,ST0 $DC $C8+r FMULP freg,ST0 $DE $C8+r FNOP $D9 D0 FPATAN $D9 $F3 FPTAN $D9 $F2 FPREM $D9 $F8 FPREM1 $D9 $F5 FRNDINT $D9 $FC FSAVE mem $9B $DD /6 FNSAVE mem $DD /6 FRSTOR mem $DD /4 FSCALE $D9 $FD FSETPM $DB $E4 FSIN $D9 $FE FSINCOS $D9 $FB FSQRT $D9 $FA FST mem32 $D9 /2 FST mem64 $DD /2 FST freg $DD $D0+r FSTP mem32 $D9 /3 FSTP mem64 $DD /3 FSTP mem80 $DB /0 FSTP freg $DD $D8+r FSTCW mem16 $9B $D9 /0 FNSTCW mem16 $D9 /0 FSTENV mem $9B $D9 /6 FNSTENV mem $D9 /6 FSTSW mem16 $9B $DD /7 FSTSW AX $9B $DF $E0 FNSTSW mem16 $DD /7 FNSTSW AX $DF $E0 FSUB mem32 $D8 /4 FSUB mem64 $DC /4 FSUB freg $D8 $E0+r FSUB freg,ST0 $DC $E8+r FSUBR mem32 $D8 /5 FSUBR mem64 $DC /5 FSUBR freg $D8 $E8+r FSUBR freg,ST0 $DC $E0+r FSUBP freg,ST0 $DE $E8+r FSUBRP freg,ST0 $DE $E0+r FTST $D9 $E4 FUCOM freg $DD $E0+r FUCOMP freg $DD $E8+r FUCOMPP $DA $E9 FUCOMI freg $DB $E8+r \ P6 FUCOMIP freg $DF $E8+r / FXAM $D9 $E5 FXCH freg $D9 $C8+r FXTRACT $D9 $F4 FYL2X $D9 $F1 FYL2XP1 $D9 $F9 8086 mode peculiarities: There are two pseudo-instructions used in 8086 mode (ignored in 8086tiny or 386 mode): SEGC reg16 ; causes DS to be reloaded if the next memory reference is NOT on the stack. ; the register specified is used as an intermediary to hold the value ; (since there is no move-immediate instruction for segment registers) SEGR ; causes the next SEGC to be ignored, in case DS has already been setup Hence the method of accessing global (non-stack) variables in 8086 assembly: SEGC AX ; choose a register whose contents aren't needed MOV AX,[symbol] ; (for instance, the one we are about to reload) Don't assume that two different symbols have the same segment! Symbol addresses are handled a few different ways: DD symbol ; results in a 32-bit offset that is relative to beginning of program MOV AX,symbol ; loads the low word of the 32-bit value MOV DX,symbol.h ; loads the high word of the 32-bit value SEGC SI ; LEA SI,[symbol] ; this sequence will load a valid segment/offset pair into DS:SI The way that the compiler translates a 32-bit address to a segment/offset pair is by using the high word to lookup a segment value from a table at CS:0000 (the table is populated by INITPLATFORM in PIODOS). This happens whenever indexed or indirect addressing is used in NOWUT code. However, the resulting values are not identical to the ones used by direct references in code after it has been linked and executed. The linker tries to keep offsets under 32K so that there is room for an index to be added on, as would occur in this example: SEGC SI MOV SI,[symbol(448)] This mechanism does not allow for a negative index: SEGC SI MOV SI,[symbol(-12)] ; does NOT work The 68000 instruction set: All 68000/68010 instructions and addressing modes are now supported. Only plain 68000 opcodes are used by the compiler. Normal branch instructions use the 16-bit displacement, the SBxx version should be used for the shorter 8-bit displacement form. Variations on a single mnemonic such as ADDA or ADDI seen in other assemblers have been eliminated in favor of using one mnemonic for all forms. 32-bit words are referred to as dwords and use the .d tag, just as they do in x86 NOWUT. Likewise, memory operands use square brackets, and operands receive a size tag rather than the instruction. Destination operands go on the right, source operands on the left. The assembler will accept operands without a size tag, however in some cases the size ambiguity will mean that more than one opcode would be valid. Shorter opcodes are generally favored. example: MOVE 7,d0 ; assembled as 8-bit immediate instead of 16 or 32-bit MOVE [address],d0.d ; \ MOVE [address].d,d0 ; these all do the same thing MOVE [address].d,d0.d ; / ea (effective address) operands can be any of the following: imm8/16/32 ; immediate [address] ; memory reference (32-bit or signed 16-bit) [ax] ; address register indirect [ax+xxxx] ; address register indirect with displacement [ax+] ; address register indirect with post-increment [ax-] ; address register indirect with pre-decrement dx ; data register ax ; address register [ax+ry+xx] ; extension word: ry can be an address or data register, optionally with .w or .d tag [PC+xxxx] ; PC relative [PC+symbol] ; PC relative that refers to a symbol [PC+ry+xx] ; PC plus extension word not all modes are valid for all instructions (eg. immediate can't be a destination) Also note that on the 68K, displacements can be negative: MOVE [a6-40].d,d0 Hand-written assembly code should not modify (or should save and restore) the A6 register as the compiler uses it to address stack variables. ABCD dx,dy ABCD [ax-],[ay-] ; byte only ADD imm,ea /imm8/16/32 ADD imm3,ea ADD dy,ea ADD ea,dy ADD ea,ay ADDX dx,dy ADDX [ax-],[ay-] AND imm,ea /imm8/16/32 AND imm,ccr /imm8 AND imm,sr /imm16 AND dy,ea AND ea,dy ASL ea ; word only ASL imm3,dx ASL dy,dx ASR ea ; word only ASR imm3,dx ASR dy,dx BKPT imm3 ; 68010 Bcc label BCHG imm,ea BCHG dn,ea ; byte only BCLR imm,ea /imm8 BCLR dn,ea ; byte only BRA label BSET imm,ea /imm8 BSET dn,ea ; byte only BSR label BTST imm,ea /imm8 BTST dn,ea ; byte only CHK ea,dy ; word only CLR ea CMP imm,ea /imm8/16/32 CMP [an+],[an+] CMP ea,dy CMP ea,ay DBcc dx,label ; word only DBRA dx,label ; word only DIVS ea,dy ; divide 32/16, remainder in high word DIVU ea,dy EOR imm,ea /imm8/16/32 EOR imm,ccr /imm8 EOR imm,sr /imm16 EOR dy,ea EXG dx,dy EXG ax,ay EXG ax,dy EXT dx ; byte->word EXT dx ; word->dword ILLEGAL JMP ea JSR ea LEA ea,ay LINK ax,imm16 LSL ea ; word only LSL imm3,dx LSL dy,dx LSR ea ; word only LSR imm3,dx LSR dy,dx MOVE ea,ea MOVE ea,an MOVE sr,ea ; word only MOVE ea,sr ; word only MOVE ccr,ea ; word only, 68010 MOVE ea,ccr ; word only MOVE imm8,reg ; sign extended MOVEM imm,ea /imm16 ; register bit mask MOVEM ea,imm /imm16 ; register bit mask MOVEP dy,[ax+disp16] /disp16 MOVEP [ax+disp16],dy /disp16 MULS ea,dy ; 16x16->32 MULU ea,dy NBCD ea ; byte only NEG ea NEGX ea NOP NOT ea OR imm,ea /imm8/16/32 OR imm,ccr /imm8 OR imm,sr /imm16 OR dy,ea OR ea,dy PEA ea RESET ROL ea ROL imm3,dx ROL dy,dx ROR ea ROR imm3,dx ROR dy,dx ROXL ea ROXL imm3,dx ROXL dy,dx ROXR ea ROXR imm3,dx ROXR dy,dx RTD imm16 ; 68010 RTE RTR RTS SBcc label SBRA label SBSR label SBCD dx,dy ; byte only SBCD [ax-],[ay-] ; byte only Scc ea ; byte only STOP imm /imm16 SUB imm,ea /imm8/16/32 SUB imm3,ea SUB dy,ea SUB ea,dy SUB ea,ay SUBX dx,dy SUBX [ax-],[ay-] SWAP dx TAS ea ; byte only TRAP imm4 TRAPV TST ea UNLK ax XDOS imm4 ; this pseudo-instruction generates "F-line" opcodes ; for making system calls on the Sharp X68000 The SH2 instruction set: As with the 68K, the SH2 instruction set has undergone some cosmetic changes to bring it in line with NOWUT norms. A few mnemonics were tweaked, memory operands use square brackets and a data size tag, and long words (32-bit) are referred to as dwords. Destination operands go on the right, source operands on the left. Since the SH2 doesn't allow dword immediate data or memory access using absolute addresses, the assembler accepts a "fake" form of the MOV instruction and transparently inserts an extra instruction as needed. It also adds immediate data to a buffer that is periodically flushed to the output file, with a BRA opcode added to jump over the data. There are currently two issues with this: 1) In assembly mode the buffer is not flushed. Using more than 16 "fake" MOVs in a single section of assembly code may cause failure (using 32 definitely will) 2) The buffer is flushed before an INCBIN but any label immediately prior to the INCBIN will no longer point to the data as expected. Therefore, INCBIN should not be used in the same section as code. The SH2 doesn't do divides in a single instruction. When division is needed, the compiler generates a call to a subroutine. The program source should include these division subroutines (or some variation thereof): sh2divideu: ' unsigned 32/16 r1/r2 asm shll16 r2 div0u div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 rotcl r1 extuw r1,r1 endasm return sh2divides: ' signed 32/16 r1/r2 asm shll16 r2 mov r1,r4 rotcl r4 mov 0,r4 subc r4,r1 div0s r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 div1 r2,r1 extsw r1,r1 rotcl r1 addc r4,r1 extsw r1,r1 endasm return Note: delayed branch instructions cause the instruction following the branch to be executed before the branch takes place. Hand-written assembly code should not modify (or should save and restore) R11, R13, R14 as the compiler uses them to address stack variables. R12 is used by "fake" MOVs. ADD Rm,Rn ADD imm,Rn (immediate is 8-bit sign-extended) ADDC Rm,Rn ADDV Rm,Rn AND Rm,Rn AND imm,R0 (immediate is 8-bit zero-extended) AND imm,[R0+GBR].b (immediate is 8-bit zero-extended) BF label/imm (8-bit displacement) BFS label/imm (delayed branch)(8-bit displacement) BRA label/imm (delayed branch)(12-bit displacement) BRAF Rm (delayed branch) BSR label/imm (delayed branch)(12-bit displacement) BSRF Rm (delayed branch) BT label/imm (8-bit displacement) BTS label/imm (delayed branch)(8-bit displacement) CLRMAC CLRT CMPEQ imm,R0 (immediate is 8-bit sign-extended) CMPEQ Rm,Rn CMPGE Rm,Rn rn>=rm, signed CMPGT Rm,Rn rn>rm, signed CMPHI Rm,Rn rn>rm, unsigned CMPHS Rm,Rn rn>=rm, unsigned CMPPL Rn rn>0 CMPPZ Rn rn>=0 CMPSTR Rm,Rn DIV0S Rm,Rn DIV0U DIV1 Rm,Rn DMULS Rm,Rn 32x32->64 (MAC) DMULU Rm,Rn DT Rn EXTSB Rm,Rn EXTSW Rm,Rn EXTUB Rm,Rn EXTUW Rm,Rn JMP Rm (delayed branch) JSR Rm (delayed branch) LDC Rm,SR LDC Rm,GBR LDC Rm,VBR LDC [Rm+],SR LDC [Rm+],GBR LDC [Rm+],VBR LDS Rm,MACH LDS Rm,MACL LDS Rm,PR LDS [Rm+],MACH LDS [Rm+],MACL LDS [Rm+],PR MAC [Rm+],[Rn+].d MAC [Rm+],[Rn+].w MOV imm/address,Rn ; this pseudo-instruction uses [PC+label] address mode to load a dword ; from an automatically-created data dump MOV [symbol/address].b,Rn ; these pseudo-instructions cause another instruction to be inserted MOV [symbol/address].w,Rn ; which loads the address, then memory is accessed using register- MOV [symbol/address].d,Rn ; indirect mode. ; stack variables are an exception, the extra instruction isn't needed MOV Rm,Rn MOV imm,Rn (immediate is 8-bit sign-extended) MOV Rm,[Rn].b MOV Rm,[Rn].w MOV Rm,[Rn].d MOV [Rm].b,Rn MOV [Rm].w,Rn MOV [Rm].d,Rn MOV [Rm+].b,Rn MOV [Rm+].w,Rn MOV [Rm+].d,Rn MOV Rm,[Rn-].b MOV Rm,[Rn-].w MOV Rm,[Rn-].d MOV Rm,[R0+Rn].b MOV Rm,[R0+Rn].w MOV Rm,[R0+Rn].d MOV [R0+Rm].b,Rn MOV [R0+Rm].w,Rn MOV [R0+Rm].d,Rn MOV R0,[GBR+disp].b (8-bit displacement, zero-extended) MOV R0,[GBR+disp].w (8-bit displacement, zero-extended) MOV R0,[GBR+disp].d (8-bit displacement, zero-extended) MOV [GBR+disp].b,R0 (8-bit displacement, zero-extended) MOV [GBR+disp].w,R0 (8-bit displacement, zero-extended) MOV [GBR+disp].d,R0 (8-bit displacement, zero-extended) MOV R0,[Rn+disp].b (4-bit displacement, zero-extended) MOV R0,[Rn+disp].w (4-bit displacement, zero-extended, doubled) MOV Rm,[Rn+disp].d (4-bit displacement, zero-extended, quadrupled) MOV [Rn+disp].b,R0 (4-bit displacement, zero-extended) MOV [Rn+disp].w,R0 (4-bit displacement, zero-extended, doubled) MOV [Rn+disp].d,Rm (4-bit displacement, zero-extended, quadrupled) MOV [PC+label],Rn (8-bit displacement, zero-extended) MOVA [PC+label],R0 MOVT Rn MUL Rm,Rn MULS Rm,Rn MULU Rm,Rn NEG Rm,Rn NEGC Rm,Rn NOP NOT Rm,Rn OR Rm,Rn OR imm,R0 (immediate is 8-bit zero-extended) OR imm,[R0+GBR].b (immediate is 8-bit zero-extended) ROTL Rn ROTR Rn ROTCL Rn ROTCR Rn RTE (delayed branch) RTS (delayed branch) SETT SHAL Rn SHAR Rn SHLL Rn SHLR Rn SHLL2 Rn SHLR2 Rn SHLL8 Rn SHLR8 Rn SHLL16 Rn SHLR16 Rn SLEEP STC SR,Rn STC GBR,Rn STC VBR,Rn STC SR,[Rn-] STC GBR,[Rn-] STC VBR,[Rn-] STS MACH,Rn STS MACL,Rn STS PR,Rn STS MACH,[Rn-] STS MACL,[Rn-] STS PR,[Rn-] SUB Rm,Rn SUBC Rm,Rn SUBV Rm,Rn SWAPB Rm,Rn SWAPW Rm,Rn TAS [Rn].b TRAPA imm (immediate is 8-bit) TST Rm,Rn TST imm,R0 (immediate is 8-bit zero-extended) TST imm,[R0+GBR].b (immediate is 8-bit zero-extended) XOR Rm,Rn XOR imm,R0 (immediate is 8-bit zero-extended) XOR imm,[R0+GBR].b (immediate is 8-bit zero-extended) XTRCT Rm,Rn ----Known bugs and limitations---- OBJ files to be linked together must have different source file names or else automatically-generated labels will conflict. (Specifically, they must differ by alphabetic characters because other characters are not used for name generation.) When MULTINO built for 68K or 8086 is used to compile something, arguments to RESB/RESW/RESD statements may not surpass 65535. When using two-pass compilation on 68000, a GOTO, GOSUB, or IF statement that targets the next instruction will cause an error (because SBRA 0 is not valid on the 68K). Absolute addresses as memory references don't work correctly in 8086 long mode: [$B8000].w=$0841 ; does NOT work tempvar.d=$B8000 > [tempvar].w=$0841 ; use this instead NOWUT 68K uses the processor's multiply and divide instructions despite the fact that they are not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs. unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit. NOWUT in 8086 mode uses the processor's multiply and divide instructions despite the fact that they are not fully 32-bit. Multiplicands or quotients that would not fit in 16 bits will cause incorrect results. Signed vs. unsigned multiplicands will also affect the result, whereas they would not with 32bit*32bit=32bit. NOWUT SH2 only allows 12 dword parameters on the stack and 32 dword local variables at a time. (bytes and words are not allowed.) Local variables in different functions that have the same name must have the same default size. The address of a stack variable is not valid. Likewise, it can't be indexed. When a calculation is used as an operand for a NOWUT instruction that involves comparison (ie. IFGREATER, IFLESS, WHILEGREATER, WHILELESS) the value is assumed to be unsigned, regardless of any components of that calculation. If signed comparison is desired, the second operand should be marked as signed. Right-shifts do not maintain the sign bit unless the shift count is marked as signed. Eg.: signedvar=_ shr 3.sb ; maintains the sign bit The maximum number of symbols supported by the compiler (and the corresponding memory allocation) is specified in the source code of the compiler. Modify this line and recompile if needed (same for fixups): const maxsymbols,2048 const maxfixups,8192 Maximum line length is 255 characters, maximum number of labels+statements+operands on a line is defined by the "maxargs" constant (default: 32). BITS16/BITS32 shouldn't be used to switch modes within a BEGINFUNC...ENDFUNC structure as variables on the stack would no longer be addressed properly. COPYBYTES does not work in 8086 tiny mode. ----Changes from the last version---- (First release) Release 0.11: When I attempted to link a program with two OBJ files (using GoLink) I received a pile of errors about duplicate symbol definitions. Some were irrelevant symbol names used by local variables and constants. These have been given a null section/class in the OBJ so that they don't cause trouble. But I also received errors about program labels that had been automatically generated by the compiler, even though they were not marked as exports. I was not expecting this. As a workaround, the compiler now uses the name of the source file in its automatically generated labels so that they won't be the same in different OBJs. (No more 0NOWUT0000) The DEF statement was added, for manually setting a default type. This only required a change to a data table, and no changes to the code ;) Introduced the LINKBIN utility, which in addition to supporting two input files, also correctly generates X68000 executables that have relocations that are separated by more than 64K. Fixed some bugs in NO68 that caused subtraction and division to (sometimes) produce a wrong result. Added some code to NOSH2 to optimize shifts (when the value is known at compile time) instead of always using a loop. Corrected a few typos and made a few additions to this document. Included a DOS example program. Release 0.11b (2019/1/19): An endianness issue with indexed symbols in initialized data was fixed in NO68 and NOSH2. LINKBIN can now build an Amiga program from two OBJ files (it was all screwed up before). Release 0.12 (2019/2/24): x87 FPU instructions were added to NOWUT x86. NOWUT x86 parser can accept numbers with a decimal point and convert them to 32-bit floats. Fixed the NOWUT x86 assembler bug pertaining to AX/EAX ambiguity. The offending filename is displayed when a file specified by INCBIN fails to open. Fixed shifts when value to be shifted was on the stack (NOSH2). Fixed large negative immediates (NOSH2). CALLEX function address can be a calculation (NOWUT x86 only). Release 0.13 (2019/3/23): Bug fix in NO68 for exclusive-or, and for shifts of values on the stack Changed the sh2divides routine, since the old one didn't work and could also corrupt a needed register Added experimental 8086 support, including 16-bit style MODRM addressing and BITS16/BITS32 commands Added genesis and doscom platform options to LINKBIN, as well as Genesis and 8086 example programs Added a "maxarg" constant to NOWUT x86 to increase the number of allowed arguments on one line Added single-operand versions of IMUL to the x86 assembler Did some reorganization of x86 NOWUT source and tweaking of the generated code (now slightly more compact). In the future I will probably roll 68000 support back into NOWUT x86 to take advantage of some potential optimizations. Release 0.14 (2019/5/1): Combined NOWUT, NO68, and NOSH2 into the new MULTINO. 8086 support is now fully functional (minus any bugs/limitations) 68K and SH2 generated code has received modest improvements. In particular, the SH2 compiler does deduplication of immediate data/addresses. Fixed the RETURNEX 0 and ADD 0 problems for 68K. Compilation is slightly faster. Removed some stuff from the archive. Release 0.14b (2019/5/11): Fixed byte order of ASCII words/dwords on big-endian. Fixed shifts with word/dword memory operands on big-endian. Upon a duplicate label error, the label name will be displayed. Fixed pushing a byte from memory on 8086. Fixed division on 8086. Added islist and &, reducing stack operations in generated code when handling indexed symbols. Fixed mojibake in the auto-generated labels for string constants. These were not mentioned in the documentation until now, but the way they work is "random string".a can be provided as an operand and the address of the string will be passed, while the text itself will get dumped at the end of the section. Currently there is no way to put a carriage return in the string. (but they are terminated with 0) Signed shift-right (with immediate operand) can be replaced with an unrolled loop on SH2. Release 0.20 (2019/8/12): Made MULTINO and LINKBIN buildable on Win32, DOS, X68000, and Amiga, using platform modules. Reorganized and tweaked a lot of internal compiler code. Compiler does two passes by default, allowing to replace some long jumps with short ones. Tweaked some other generated code (8086 shifts, address calculation, and compares, 68K 3-bit quick form). Compiler does not generate a .LST file by default, but can show the offending line of code when aborting due to an error. Replaced x87-specific FP conversion routine with a portable integer-based one (but it is badly written and has range/precision limits). Added LOADBIG and LOADLITTLE statements to deal with endianness problems in a portable way (although the implementation seems less than ideal). Fixed typo in SH2 code for pushing a word on the stack. Added CALLAM statement for making Amiga OS system calls in a slightly less messy way. Boosted file I/O buffers to 4KB (doesn't help much except maybe on a network file system). If compilation aborts with an error, the .OBJ header is left incomplete and LINKBIN won't attempt to link it. Placing one (or multiple) letter "r" after a string inserts a CR/LF. String can also be continued. x86 assembler should now generate the correct modrm byte when ASIZE prefix is used. Fixed parsing of MOV rm,reg instructions with displacement. Fixed MOV instructions that use x86 control/debug registers. Added IFCPUxxx, IFCPUNOTxxx statements. Added x86 far calls (CALLF). Tested LINKBIN with three input files. Changed the amount of space reserved by LINKBIN in DOS executables for a segment lookup table to 64 bytes. Release 0.21 (2019/11/23): Reworked parser/assembler to handle some indexed symbols the same as plain symbols, resulting in smaller generated code. As a side effect, this lead to some changes in how 8086 segment/offset addresses are handled. Added some optimization logic that causes some redundant load instructions to be omitted. Fixed 8086 signed right shifts, and improved the other ones. Fixed the problem where the last line of the source file could be ignored. Fixed an x86 assembler bug for MOV [disp],imm instructions. When compilation fails due to an out-of-range jump, the target label will be displayed. Added far jump instructions to the x86 assembler and fixed the documentation pertaining to CALLF. Cleaned up some compiler code and made it so the assembler is invoked for each statement on a line with multiple statements, instead of buffering the code until the entire line has been parsed/compiled. Added the optrom platform option to LINKBIN. Made the compiler store the number of segment relocs in the time stamp field of 8086 big OBJs so that LINKBIN can allocate the correct amount of space in the MZ header. Release 0.22 (2020/2/25): Fixed the problem where the last line of the source file could be ignored (for real this time). Added missing 68000 addressing modes and fixed which addressing modes were allowed for JMP/JSR. Also made it so 68000 branch instructions can accept a number. Fixed parser so that [-$1234] can work. Fixed extra byte being inserted in some data statements eg.: DW "blah",$1234 Revised internal compiler code and added more optimization possibilities, plus -opt switch. Updated several sections in this document. Release 0.23 (2020/4/21): Added a fix to allow compiling source files outside the current directory. Made an improvement to the optimizer and fixed a potential problem where, for instance, 'x' and '[x].d' could have been interchanged. Rearranged the selecttables routine inside the compiler for smaller code. Only on 386, it is now possible to get the address of a stack variable. PIOWIN was changed to make most routines (except printhex and such) safe for multithreading. The 68K assembler outputs LSL instead of ASL, which shouldn't matter (but I was messing around with an emulator which wouldn't show ASL in its debugger...) Release 0.24 (2020/10/21): Added minimal Linux support via the elf386 LINKBIN option and PIOLNX module. Faster compilation. More tweaks to generate smaller compiled code. Added the COPYBYTES statement for all CPUs. (correction: except 8086 tiny mode) (386) Fixed a bug where loading the address of a stack variable was optimized incorrectly. (SH2) Fixed a problem where loading unsigned bytes generated unintentionally inefficient code. (SH2) assembler now accepts PC-relative form of MOV with hard-coded disp ie. mov [PC+$20],r1 (SH2) The address in R11 is only setup when needed by LOCALVAR, instead of every BEGINFUNC. Swapped BSS and data sections in the source code... Release 0.25 (2021/3/13): Added a check and error message for code that attempts to index a stack symbol. Made further changes to LINKBIN ELF386 code toward eventually supporting dynamic linking. Fixed a bug where indexed, signed memory references were treated as unsigned. Fixed incorrect code optimizations after assignment to a byte or word variable. Added IFNOTGREATER and IFNOTLESS. Fixed an x86 assembler bug. mov [ebx+somelabel],eax now works and is used by the compiler. Fixed a parsing bug that affected x86 assembly prefixes before some instructions. Release 0.26 (2021/7/2): Added LINKLIBFILE. OBJ files now contain a .drectve section header, for a total of 4 section headers. Works with new version of LINKBIN and with GoLink. LINKBIN can output PRG files. ELF files with dynamic linking appear to work... Changed SH2 RETURN statement so it generates rts>mov 0,r2 instead of rts>nop. Mixing up RETURN with RETURNEX 0 won't cause a crash now. Subtractions on SH2 can generate an add instruction with negative operand for efficiency. optimizer code doesn't assume dword and signed dword are different things. (acctype) added ALIGN16 (section alignment is also configurable inside LINKBIN) PIOWIN fileskread/fileskwrite return the bytes read/written