This document is supposed to describe as11, the PDP-11 cross-assembler. Command line: as11 [options] assembly-file Options: -l listing-file Writes a listing, including generated code, into the named file. -o output-file Writes the generated code into the named file. No filename extensions are enforced; the convention I have been using is .s11 for assembly source, .l11 for listings, and .o11 for output files. Either (or both, if you feel weird) of the options can take the name -, which means to write the corresponding thing to as11's standard output. This is really useful only with the -l option; looking at output files is not very interesting. They are not intended to be read by anything but the simulator. The assembly-file must be a real file. (Actually, it must be something on which rewind(3) works.) The assembly language tries to stay moderately close to the DEC assembler syntax; in particular, it uses # and @ where the UNIX assemblers typically use $ and *. The assembler processes each line of the source file in turn. An assembly source line takes one of the following forms. Whitespace is ignored except where noted, though it does serve to separate tokens from one another. conditional [label] [comment] [label] pseudo-op [label] opcode [operand[,operand]] [comment] assignment [comment] where []s indicate that the thing enclosed is optional, and the names represent things as follows. Keywords, which appear in upper case in the descriptions below, are literal values; other things are terms to be described further. Alphabetic case is ignored for such keywords, and also for opcode and register names, but not for symbol names or in character constants or strings. comment A ; together with everything following it, up to the end of the line. conditional One of .IF condition expression .IF DF symbol .IF NDF symbol .IFT .IFF .IFTF .ENDC .ENDIF where "condition" is one of EQ, NE, LT, GT, LE, or GE. These provide conditional assembly. When a .IF directive is encountered, if the condition is true, assembly continues as normal up to the matching .ENDIF; if false, all lines in between are ignored, except for further conditionals. This can be further modified by the .IFF and .IFT directives, which assemble code if the original condition was false or true, respectively, or .IFTF, which assembles code unconditionally (but does not terminate the conditional). Conditionals can be nested to an arbitrary depth and take no notice of include file boundaries (though a .INCLUDE directive is not executed if code is not being assembled at the time it is seen). .ENDC and .ENDIF are synonyms. The condition of an if is considered true for the first form if the expression bears the indicated relation to 0; for example, .IF GE foo-bar ... .ENDC will assemble the contained code if foo-bar >= 0. .IF DF and .IF NDF test whether a symbol is defined and assemble code if it is or is not defined, respectively. Note that conditionals look like pseudo-ops but, unlike pseudo-ops, cannot carry labels. Code inside a false conditional is not inspected, except to determine whether it is a conditional, and thus can include illegal opcode names, invalid expression syntax, or indeed anything that cannot be parsed as a conditional. This can be dangerous, because a conditional with a syntax error will not be taken as a conditional if it occurs inside (another) false conditional. When inside a false conditional, the normal octal number giving the value of dot at the left hand side of the listing is replaced with six dashes. symbol Any sequence of characters drawn from the set !$%&.?^_~ plus the digits and letters, but not beginning with a digit or a $. Any character can be part of a symbol name if it is preceded with a backslash; except for this, whitespace cannot be part of a symbol name. Some symbol names beginning with a . have reserved meanings. Currently, these are . The current address. This is updated only between lines; within any given line it is constant. Assigning to it performs the function of the .org pseudo-op in some other assemblers. .base The default base, used to interpret numbers where no other base can be determined. Can range from 2 through 36. See the description of an expression for more details. assignment symbol = expression [comment] or symbol == expression [comment] Both forms cause the symbol to be given a value equal to the value of the expression. The difference between the two forms is that the form with the double equal sign produces a warning if you change the value of a symbol; it is intended for manifest constants, such as device register addresses, whereas the first form produces no message if you change the value of the symbol with it; it can be used like an assignment in a programming language. (Note, though, that the assembler is not intended as a general-purpose programming language!) A symbol's value can have any type; they are not restricted to 16-bit integers. Any expression's value can be stored in a symbol with an assignment and will reappear, type and value unchanged, when that symbol is used in an expression. label A symbol followed by a colon. There may be whitespace between the symbol and the colon. Doing this causes the symbol to be given a value equal to the value of . for the line on which it was seen. pseudo-op One of the following: .ASCII string .ASCIZ string .EVEN .ODD .ALIGN expression [comment] .SPACE expression [comment] .BLKB expression [comment] .BLKW expression [comment] .BYTE [expression[,expression[,expression...]]] [comment] .WORD [expression[,expression[,expression...]]] [comment] .LONG [expression[,expression[,expression...]]] [comment] .QUAD [expression[,expression[,expression...]]] [comment] .FLOAT [expression[,expression[,expression...]]] [comment] .DOUBLE [expression[,expression[,expression...]]] [comment] .LIST {ON|OFF|PUSH[{ON|OFF}]|POP} .INCLUDE filename [comment] .END [expression] .DB [expression[,expression[,expression...]]] [comment] .DW [expression[,expression[,expression...]]] [comment] .DL [expression[,expression[,expression...]]] [comment] .DQ [expression[,expression[,expression...]]] [comment] .DF [expression[,expression[,expression...]]] [comment] .DD [expression[,expression[,expression...]]] [comment] These have semantics as follows: .ASCII .ASCIZ These assemble a string of ASCII characters. See the entry for "string" for a description of what they accept. The difference between .ASCII and .ASCIZ is that .ASCIZ automatically appends a null byte, as if the string ended with <0>. .EVEN .ODD These test whether . is even or odd, respectively, and skip forward by one if not. (.EVEN is, thus, equivalent to .ALIGN 2.) .ALIGN This advances . until it's an exact multiple of the expression. This is not needed often; .EVEN is all that's usually required. .SPACE This advances . by as many bytes as the value of the expression. Thus, these two are equivalent: .SPACE expression . = . + expression .BLKB .BLKW These reserve space for the specified number of bytes or words, respectively. (.BLKB is thus equivalent to .SPACE.) .BYTE .DB These reserve bytes of memory and initialize them to the listed expressions. One byte is reserved for each expression listed. Both forms are completely equivalent. .WORD .DW These reserve words of memory and initialize them to the listed expressions. One word is reserved for each expression listed. Both forms are completely equivalent. .LONG .DL These reserve longwords of memory (32 bits each) and initialize them to the listed expressions. One longword is reserved for each expression listed. Both forms are completely equivalent. .QUAD .DQ These reserve quadwords of memory (64 bits each) and initialize them to the listed expressions. One quadword is reserved for each expression listed. Both forms are completely equivalent. .FLOAT .DF These reserve and initialize single-floats (32 bits each) in memory. One float is reserved for each listed expression. Both forms are completely equivalent. .DOUBLE .DD These reserve and initialize double-floats (64 bits each) in memory. One double is reserved for each listed expression. Both forms are completely equivalent. .LIST This controls the listing of lines. If the argument is ON or OFF, listing of lines is turned on or off, respectively. A stack of listing state is maintained; the arguments PUSH and POP control this stack. POP pops it, setting the listing state to the state thus uncovered; PUSH alone duplicates the current state and pushes it on the stack, whereas PUSH ON and PUSH OFF push the stack and set the new state as specified. .INCLUDE Includes a file. When a .INCLUDE directive is read, further lines are taken from the named file until end-of-file or a .END directive is reached, at which point assembly continues with the line after the .INCLUDE directive. Includes can be nested arbitrarily deeply (or at least until the assembler runs out of open files; typically this limit will be at least fifteen or so, and often as much as 50 or 60). .END No further lines are read from the current input file. If it is being read because it was named in a .INCLUDE directive, processing resumes with the line after the .INCLUDE; if it is being processed because it was named on the command line, assembly terminates as if end-of-file had been reached. Nothing after a .END directive is examined by the assembler, except in one circumstance: when the .END is inside a false conditional, in which case it is ignored like all other pseudo-ops. string A sequence of quoted-strings and ASCII-codes. A quoted-string is any string of characters surrounded by matching delimiters, provided the delimiter is not a less-than sign. Such a string produces one byte in memory for each character between delimiters. An ASCII-code is an expression inside <>, and assembles as one byte, whose value is that of the expression. Whitespace is significant inside quoted-strings, but not elsewhere (except possibly for character constants in the expression - see the description of expressions). Note that it is not possible to put a trailing comment after a string, because the leading semicolon would be taken as the leading delimiter of a quoted-string. filename Any string of characters between matching delimiters. The delimiter is taken to be the first non-whitespace character after the directive; the filename is everything from there up to the matching delimiter. Unlike a string (above), a comment can appear after a filename. opcode One of the recognized opcode names. As of this writing, they are: ABSD BICB CLNV CSM LDF NOP SETF STST ABSF BIS CLNVC DEC LDFPS RESET SETI SUB ADC BISB CLNZ DECB MARK ROL SETL SUBD ADCB BIT CLNZC DIV MFPD ROLB SEV SUBF ADD BITB CLNZV DIVD MFPI ROR SEVC SWAB ADDD BLE CLR DIVF MFPS RORB SEZ SXT ADDF BLO CLRB EMT MFPT RTI SEZC TRAP ASH BLOS CLRD HALT MODD RTS SEZV TST ASHC BLT CLRF INC MODF RTT SEZVC TSTB ASL BMI CLV INCB MOV SBC SOB TSTD ASLB BNE CLVC IOT MOVB SBCB SPL TSTF ASR BPL CLZ JMP MTPD SCC STCDF TSTSET ASRB BPT CLZC JSR MTPI SEC STCDI WAIT BCC BR CLZV LDCDF MTPS SEN STCDL WRTLCK BCS BVC CLZVC LDCFD MUL SENC STCFD XOR BEQ BVS CMP LDCID MULD SENV STCFI BGE CCC CMPB LDCIF MULF SENVC STCFL BGT CFCC CMPD LDCLD NEG SENZ STD BHI CLC CMPF LDCLF NEGB SENZC STEXP BHIS CLN COM LDD NEGD SENZV STF BIC CLNC COMB LDEXP NEGF SETD STFPS The convention some assemblers support, allowing what looks like bitwise or applied to opcodes, to combine multiple condition-code manipulation instructions into one, does not work. Instead, each of the 15 possible combinations has its own opcode, formed by listing the affected bits in the order NZVC, or SCC/CCC for the instructions affecting all four bits. operand Precisely what is legal in the "operand" field depends on the instruction. Some take no operands at all; some take only one, and some take two. Each instruction must be given the correct number of operands, and what sorts of operands an instruction can take depends on the instruction. Each instructions falls into one of 19 patterns, though. A description of the various types of operands referred to will follow. 1) No operands. Instructions falling into this class are: BPT CLNVC CLZC RESET SENV SETI SEZVC CCC CLNZ CLZV RTI SENVC SETL WAIT CFCC CLNZC CLZVC RTT SENZ SEV CLC CLNZV HALT SCC SENZC SEVC CLN CLV IOT SEC SENZV SEZ CLNC CLVC MFPT SEN SETD SEZC CLNV CLZ NOP SENC SETF SEZV 2) One general-addressing operand. Instructions falling into this pattern are: ADC CLR DECB MFPI NEGB SBCB TSTB ADCB CLRB INC MFPS ROL STFPS TSTSET ASL COM INCB MTPD ROLB STST WRTLCK ASLB COMB JMP MTPI ROR SWAB ASR CSM LDFPS MTPS RORB SXT ASRB DEC MFPD NEG SBC TST Note, though, that the operand of JMP is not allowed to be a register. 3) One operand, which must be a general-purpose register. The only instruction in this class is RTS. 4) One operand, which must be a constant from 0 through 7. The only instruction in this class is SPL. 5) One operand, which is a target to branch to; this target must be within approximately 128 instructions of the branch. Instructions falling into this pattern are: BCC BGE BHIS BLOS BNE BVC BCS BGT BLE BLT BPL BVS BEQ BHI BLO BMI BR 6) Two operands, the first being a general-purpose register and the second an general-addressing operand. Instructions falling into this pattern are JSR and XOR, but note that the second operand for JSR is not allowed to be a register. 7) Two operands, the first being a general-addressing operand and the second a general-purpose register. Instructions falling into this pattern are ASH, ASHC, DIV, and MUL. 8) Two operands, both being general-addressing operands. Instructions falling into this pattern are: ADD BICB BISB BITB CMPB MOVB BIC BIS BIT CMP MOV SUB 9) Two operands, the first being a general-purpose register and the second being a target to branch to; the target must be before the instruction and cannot be farther than approximately 64 instructions back. The only instruction in this class is SOB. 10) One operand, which must be a constant from 0 through 377 (255.). The instructions in this class are EMT and TRAP. 11) One operand, which must be a constant from 0 through 77 (63.). The only instructions in this class is MARK. 12) One operand, which must be a single-precision floating-point general-addressing operand. The instructions in this class are ABSF, CLRF, NEGF, and TSTF. 13) Two operands, the first being a floating-point register from F0 to F3 and the second being a single-precision floating-point general-addressing operand. The instructions in this class are STCDF and STF. 14) Two operands, the first being a single-precision floating-point general-addressing operand and the second being a floating-point register from F0 to F3. The instructions in this class are: ADDF DIVF LDF MULF CMPF LDCFD MODF SUBF 15) Two operands, the first being a floating-point register from F0 to F3 and the second being a (non-floating-point) general-addressing operand. The instructions in this class are STCDI, STCDL, STCFI, STCFL, and STEXP. 16) Two operands, the first being a (non-floating-point) general-addressing operand and the second being a floating-point register from F0 to F3. The instructions in this class are LDCID, LDCIF, LDCLD, LDCLF, and LDEXP. 17) One operand, which must be a double-precision floating-point general-addressing operand. The instructions in this class are ABSD, CLRD, NEGD, and TSTD. 18) Two operands, the first being a floating-point register from F0 to F3 and the second being a double-precision floating-point general-addressing operand. The instructions in this class are STCFD and STD. 19) Two operands, the first being a double-precision floating-point general-addressing operand and the second being a floating-point register from F0 to F3. The instructions in this class are: ADDD DIVD LDD MULD CMPD LDCDF MODD SUBD where the following terms have the following meanings: general-addressing operand One of the following: gpr (gpr) (gpr)+ @(gpr)+ -(gpr) @-(gpr) expression(gpr) @expression(gpr) #expression @#expression expression @expression where gpr means general-purpose register. general-purpose register One of R0, R1, R2, R3, R4, R5, R6, R7, SP, or PC. No whitespace is allowed in a register name. constant An expression whose value is within the indicated range. target to branch to An expression designating an address to branch to. single-precision floating-point general-addressing operand double-precision floating-point general-addressing operand One of the following fpr (gpr) (gpr)+ @(gpr)+ -(gpr) @-(gpr) expression(gpr) @expression(gpr) #expression @#expression expression @expression where gpr means general-purpose register and fpr means floating-point register. floating-point register One of F0, F1, F2, F3, F4, or F5. In some cases, F4 and F5 are not allowed. expression An expression formed from numeric constants, symbols (which produce their values when used in expressions), in the usual way with the following set of operators: + (binary) addition - (binary) subtraction * (binary) multiplication / (binary) division + (unary) no-op - (unary) negation ~ (unary) bitwise complement %FASI (unary) convert floating-point to integer by reinterpreting the bits, not by converting the value %IASF (unary) convert integer to floating-point by reinterpreting the bits, not by converting the value %FIX (unary) convert floating-point to integer by converting the value and (currently) rounding %ROUND (unary) convert floating-point to integer by converting the value and rounding %TRUNC (unary) convert floating-point to integer by converting the value and truncating where parentheses ( ) or < > can be used to override the precedence rules of the binary operators. All unary operators come before their operand, and all of them bind tighter than any binary operator. Numeric constants are of two types. Floating-point constants consist of a mantissa part, containing one or more digits and possible a decimal point, optionally followed by an exponent consisting of one of the letters e, E, d, or D, an optional sign, and a decimal exponent, except that if there is no decimal point, or if there are no digits after the decimal point, the exponent portion must be present. Integer constants are basically strings of digits. The set of legal digits changes, depending on the base; there are four ways to specify the base: - Explicitly, by writing the base in decimal and enclosing it in { } and placing it before the rest of the number. For example, {8}100 is the number sixty-four. - The traditional way of specifying decimal, with a trailing period. Thus, 100. is one hundred. - With a prefix. If the number begins with 0o or 0O, it is taken in base 8; if 0t or 0T, base 10; and if 0x or 0X, base 16. (But see the note on the next paragraph.) - By default. There is at all times a notion of the default base; this base is used for numbers that do not otherwise have a base specified. It is the value of the reserved symbol .base and can be changed by assigning to .base with an assignment. .base can take on any value from 2 through 36; if the value is greater than 16, the prefix notation for specifying bases is disabled. In bases above 10, the digits are alphabetic letters; case is ignored. The default default base is 8. The listing file optionally generated consists of each line of assembly source code, prefixed with the value of . in six-digit octal and a colon. If the line came from a .INCLUDE file, a number is prefixed to indicate the depth of nesting. If anything is assembled into memory as a result of the line, the assembled data follow. (If the .LIST directive is used to disable listing, an ellipsis is printed to indicate the omitted lines.) If you think the assembler does not conform to the above, I'd like to hear about why, because if it doesn't it's a bug (though it may be a bug in the description rather than the assembler :-). der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu