Programs written in assembly language are machine dependent and can only be used on a specific processor. They have little to no structure, and consists of far more lines of code than high-level language programs. To control a processor, you must speak it's language. The language of the computer is it's machine language and it's vocabulary is it's instructions. While no two machine languages are the same, they are quite similar since they are all constructed using similar underlying hardware principles.
Assembly Syntax
An assembly language program consists of a collection of basic instructions and assembler directives. There are no constructs like loops and selection statements commonly found in high-level language programs. Instead, you must use the primitive instructions of the processor to implement any high-level language constructs that are needed.
All assembly languages follow a similar pattern in terms of their syntax and program structure. Some may have specific requirements based on the given architecture, but once you learn one assembly language, others are easy to understand. In this section, we begin an introduction to the MIPS assembly language by looking at the structure and syntax of a simple program.
Assembler Directives
Every MIPS program must contain a text segment, which contains the executable instructions. The text segment is indicated using the assembler directive
Assembly languages use assembler directives to provide information as the program is translated to machine code. In MIPS assembly, directives are specified using dot notation, an identifier that begins with a period.
Statements
The statements within an assembly language program are written one per line with each statement being comprised of three parts:
<arg label> <arg operation> <arg operands>
The operation is the symbolic representation of the machine code operation to be performed and the operands are the arguments to that operation. The number and type of operands depends on the specific instruction. At most, there will be three operands, with each separated by a comma:
while some instructions may take a single operand
In MIPS, the operands can be either a register, a label, or an immediate value. The actual type depends on the specific instruction. In the above examples, the add
operation requires three registers, while the j
instruction requires either a label or an immediate integer value.
Immediate Values
In assembly language, literal values are known as immediate values. Immediate values can be specified as arguments for some instructions, but it depends on the specific instruction. Here, the li
instruction (load immediate) is used to store an immediate value into a register
Immediate integer values can be specified in assembly using decimal, hexadecimal or octal notation.
Labels
Assembly languages do not use variables or functions. Instead, labels are used to identify and name memory locations that hold instructions or data. The label part of a statement is optional as shown above or may be included if the given location in memory needs to be referenced within the program:
A label alone may be provided on a line, but it always refers to the location of the next statement or word in memory:
A label is a valid identifier that ends with a colon (:
). The rules for naming an identifier are similar to identifiers in a high-level language. A label can be comprised of alphanumeric characters including the underscore (_
), but the first character can not be a digit. The names of instructions and assembler directives are reserved and can not be used as identifiers.
Comments
Comments can be specified in an assembly language program using the hash symbol (#
). Like in Python, everything from the #
to the end of the line is considered a comment.
# This is an example that illustrates a comment in MIPS.
Sample Program
To illustrate the style and syntax of an assembly language program, consider the following simple program which adds two numbers, both of which are stored in registers, and stores the result in a third register:
# example1.asm
# This is an example that illustrates the various components of
# an assembly language program.
#
.text
# Program code goes in the text segment. This is an
# example of an assembler directive.
main:
# Initialize two registers and add their values
li $t1, 50
li $t2, 18
add $t0, $t1, $t1
# Terminate the program
li $v0, 10
syscall
Anassembly language program is written in a text file in the same way that you would write a program in any high-level language. The .asm
file extension is used for assembly language programs and modules.
Structure
MIPS assembly is a free-flowing type language, which means that it does not require the labels, instructions or directives to be placed at any specific location. By tradition, however, labels start at the left most position or column 1, and assembler directives and instructions are indented in from the left. This allows you to quickly find specific labels as you scan the lines of code. In this example, the directive is indented over 11 spaces and the add
instruction, 8 spaces.
Initiation
Every MIPS program must contain a main:
label to indicate the starting point of execution. After the program is loaded into memory, execution will begin with the first statement following the main:
label. Technically, it can be placed anywhere, but good design calls for it be included at the top of the assembly file.
Termination
In a high-level language, program termination is handled automatically when the last statement in the module is executed (as with Python) or when the main function returns (as with C and Java). In assembly language, there is no magic or automatic steps performed. To end or exit a program, we must terminate it using a system call. In MIPS, this is done using a two statements as shown at the end of the sample program
li $v0, 10 # These two lines serve as a halt statement.
syscall # More on this later - just use them for now.
These lines are needed in every MIPS assembly language program. We will cover the actions of these two statements in a later section. For now, simply use them at the end of each program in order to correctly terminate the program.