UMBC CMSC 211

Structure of an IBM PC Assembly Language Program

Let's look again about the first program. The different parts will be coded by color:

Labels
Operation
Operands
Comments

What we will see, is that there is only one of each item on a line. None of the four parts are required, except that an operation must be present with an operand. There are also zero or more operands for each operation. Once a comment starts, the rest of the line is a comment.

;; FIRST.ASM -- Our first Assembly Language Program.  This program
;;    displays the line 'Hello, my name is Gary' on the CRT. 

        .MODEL  SMALL

        .STACK  100H

        .DATA

Message DB      'Hello, my name is Gary', 13, 10, '$'

        .CODE
Hello   PROC
        mov     ax, @data
        mov     ds, ax
        mov     dx, OFFSET Message
        mov     ah, 9h            ;Function code for 'display string'
        int     21h               ;Standard way to call MSDOS
        mov     al, 0             ;Return code of 0
        mov     ah, 4ch           ;Exit back to MSDOS
        int     21h
Hello      ENDP

                END     Hello                                ;Tells where to start execution

OK, what else did you see? Some of the operations are started with a period, .MODEL, .STACK, .DATA., and .CODE. These and the operations of DB, PROC, ENDP, and END are not really computer instructions and are known as pseudo-operations. These are actually directives to the assembler. As you will see in this course, the other lines are converted to a single machine instruction.

Labels are identifiers. They are names invented by the programmer, in an attempt to provide documentation about what is happening, just as the identifiers in C language are. Labels contain letters, digits, and the special characters @ $ _ ?. Labels can have up to 31 character and the first character must not be a digit. Upper and lower case are normally equivalent. It is best to have the first character be a letter and that is a requirement for this class.

The actual program is only eighteen bytes long, or eight instructions. The program will use 25 bytes of data.

To help you put it into a proper frame of reference, here is the corresponding C program:

#include <stdio.h>
int main( void )
{
    printf("'Hello, my name is Gary\n");
    return 0;
}

Global Program Structure

There are three sections to an assembly language program:

.STACK -- program scratch pad
.DATA -- All programs have variables and constants
.CODE -- the machine instructions for the program

The .CODE segment is divided into PROCedures delimited by the PROC/ENDP pair. They are similar to the functions in C/C++. This segment is also terminated with the END pseudo-op with the program name (the procedure that will be executed first, like main() in C). (Note that the program name is not the filename of the program!) Additionally, there should be a program comment header block with comments explaining purpose, giving author, etc. of the program, and the .MODEL pseudo-op.

The author gives a boilerplate that I recommend you put into a file and that will become a template for each program that you write.

.STACK Segment

This provides the size in number of bytes that should be reserved for the stack.

.DATA Segment

This is where you define all the constants and variables (with possible initial values). Remember you must provide the values for variables one way or another before you can use them! This is just like declaring variables in C, except that you must give each item an initial value or '?' (which means it is not initialized and garbage in in that location).

When you are declaring a multi-byte location, the value is stored in little-endian format:

    myInt     DW  1234h     ; two-byte value

will be stored in consequentive memory locations as:

34  12

A four-byte value looks like:

    myBigInt  DD  12345678h    ; four-byte value

becomes in memory:

78 56 34 12

This reversal only occurs with 16-, 32-, and 64- byte declarations (DW, DD, and DQ). It does not happen when you declare variables with DB!

When reserving the data space, you can use several forms:

Message DB 'Hello', 13, 10, '$'
Message DB 'H'
   'e'
   'l'
   'l'
   'o'
   13
   10
   '$'
Message DB 'H', 'e', 'l', 'l', 'o', 13, 10, '$'
Ones DB DUP(1)
Wow DB 10 DUP (2, 3, ?)

.CODE Segment

This is the area where you write the instructions to solve the problem. If you compare this to a C program, you put your variable declarations (.DATA Segment) followed by your code (.CODE Segment). Same thing applies here.

As we start our study of assembly language, there a couple of things to clarify. We need to understand the internals of the CPU better than we do now. Also, some work is done by causing a interrupt after we have first set up the CPU.

Hopefully, you have already looked at the information on the registers. That leaves to describe how to output a message to the screen and how to exit the program.

We can prepare for an interrupt by putting data into specified registers. Then we simply activate the interrupt that we want.

To move data around, there has to be a source and a destination. The assembly language instruction is mov destination, source. Note that they both can not be references to a memory address. The destination and the source can be memory or a register. The source can also be a constant. This instruction can be either word or byte, but the source and destination must agree in size.

mov	AX, 345;

To display a message on the screen, we have to use the interrupt instruction, as well as the function code and the address of the message. In our example we see:

mov	dx, OFFSET Message   ;This is where in the Data Segment to find Message
mov	ah, 9h               ;Function code:  Display String
int	21h			   ;Cause the right interrupt

We exit the program with a different Function Code for INT 21:

mov	al, 0h;		   ;Return Code
mov	ah, 4ch              ;Function code:  Exit to DOS
int	21h			   ;Cause the right interrupt

The return code is identical to in C when main has:

int main()
{
    return 0;
}

UMBC | CSEE |