UMBC CMSC 211

UMBC | CSEE


Segments and Registers

Actual addresses on the IBM PC are given as a pair of 16-bit numbers, the segment and the offset, written in the form segment:offset. We have ignored the segment in our previous discussions because it was a constant in one of the four segment registers, specifically, the DS register, which we loaded with the statements:
 movax,@data
 movds, ax
The rest of the time, only the offset was contained in individual instructions.

There are some places where it is important to understand the segment addressing:

Things on the 80x86 are based on a 16-bit word. Therefore, addresses are based on a 16-bit word. As we know, 216 is 65,536. It might seem that a segment of 216 and an offset of 216, would give us 232, or approximately 4GB of memory. Before the 8086 was invented, microprocessors had a maximum of 216 or 64KB of memory. The designers know that was insufficient, and the decision was made that 1 MegaByte of memory would be more than would ever be needed. Part of the reason was to hold down costs, less address lines means lower system costs. 1MB is 220 or four more bits. This did not fit into things easily. They decided to make the segment register assumed to be 20 bits and only the upper 16 bits would be specified. That means the four zeros must be added to the right side of the segment. What really happens looks like this if we want to convert the segment:offset of 13a5:3327 to an actual physical address that all computers need:

segment    1 3 a 5 0
offset +    3 3 2 7
address    1 6 d 7 7
That means that when you add one to the value in the segment, you are moving forward by sixteen bytes. The makes for sixteen byte blocks known as paragraphs and paragraphs always start of addresses that are evenly divisible by sixteen. The smallest segment is a single paragraph.

An 80x86 program is organized a a number of these segments. The linker forces all of the code (.CODE directive), all of the data (.DATA directive) and stack (.STACK) portions into a single segment for each of them.

The same paragraph can have its physical addressed represented in 4096 different ways. An example of this is

16d7h:0007h
1000h:6d77h
116hh:5c17h
Any symbolic memory address label defined in a segment can have two operators applied to it, one of which we have already used.
SEG label The paragraph number at which the segment containing label is loaded
OFFSET label the offset from the beginning of that segment of label.
It is important to remember that the OFFSET part of the address is determined by the linker but the SEG part can only be determined when the program is finally loaded. Part of problem is that different parts of the same segment can be in different source code files. This also means that the program can be executed at different memory locations at different execution times without the programmer have to re-assemble or re-link the program. This is also why we have to have the code:
 movax,@data
 movds, ax
The location of the data segment is not known until after the program is loaded into memory.

A new restriction on address arithmetic is that starting address and the resulting address must be in the same segment while the expression can only involve the offset. This is because neither the assembler nor the linker knows what the values for the segments are. The assembler will report an error if you try to add or subtract addresses from different segments.


UMBC | CSEE