UMBC CMSC 313 -- Segments Previous | Next


Segments

Segments and Registers

Actual addresses on the IBM PC are given as a pair of 16-bit numbers, the segment and the offset, written in the form segment:offset. We have ignored the segment in our previous discussions because it was a constant in one of the four segment registers, specifically, the DS register, which we loaded with the statements:
 movax,@data
 movds, ax
The rest of the time, only the offset was contained in individual instructions.

There are some places where it is important to understand the segment addressing:

Things on the 80x86 are based on a 16-bit word. Therefore, addresses are based on a 16-bit word. As we know, 216 is 65,536. It might seem that a segment of 216 and an offset of 216, would give us 232, or approximately 4GB of memory. Before the 8086 was invented, microprocessors had a maximum of 216 or 64KB of memory. The designers know that was insufficient, and the decision was made that 1 MegaByte of memory would be more than would ever be needed. Part of the reason was to hold down costs, less address lines means lower system costs. 1MB is 220 or four more bits. This did not fit into things easily. They decided to make the segment register assumed to be 20 bits and only the upper 16 bits would be specified. That means the four zeros must be added to the right side of the segment. What really happens looks like this if we want to convert the segment:offset of 13a5:3327 to an actual physical address that all computers need:

segment    1 3 a 5 0
offset +    3 3 2 7
address    1 6 d 7 7
That means that when you add one to the value in the segment, you are moving forward by sixteen bytes. The makes for sixteen byte blocks known as paragraphs and paragraphs always start of addresses that are evenly divisible by sixteen. The smallest segment is a single paragraph.

An 80x86 program is organized a a number of these segments. The linker forces all of the code (.CODE directive), all of the data (.DATA directive) and stack (.STACK) portions into a single segment for each of them.

The same paragraph can have its physical addressed represented in 4096 different ways. An example of this is

16d7h:0007h
1000h:6d77h
116hh:5c17h
Any symbolic memory address label defined in a segment can have two operators applied to it, one of which we have already used.
SEG label The paragraph number at which the segment containing label is loaded
OFFSET label the offset from the beginning of that segment of label.
It is important to remember that the OFFSET part of the address is determined by the linker but the SEG part can only be determined when the program is finally loaded. Part of problem is that different parts of the same segment can be in different source code files. This also means that the program can be executed at different memory locations at different execution times without the programmer have to re-assemble or re-link the program. This is also why we have to have the code:
 movax,@data
 movds, ax
The location of the data segment is not known until after the program is loaded into memory.

A new restriction on address arithmetic is that starting address and the resulting address must be in the same segment while the expression can only involve the offset. This is because neither the assembler nor the linker knows what the values for the segments are. The assembler will report an error if you try to add or subtract addresses from different segments.

Segment Registers

The address that is in an instruction is actually the offset part of the address. Whether it is OFFSET label or just label in the source code, it is stored as the offset in the instruction in memory.

The actual registers are:

bp. Used implicitly in push, pop, and call and ret instructions for the return address.
Register Namefor addesses in...
cs Code Segment call and jmp instructions and instruction fetches (using ip)
ds Data Segment all other instructions.
es Extra Segment when an extra segment register is needed.
The operations on segment registers is extremely limited: Notice this meanings that: The default segment can be overridden by preceding the instruction by a segment override byte. The assembler will usually supply the override for you, but occasionally you must use an explicit segment override: mov WORD PTR es:[bx + 8 ], 100

You can call a procedure that is in a different segment. This is known as a far call and both the original cs and ip pushed and popped. There are also far jumps possible.

When DOS loads a program, it gets the initial values of ss:sp and cs:ip from a short header record in the .EXE file. The stack starts at the first paragraph after the code and data segments and its size is the sum of all the values in the .STACK directives (which is what is loaded into the ESP). The EIP is loaded with the offset of the label that you provided in END Label directives in one of the source code files.


Previous | Next

©2005, Gary L. Burt