Segments and Offsets

UMBC CMSC 211

Segments and Registers

Actual addresses on the IBM PC are given as a pair of 16-bit numbers, the segment and the offset, written in the form segment:offset. We have ignored the segment in our previous discussions because it was a constant in one of the four segment registers, specifically, the DS register, which we loaded with the statements:

	mov	ax,@data
	mov	ds, ax

The rest of the time, only the offset was contained in individual instructions.

There are some places where it is important to understand the segment addressing:

Real world applications programs usually allow (or require) users to specify some or all of their run-time parameters on the command line (e.g., ml /Zi foo.asm). Access to this information in assembly language (here the switch /Zi and the filename foo.asm) requires use of the Program Segment Prefix (PSP).
Environment variables such as the PATH are in another segment whose address in found in the PSP.
Output to the CRT screen is ultimarely performed by moving characters into the video memory, which is in yet another segment. Normally the character are moved with DOS or BIOS calls, but the fastest I/O, you can go directly to video memory. Also, if you want to save part of the screen so that it can be temporarily replaced by a pop-up window or menu, direct access to video memory is required.
The BIOS Data area (from address 400h) is used to store the lcoation of the cursor, keyboard input queue, and other items are stored in low-adressed RAM memory.
Terminate and Stay Resident (TSR) programs and other interrupt handlers gain control from another program, which is using an unknown set of segments, which must be saved and restored.
The programs we have examined so far have only useed so far have only used three segments (stack, code, and data), which can only address 3 time 65,536 bytes. At best that is about 130 KBytes. By today's standards, that is not very much.

Things on the 80x86 are based on a 16-bit word. Therefore, addresses are based on a 16-bit word. As we know, 2¹⁶ is 65,536. It might seem that a segment of 2¹⁶ and an offset of 2¹⁶, would give us 2³², or approximately 4GB of memory. Before the 8086 was invented, microprocessors had a maximum of 2¹⁶ or 64KB of memory. The designers know that was insufficient, and the decision was made that 1 MegaByte of memory would be more than would ever be needed. Part of the reason was to hold down costs, less address lines means lower system costs. 1MB is 2²⁰ or four more bits. This did not fit into things easily. They decided to make the segment register assumed to be 20 bits and only the upper 16 bits would be specified. That means the four zeros must be added to the right side of the segment. What really happens looks like this if we want to convert the segment:offset of 13a5:3327 to an actual physical address that all computers need:

segment	1 3 a 5 0
offset	+ 3 3 2 7
address	1 6 d 7 7

That means that when you add one to the value in the segment, you are moving forward by sixteen bytes. The makes for sixteen byte blocks known as paragraphs and paragraphs always start of addresses that are evenly divisible by sixteen. The smallest segment is a single paragraph.

An 80x86 program is organized a a number of these segments. The linker forces all of the code (.CODE directive), all of the data (.DATA directive) and stack (.STACK) portions into a single segment for each of them.

The same paragraph can have its physical addressed represented in 4096 different ways. An example of this is

16d7h:0007h
1000h:6d77h
116hh:5c17h Any symbolic memory address label defined in a segment can have two operators applied to it, one of which we have already used.

SEG label	The paragraph number at which the segment containing label is loaded
OFFSET label	the offset from the beginning of that segment of label.

It is important to remember that the OFFSET part of the address is determined by the linker but the SEG part can only be determined when the program is finally loaded. Part of problem is that different parts of the same segment can be in different source code files. This also means that the program can be executed at different memory locations at different execution times without the programmer have to re-assemble or re-link the program. This is also why we have to have the code:

	mov	ax,@data
	mov	ds, ax

The location of the data segment is not known until after the program is loaded into memory.

A new restriction on address arithmetic is that starting address and the resulting address must be in the same segment while the expression can only involve the offset. This is because neither the assembler nor the linker knows what the values for the segments are. The assembler will report an error if you try to add or subtract addresses from different segments.

UMBC | CSEE