Segments and Registers
Actual addresses on the IBM PC are given as a pair of 16-bit numbers,
the segment and the offset, written in the form segment:offset.
We have ignored the segment in our previous discussions because it was a
constant in one of the four segment registers, specifically, the
DS register, which we loaded with the statements:
The rest of the time, only the offset was contained in individual instructions.
There are some places where it is important to understand the segment
addressing:
- Real world applications programs usually allow (or require) users to
specify some or all of their run-time parameters on the command line
(e.g., ml /Zi foo.asm). Access to this information in
assembly language (here the switch /Zi and the filename foo.asm)
requires use of the Program Segment Prefix (PSP).
- Environment variables such as the PATH are in another segment
whose address in found in the PSP.
- Output to the CRT screen is ultimarely performed by moving characters
into the video memory, which is in yet another segment.
Normally the character are moved with DOS or BIOS calls, but
the fastest I/O, you can go directly to video memory. Also, if
you want to save part of the screen so that it can be temporarily
replaced by a pop-up window or menu, direct access to video
memory is required.
- The BIOS Data area (from address 400h) is used to store the lcoation of
the cursor, keyboard input queue, and other items are stored
in low-adressed RAM memory.
- Terminate and Stay Resident (TSR) programs and other interrupt
handlers gain control from another program, which is using an
unknown set of segments, which must be saved and restored.
- The programs we have examined so far have only useed so far have only used three segments
(stack, code, and data), which can only address 3 time 65,536 bytes.
At best that is about 130 KBytes. By today's standards, that is not
very much.
Things on the 80x86 are based on a 16-bit word. Therefore, addresses are
based on a 16-bit word. As we know, 216 is 65,536. It might
seem that a segment of 216 and an offset of 216, would
give us 232, or approximately 4GB of memory. Before the 8086
was invented, microprocessors had a maximum of 216 or 64KB
of memory. The designers know that was insufficient, and the decision
was made that 1 MegaByte of memory would be more than would ever be
needed. Part of the reason was to hold down costs, less address lines
means lower system costs. 1MB is 220 or four more bits.
This did not fit into things easily. They decided to make the segment
register assumed to be 20 bits and only the upper 16 bits would be
specified. That means the four zeros must be added to the right side
of the segment. What really happens looks like this if we want
to convert the segment:offset of 13a5:3327 to an actual physical
address that all computers need:
segment | 1 3 a 5 0 |
offset | + 3 3 2 7 |
address | 1 6 d 7 7
|
That means that when you add one to the value in the segment, you
are moving forward by sixteen bytes. The makes for sixteen byte blocks
known as paragraphs and paragraphs always start of addresses
that are evenly divisible by sixteen. The smallest segment is a
single paragraph.
An 80x86 program is organized a a number of these segments. The linker
forces all of the code (.CODE directive), all of the data (.DATA directive)
and stack (.STACK) portions into a single segment for each of them.
The same paragraph can have its physical addressed represented in 4096
different ways. An example of this is
16d7h:0007h
1000h:6d77h
116hh:5c17h
Any symbolic memory address label defined in a segment can have two operators
applied to it, one of which we have already used.
SEG label | The paragraph number at which the segment
containing label is loaded |
OFFSET label | the offset from the beginning of that segment of label. |
It is important to remember that the OFFSET part of the address is determined
by the linker but the SEG part can only be determined when the program
is finally loaded. Part of problem is that different parts of the
same segment can be in different source code files. This also means that
the program can be executed at different memory locations at different
execution times without the programmer have to re-assemble or re-link
the program. This is also why we have to have the code:
The location of the data segment is not known until after
the program is loaded into memory.
A new restriction on address arithmetic is that starting address
and the resulting address must be in the same segment while the expression
can only involve the offset. This is because neither the assembler nor
the linker knows what the values for the segments are. The assembler
will report an error if you try to add or subtract addresses from
different segments.
UMBC |
CSEE