Segments
Segments and Registers
Actual addresses on the IBM PC are given as a pair of 16-bit numbers,
the segment and the offset, written in the form segment:offset.
We have ignored the segment in our previous discussions because it was a
constant in one of the four segment registers, specifically, the
DS register, which we loaded with the statements:
The rest of the time, only the offset was contained in individual instructions.
There are some places where it is important to understand the segment
addressing:
- Real world applications programs usually allow (or require) users to
specify some or all of their run-time parameters on the command line
(e.g., ml /Zi foo.asm). Access to this information in
assembly language (here the switch /Zi and the filename foo.asm)
requires use of the Program Segment Prefix (PSP).
- Environment variables such as the PATH are in another segment
whose address in found in the PSP.
- Output to the CRT screen is ultimarely performed by moving characters
into the video memory, which is in yet another segment.
Normally the character are moved with DOS or BIOS calls, but
the fastest I/O, you can go directly to video memory. Also, if
you want to save part of the screen so that it can be temporarily
replaced by a pop-up window or menu, direct access to video
memory is required.
- The BIOS Data area (from address 400h) is used to store the lcoation of
the cursor, keyboard input queue, and other items are stored
in low-adressed RAM memory.
- Terminate and Stay Resident (TSR) programs and other interrupt
handlers gain control from another program, which is using an
unknown set of segments, which must be saved and restored.
- The programs we have examined so far have only useed so far have only used three segments
(stack, code, and data), which can only address 3 time 65,536 bytes.
At best that is about 130 KBytes. By today's standards, that is not
very much.
Things on the 80x86 are based on a 16-bit word. Therefore, addresses are
based on a 16-bit word. As we know, 216 is 65,536. It might
seem that a segment of 216 and an offset of 216, would
give us 232, or approximately 4GB of memory. Before the 8086
was invented, microprocessors had a maximum of 216 or 64KB
of memory. The designers know that was insufficient, and the decision
was made that 1 MegaByte of memory would be more than would ever be
needed. Part of the reason was to hold down costs, less address lines
means lower system costs. 1MB is 220 or four more bits.
This did not fit into things easily. They decided to make the segment
register assumed to be 20 bits and only the upper 16 bits would be
specified. That means the four zeros must be added to the right side
of the segment. What really happens looks like this if we want
to convert the segment:offset of 13a5:3327 to an actual physical
address that all computers need:
segment | 1 3 a 5 0 |
offset | + 3 3 2 7 |
address | 1 6 d 7 7
|
That means that when you add one to the value in the segment, you
are moving forward by sixteen bytes. The makes for sixteen byte blocks
known as paragraphs and paragraphs always start of addresses
that are evenly divisible by sixteen. The smallest segment is a
single paragraph.
An 80x86 program is organized a a number of these segments. The linker
forces all of the code (.CODE directive), all of the data (.DATA directive)
and stack (.STACK) portions into a single segment for each of them.
The same paragraph can have its physical addressed represented in 4096
different ways. An example of this is
16d7h:0007h
1000h:6d77h
116hh:5c17h
Any symbolic memory address label defined in a segment can have two operators
applied to it, one of which we have already used.
SEG label | The paragraph number at which the segment
containing label is loaded |
OFFSET label | the offset from the beginning of that segment of label. |
It is important to remember that the OFFSET part of the address is determined
by the linker but the SEG part can only be determined when the program
is finally loaded. Part of problem is that different parts of the
same segment can be in different source code files. This also means that
the program can be executed at different memory locations at different
execution times without the programmer have to re-assemble or re-link
the program. This is also why we have to have the code:
The location of the data segment is not known until after
the program is loaded into memory.
A new restriction on address arithmetic is that starting address
and the resulting address must be in the same segment while the expression
can only involve the offset. This is because neither the assembler nor
the linker knows what the values for the segments are. The assembler
will report an error if you try to add or subtract addresses from
different segments.
Segment Registers
The address that is in an instruction is actually the offset
part of the address. Whether it is OFFSET label or just
label in the source code, it is stored as the offset in the instruction
in memory.
The actual registers are:
Register | Name | for addesses in... |
cs | Code Segment | call and jmp
instructions and instruction fetches (using ip) |
bp. Used implicitly in push, pop, and call
and ret instructions for the return address.
ds | Data Segment | all other instructions. |
es | Extra Segment | when an extra segment register is needed. |
The operations on segment registers is extremely limited:
- move to or from memory
- push
- pop
Notice this meanings that:
- mov segreg, constant is illegal.
(@data is considered a constant)
- mov segreg, segreg is illegal, so do:
The default segment can be overridden by preceding the instruction by a
segment override byte. The assembler will usually supply the override
for you, but occasionally you must use an explicit segment override:
mov WORD PTR es:[bx + 8 ], 100
You can call a procedure that is in a different segment. This is known
as a far call and both the original cs and ip pushed and popped.
There are also far jumps possible.
When DOS loads a program, it gets the initial values of ss:sp and cs:ip
from a short header record in the .EXE file. The stack starts at the first
paragraph after the code and data segments and its size is the sum
of all the values in the .STACK directives (which is what is loaded into the
ESP). The EIP is loaded with the offset of the label that you provided
in END Label directives in one of the source code files.
Previous |
Next
©2005, Gary L. Burt