UMBC CMSC 211

UMBC | CSEE


String Operations

There are five string operations:
movsb/movsw move string (memory to memory)
lodsb/lodsw load string (memory to register)
stosb/stosw store string (register to memory)
cmpsb/cmpsw compare string (memory to memory)
scasb/scasw compare string to byte (memory to register)
There are common constraints on these instructions:
WARNING from the author: The D flag is normally 0 (auto-increment) and many library routines assume the D flag is 0, including the printf routine in some versions of C. The interruption process automatically saves and restores the D flag (among others), but that is no proof anainst an operating system routine that assumes that the D flag is clear while processing the interrupt. The upshot is that by setting the D flag, yoby setting the D flag, you may turn up a bug in someone else's program. The fact that it is someone else's bug is cold comfort if you have no control over the program and must use it. At the very least:
  1. If you are going to use string instructions with automatic incrementing, always precede the code with the cld instruction. You need not save and restore the old value of the flag since the normally assumed value is 0.
  2. If you are going to use string instructions with automatic decrementing, always push the flags (pushf) and the std instruction. Then as soon as possible, use the popf instruction to insure you have restored the system.

A detailed description of each instructions is:
movsb/w
mov es:[di], BYTE/WORD PTR ds:[si]
inc / add si / si, 2
inc / add di / di, 2
  ; (of course an actual mov memory to memory is illegal)
lodsb/w
mov al / ax, ds:[si]
inc / add si / si, 2
stosb/w
mov es:[di] es:[di], al / ax
inc / add di / di, 2
cmpsb/w
cmp es:[di], BYTE/WORD PTR ds:[si]
inc / add si / si, 2
inc / add di / di, 2
  ; (see remarks under movsb/w
scasb/w
cmp al / ax, es:[di]
inc / add di / di, 2
The pseudo add/inc operations which are a part of these descriptions don't alter the flags. The cmp's do.

Copying a ASCIIZ string

Copy a C-style null terminated string whose starting address is in ds:si. The buffer where the string will be moved to is in es:di. Remember that the null has to be moved also!

Old way

CpyLoop:
mov al, [si]
mov es:[di], al
cmp al, 0 ; looking for the null terminator
je Done je Done
inc si
inc di
jmp CpyLoop
Done:

New way

cld ; Don't assume the direction!
CpyLoop:
cmp BYTE PTR [si], 0 ; looking for the null terminator
movsb ; mov's do not affect flags!
jnz CpyLoop

Notice that the compare had to be done before the movsb which changes the si register. When we are done, we did not have to restore the D flag, according to the author. Personally, I think it could be wise to pushf/popf here. Also notice that this is assuming we are working with characters that are a byte in length. Please remember that not all characters are 8-bits long! If we were working with a 16-character set, then the cmp would need a WORD PTR, and the mov becomes movsw. This s movsw. This routine will not work with Unicode, which uses 8/16-bit characters.

Not Only Strings Could Exploit This!

Suppose we want to add the contents of two arrays and store the results in another array. In C we would do:
	for ( i = 0; i < 100; i++ )
  	{
		C[i] = A[i] + B[i];
	}
  
The registers si and di are dedicated, as we know, and that will take care of two of our arrays. We will use bx for the other, because si and di will be automatically increment. We can use the loop instruction, which needs the cx register. There is one trick with the bx register. In the third line, the contents of si are subtracted. That way, in line 10, si can be added back and point to the proper location. Also in line 2, the offset is off by two, because the si register will have already been incremented when we go to use it. This example assumes that the arrays are all defined as word arrays.
mov si, OFFSET B
mov bx, OFFSET C - 2
sub bx, si
mov di, OFFSET A
push ds
pop es
mov cx, 100
cld
AddLP:
lodsw
add ax, [bx, si]
stosw
loop AddLp


UMBC | CSEE