UMBC CMSC 211

UMBC | CSEE


C/C++ Strings

Strings in C/C++ are really arrays of characters with one special characteristic. Every string in C/C++ is terminated with a null byte. That is a byte whose binary value is 0. Those strings can be of any length, but they must be declared in advance of a maximum size. Remember that you also had to count the null in the maximum size, so if you had: char name[100]; the array name could hold up to 99 characters. The same exact rules apply in assembly language. Actually, the reason those rules apply in C/C++, is those languages get reduced to assembly language.

Length of a String

Now, if I put: char name[100] = "Gary"; What I get is "G', 'a', 'r', 'y', '\0, and 95 locations of some unknown values. So how long is the string? Is it 100, because I have declared it to be 100? Is it 5 long, for the four characters plus the null? Is it 4 long, for the actual characters? Technically, it is all three. However, what applications programmer is concerned with is how many characters do I have in the array at this time. The library has a function, strlen(), that will return 4. What if we created our own? What would it look like?

;; LENSTR.ASM--return length of C string pointed to on entry by si
;;
;;   Calling Sequence:
;;	EXTRN	LenStr:NEAR
;;	mov	si, OFFSET theString
;;	call	LenStr
;;
;;   On exit, ax contains the string's length (excluding the terminating
;;    null byte) and si points to that byte.  All other registers are
;;    preserved.
;;
;;  Program text from "Assembly Language for the IBM PC Family" by
;;   William B. Jones, (c) Copyright 1992, 1997, Scott/Jones Inc.
;;
	.MODEL	SMALL

	.CODE
	PUBLIC	LenStr

LenStr	PROC

	mov	ax, si 				;	save si to subtract at end

LenLoop:
	cmp	BYTE PTR [si], 0
	je	Done
	inc	si
	jmp	LenLoop

Done:
	neg	ax
	add	ax, si 				;	ax := loc of null byte - loc of start
	ret

LenStr	ENDP
	END
  
Simply, we step through the array until we reach a zero. We take the address where we stopped and subtract the start address. Remember, substracting an address from an address results in an ordinary number. That number is the difference between the two locations, or the size of the string.

Copying One String into another.

When copying, we know that we are done when the last character transferred is a null!
;; CPYSTR.ASM--copy C string from si to di
;;
;;   Calling Sequence:
;;	EXTRN	CpyStr:NEAR
;;	mov	si, OFFSET sourceString
;;	mov	di, OFFSET destinationString
;;	call	CpyStr
;;
;;   On exit, si points to the null byte terminating the source string,
;;    di to the null byte terminating the copied string, and all other
;;    registers are preserved.
;;
;;  Program text from "Assembly Language for the IBM PC Family" by
;;   William B. Jones, (c) Copyright 1992, 1997, Scott/Jones Inc.
;;
	.MODEL	SMALL

	.CODE
	PUBLIC	CpyStr

CpyStr	PROC
	push	ax 			;	We will use al for moving bytes

CpyLoop:
	mov	al, [si]
	mov	[di], al
	cmp	al, 0 			;	Compare after move so null byte gets
	je	Done			; 	 moved, too

	inc	si
	inc	di
	jmp	CpyLoop

Done:
	pop	ax
	ret

CpyStr	ENDP
	END  

Comparing two strings

Actually, there is more than one way to describe if two strings are equal. One way could be if the string variables point to the same memory location. We don't need that. We could have a case insensitive comparision. We don't need that. We will do a Lexicographic order comparison.
;; CMPSTR.ASM--return ax < 0, = 0, > 0 according as string at si is
;;   <, =, or > string at di
;;
;;   Calling Sequence:
;;	EXTRN	CmpStr:NEAR
;;	mov	si, OFFSET leftString
;;	mov	di, OFFSET rightString
;;	call	CmpStr
;;
;;   On exit, ax contains the result and all other registers are
;;    preserved
;;
;;  Program text from "Assembly Language for the IBM PC Family" by
;;   William B. Jones, (c) Copyright 1992, 1997, Scott/Jones Inc.
;;
	.MODEL	SMALL

	.CODE
	PUBLIC	CmpStr

CmpStr	PROC
	push	si
	push	di

CharLoop:
	mov	al, [si]
	mov	ah, [di]
	cmp	al, ah 			;	Must compare bytes before checking for
	jne	Done 			;	  termination

	cmp	al, 0 			;	Now check for either string terminated
	je	Done
	cmp	ah, 0
	je	Done 			;	Anything ending loop treated the same

	inc	si
	inc	di
	jmp	CharLoop

Done:
	sub	al, ah
	cbw 				;		Extend result to full word
	pop	di
	pop	si
	ret

CmpStr	ENDP
	END
  


UMBC | CSEE