CMSC 201
Programming Project Five
XHTML Checker
Out: Monday 11/20/06
Due Date: Sunday 12/10/06, before midnight
The design document for this project,
design5.txt,
is due: Before Midnight, Sunday 12/3/06
|
The Objective
The purpose of this assignment is to give you practice with linked lists,
queues, stacks and guarding header files. It will also act as an introduction
to HTML, the markup language of the web. You'll be getting some more
experience using command line arguments, and writing error messages to stderr.
As always, you should continue to practice using top-down design and good
implementation techniques like incremental programming.
The Background
Hypertext Markup Language (HTML) is a notation used to describe how to display
the contents of web pages and to focus on how the page looks. Extensible
Markup Language (XML) is a notation used to describe data on web pages and to
focus on what the data mean. The newest HTML standard is XHTML, which combines
both HTML and XML. Web browsers read HTML or XHTML to determine how web pages
should be displayed. Other software can be used to read the same webpages and
interpret the information contained there, if XML tags that program recognizes
have been used in the document. HTML tags are enclosed in angle brackets
(<>). In XHTML, tags generally appear in start-tag, end-tag combinations.
A start tag has the form: <name attributes>. The matching end tag
contains that name preceeded by a "/". For example, a paragraph of text might
be formatted as follows:
<p>This is a paragraph. </p>
XHTML also allows self-closing tags of the form: <name attributes />.
In a proper XHTML file, the tags will occur in properly nested pairs. Each
start tag is matched by a corresponding end tag, and one structure may be
embedded inside another, but they cannot overlap. This is like the matching
of curly braces in a C program.
For example:
<p>...<ol>...</ol>...</p> is OK, but
<p>...<ol>...</p>...</ol> is not.
A self-closing tag acts as a self-contained start-end pair.
The Task
You should become familiar with both HTML and XHTML first. You can use the
w3school.com website for both the
HTML and
XHTML tutorials, or search for
other tutorials that you like better. This would be appropriate to do over the
long Thanksgiving break. The lecture will cover stacks and queues when classes
begin again on Monday/Tuesday, 11/27,11/28.
Your task is to write a program that checks html files, (web pages) to
determine whether the embedded XHTML tags are balanced. Balanced tags are
necessary for the file to be a valid xhtml file as explained above.
Your program will read from the file and print an analysis of the file to the
screen. The sequence of tags in the file will be echoed to the output; any
other text in the file is to be ignored. If there is a tag balancing error or
the program reaches the end of the file while in the middle of a tag, the
program must quit and print an error message. If the end of file is reached
without any errors, print a message to that effect.
Program Requirements
- Your program must use command line arguments. At the command line the
user must enter
(in this order):
- the name of the executable file,
- the name of the html file to be checked
- You must store the tags read in from the file in a queue and then use
a stack to check the balancing. Both the queue and stack MUST
use the linked list implementation. You may use the linked list, stacks,
and queues code presented in the lecture notes, but all will require
some modification.
- The basic idea of using a stack to check for balancing is to push each
start tag onto the stack. When an end tag is encountered, a start tag is
popped off the stack and the two are compared. If the tags are matching
start/end tags, processing continues; otherwise, the program should issue
an error message and stop.
- You MUST close any files that you have opened and also free any
memory that you have dynamically allocated as soon as you have
finished using them.
- You must use separate compilation for this project.
Hint
You must use both a stack and a queue to solve this problem. To divide up
the effort of this problem, I suggest that you consider the program as
consisting of two phases.
- The first phase takes care of reading the file and finding all of
the tags from the file. The first phase will print out the tags as it
finds them and also save them into a queue of tags. If the input file
ends in the middle of a tag, the program will report the error and end.
- The second phase of the program takes the queue of tags generated in
phase one and analyzes the sequence of tags to make sure that they are
properly balanced. This phase will make use of both a queue and a
stack.
More Details
This section added 11/24/06
- Although xhtml tags may have attributes, your project does not
have to handle them. I guarantee there will be no attributes in
the tags in any of the files used to test your project.
- All tags used in this project will be 80 characters or less.
- Although strict xhtml does not allow uppercase characters in tags,
transitional tags do allow this and your program should accept them.
Copying the file
The data file used to create the sample output for this project is called
test1.html. You may get a copy of this file from my pub
directory.
Sample Run
Please note that the sample output only shows
what
is expected from your program for the test1.html file. This is not a test of
the program at all, but just a sample for your clarification.
Submitting the Program
To submit your program, type the following command at the Unix prompt
submit cs201 Proj5 proj5.c followed by all the .c and .h files necessary
for compilation
I would expect this list of files to be long. Probably:
proj5.c xhtml.c xhtml.h llist.c llist.h queue.c queue.h stack.c stack.h
To verify that your project was submitted, you can execute the
following command at the Unix prompt. It will show all files that
you submitted in a format similar to the Unix 'ls' command.
submitls cs201 Proj5