Sue Evans
Hit the space bar for next page
Hypertext Markup Language (HTML) is a notation used to describe how to display the contents of web pages and to focus on how the page looks. Extensible Markup Language (XML) is a notation used to describe data on web pages and to focus on what the data mean. A fairly recent HTML standard is XHTML, which combines both HTML and XML. Web browsers read HTML or XHTML to determine how web pages should be displayed. Other software can be used to read the same webpages and interpret the information contained there, if XML tags that program recognizes have been used in the document. HTML tags are enclosed in angle brackets (<>). In XHTML, tags generally appear in start-tag, end-tag combinations.
A start tag has the form: <name attributes>. The matching end tag contains that name preceeded by a "/". For example, a paragraph of text might be formatted as follows:
<p>This is a paragraph. </p>
XHTML also allows self-closing tags of the form: <name attributes />.
In a proper XHTML file, the tags will occur in properly nested pairs. Each start tag is matched by a corresponding end tag, and one structure may be embedded inside another, but they cannot overlap. This is like the matching of curly braces in a Java or C program.
For example:
<p>...<ol>...</ol>...</p> is OK, but
<p>...<ol>...</p>...</ol> is not.
A self-closing tag acts as a self-contained start-end pair.
You should become familiar with both HTML and XHTML first. You can use the w3schools.com website for both the HTML and XHTML tutorials, or search for other tutorials that you like better.
Your task is to write a program that checks html files, (web pages) to determine whether the embedded XHTML tags are balanced. Balanced tags are necessary for the file to be a valid xhtml file as explained above.
Your program will read from the file and print an analysis of the file to the screen. The sequence of tags in the file will be echoed to the output; any other text in the file is to be ignored. If there is a tag balancing error or the program reaches the end of the file while in the middle of a tag, the program must quit and print an error message. If the end of file is reached without any errors, print a message to that effect.
You must use both a stack and a queue to solve this problem. To divide up the effort of this problem, consider the program as consisting of two phases.
The data file used to create the sample output for this project is called xhtml.dat. You may get a copy of this file from my pub directory. Do NOT cut and paste from this link.
Here's the data file
This file doesn't really test your program very well, it's just to give you an idea of what the output should look like. Your program should properly handle files with mismatched tags, EOF before a complete tag, etc., as described above.
Here's the sample run
As always, your output need not look exactly like mine, but it should contain all of the same information.
You must be logged into your account and in the same directory as the file you are trying to submit.
To submit your design, type the following at the linux prompt:
submit cs201 Proj1 design1.txt
To submit your homework, type the following at the linux prompt:
submit cs201 Proj1 proj1.py
To verify that your file was submitted, you can execute the following command at the Unix prompt. It will show all files that you submitted in a format similar to the Unix 'ls' command.
submitls cs201 Proj1