Programming Projects: Linux Installation and Modifications
CMSC 421, Fall 2003
Project 2 Now
Assigned: Nov 3, 2003;
Design Document Due: Nov. 23, 2003; Design Document Due Date Extended
to Nov. 26th, 11.59PM
See Project Design Document
Guidelines;
Note: If working in a group, BOTH design and final code should be
submitted from the same GL account
Changes in Project 2 Requirements (For ALL Sections):
- Fork() within the system call for logging is NOT
REQUIRED. However, you still need to write information to a
log-file.
- A simple encryption scheme such as substitution is
sufficient. If you choose to implement another more
sophisticated scheme, that will be counted as Extra Credit
worth up to 15%.
Final Code/Report Due: Dec. 8, 2003; Submission deadline
extended to Dec. 9th, Tuesday, 11.59PM
Follow SUBMIT
instructions completely and carefully (updated on Dec 8th, 2003).
Goals
The aim of this project is rather simple -- to make you work with a
real operating system (in the present instance, Linux), and to
understand how to modify and add functionality to it. The project will
have three phases, with the final phase due the end of the
semester (just after Thanksgiving break).
Groups:This project can be done in groups of at most TWO
members (NO Exceptions). If working in a group, form your group no
later than Sep. 18, 2003. Under rare circumstances, your group member
may be a different section than what you are enrolled in. You have to
seek both the sections' instructors' approval for this.
ITE 240 LAB:The ITE 240 Lab may be used by students
enrolled in CMSC 421 for their project work. The lab contains 24
high-end Pentium machines with 1 GB RAM, dual-processor Pentium 4 CPUs
and a CD-RW drive. The 3 TAs will be holding regular weekly Office
Hours in the ITE 240 Lab - plan to take full advantage of these. The
office hours are available on the course website and also posted
outside the door.
ITE 240 LAB USAGE Policies:
- No Food or Drinks Allowed in the Lab - Absolutely! Never!
- Do not log in to multiple machines, especially when there are
students waiting for access to the lab.
- Use the laboratory ONLY for CMSC 421 related activities - NOT
for general other coursework. There are designated OIT Labs
across campus to serve this purpose.
- DO NOT ABUSE the superuser priveleges from your installed
kernel. This is really important. If you are caught involving
in such activities, you will immediately lose access priveleges
to the lab AND also face other disciplinary action. Repeated
offenses might result in shutting down the laboratory.
What will you need
Since you will be installing and modifying your own version of the
linux kernel, you will need to:
- Obtain a USB External Hard Drive that can be used on your own
laptop/computer or one of the machines in ITE 240 Lab;
- OR
- Partition your existing Hard Drive to install a separate
version of the kernel. Make sure that you are fully comfortable
with doing this since there are real risks and chances of
spending long frustrating hours.
IMPORTANT: START EARLY. These are non-trivial tasks and if
you do not start early, your chances of finishing on time keep on
diminishing the later you start.
Basic Phase: Due Sep 28, 2003, 11.59 PM
This project is intended to help you gain experience with obtaining a
linux kernel and installing it. Since most current releases of Linux
use the 2.4 kernel series, we will be using the RedHat 8.0 Linux
version (2.4.18). This available via FTP from UMBC's Linux Mirror Site
at:
ftp://mirrors.umbc.edu/pub/linux/8.0/en/iso/i386/
Download the FIRST TWO ISO (disc1 and disc2) files and copy them on
to the CD-Rs (Burn them in ISO format and it might be better to do
this from under Windows).
The Linux kernel will be installed on the External USB Hard Drive
or your own computer/laptop internal hard-drive (suitably
partitioned). For the external USB hard drive option, you will need
to create a boot floppy (so, do not throw away those floppy disks
yet).
If you plan to use the same external hard drive in both the lab and
your personal computer, then you might need two separate boot floppy
disks and also have two partitions/Linux installations on the external
hard-drive: one each for the laboratory use and your PC use.
What to hand in
Submit via the UMBC submit program
(Run this from a GL machine or one of the lab machines).
You should submit a SINGLE FILE that contains the following
information:
- Name(s) of Students submitting this. Remember that groups of at
most TWO students are allowed.
- Where you installed the Kernel, i.e. USB drive or your personal
system
- The installed Linux version.
- Comments on this phase: Did you find the Lab and TA support
useful? What kind of major difficulties did you encounter
during this file.
The command to run is specific to your section:
- submit cs421_0101 proj0 FILE_NAME for Section 0101
- submit cs421_0201 proj0 FILE_NAME for Section 0201
- submit cs421_0301 proj0 FILE_NAME for Section 0301
Note: If you are working in a group, then ONLY one student (from any
one of the two sections) should submit.
No credits are assigned for this phase -- this is simply catch up time
for those in the class not yet familiar with installing, partitioning,
dual booting etc. You may want to use the local Linux Users Group as a
resource and perhaps join their installfest.
Helpful Links
Project 1, Due Oct. 26, 2003, 11.59 PM
Assigned: 30 Sep. 2003
We assume that you have installed the required kernel. This document
will describe a new function that we want you to add to the kernel as
a system call.
GOALS
To get the student comfortable with adding system calls to the kernel. At
the end of this project you will be comfortable with perusing the Linux
sources and modifying them. While the source code you produce will itself
not be voluminous you will find that you will have to spend long hours
looking at various source and .h files.
DESCRIPTION
We assume that you have installed the 2.4.18 kernel. This document will
describe a new function that we want you to add to the kernel as a system
call. The exercise is fairly straightforward, and you'll add in no more
than 50 lines of codes/headers etc -- probably less. The idea is to make
sure you understand the mechanics of modifying the kernel.
We assume that you are already familiar with makefiles and debugging from
classes such as CMSC 341. If not, this will be a considerably more
difficult project because you will have to learn to use these tools as
well.
Helpful Hints
- By default, code for Linux
exists in
/usr/src/linux. If you have multiple versions of the
kernel, the code may exist in /usr/src/linux-version
instead.
- Recall that a system call is
a software trap or interrupt. This means that when adding a new system
call, you will need to update the system call table (i.e. the interrupt
vector) with a new entry, and generate a stub that will be used by
user programs. Look for files arch/i386/kernel/entry.S and
include/asm/unistd.h, and look for the macro _syscall2.
- In the critical section of
the code you write (e.g. when you access the kernel variable(s)), you
should consider avoiding race conditions, perhaps by disabling and
enabling the interrupts. The functions you might find useful are cli( )
& sti( ) examples of use are found in
/usr/src/linux-2.4.20-18.9/kernel
- A kernel function is just
like any other function. The header declaration has a minor difference --
the keyword asmlinkage precedes the declaration,
e.g. asmlinkage
void sys_foo(void).
- You will need to dereference
a user space pointer in the kernel when trying to pass the value back in
the argument. Look at the functions copy_to_user()
and copy_from_user()
to help with this. A good place to obtain information on these functions
is /usr/src/linux-version/arch/i386/lib/usercopy.c. You could also
use the depreciated functions memcpy_fromfs( )
and memcpy_tofs( ). You will also need to use verify_area( ) to reference user space.
- Comparing and Merging files:
A good tutorial on how to do this can be found
at http://www.cslab.vt.edu/manuals/diffutils-2.8.1/diff.html
- Here is a step-by-step instruction for creating the patch file
and the subsequent SINGLE tar file for submission:
Command: diff [options] from-file to-file
In order to create the patch, you should have a clean version of the
kernel (untared without any change), suppose it is in the directory
/usr/src/linux; also, suppose your modified kernel is in the directory
/usr/src/mylinux.
IMPORTANT: Before using the diff command, go to /usr/src/mylinux,
run "make clean" and "make mrproper" to remove all the
executables. Then enter the command:
diff -rcP /usr/src/linux /usr/src/mylinux > /tmp/mypatch.diff
In the command, the -r option lets the patch update subdirectory, u
uses unified context(default 2 lines), and P (When comparing
directories, if a file appears only in the second directory, it is
regarded as present but empty in the first directory) lets the patch
create files in the original kernel.
Now the /tmp/mypatch.diff file contains all the information the patch
commands needs to transform a clean version of the linux kernel in the
machine.
-b : Ignore changes in amount of white space
-U num: output num lines of unified context
-d: Use the algorithm to perhaps find a smaller set of changes.
-N: In directory comparison, if a file is found in only one directory,
treat it as present but empty in the other directory.
-w: Ignore white space when comparing lines
-B: Ignore changes that just insert or delete blank lines.
After generating the patch file, generate you can compress the patch file, your
project documents and the test program together, using tar -cvf
destination-file-name source-file-name1 source-file-name2
source-filename3...
Command: patch
To validate whether your patch works or not, you can use patch command.
You still need a clean version of the kernel, suppose it is in the
directory /usr/src/linux, enter this directory, and use the command
cd /usr/src/linux
patch < /tmp/mypatch.diff
- Here are several URLs you
will find useful as you work this project:
WHAT TO HAND IN
There are
two steps to what you will hand in - chronologically separated by one week:
1.
Design documentation - due on or before 11:59 PM on 19 Oct 2003.
2.
Source code and documentation - due on or before 11:59
PM
on 26 Oct 2003.
Design
documentation:
We are enforcing this deadline to ensure that people don't leave the
project until the last minute. You are, of course, welcome to visit
either the faculty or TA office hours for help; however, one of the
first things we'll ask for is your design documentation (unless you're
asking for help with that...). You may make changes to your
documentation before the due date for the source code (Oct
26th); however, the design portion of your grade will
depend heavily on the design document you hand in on Oct 19th.
Your
design documentation, typically 1-2 pages for a project of this size,
should include the basic design of your software (what modules will
you write, where will you make changes to the kernel etc.), a
timeline, as well as details on the testing that you plan to do to
ensure that your code works. The (section dependent) submit commands
are:
- submit cs421_0101
proj1 <Design Document FILE_NAME>: Section 0101
- submit cs421_0201 proj1 < Design
Document FILE_NAME>: Section 0201
- submit cs421_0301 proj1 < Design
Document FILE_NAME>: Section 0301
Source
code and documentation:
You will need to hand in all of your code and documentation using the submit programs available on the GL
cluster.
In particular, hand in a SINGLE tar file (run man tar for
information on 'tar' command) that contains the following items:
These were written up after the due date of Project 1 and do not
apply to the Fall 2003. This is for future!!
- The tar/gzipped version of a patch to your modified kernel
- The source code file(s) that implements the new system calls (this
could be a separate file or part of an existing kernel file)
- The updated entry.S, unistd.h and kernel Makefile
- The driver program (any .c, .cc and .h files) and any Makefile used
- Sample Output showing the execution of your driver program
(this can be obtained by running the 'script' command and
submitting the file 'typescript' generated as output by script.
- A README file (explaining how to compile/run your program) -
Include any "gotchas" with your program that you are aware
of. If the programs wont compile correctly, be honest and state
them here.
- A COMMENTS file (describing your experience with the project and
any suggestions/feedback to the instructors).
For submit, the commands will be different by section as shown
below:
- submit cs421_0101 proj1
<Code TAR FILE_NAME>: Section 0101
- submit cs421_0201 proj1 <Code
TAR FILE_NAME>: Section 0201
- submit cs421_0301 proj1 <Code
TAR FILE_NAME>: Section 0301
Final Phase
Assigned: 3 Nov 2003
Design Document Due: Nov 23, 2003 at 11:59PM; Design Document Due Date
Extended to Nov. 26th, 11.59PM
Design Document Guidelines
Final Code/Results Due: Dec 8, 2003 at 11:59 PM; Submission
deadline extended to Dec. 9th, Tuesday, 11.59PM
Goals
With project 1 completed, you should now be comfortable modifying
linux code in general, and adding system calls to linux in
particular. This document describes new functionality that you will
add to the linux filesystem.
Most present day filesystems store the raw data directly on disk. This
means that system administrators can see any data you store. In
addition, the security of your data is tied to the security of the
system as a whole. If miscreants can hack into the system as
superuser, or can defeat the protection mechanisms of the OS, or
physically steal the disk, then your data is compromised. One way to
avoid this is to store the data on the disk in an encrypted format,
with the decryption possible only with a key that you posses. This
project asks you to create such an encrypted filesystem by layering
the encryption/decryption process on top of the existing linux
filesystems.
Specifics
- For every file, there is a key that is used to encrypt
and decrypt the file. This key is provided by the user upon
file creation.
- Since users can find it annoying to remember all the file keys,
we define a File_of_Keys that stores each file's
key. Thus, each line in this file will contain at least two
entries: filename (absolute pathname) and the file's encryption
key.
- The above File_of_Keys is encrypted with a
Super_Key that the user provides. This super key can be
any string and in order to protect the super-user from knowing
this key, the Super_Key is stored in an encrypted form (using a
one-way function) in a
file called File_SuperKey in the user's home
directory.
- When a user program wishes to read an encrypted file, the
kernel first asks for the user's Super_Key, applies the same
one-way function. If the output of the function matches the
value stored in File_SuperKey, the user program is
successfully authenticated. Then, the kernel decrypts the
File_of_Keys using the supplied Super_Key and can obtain
the file-specific password which is used for subsequent open,
write, read, etc. on the given file.
This authentication is assumed to be valid until the user
program exits.
The above can be compared to Netscape's (or other browsers')
Password Manager function that lets you store login/passwords for
various different sites and uses a master password to encrypt the
passwords file.
For the project, you have to implement several new system calls:
For each user, a Logfile is maintained in the user's home
directory. This file keeps track of all activities - file creation,
open, read, write, deletion. The log information for each access
should include the username, file name, the action (create, open,
read, write, close, delete), any key failures, and the timestamp. The
logging is accomplished as follows:
Within each of the above system call's implementation, invoke fork()
or pthread_create() and let the child process take care of storing the
log information. The parent process may or may not wait until the
child process terminates (i.e. successfully logs) - you decide.
Once you implement the above system calls, you will write a set of
sample driver programs (at least 2) that invoke the above calls for a
different set of files. Assume that all the files you test with reside
in the user's home directory. Here are a set of suggested driver
programs:
- Sec_from_Text File_Name : This program will read a
regular text file and place the encrypted version in a new
file, say File_Name.enc
- Sec_Read_File File_Name : This program will output to
screen an encrypted text file.
- Sec_Copy_File File_Name_1 File_Name_2 : This program
will copy an encrypted File_Name_1 - via decryption and then
encryption - to File_Name_2. Note that the new file
will require a new encryption key.
- Sec_Line_Count File_Name : This program will output to
screen the number of unencrypted (orignal data) lines in an
encrypted file.
You can be more creative and write other/better driver programs.
Mechanics, and what to hand in
If a group works on a
project, then in general we will assign a common score to both
participants. If your group did not work out well for the Project 1
phase, then you are free to work independently. Please make sure that
the group is identified in the README you turn in, and that only one
member of the group submits the project!
There are two steps to what you will hand in:
- The Design Documentation is due by 11.59PM on Nov. 23, 2003. We
are enforcing this deadline to ensure that people don't leave the
project until the last minute. You are, of course, welcome to visit
either the faculty or TA office hours for help; however, one of the
first things we'll ask for is your design documentation (unless you're
asking for help with that...). You may make changes to your
documentation before the full Project handin; however, the design
portion of your grade will depend heavily on the design document you
hand-in on November 23th.
Your design documentation, typically 3-5 pages for a project
of this size, should include the basic design of your software (the
modules that you will write, their functionality and rough
psuedo-code, where will you make changes to the kernel etc.), a
timeline, as well as details on the testing that you plan to do to
ensure that your code works.
Submit using the online submit
command (class section-specific as before) and name your file
Project2-Design.txt or Project2-Design.pdf - NO OTHER
FORMATS Please.
- The Final Code and Results are due by
due by 11.59PM on Dec. 8 (MONDAY), 2003 (extended to Dec 9th,
Tuesday, 11.59PM).
Submit a single TAR and
gzipped file that contains:
- The patch file. Also, try the patch commands way
ahead of submission deadlines and let us know of any
problems. Make sure that you have a backup copy of
ALL your modified kernel source files before you use the
diff command.
Create a patch file using diff
command:
diff -crP /usr/src/linux /usr/src/mylinux > /tmp/mypatch.diff
where mylinux is *YOUR* project directory for the second project.
Please remember that the patch file should be generated
after you remove any object files by using a make
clean or make mrproper. Also, make sure you use the
right options to diff.
-
Create a new folder and copy ALL
the kernel-level .c and .h files that you
created/modified for
this project and tar, gzip to create Project2-Source.tar.gz
This file is a backup to your patch file.
- The driver programs (any .c, .cc and .h files)
- The encrypted File_of_Keys and File_SuperKey files used
in your testing and the super password (listed in the
README file).
- The input files used for the driver programs.
- Sample Output showing the execution of your driver
program (this can be obtained by running the 'script'
command and submitting the file 'typescript' generated
as output by script.)
- Performance Study Report: A 1-2 page report reporting
your tests, including performance measurements. We
suggest you measure the time taken to read/write files
of different sizes (at least 4 sizes ranging from 10
Kbytes to 1 Mbyte) with and without encryption to figure
out how much overhead the encryption process
causes. Present the numerical results in neat graph or
tabular forms.
- A README file (explaining how to compile/run your
program) - Include any "gotchas" with your program that
you are aware of. If the programs wont compile
correctly, be honest and state them here.
- A COMMENTS file (describing your experience with the
project and any suggestions/feedback to the instructors).
Helpful Hints
- Start with a CLEAN Base Kernel, i.e. DO NOT use the modified
kernel that you developed for Project 1.
- There is a great deal of information available about encryption in
linux. Things to check out include gpg, the linux encryption howto,
and information about the international kernel patches (which add many
crypto algorithm codes to the kernel, which you can then use. Most of
these are easily discovered using search engines like google or yahoo.
- For encryption, we suggest you use either Blowfish or 3DES
algorithms. Beware though that many of these operate on fixed size
blocks, so you will need to figure out how to convert random sized
reads and writes into the blocks needed by these algorithms.
- There are possibly similar implementations out there in
the public domain. They mostly do more than what we ask you to do
here, or do it in a different way. Some of these include the CryptFS,
CFS, TCFS, and the Encrypted Loopback filesystem. Your are welcome (in
fact, encouraged) to read their design and documentation. When you do
so, please identify any sources you used in your own design document,
especially of you liked some approach they have taken and plan to use
it yourself. DO NOT borrow without acknowledgment, and DO NOT borrow
code at all. These will both be counted as plagiarism. The only code
you are allowed to borrow as is from the net is the code for the
encryption algorithms themselves.
Grading the Project
The grading for the project will be as follows: 40% design, 50%
implementation, 10% testing. We have structured the grading in this
way to encourage you to think through your solution before you start
coding, and realize that testing your implementation is an important
part of any software development process. If all you do is to work out
a detailed design for what you would do to address the assignment (and
if the design would work!), but you write no code, you will still get
almost half of the credit for the assignment. Conversely, if you
implement correctly, but do not prove that by testing your
code, you will still not be given complete credit. Tests should
convince us of two things -- firstly that your implementation works
and secondly how much overhead it adds to the file operations.
The implementation portion of the grade considers whether you
implemented your design and provided documentation that the TA could
understand. Part of being a good computer scientist is coming up with
simple designs and easy to understand code; a solution which works
isn't necessarily the best that you can do. Thus, part of the design
and implementation grade will be based on whether your solution is
elegant, simple, and easy to understand.
We suggest that you do the project in two phases. First, just add
in the new functions without doing any complex encryption. Use
something simple -- we suggest a substitution cipher, with the key
indicating a shift. So a key of 4 will mean that A becomes E, B
becomes F, ... Z becomes D and so on. Successfully completing this
phase will entitle you to 75% of the implementation credit. The
remaining 25% will come from using a "real" crypto algorithm such as
Blowfish or 3DES.
There are several extra credit opportunities available, with the
extra credit varying from 5 to 25 percent of the total. For a small
amount of extra credit, encrypt not just the contents but even the
names of the files. For more extra credit, allow the user to specify
not just the encryption key but also the encryption algorithm on a per
file basis.
The intent of the grading for the project is not to
differentiate among those students who do a careful design and
implementation of the assignments. Rather, the grading helps us
identify those students who (i) don't do the assignments or (ii) don't
think carefully about the design, and therefore end up with a messy
and over-complicated solution. Remember that you can't pass this
course without at least making a serious attempt at each of the
assignments. Further, the grading is skewed so that you will get
substantial credit, even if your implementation doesn't completely
work, provided your design is logical and easy to understand. This
means that you should first strive to come up with a clean design of
your project on paper. Second, don't try to add fancy features
because some other group is doing so!
Rules for Collaboration
It is OK for you to discuss general
approaches with other groups. It is NOT OK to exchange solutions --
ideas or code. Please recall that academic dishonesty will be sternly
dealt with.