Due at the start of class on September 9
In this homework assignment you will gain familiarity with WEKA, the Waikato Environment for Knowledge Analysis. WEKA is widely used in the machine learning and data mining communities because, among other things, it provides both a nice user interface to a number of standard algorithms and a Java API.
First, you must download WEKA from the following URL: http://www.cs.waikato.ac.nz/ml/weka/. The "Getting Started" section of that page has links for information on system requirements, how to download the software, and documentation. WEKA is written in Java and should run on any platform with Java 1.5 or higher.
Read about the Adult Census Income dataset, and get it in the form of an ARFF file. The do the following:
Consider the following decision tree:
(A) (478 and 678 students) Draw the decision boundaries defined by this tree. Each leaf is labeled with a letter. Write this letter in the corresponding region of instance space.
(B) (478 and 678 students) Give another decision tree that is syntactically different but defines the same decision boundaries. Your answer must be in the form of a decision tree. This demonstrates that the space of decision trees is syntactically redundant.
(C) (678 students only) Does the fact that the hypothesis space of decision trees has this kind of redundancy have any implications for learning decision trees? In particular, comment briefly on whether this redudancy makes it more difficult to find accurate trees (as opposed the inaccurate trees) and whether it makes decision tree learning more computationally expensive.