^CSE454^ ^2005^

Practical 2, CSE454 CSSE Monash, Semester 1, 2005

Due CSSE general office, noon Thursday, week 12, 26 May 2005.

Do not use Excel or other software, particularly not to produce graphs and diagrams. This is not to be perverse; there are far fewer problems with readability when graphs and diagrams are drawn by hand.

This practical involves using the [C5 (on nexus)] decision (classification) tree program [tutorial (csse domain)].

  1. Take the [Anderson / Fisher] Iris data set and put it in a suitable form for use with C5 -- the 5th attribute, species, allows it to be used for 'supervised classification'. In 1-page, draw the tree (nicely!), or the top levels if the whole is too big, and describe what it means for the data set.
    If your use of Snob in practical-1 (unsupervised classification) found a different set(s) of "species" from the botanists, take the (most probable) class in Snob's "best" set as the the attribute to be predicted by C5 and do the same analysis.
    [5 marks]

    In [../XD6/] is (i) a program to generate any amount of a certain kind of data and (ii) examples of this kind of data. The data have ten binary attributes, `a1' to `a9' and `class' where   class = (a1 & a2 & a3) or (a4 & a5 & a6) or (a7 & a8 & a9) +/- noise.

  2. Draw the decision (classification) tree that describes the reduced XD':   class' = (a1' & a2') or (a3' & a4').   Identify the repeated sub-tree.
  3. How many nodes would the full tree for XD6 have? Why?
  4. Decision (classification) tree programs have difficulty learning disjunctive functions (-graphs do better). Investigate the ability of C5 to learn a good model, or the true model, for XD6, as you vary
    1. the size of the training data and
    2. the noise level (you will need to modify the generator slightly)
    Write a short report on C5's performance. You might like to include some of
    1. Optionally, results on a reduced XD' data set.
    2. Some example trees.
    3. Tables of results.
    4. Right/wrong scores on test data.
    5. The "closeness" of true and inferred trees.
    6. Other ?
    [15 marks]

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1