^CSE454^ ^2003^ <prac 1<

Prac' 2 CSE454 CSSE Monash Semester 1, 2003

Due (CSSE office) noon, Friday, week 12, 30 May 2003.

This prac' involves using the [C5 (on nexus)] classification- (decision-) -tree program, [tutorial (click)].

Use C5 to analyse one of the data sets in the .../c5/Data/ directory:
- anneal - Mark, Chong, Owen(R)
- breast-cancer - Dhananjay, Shannon, David(C)
- credit - Simon, Owen(W), Stephen
- genetics - Alex, Andrew, David(W)
- hypothyroid - Timothy, Vesna, and anyone else not listed
- letter - Jonathan, Paul
- sonar - Sascha, Stewart
In 1-page, draw the tree (nicely!), or the top levels if the whole is too big, and describe what the tree means for the data set.
[5%]
(a) Take your ``best''^[*] unsupervised classification of the cgi-bin data from prac 1. You need to choose a good value for `k' and you might use something from your ``... further analysis, e.g. (a) on different attributes, ...''
Take the most probable class, `C', output by Snob for each observation as the attribute to be predicted. (If your answer to prac 1 ``didn't go too well'', you should probably ``improve'' it now.)

(b) Use C5 to form a classification-tree to predict `C' from the other attributes.
If `C' has too many^[*] values, that is too high arity, you might need to reduce the number of values, i.e. force Snob to produce fewer classes, e.g. by stopping it after fewer adjust cycles than normal, or by merging some classes.

(c) Draw the tree (or the top levels if large) and write a short(!) report describing what the tree means for the data set.

(d) If appropriate, perform other similar analysis(es) on transformed attributes, or on extra attributes, e.g. `address' is a `commercial/ educational/ other' ISP.
[15%]

[*] Use judgement.

Created with "vi (Linux & Solaris)", charset=iso-8859-1