^CSE454^
^2003^
<prac 1<
Prac' 2 CSE454 CSSE Monash Semester 1, 2003
Due (CSSE office) noon, Friday, week 12, 30 May 2003.
This prac' involves using the
[C5 (on nexus)]
classification- (decision-) -tree program,
[tutorial (click)].
- Use C5 to analyse one of the data sets in the .../c5/Data/
directory:
- anneal - Mark, Chong, Owen(R)
- breast-cancer - Dhananjay, Shannon, David(C)
- credit - Simon, Owen(W), Stephen
- genetics - Alex, Andrew, David(W)
- hypothyroid - Timothy, Vesna, and anyone else not listed
- letter - Jonathan, Paul
- sonar - Sascha, Stewart
In 1-page, draw the tree (nicely!), or
the top levels if the whole is too big, and
describe what the tree means for the data set.
[5%]
- (a)
Take your ``best''[*] unsupervised classification
of the cgi-bin data from
prac 1.
You need to choose a good value for `k' and
you might use something from your
``... further analysis, e.g. (a) on
different attributes, ...''
Take the most probable class, `C', output by Snob for
each observation as the attribute to be predicted.
(If your answer to prac 1 ``didn't go too well'', you
should probably ``improve'' it now.)
(b) Use C5 to form a classification-tree to predict `C'
from the other attributes.
If `C' has too many[*] values, that is too high arity,
you might need to reduce the number of values,
i.e. force Snob
to produce fewer classes,
e.g. by stopping it after fewer adjust cycles than normal,
or by merging some classes.
(c) Draw the tree (or the top levels if large)
and write a short(!) report describing
what the tree means for the data set.
(d) If appropriate, perform other similar analysis(es) on
transformed attributes, or on extra attributes,
e.g. `address' is a `commercial/ educational/ other' ISP.
[15%]
[*] Use judgement.
© L. Allison,
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & Solaris)", charset=iso-8859-1