Homework78 : FFT
Now you build a learner. An FFT tree which, as you recall contains nodes tiwh two kids:
- One is an exit node and has the form `if x > 7 then y'.
- Note that the "arity" of the exit test is one.
- That is, we test for
x>7
and notx>7 AND y<3
.
- The other is
other
. Any data that fails the exit test goes intoother
(and FFT trees extended by recursing on that data). - FFT's are built to a maximum depth of, say,
d=4
. - FTT's run over discrete and continuous attributes
- For discrete attributes, the exit test will be
x=value
- For contuous attributes, the exit test will be
x<=10
orx>10
.- And just to keep it simple, we'll only try to break numerics at median values
- For discrete attributes, the exit test will be
The core algorithm is very simple:
- Divide all independent colums into
cuts
. - Sort all
cuts
by the average score of the dependent variable seen in eachcut
. - Split the data into (a) that which falls into the best
cut
and (b)other
. - Print out the best
cut
as an exit rule - Recurse on
other
.
Here's what I get whet I take
auto and run it through
dom
and then `fft
. The numbers at the end of each line is the mean value of the depedent
value in each cut
. Note that the best cut we see gets us to examples with an average
dom
score of 74.
if origin is 3 ==> 74
else %cylinders is 4 ==> 70
else %cylinders is 5 ==> 65
else %cylinders is 6 ==> 39
Here's another run using the same data but this time I discretize the numerics using unsuper. Now we get much better ressults (note our best gets us to 92). So the lesson here is that median cuts (done above) can be much worse that unsupervised discretized cuts (done below).
if horsepower is ..65 ==> 92
else horsepower is 66..69 ==> 86
else displacement is 97.5..101 ==> 82
else displacement is 85..91 ==> 79
What to hand in
The usual drill. A new sub-directory. Python source coe. and Example out .txt
file.
Task7: FFT, part1 (due this week).
In task1, do not worry about unsupervised descritization or building the tree. That is next week.
Using the code at fft, generate the cuts and sort them by their mean goal score. Print out what you would use for the first cut of the first part of the FFT tree.
-
For weatherLong, the best cut I find is
outlook=overcast
which has a mean dom score of 53 -
For auto, the best cut I find (without unsuper.s
origin=3
with an average dom score of 74.
Don't worry if you do not exactly my best cuts. Near enough will do.
Task8: FFT, part2 (due this week).
Add in unsupervised descritization. And build the FFT trees recursively (where at each level, find the new best cuts).