Homework78 : FFT

Now you build a learner. An FFT tree which, as you recall contains nodes tiwh two kids:

One is an exit node and has the form `if x > 7 then y'.
- Note that the "arity" of the exit test is one.
- That is, we test for x>7 and not x>7 AND y<3.
The other is other. Any data that fails the exit test goes into other (and FFT trees extended by recursing on that data).
FFT's are built to a maximum depth of, say, d=4.
FTT's run over discrete and continuous attributes
- For discrete attributes, the exit test will be x=value
- For contuous attributes, the exit test will be x<=10 or x>10.
  - And just to keep it simple, we'll only try to break numerics at median values

The core algorithm is very simple:

Divide all independent colums into cuts.
Sort all cuts by the average score of the dependent variable seen in each cut.
Split the data into (a) that which falls into the best cut and (b) other.
Print out the best cut as an exit rule
Recurse on other.

Here's what I get whet I take auto and run it through dom and then `fft. The numbers at the end of each line is the mean value of the depedent value in each cut. Note that the best cut we see gets us to examples with an average dom score of 74.

if      origin  is  3   ==> 74
else    %cylinders  is  4   ==> 70
else    %cylinders  is  5   ==> 65
else    %cylinders  is  6   ==> 39

Here's another run using the same data but this time I discretize the numerics using unsuper. Now we get much better ressults (note our best gets us to 92). So the lesson here is that median cuts (done above) can be much worse that unsupervised discretized cuts (done below).

if      horsepower  is  ..65    ==> 92
else    horsepower  is  66..69  ==> 86
else    displacement    is  97.5..101   ==> 82
else    displacement    is  85..91  ==> 79

What to hand in

The usual drill. A new sub-directory. Python source coe. and Example out .txt file.

Task7: FFT, part1 (due this week).

In task1, do not worry about unsupervised descritization or building the tree. That is next week.

Using the code at fft, generate the cuts and sort them by their mean goal score. Print out what you would use for the first cut of the first part of the FFT tree.

For weatherLong, the best cut I find is outlook=overcast which has a mean dom score of 53
For auto, the best cut I find (without unsuper.s origin=3 with an average dom score of 74.

Don't worry if you do not exactly my best cuts. Near enough will do.

Task8: FFT, part2 (due this week).

Add in unsupervised descritization. And build the FFT trees recursively (where at each level, find the new best cuts).

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search