Homework6
What to hand in
The usual drill. A new sub-directory. Python source coe. and Example out .txt
file.
Task1: Supervised Domination
Add the dom score to a data set, then supervise discretize the independent numeric attribites
Test that code on 2 files: weatherLong and auto.
Code
- See super.lua
- See config.lua
Test1
Output, from weatherLong
, affter it is get a dom score (don't worry
if you don't get exactly this):
-- $temp ----------
|.. 64..85
|.. |.. 64..69 = 69
|.. |.. 70..85
|.. |.. |.. 70..72 = 9
|.. |.. |.. 75..85 = 48
%outlook temp <humid wind !play >dom
over ..69 65 TRUE yes 0.93
over ..69 65 TRUE yes 0.91
rainy ..69 70 TRUE no 0.75
rainy ..69 70 TRUE no 0.73
rainy ..69 80 FALSE yes 0.41
rainy ..69 80 FALSE yes 0.48
sunny ..69 70 FALSE yes 0.65
sunny ..69 70 FALSE yes 0.72
rainy 70..72 96 FALSE yes 0
rainy 70..72 96 FALSE yes 0
rainy 70..72 91 TRUE no 0.1
rainy 70..72 91 TRUE no 0.12
over 70..72 90 TRUE yes 0.23
sunny 70..72 95 FALSE no 0.04
sunny 70..72 95 FALSE no 0.07
over 70..72 90 TRUE yes 0.17
rainy 75.. 80 FALSE yes 0.53
rainy 75.. 80 FALSE yes 0.56
sunny 75.. 70 TRUE yes 0.74
sunny 75.. 70 TRUE yes 0.83
sunny 75.. 90 TRUE no 0.23
sunny 75.. 90 TRUE no 0.1
over 75.. 75 FALSE yes 0.67
over 75.. 75 FALSE yes 0.69
over 75.. 86 FALSE yes 0.3
over 75.. 86 FALSE yes 0.36
sunny 75.. 85 FALSE no 0.43
sunny 75.. 85 FALSE no 0.39
Test2
Output, from auto
, affter it is get a dom score (don't worry
if you don't get exactly this).
What I am showing here is just the ranges found by the recursive descent:
- $displacement ----------
|.. 68..455
|.. |.. 68..156
a |.. |.. |.. 68..105 = 83
b |.. |.. |.. 105..156 = 60
|.. |.. 156..455
c |.. |.. |.. 156..262 = 38
|.. |.. |.. 262..455
d |.. |.. |.. |.. 262..351 = 16
e |.. |.. |.. |.. 351..455 = 7
-- $horsepower ----------
|.. 46..230
|.. |.. 46..96
|.. |.. |.. 46..70
a |.. |.. |.. |.. 46..60 = 95
b |.. |.. |.. |.. 61..70 = 88
|.. |.. |.. 70..96
c |.. |.. |.. |.. 70..82 = 71
d |.. |.. |.. |.. 83..96 = 58
|.. |.. 96..230
e |.. |.. |.. 96..125 = 37
|.. |.. |.. 125..230
f |.. |.. |.. |.. 125..145 = 20
|.. |.. |.. |.. 145..230
g |.. |.. |.. |.. |.. 145..170 = 12
h |.. |.. |.. |.. |.. 170..230 = 7
-- $model ----------
|.. 70..82
a |.. |.. 70..76 = 39
|.. |.. 76..82
b |.. |.. |.. 76..79 = 50
c |.. |.. |.. 79..82 = 74
Task2: Find the Splitter
The splitter
is the attribute which:
- if we split on that attribute,
- the expected value of the standard deviation after the split is less.
As an example of expected value,
in the above for $model
, there are splits a,b,c
of sizes 39,50,74.
- Suppose the variance of the dom score in those splits is
Va, Vb, Vc
- 30/n * Va + 50/n * Vb + 74/n * Vc is the expected value of
$model
after the split is (where n = 39+50+74).