Homework5
What to hand in
The usual drill. A new sub-directory. Python source coe. and Example out .txt
file.
Part 1: Domination (Easy)
Port the code dom.lua to Python. Using that code, add the dom score to each row of a data file.
Test that code on 2 files: weatherLong and auto.
Test1
Input, from weatherLong
:
%outlook $temp <humid wind !play
over 64 65 TRUE yes
over 64 65 TRUE yes
over 72 90 TRUE yes
over 72 90 TRUE yes
over 81 75 FALSE yes
over 81 75 FALSE yes
over 83 86 FALSE yes
over 83 86 FALSE yes
sunny 69 70 FALSE yes
sunny 69 70 FALSE yes
rainy 65 70 TRUE no
rainy 65 70 TRUE no
sunny 75 70 TRUE yes
sunny 75 70 TRUE yes
rainy 75 80 FALSE yes
rainy 75 80 FALSE yes
rainy 68 80 FALSE yes
rainy 68 80 FALSE yes
sunny 85 85 FALSE no
sunny 85 85 FALSE no
sunny 80 90 TRUE no
sunny 80 90 TRUE no
rainy 71 91 TRUE no
rainy 71 91 TRUE no
sunny 72 95 FALSE no
sunny 72 95 FALSE no
rainy 70 96 FALSE yes
rainy 70 96 FALSE yes
Output. Note that max dom
is seen for lowest humidity:
%outlook $temp <humid wind !play >dom
over 64 65 TRUE yes 0.93
over 64 65 TRUE yes 0.91
over 72 90 TRUE yes 0.23
over 72 90 TRUE yes 0.17
over 81 75 FALSE yes 0.67
over 81 75 FALSE yes 0.69
over 83 86 FALSE yes 0.3
over 83 86 FALSE yes 0.36
sunny 69 70 FALSE yes 0.65
sunny 69 70 FALSE yes 0.72
rainy 65 70 TRUE no 0.75
rainy 65 70 TRUE no 0.73
sunny 75 70 TRUE yes 0.74
sunny 75 70 TRUE yes 0.83
rainy 75 80 FALSE yes 0.53
rainy 75 80 FALSE yes 0.56
rainy 68 80 FALSE yes 0.41
rainy 68 80 FALSE yes 0.48
sunny 85 85 FALSE no 0.39
sunny 85 85 FALSE no 0.43
sunny 80 90 TRUE no 0.23
sunny 80 90 TRUE no 0.1
rainy 71 91 TRUE no 0.12
rainy 71 91 TRUE no 0.1
sunny 72 95 FALSE no 0.04
sunny 72 95 FALSE no 0.07
rainy 70 96 FALSE yes 0
rainy 70 96 FALSE yes 0
Test2
- Input:
auto
- Output: Sort by the last column (the
dom
score) and show first and last 10 lines.
The output should look something like the line Here's the same data, with dom score added. Shown here are the 5 best and worst rows. in the domination lecture lecture.
If you do Test2 correctly, then highest dom scores should be assocaited wiht rows with least weight, most acceleration and most mpg (and the lowest dom scores are associated with the reverse).
Part2: Unsupervised discretization (Tricky)
Port the code code unsuper.lua to Python
and test it on weatherLong
. This code
find all numeric independent columns then splits them to minmize the execpted value of
the standard deviation of those columns, after the splits.
For example, input:
%outlook, $temp, <humid, wind, !play
over, 64, 65, TRUE, yes
over, 64, 65, TRUE, yes
over, 72, 90, TRUE, yes
over, 72, 90, TRUE, yes
over, 81, 75, FALSE, yes
over, 81, 75, FALSE, yes
over, 83, 86, FALSE, yes
over, 83, 86, FALSE, yes
sunny, 69, 70, FALSE, yes
sunny, 69, 70, FALSE, yes
rainy, 65, 70, TRUE, no
rainy, 65, 70, TRUE, no
sunny, 75, 70, TRUE, yes
sunny, 75, 70, TRUE, yes
rainy, 75, 80, FALSE, yes
rainy, 75, 80, FALSE, yes
rainy, 68, 80, FALSE, yes
rainy, 68, 80, FALSE, yes
sunny, 85, 85, FALSE, no
sunny, 85, 85, FALSE, no
sunny, 80, 90, TRUE, no
sunny, 80, 90, TRUE, no
rainy, 71, 91, TRUE, no
rainy, 71, 91, TRUE, no
sunny, 72, 95, FALSE, no
sunny, 72, 95, FALSE, no
rainy, 70, 96, FALSE, yes
rainy, 70, 96, FALSE, yes
Output (where x..y means "x to y" and ..x means up to x" and x.. means "x and above")
%outlook temp <humid wind !play
over ..69 65 TRUE yes
over ..69 65 TRUE yes
rainy ..69 70 TRUE no
rainy ..69 70 TRUE no
rainy ..69 80 FALSE yes
rainy ..69 80 FALSE yes
sunny ..69 70 FALSE yes
sunny ..69 70 FALSE yes
rainy 70..72 96 FALSE yes
rainy 70..72 96 FALSE yes
rainy 70..72 91 TRUE no
rainy 70..72 91 TRUE no
over 70..72 90 TRUE yes
sunny 70..72 95 FALSE no
sunny 72..75 95 FALSE no
over 72..75 90 TRUE yes
rainy 72..75 80 FALSE yes
rainy 72..75 80 FALSE yes
sunny 72..75 70 TRUE yes
sunny 72..75 70 TRUE yes
sunny 80.. 90 TRUE no
sunny 80.. 90 TRUE no
over 80.. 75 FALSE yes
over 80.. 75 FALSE yes
over 80.. 86 FALSE yes
over 80.. 86 FALSE yes
sunny 80.. 85 FALSE no
sunny 80.. 85 FALSE no
Note: your results may differ somewhat from mine due to your different engineering decisions. That's cool.