Homework5

What to hand in

The usual drill. A new sub-directory. Python source coe. and Example out .txt file.

Part 1: Domination (Easy)

Port the code dom.lua to Python. Using that code, add the dom score to each row of a data file.

Test that code on 2 files: weatherLong and auto.

Test1

Input, from weatherLong:

%outlook       $temp            <humid          wind   !play
over            64              65              TRUE    yes
over            64              65              TRUE    yes
over            72              90              TRUE    yes
over            72              90              TRUE    yes
over            81              75              FALSE   yes
over            81              75              FALSE   yes
over            83              86              FALSE   yes
over            83              86              FALSE   yes
sunny           69              70              FALSE   yes
sunny           69              70              FALSE   yes
rainy           65              70              TRUE    no
rainy           65              70              TRUE    no
sunny           75              70              TRUE    yes
sunny           75              70              TRUE    yes
rainy           75              80              FALSE   yes
rainy           75              80              FALSE   yes
rainy           68              80              FALSE   yes
rainy           68              80              FALSE   yes
sunny           85              85              FALSE   no
sunny           85              85              FALSE   no
sunny           80              90              TRUE    no
sunny           80              90              TRUE    no
rainy           71              91              TRUE    no
rainy           71              91              TRUE    no
sunny           72              95              FALSE   no
sunny           72              95              FALSE   no
rainy           70              96              FALSE   yes
rainy           70              96              FALSE   yes

Output. Note that max dom is seen for lowest humidity:

%outlook  $temp  <humid  wind   !play  >dom
over      64     65      TRUE   yes    0.93
over      64     65      TRUE   yes    0.91
over      72     90      TRUE   yes    0.23
over      72     90      TRUE   yes    0.17
over      81     75      FALSE  yes    0.67
over      81     75      FALSE  yes    0.69
over      83     86      FALSE  yes    0.3
over      83     86      FALSE  yes    0.36
sunny     69     70      FALSE  yes    0.65
sunny     69     70      FALSE  yes    0.72
rainy     65     70      TRUE   no     0.75
rainy     65     70      TRUE   no     0.73
sunny     75     70      TRUE   yes    0.74
sunny     75     70      TRUE   yes    0.83
rainy     75     80      FALSE  yes    0.53
rainy     75     80      FALSE  yes    0.56
rainy     68     80      FALSE  yes    0.41
rainy     68     80      FALSE  yes    0.48
sunny     85     85      FALSE  no     0.39
sunny     85     85      FALSE  no     0.43
sunny     80     90      TRUE   no     0.23
sunny     80     90      TRUE   no     0.1
rainy     71     91      TRUE   no     0.12
rainy     71     91      TRUE   no     0.1
sunny     72     95      FALSE  no     0.04
sunny     72     95      FALSE  no     0.07
rainy     70     96      FALSE  yes    0
rainy     70     96      FALSE  yes    0

Test2

The output should look something like the line Here's the same data, with dom score added. Shown here are the 5 best and worst rows. in the domination lecture lecture.

If you do Test2 correctly, then highest dom scores should be assocaited wiht rows with least weight, most acceleration and most mpg (and the lowest dom scores are associated with the reverse).

Part2: Unsupervised discretization (Tricky)

Port the code code unsuper.lua to Python and test it on weatherLong. This code find all numeric independent columns then splits them to minmize the execpted value of the standard deviation of those columns, after the splits.

For example, input:

%outlook, $temp, <humid, wind, !play
over,   64, 65, TRUE,   yes
over,   64, 65, TRUE,   yes
over,   72, 90, TRUE,   yes
over,   72, 90, TRUE,   yes
over,   81, 75, FALSE,  yes
over,   81, 75, FALSE,  yes
over,   83, 86, FALSE,  yes
over,   83, 86, FALSE,  yes
sunny,  69, 70, FALSE,  yes
sunny,  69, 70, FALSE,  yes
rainy,  65, 70, TRUE,   no
rainy,  65, 70, TRUE,   no
sunny,  75, 70, TRUE,   yes
sunny,  75, 70, TRUE,   yes
rainy,  75, 80, FALSE,  yes
rainy,  75, 80, FALSE,  yes
rainy,  68, 80, FALSE,  yes
rainy,  68, 80, FALSE,  yes
sunny,  85, 85, FALSE,  no
sunny,  85, 85, FALSE,  no
sunny,  80, 90, TRUE,   no
sunny,  80, 90, TRUE,   no
rainy,  71, 91, TRUE,   no
rainy,  71, 91, TRUE,   no
sunny,  72, 95, FALSE,  no
sunny,  72, 95, FALSE,  no
rainy,  70, 96, FALSE,  yes
rainy,  70, 96, FALSE,  yes

Output (where x..y means "x to y" and ..x means up to x" and x.. means "x and above")

%outlook   temp    <humid   wind   !play
over      ..69    65       TRUE   yes
over      ..69    65       TRUE   yes
rainy     ..69    70       TRUE   no
rainy     ..69    70       TRUE   no
rainy     ..69    80       FALSE  yes
rainy     ..69    80       FALSE  yes
sunny     ..69    70       FALSE  yes
sunny     ..69    70       FALSE  yes
rainy     70..72  96       FALSE  yes
rainy     70..72  96       FALSE  yes
rainy     70..72  91       TRUE   no
rainy     70..72  91       TRUE   no
over      70..72  90       TRUE   yes
sunny     70..72  95       FALSE  no
sunny     72..75  95       FALSE  no
over      72..75  90       TRUE   yes
rainy     72..75  80       FALSE  yes
rainy     72..75  80       FALSE  yes
sunny     72..75  70       TRUE   yes
sunny     72..75  70       TRUE   yes
sunny     80..    90       TRUE   no
sunny     80..    90       TRUE   no
over      80..    75       FALSE  yes
over      80..    75       FALSE  yes
over      80..    86       FALSE  yes
over      80..    86       FALSE  yes
sunny     80..    85       FALSE  no
sunny     80..    85       FALSE  no

Note: your results may differ somewhat from mine due to your different engineering decisions. That's cool.