csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state
Output from Part 8. Answers to Questions 9,10,11.12,13,14,15,16 All in one pdf file.
The following works for LINUX and Mac. Windows users, please join in a discussion at windows. Or, don’t use windows and use codespaces on Github.
(If using codespaces on Github, carefully monitor your monthly costs using [this link])https://docs.github.com/en/billing/managing-billing-for-github-codespaces/viewing-your-github-codespaces-usage)
git clone https://github.com/timm/ezr/
git checkout 24Aug14
Install the htop monitor so you can track executions (Windows users many have an equivalent tool).
Generate a run file for the “clusters2” action.
mkdir -p ~/tmp/clusters2
make Act=clusters2 actb4 > ~/tmp/clusters2.sh
Edit that file. Using the “#” character, comment one very line EXCEPT those that mention SS-*.csv. Also comment out the lines than mention SS-W, SS-X SS-N (cuase these are slow to run),
Run that file. You will not see any output for a minute or two.
bash ~/tmp/clusters2.sh
% cd ~/tmp/clusters2
% grep k1 *.csv | cut -d, -f 3 | sort -n > /tmp/k1
% grep k2 *.csv | cut -d, -f 3 | sort -n > /tmp/k2
% grep k3 *.csv | cut -d, -f 3 | sort -n > /tmp/k3
% grep k5 *.csv | cut -d, -f 3 | sort -n > /tmp/k5
% grep mid *.csv | cut -d, -f 3 | sort -n > /tmp/mid
% paste /tmp/k1 /tmp/k2 /tmp/k3 /tmp/k5 /tmp/mid
The results should look like this. Here, we guess a goal value by clustering the data then for each test instance (a) finding its relevant cluster; then (b) using either the k=1,2,3,5 neighbors closest neighbors (or the mid-point of that cluster).
k=1 k=2 k=3 k=5 mid
==== ===== ==== ===== =====
-0.42 -0.21 -0.34 -0.28 -0.92
-0.07 -0.10 -0.16 -0.24 -0.62
-0.03 -0.03 -0.09 -0.19 -0.36
-0.02 -0.02 -0.05 -0.19 -0.22
-0.01 -0.01 -0.03 -0.18 -0.18
0.00 0.00 -0.02 -0.06 -0.17
0.00 0.00 -0.01 -0.05 -0.13
0.00 0.00 -0.01 -0.02 -0.12
0.00 0.00 -0.01 -0.01 -0.12
0.00 0.00 0.00 0.00 -0.11
0.00 -0.00 0.00 0.00 -0.11
0.00 -0.00 0.00 -0.00 -0.08
-0.00 -0.00 0.00 -0.00 -0.07
-0.00 -0.00 0.00 -0.00 -0.02
-0.00 -0.00 0.00 -0.00 -0.02
-0.00 0.01 0.00 0.01 -0.02
-0.00 0.01 -0.00 0.01 -0.01
0.01 0.01 -0.00 0.03 0.03
0.01 0.03 0.01 0.04 0.03
0.01 0.05 0.01 0.05 0.07
0.02 0.05 0.01 0.05 0.20
These are all “z” scores; i.e. (x - mid)/sd
. Note
that:
predict
function in ezr.py make a
prediction (predict is called inside the clusters2
function)?def chebyshev
. Whey do we normalize the goal values?yes
or no
for the following. Show all working.
outlook ,temperature ,humidity ,windy ,play!
======= ========== ========= ====== =====
sunny ,hot ,high ,FALSE ,no
sunny ,hot ,high ,TRUE ,no
rainy ,cool ,normal ,TRUE ,no
sunny ,mild ,high ,FALSE ,no
rainy ,mild ,high ,TRUE ,no
overcast ,hot ,high ,FALSE ,yes
rainy ,mild ,high ,FALSE ,yes
rainy ,cool ,normal ,FALSE ,yes
overcast ,cool ,normal ,TRUE ,yes
sunny ,cool ,normal ,FALSE ,yes
rainy ,mild ,normal ,FALSE ,yes
sunny ,mild ,normal ,TRUE ,yes
overcast ,mild ,high ,TRUE ,yes
overcast ,hot ,normal ,FALSE ,yes