csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state
(IMPORTANT NOTE: the experimental runs for this one can take a while– especially if you find a find a mistake and have to start again. Do not make this a last minute rush job!!!).
Three students from this class in the spring claim (Jacob, Joshua, and Rohan) claim that:
Use the extension from hw1 to find which data sets have less than 6 independent values
Run the following twice (once for the low dimensional data sets and once for the other). See what conclusions are found.
Now that the following is quickly written pseudo code. May have mistakes. You fix them. Have fun!
First you must write an experiment function
That experiment file needs to loop through some options and write to
a list of SOME instances (one per option). Note that one treatment must
be asIs
that runs over the data and collects all the
distances to heaven. This is the baseline result against which
everything else will be compared.
= DATA().adds(csv(the.train))
d = [d.chebyshev(row) for row in d.rows]
b4 = [stats.SOME(b4,f"asIs,{len(d.rows)}")] somes
Then you need to loop through some options to collect some numbers
into a list. This gets added to SOME
with a name that
identiges the treatment. In the following ,see some +=
:
= lambda z: z
rnd = [
scoring_policies 'exploit', lambda B, R,: B - R),
('explore', lambda B, R : (exp(B) + exp(R))/ (1E-30 + abs(exp(B) - exp(R))))]
(
for what,how in scoring_policies:
for the.Last in [0,20, 30, 40]:
for the.branch in [False, True]:
= time()
start = []
result = 0
runs for _ in range(repeats):
=d.shuffle().activeLearning(score=how)
tmp+= len(tmp)
runs += [rnd(d.chebyshev(tmp[0]))]
result
=f"{what}/b={the.branch}" if the.Last >0 else "rrp"
pre= f"{pre},{int(runs/repeats)}"
tag print(tag, f": {(time() - start) /repeats:.2f} secs")
+= [stats.SOME(result, tag)] somes
=f"{what}/b={the.branch}" if the.Last >0 else "rrp"
pre= f"{pre},{int(runs/repeats)}"
tag += [stats.SOME(result, tag)] somes
When all the looping is done, you have to print the result:
stats.report(somes, 0.01)
(In the above, “0.01” controls the size of the smallest difference we can print in the output.)
The scripts you write for these experiments are always quirky and complex. It is very easy to make mistakes and have to throw out days of compute. So test experimental scripts have to be commissioned.
Also: add in tests to check that the expected stuff is actually happening. e.g.
Makefile
has a tool for generating a todo file for
running multiple experiments
Lets say your experument can be called from the command line
-e branch
.
For example:
make Act=branch actb4 # this outputs
mkdir -p .../tmp/branch
rm .../tmp/branch/*
python3 .../ezr.py -D -t .../Apache_AllMeasurements.csv -e branch | tee .../tmp/branch/Apache_AllMeasurements.csv &
python3 .../ezr.py -D -t .../HSMGP_num.csv -e branch | tee .../tmp/branch/HSMGP_num.csv &
python3 .../ezr.py -D -t ../SQL_AllMeasurements.csv -e branch | tee .../tmp/branch/SQL_AllMeasurements.csv &
...
You can catch the output of actb4
into a
todo
file:
make Act=branch actb4 > ~/tmp/branch.sh
See here for a full example branch.sh.
You can now run all this to generate lots of output files. See here for a sample.
All those outputs can be summarizes with the rq.sh script:
cd ~/tmp/branch ; bash ~/gits/timm/ezr/etc/rq.sh
RANK 0 1 2 3
exploi/b=True 92 4 4
explore/b=True 80 16 2 2
exploi/b=False 71 24 4
explore/b=False 59 27 10 4
rrp 10 16 14 37
asIs 2 8 12 12
#
#EVALS
RANK 0 1 2 3
exploi/b=True 29 ( 8) 35 ( 0) 20 ( 0) 0 ( 0)
explore/b=True 29 ( 8) 29 ( 0) 20 ( 0) 30 ( 0)
exploi/b=False 28 ( 8) 26 ( 4) 20 ( 0) 0 ( 0)
explore/b=False 28 ( 4) 31 ( 8) 30 ( 0) 30 ( 0)
rrp 4 ( 0) 4 ( 0) 4 ( 0) 5 ( 0)
asIs 3840 ( 0) 6581 ( 0) 12835 ( 0) 16307 ( 0)
#
#DELTAS
RANK 0 1 2 3
exploi/b=True 73 ( 23) 48 ( 0) 41 ( 0) 0 ( 0)
explore/b=True 74 ( 21) 61 ( 0) 24 ( 0) 24 ( 0)
exploi/b=False 73 ( 26) 59 ( 19) 46 ( 0) 0 ( 0)
explore/b=False 71 ( 22) 58 ( 15) 52 ( 0) 54 ( 0)
rrp 61 ( 0) 50 ( 0) 22 ( 0) 22 ( 11)
RANKS: how often treatments are in rank 0,1,2,…
EVALS: is the budgets used to achieve those ranks.
DELTAS: are the 100*(asIs - now)/asIs
change.
asIs
outputSubmit a url link to moodle with a repo link that has a /hw3 subdirectory