csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state

home :: timetable :: syllabus :: groups :: moodle :: license

Notes on Bayes

Tim Menzies

Auguest 2024

A Bayes classifier is a simple statistical-based learning scheme.

Advantages:

Tiny memory footprint
Fast training, fast learning
Simplicity
Often works surprisingly well

Assumptions

Learning is done best via statistical modeling
Attributes are
- equally important
- statistically independent (given the class value)
- Which means that knowledge about the value of a particular attribute doesn’t tell us anything about the value of another attribute (if the class is known)

Although based on assumptions that are almost never correct, this scheme works well in practice ¹.

It turns out, we don’t need to guess like 100% accurately.
There are large regions where e.g. L(A) is more than L(B)
So we can decide A or B across a large range of values.

    like(B)
       ^
     6 |     /
       |    /
       |   /
     3 |  /     <-- region where L(A) > L(B)
       | /
       |/ 
       .------> like(A)
       0  3  6

Aside: note the zone of confusion

where L(A) == L(B).
And there the region of confidence L(A) is large and L(B) is large.
So if \(b=L(B)\) and \(a=L(B)\), a region of much interest is where everyone is confidently saying differnt things:
- \(\frac{a+b}{abs(a - b)}\).

Example

outlook  temperature  humidity   windy   play
-------  -----------  --------   -----   ----
rainy    cool        normal    TRUE    no
rainy    mild        high      TRUE    no
sunny    hot         high      FALSE   no
sunny    hot         high      TRUE    no
sunny    mild        high      FALSE   no
overcast cool        normal    TRUE    yes
overcast hot         high      FALSE   yes
overcast hot         normal    FALSE   yes
overcast mild        high      TRUE    yes
rainy    cool        normal    FALSE   yes
rainy    mild        high      FALSE   yes
rainy    mild        normal    FALSE   yes
sunny    cool        normal    FALSE   yes
sunny    mild        normal    TRUE    yes%%

This data can be summarized as follows:

           Outlook            Temperature           Humidity
====================   =================   =================
          Yes    No            Yes   No            Yes    No
Sunny       2     3     Hot     2     2    High      3     4
Overcast    4     0     Mild    4     2    Normal    6     1
Rainy       3     2     Cool    3     1
          -----------         ---------            ----------
Sunny     2/9   3/5     Hot   2/9   2/5    High    3/9   4/5
Overcast  4/9   0/5     Mild  4/9   2/5    Normal  6/9   1/5
Rainy     3/9   2/5     Cool  3/9   1/5

            Windy        Play
=================    ========
      Yes     No     Yes   No
False 6      2       9     5
True  3      3
      ----------   ----------
False  6/9    2/5   9/14  5/14
True   3/9    3/5

So, what happens on a new day:

Outlook       Temp.         Humidity    Windy         Play
Sunny         Cool          High        True          ?%%

First find the likelihood of the two classes

For “yes” = 2/9 * 3/9 * 3/9 * 3/9 * 9/14 = 0.0053
For “no” = 3/5 * 1/5 * 4/5 * 3/5 * 5/14 = 0.0206

Conversion into a probability by normalization:

P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

So, we aren’t playing golf today.

Numerics

For numeric columns:

we compute probability for a value by assumung a normal bell-shaped curve

@of("If `x` is known, add this COL.")
def add(self:COL, x:any) -> any:
  if x != "?":
    self.n += 1
    self.add1(x)

@of("add symbol counts.")
def add1(self:SYM, x:any) -> any:
  self.has[x] = self.has.get(x,0) + 1
  if self.has[x] > self.most: self.mode, self.most = x, self.has[x]
  return x

@of("add `mu` and `sd` (and `lo` and `hi`). If `x` is a string, coerce to a number.")
def add1(self:NUM, x:any) -> number:
  self.lo  = min(x, self.lo)
  self.hi  = max(x, self.hi)
  d        = x - self.mu
  self.mu += d / self.n
  self.m2 += d * (x -  self.mu)
  self.sd  = 0 if self.n <2 else (self.m2/(self.n-1))**.5

Putting it all together

So here’s the NB classifier:

@of("How much DATA likes a `row`.")
def loglike(self:DATA, r:row, nall:int, nh:int) -> float:
  prior = (len(self.rows) + the.k) / (nall + the.k*nh)
  likes = [c.like(r[c.at], prior) for c in self.cols.x if r[c.at] != "?"]
  return sum(log(x) for x in likes + [prior] if x>0)

@of("How much a SYM likes a value `x`.")
def like(self:SYM, x:any, prior:float) -> float:
  return (self.has.get(x,0) + the.m*prior) / (self.n + the.m)

@of("How much a NUM likes a value `x`.")
def like(self:NUM, x:number, _) -> float:
  v     = self.sd**2 + 1E-30
  nom   = exp(-1*(x - self.mu)**2/(2*v)) + 1E-30
  denom = (2*pi*v) **0.5
  return min(1, nom/(denom + 1E-30))

Pedro Domingos and Michael Pazzani. 1997. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Mach. Learn. 29, 2-3 (November 1997), 103-130↩︎