csc 510-001, (1877)
fall 2024, software engineering
Tim Menzies, timm@ieee.org, com sci, nc state


home :: syllabus :: corpus :: groups :: moodle :: license

Abstraction

Historically, abstraction has enabled:

What is abstraction?

Formal definition

Between the programmer and the lower-level details (e.g. the hardware, the lower-level language constructs) there is a layers of abstraction

Concrete example of abstraction:

  1. Concrete: Countries have states. States have cities. Cities have suburbs. Suburbs have streets. Streets have houses.
  2. Abstract: Things, at level i have sub-things at level i+1.
    • Note that number 2, would need some tools for instance creation
    • Maybe something as simple as a yaml config file to make your country

If you are only building one application, then clearly number1 - And if you racing to get “it” out the door then there is clearly a case for using the first, most obvious design.

But - If you are testing “it” then - abstractions reduce the number of tests. - eg. in the above, one test, not four - Abstract: - if we delete things, do we also delete sub-things versus - Concrete: - if we delete suburbs, do we also delete streets? - if we delete cities, do we also delete suburbs? - if we delete states, do we also delete cities? - if we delete countries, do we also delete states? - If after doing “it”, - And if you want to support very large communities not just doing “it0” but also “it1, it2…” - And you want to build something unique that might be the basis for an entirely new industry - Supporting billions of dollars in IT acquisitions - Then you need abstraction.

Kinds of abstractions

Abstraction has been applied, successfully, dozens of times in the history of computing

The following list is sorted by their memory footprint, largest to smallest:

  1. Operating systems
  2. Virtual machines
  3. Containers
  4. Serverless systems
  5. Erlang’s trick
  6. Unix Pipes
  7. OO class hierarchies
  8. Macros
  9. Iterators and error handlers
    • These last two is not the same as the rest
      • Iterators and error handlers are abstraction tricks for programmers are smaller than
      • Implemented and popularized by Barbara Liskov
    • The rest allow you jump computation around one of more CPUs

Also, for other notes on abstraction that lead to different programming languages (LISP, Smalltalk, Java, Javascript, CoffeeScript) - see next lecture

Why abstraction?

How to abstract

Warning:

BTW, best abstractions needs really, really smart people

(*) You don’t know who Guy Steele is? Shame on you! Read it and weep.

Design Patters

Q: So what have programmers seen before that might repeat in future systems?
A: Patterns

Patterns example 1: 3-tiered architectures

Data : model : dialog

image
image

Patterns example 2: Composite design patterns

image
image
image
image

Patterns have review heuristics:

Patterns example 3: CRUD: low-level code idiom

CRUD = create, read, update, delete

BTW, CRUD can leap from idiom to architecture (web-scale RESTful applications)

Regardless - the main points here is that complex systems can be abstracted into reasonable chunks via Patterns - Such abstraction simplifies how we discuss even complex systems

Patterns have review heuristics:

Error handling

Throw a ball into a stadium, wait for what returns while holding a catchers mitt. Only catch things you know how to handle

def atom(token: str) -> Atom:
    "Numbers become numbers; every other token is a symbol."
    try: return int(token)
    except ValueError:
        try: return float(token)
        except ValueError:
            return Symbol(token)

Iterators

When you code recursive data structures, give programmers so way to way around that code.

e.g. What does this return?

def items(x):
  if isinstance(x,(str,list,tuple)):
    for y in x:
      for z in items(y):
        yield z
  else:
    yield x


for x in items([10,[[11,12,13],15], 20,30,[40,50,"tim"]]):
   print(x)

Another example (in awk). This example shows that iterators can be implemented in any language, it you think the right way.

# application code


while(csv(lst,file))
  doSOmething(lst)

Handles numerous details for reading CSV files. For example, the following CSV file has blank lines (to be skipped), white space and comments (that need deleting), strings containing numbers that need to be coerced into real numbers, and record that that split on “,” (which need to be combined with the next line).

name,   # who
age,    # in year`s
shoesize # posint
#########################


# family1
tim, # what
21,  # a 
10   # nice guy
jane  ,  6,  2


# family2
tim, 21, 10 
jane,6,2

The csv iterator reader is below. This iterator calls a sub-iterator getline that returns 0 on end of line:

# iterator code


function csv(a,file,                        # passed arguments     
             i,j,b4, ok,line,x,y) {         # locals
  file  = file ? file : "-"                 # [1] ... read from standard input or file       
  ok = getline < file
  if (ok <0) {                              # [2] ... complain if missing file
     print "missing "file > "/dev/stderr"; 
     exit 1 }
  if (ok==0) { close(file);return 0 }       # [3] ... signal that there is no data                                 
  line = b4 $0                         
  gsub(/([ \t]*|#.*$)/, "", line)           # [4] ... kill white space and comments      
  if (!line)       return csv(a,file, line) # [5] ... skip blanks lines
  if (line ~ /,$/) return csv(a,file, line) # [6] ... contact incomplete rows with next
  split(line, a, ",")                       # [7] ... split line into cells on comma
  for(i in a) {
    x=a[i]
    y=a[i]+0
    a[i] = x==y ? y : x                     # [8] ... coerce number strings, if any
  }
  return 1                                  # [9] ... signal that there is more to come
}

The point being made here is that “abstraction” is a tool you can use, in any language - You just need to think a little more… abstractly - Thing “producer” and “consumer” when the consumer does not want to, or need to, understand all the details - Which is very useful if I start changing low-level representations - e.g. I’m talking to a zip file and not a raw file - Then csv can change internally while the application code remains the same.

By the way, if we are talking abstractions about consumers and producers then that takes us naturally to pipes (see below)

Pipes

Doug McIlroy, Bell Labs’ “Computing Techniques Research Department” 1986

Pipes changed the whole idea of UNIX:

Pipes changed how we think about software

You’ve probably all used pipes already. Here we download an install shell script then pipe it to the sh command (so it runs)

curl -s http://example.com/install.sh | sh

(By the way, that is a security risk. Use with care!)

Examples

nroff files | col | lp


tbl file-1 file-2 . . . | eqn | nroff -ms

Think of pipes as produce, translate, filter, consume:

find /usr/bin/ |                #produce 
sed 's:.*/::'  |                #translate: strip directory part
grep -i '^z'   |                #filter   : select items starting with z
xargs -d '\n' aFinalConsumer    #consume 

Note that in the above, the filters inside the pipes could be written in any language at all, by different teams.

Also, operating systems love pipes since they can run each part of the process as a separate process, maybe even on different machines (*)

(*) That said, there is the data bus / network cost of streaming data between parts of the pipe.

Problems with pipes:

OO Class Hierachies

Smalltalk class hierarchy (somewhere in the 1980s)

Object
|    Behavior
|    |    ClassDescription
|    |    |    Class
|    |    |    Metaclass
|    BlockClosure
|    Boolean
|    |    False
|    |    True
|    Browser
|    Collection
|    |    Bag
|    |    MappedCollection
|    |    SequenceableCollection
|    |    |    ArrayedCollection
|    |    |    |    Array
|    |    |    |    ByteArray
|    |    |    |    WordArray
|    |    |    |    LargeArrayedCollection
|    |    |    |    |    LargeArray
|    |    |    |    |    LargeByteArray
|    |    |    |    |    LargeWordArray
|    |    |    |    CompiledCode
|    |    |    |    |    CompiledMethod
|    |    |    |    |    CompiledBlock
|    |    |    |    Interval
|    |    |    |    CharacterArray
|    |    |    |    |    String
|    |    |    |    |    |    Symbol
|    |    |    LinkedList
|    |    |    |    Semaphore
|    |    |    OrderedCollection
|    |    |    |    RunArray
|    |    |    |    SortedCollection
|    |    HashedCollection
|    |    |    Dictionary
|    |    |    |    IdentityDictionary
|    |    |    |    |    MethodDictionary
|    |    |    |    RootNamespace
|    |    |    |    |    Namespace
|    |    |    |    |    SystemDictionary
|    |    |    Set
|    |    |    |    IdentitySet
|    ContextPart
|    |    BlockContext
|    |    MethodContext
|    File
|    |    Directory
|    FileSegment
|    Link
|    |    Process
|    |    SymLink
|    Magnitude
|    |    Association
|    |    Character
|    |    Date
|    |    LargeArraySubpart
|    |    Number
|    |    |    Float
|    |    |    Fraction
|    |    |    Integer
|    |    |    |    LargeInteger
|    |    |    |    |    LargeNegativeInteger
|    |    |    |    |    LargePositiveInteger
|    |    |    |    |    |    LargeZeroInteger
|    |    |    |    SmallInteger
|    |    Time
|    Memory
|    Message
|    |    DirectedMessage
|    MethodInfo
|    Point
|    ProcessorScheduler
|    Rectangle
|    Signal
|    |    Exception
|    |    |    Error
|    |    |    |    Halt
|    |    |    |    |    ArithmeticError
|    |    |    |    |    |    ZeroDivide
|    |    |    |    |    MessageNotUnderstood
|    |    |    |    UserBreak
|    |    |    Notification
|    |    |    |    Warning
|    Stream
|    |    ObjectDumper
|    |    PositionableStream
|    |    |    ReadStream
|    |    |    WriteStream
|    |    |    |    ReadWriteStream
|    |    |    |    |    ByteStream
|    |    |    |    |    |    FileStream
|    |    Random
|    |    TextCollector
|    |    TokenStream
|    UndefinedObject
|    ValueAdaptor
|    |    NullValueHolder
|    |    PluggableAdaptor
|    |    |    DelayedAdaptor
|    |    ValueHolder

On the criteria to be used in decomposing systems into modules. David Parnas, 1971

Macros

The LISP dolist macro lets you do things to one item ata time from a lst (list).

e.g.

(dolist (one lst) 
   (print one))

What does this high level abstraction hide under-the-hood?

Lets find out:

[3]> (macroexpand-1 '(dolist (one lst) (print one)))


(DO* ((#:LIST-2990 LST (CDR #:LIST-2990))
      (ONE NIL))
     ((ENDP #:LIST-2990) NIL)
     (DECLARE (LIST #:LIST-2990))
     (SETQ ONE (CAR #:LIST-2990))
     (PRINT ONE)
)


[3]> (macroexpand-1 (macroexpand-1 '(dolist (one lst) (print one))))


(BLOCK NIL
       (LET* ((#:LIST-2991 LST)
              (ONE NIL))
             (DECLARE (LIST #:LIST-2991))
             (TAGBODY #:LOOP-2992
                      (IF (ENDP #:LIST-2991)
                          (GO #:END-2993))
                      (SETQ ONE (CAR #:LIST-2991))
                      (PRINT ONE)
                      (SETQ #:LIST-2991
                            (CDR #:LIST-2991))
                      (GO #:LOOP-2992)
                      #:END-2993
                      (RETURN-FROM NIL
                                   (PROGN NIL))))) ;

Note that defmacro is a LISP built in that lets you write your own macros. Note that a Python variant of this become so acrimonious that Python’s founder resigned from the community. LISP programmers would just shrug and say “whatever, use it, don’t use it, up to you”.

(defmacro aif (test-form then-form &optional else-form)
   `(let ((it ,test-form))
          (if it ,then-form ,else-form)))


 (aif (+ 2 7)
   (format nil "~A does not equal NIL." it)
   (format nil "~A does equal NIL." it))
 ;; ⇒ "9 does not equal NIL."

LISP is not so popular these days. For cool languages that support macros:

Queuing Theory

Before going on… this is going to seem a digression, but just stay with me here

The point of this little digression is that it is useful to jump around things between service providers.

N tellers/ checkout people. How many lines? One? N?

Answer: see 7 insights into Queue Theory

According to the RIOT paper Computers execute in highly dynamic environment.

Erlang’s Trick

Some languages are designed for concurrency. To do so, they use new abstractions.

Consider the following block-structured programs (where variables in a sub-block can access variables in parent blocks):

def aa(x):
    def bb(y): return aa(y-1)
    return x if x < 1 else bb(x)

Here, we can’t run bb by itself without dragging along everything we know about `aa.

So instead, what if all functions are in the global space and all variables can only access the local space of each function?

Here’s a small database language that can derive child info from a database on fathers and mothers. X and Y are variabes scoped to within a clause (and clauses end with a fulll stop).

child(X,Y) :- mother(X,Y).  # clause1
child(X,Y) :- father(X,Y).  # clause2


descendant(X,Y) :- child(X,Y).                    # clause3
descendant(X,Y) :- child(X,Z), descendant(Z,Y).   # clause4


father(tim,don).            # clause
father(janet,don).          # clause6


mother(tim,jane).           # clause7
mother(janet,don).          # clause8

Note now that the clauses can all be run seperately on different CPUs since everything we need to know about (eg) descendant is available locally when we call that function.

Now Erlang’s unit of swap is really, really small. From Erlang - Erlang is a concurrent language. - By that I mean that threads are part of the programming language, they do not belong to the operating system. - That’s really what’s wrong with programming languages like Java and C++. - It’s threads aren’t in the programming language, threads are something in the operating system s – and they inherit all the problems that they have in the operating system. - One of those problems is granularity of the memory management system. - The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. - That’s actually too big. - If you add more memory to your machine – you have the same number of bits that protects the memory so the granularity of the page tables goes up – you end up using say 64kB for a process you know running in a few hundred bytes.

Sounds like a crazy idea, right? Well, its actually a crazy 19 billion dollar idea.

And its not just one company:

See also Elixr (Erlang + prettier syntax, not sure it has the same systems advantage as Erlang).

Serverless

Servless is an abstraction that is smaller than containers

demo

Containers and Virtual Machines

If iterators is small abstraction

UNIX

If iterators and error handlers are “little abstractions”, then operating systems are “big abstractions”.

Programming Languages and Abstraction

LISP: The Ultimate Abstraction

Consider a minimal programming language with just a few primitives, plus a macro language for assembling those primitives into more complex reasoning. - Build mappings from primitives down to hardware - Port those small number of mappings to many platform - Let programmers work up from the primitives.

Enter LISP:

LISP, its the source

(Aside: Well, not really. - Turns out, code does more than “computes”. - It also “connects” (to GUIs, databases, networks) - It also “explores”; i.e. Cognitive support for humans
exploring the world with software. - But we’ll get back to that.)

Paul Graham - “In 1960, John McCarthy published a remarkable paper in which he did for programming something like what Euclid did for geometry. He showed how, given a handful of simple operators and a notation for functions, you can build a whole programming language. He called this language Lisp, for”List Processing,” because one of his key ideas was to use a simple data structure called a list for both code and data.” - Do not read MacCarthy, 1960, “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I” - Here’s a Better read

Here’s a LISP interpreter written in LISP:

; The Lisp defined in McCarthy's 1960 paper, translated into CL.
; Assumes only quote, atom, eq, cons, car, cdr, cond.
; Bug reports to lispcode@paulgraham.com.


(defun null. (x)
  (eq x '()))


(defun and. (x y)
  (cond (x (cond (y 't) ('t '())))
        ('t '())))


(defun not. (x)
  (cond (x '())
        ('t 't)))


(defun append. (x y)
  (cond ((null. x) y)
        ('t (cons (car x) (append. (cdr x) y)))))


(defun list. (x y)
  (cons x (cons y '())))


(defun pair. (x y)
  (cond ((and. (null. x) (null. y)) '())
        ((and. (not. (atom x)) (not. (atom y)))
         (cons (list. (car x) (car y))
               (pair. (cdr x) (cdr y))))))


(defun assoc. (x y)
  (cond ((eq (caar y) x) (cadar y))
        ('t (assoc. x (cdr y)))))


(defun eval. (e a)
  (cond
    ((atom e) (assoc. e a))
    ((atom (car e))
     (cond
       ((eq (car e) 'quote) (cadr e))
       ((eq (car e) 'atom)  (atom   (eval. (cadr e) a)))
       ((eq (car e) 'eq)    (eq     (eval. (cadr e) a)
                                    (eval. (caddr e) a)))
       ((eq (car e) 'car)   (car    (eval. (cadr e) a)))
       ((eq (car e) 'cdr)   (cdr    (eval. (cadr e) a)))
       ((eq (car e) 'cons)  (cons   (eval. (cadr e) a)
                                    (eval. (caddr e) a)))
       ((eq (car e) 'cond)  (evcon. (cdr e) a))
       ('t (eval. (cons (assoc. (car e) a)
                        (cdr e))
                  a))))
    ((eq (caar e) 'label)
     (eval. (cons (caddar e) (cdr e))
            (cons (list. (cadar e) (car e)) a)))
    ((eq (caar e) 'lambda)
     (eval. (caddar e)
            (append. (pair. (cadar e) (evlis. (cdr e) a))
                     a)))))


(defun evcon. (c a)
  (cond ((eval. (caar c) a)
         (eval. (cadar c) a))
        ('t (evcon. (cdr c) a))))


(defun evlis. (m a)
  (cond ((null. m) '())
        ('t (cons (eval.  (car m) a)
                  (evlis. (cdr m) a)))))
is.py> (define circle-area (lambda (r) (* pi (* r r))))
lis.py> (circle-area 3)
28.274333877
lis.py> (define fact (lambda (n) 
              (if (<= n 1) 
            1 
      (* n 
         (fact (- n 1))))))
lis.py> (fact 10)
3628800
lis.py> (fact 100)
9332621544394415268169923885626670049071596826438162146859296389521759999322991
5608941463976156518286253697920827223758251185210916864000000000000000000000000
lis.py> (circle-area (fact 10))
4.1369087198e+13
lis.py> (define first car)
lis.py> (define rest cdr)
lis.py> (define count (lambda (item L) 
            (if L 
                (+ (equal? item (first L)) 
                   (count  item (rest L))) 
                0)))
lis.py> (count 0 (list 0 1 2 3 0 0))
3
lis.py> (count (quote the) (quote (the more the merrier the bigger the better)))
4
lis.py> (define twice (lambda (x) (* 2 x)))
lis.py> (twice 5)
10
lis.py> (define repeat (lambda (f) 
                          (lambda (x) 
                              (f (f x)))))
lis.py> ((repeat twice) 10)
40
lis.py> ((repeat (repeat twice)) 10)
160
lis.py> ((repeat (repeat (repeat twice))) 10)
2560
lis.py> ((repeat (repeat (repeat (repeat twice)))) 10)
655360
lis.py> (pow 2 16)
65536.0
lis.py> (define fib (lambda (n) 
               (if (< n 2)
                   1 
                  (+ (fib (- n 1)) 
                     (fib (- n 2))))))
lis.py> (define range (lambda (a b) 
                           (if (= a b) 
                               (quote ()) 
                               (cons a 
                                     (range (+ a 1) 
                                            b)))))
lis.py> (range 0 10)
(0 1 2 3 4 5 6 7 8 9)
lis.py> (map fib (range 0 10))
(1 1 2 3 5 8 13 21 34 55)
lis.py> (map fib (range 0 20))
(1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765)

LISP was crazy popular and remains very influential

;;; numeric columns -----------------------------
(defmethod add1 ((self num) (x string)) 
  "Handling strings from text files 
   where 'x' is not a number yet."
  (add1 self (read-from-string x)))


(defmethod add1 ((self num) (x number))
  "Knuth's algorithm for incrementally 
   updating lo, hi, mu, sd."
  (with-slots (n mu m2 hi lo sd) self
    (let ((d (- x mu)))
      (if (> x hi) (setf hi x))
      (if (< x lo) (setf lo x))
      (incf mu (float (/ d n)))
      (incf m2 (float (* d (- x mu))))
      (setf sd 
            (cond ((< m2 0) 0)
                  ((< n 2) 0)
                  (t (sqrt (/ m2 (- n 1))))))))
  x)

Vendor wars killed LISP

JavaScript

Transpilers

But problem with transpliers: where are the line numbers?

Smalltalk

Smalltalk class hierarchy (somewhere in the 1980s):

Object
|    Behavior
|    |    ClassDescription
|    |    |    Class
|    |    |    Metaclass
|    BlockClosure
|    Boolean
|    |    False
|    |    True
|    Browser
|    Collection
|    |    Bag
|    |    MappedCollection
|    |    SequenceableCollection
|    |    |    ArrayedCollection
|    |    |    |    Array
|    |    |    |    ByteArray
|    |    |    |    WordArray
|    |    |    |    LargeArrayedCollection
|    |    |    |    |    LargeArray
|    |    |    |    |    LargeByteArray
|    |    |    |    |    LargeWordArray
|    |    |    |    CompiledCode
|    |    |    |    |    CompiledMethod
|    |    |    |    |    CompiledBlock
|    |    |    |    Interval
|    |    |    |    CharacterArray
|    |    |    |    |    String
|    |    |    |    |    |    Symbol
|    |    |    LinkedList
|    |    |    |    Semaphore
|    |    |    OrderedCollection
|    |    |    |    RunArray
|    |    |    |    SortedCollection
|    |    HashedCollection
|    |    |    Dictionary
|    |    |    |    IdentityDictionary
|    |    |    |    |    MethodDictionary
|    |    |    |    RootNamespace
|    |    |    |    |    Namespace
|    |    |    |    |    SystemDictionary
|    |    |    Set
|    |    |    |    IdentitySet
|    ContextPart
|    |    BlockContext
|    |    MethodContext
|    File
|    |    Directory
|    FileSegment
|    Link
|    |    Process
|    |    SymLink
|    Magnitude
|    |    Association
|    |    Character
|    |    Date
|    |    LargeArraySubpart
|    |    Number
|    |    |    Float
|    |    |    Fraction
|    |    |    Integer
|    |    |    |    LargeInteger
|    |    |    |    |    LargeNegativeInteger
|    |    |    |    |    LargePositiveInteger
|    |    |    |    |    |    LargeZeroInteger
|    |    |    |    SmallInteger
|    |    Time
|    Memory
|    Message
|    |    DirectedMessage
|    MethodInfo
|    Point
|    ProcessorScheduler
|    Rectangle
|    Signal
|    |    Exception
|    |    |    Error
|    |    |    |    Halt
|    |    |    |    |    ArithmeticError
|    |    |    |    |    |    ZeroDivide
|    |    |    |    |    MessageNotUnderstood
|    |    |    |    UserBreak
|    |    |    Notification
|    |    |    |    Warning
|    Stream
|    |    ObjectDumper
|    |    PositionableStream
|    |    |    ReadStream
|    |    |    WriteStream
|    |    |    |    ReadWriteStream
|    |    |    |    |    ByteStream
|    |    |    |    |    |    FileStream
|    |    Random
|    |    TextCollector
|    |    TokenStream
|    UndefinedObject
|    ValueAdaptor
|    |    NullValueHolder
|    |    PluggableAdaptor
|    |    |    DelayedAdaptor
|    |    ValueHolder