csc 510-001, (1877)
fall 2024, software engineering
Tim Menzies, timm@ieee.org, com sci, nc state
Historically, abstraction has enabled:
Formal definition
Between the programmer and the lower-level details (e.g. the hardware, the lower-level language constructs) there is a layers of abstraction
Concrete example of abstraction:
i
have sub-things at level
i+1
.
If you are only building one application, then clearly number1 - And if you racing to get “it” out the door then there is clearly a case for using the first, most obvious design.
But - If you are testing “it” then - abstractions reduce the number of tests. - eg. in the above, one test, not four - Abstract: - if we delete things, do we also delete sub-things versus - Concrete: - if we delete suburbs, do we also delete streets? - if we delete cities, do we also delete suburbs? - if we delete states, do we also delete cities? - if we delete countries, do we also delete states? - If after doing “it”, - And if you want to support very large communities not just doing “it0” but also “it1, it2…” - And you want to build something unique that might be the basis for an entirely new industry - Supporting billions of dollars in IT acquisitions - Then you need abstraction.
Abstraction has been applied, successfully, dozens of times in the history of computing
The following list is sorted by their memory footprint, largest to smallest:
Also, for other notes on abstraction that lead to different programming languages (LISP, Smalltalk, Java, Javascript, CoffeeScript) - see next lecture
Warning:
BTW, best abstractions needs really, really smart people
(*) You don’t know who Guy Steele is? Shame on you! Read it and weep.
Q: So what have programmers seen before that might repeat in future
systems?
A: Patterns
e.g. idioms : small code-level patterns
e.g. design patterns: common collections of a few classes
e.g. archtirctures: recurring “big picture” structures seen in large systems
The power of patterns
Data : model : dialog
Patterns have review heuristics:
CRUD = create, read, update, delete
BTW, CRUD can leap from idiom to architecture (web-scale RESTful applications)
Regardless - the main points here is that complex systems can be abstracted into reasonable chunks via Patterns - Such abstraction simplifies how we discuss even complex systems
Patterns have review heuristics:
Throw a ball into a stadium, wait for what returns while holding a catchers mitt. Only catch things you know how to handle
def atom(token: str) -> Atom:
"Numbers become numbers; every other token is a symbol."
try: return int(token)
except ValueError:
try: return float(token)
except ValueError:
return Symbol(token)
When you code recursive data structures, give programmers so way to way around that code.
e.g. What does this return?
def items(x):
if isinstance(x,(str,list,tuple)):
for y in x:
for z in items(y):
yield z
else:
yield x
for x in items([10,[[11,12,13],15], 20,30,[40,50,"tim"]]):
print(x)
Another example (in awk). This example shows that iterators can be implemented in any language, it you think the right way.
# application code
while(csv(lst,file))
(lst) doSOmething
Handles numerous details for reading CSV files. For example, the following CSV file has blank lines (to be skipped), white space and comments (that need deleting), strings containing numbers that need to be coerced into real numbers, and record that that split on “,” (which need to be combined with the next line).
name, # who
age, # in year`s
shoesize # posint
#########################
# family1
tim, # what
21, # a
10 # nice guy
jane , 6, 2
# family2
tim, 21, 10
jane,6,2
The csv
iterator reader is below. This iterator calls a
sub-iterator getline
that returns 0 on end of line:
# iterator code
function csv(a,file, # passed arguments
,j,b4, ok,line,x,y) { # locals
i= file ? file : "-" # [1] ... read from standard input or file
file = getline < file
ok if (ok <0) { # [2] ... complain if missing file
print "missing "file > "/dev/stderr";
exit 1 }
if (ok==0) { close(file);return 0 } # [3] ... signal that there is no data
= b4 $0
line gsub(/([ \t]*|#.*$)/, "", line) # [4] ... kill white space and comments
if (!line) return csv(a,file, line) # [5] ... skip blanks lines
if (line ~ /,$/) return csv(a,file, line) # [6] ... contact incomplete rows with next
split(line, a, ",") # [7] ... split line into cells on comma
for(i in a) {
=a[i]
x=a[i]+0
y= x==y ? y : x # [8] ... coerce number strings, if any
a[i] }
return 1 # [9] ... signal that there is more to come
}
The point being made here is that “abstraction” is a tool you can
use, in any language - You just need to think a little more… abstractly
- Thing “producer” and “consumer” when the consumer does not want to, or
need to, understand all the details - Which is very useful if I start
changing low-level representations - e.g. I’m talking to a zip file and
not a raw file - Then csv
can change internally while the
application code remains the same.
By the way, if we are talking abstractions about consumers and producers then that takes us naturally to pipes (see below)
Doug McIlroy, Bell Labs’ “Computing Techniques Research Department” 1986
Pipes changed the whole idea of UNIX:
Pipes changed how we think about software
You’ve probably all used pipes already. Here we download an install
shell script then pipe it to the sh
command (so it
runs)
curl -s http://example.com/install.sh | sh
(By the way, that is a security risk. Use with care!)
Examples
nroff
is a text formatter (old version of latex)lp
is the line printercol
handles escape sequences that sets up text in 2
columnstbl
is an nroff pre-processor that expresses tables in
terms of lower-level nroff commandseqn
is an nroff pre-processor that expresses equations
in terms of lower-level nroff commandsnroff files | col | lp
tbl file-1 file-2 . . . | eqn | nroff -ms
Think of pipes as produce
, translate
,
filter
, consume
:
find /usr/bin/ | #produce
sed 's:.*/::' | #translate: strip directory part
grep -i '^z' | #filter : select items starting with z
xargs -d '\n' aFinalConsumer #consume
Note that in the above, the filters
inside the
pipes
could be written in any language at all, by different
teams.
Also, operating systems love pipes since they can run each part of the process as a separate process, maybe even on different machines (*)
(*) That said, there is the data bus / network cost of streaming data between parts of the pipe.
Problems with pipes:
Smalltalk class hierarchy (somewhere in the 1980s)
Object
| Behavior
| | ClassDescription
| | | Class
| | | Metaclass
| BlockClosure
| Boolean
| | False
| | True
| Browser
| Collection
| | Bag
| | MappedCollection
| | SequenceableCollection
| | | ArrayedCollection
| | | | Array
| | | | ByteArray
| | | | WordArray
| | | | LargeArrayedCollection
| | | | | LargeArray
| | | | | LargeByteArray
| | | | | LargeWordArray
| | | | CompiledCode
| | | | | CompiledMethod
| | | | | CompiledBlock
| | | | Interval
| | | | CharacterArray
| | | | | String
| | | | | | Symbol
| | | LinkedList
| | | | Semaphore
| | | OrderedCollection
| | | | RunArray
| | | | SortedCollection
| | HashedCollection
| | | Dictionary
| | | | IdentityDictionary
| | | | | MethodDictionary
| | | | RootNamespace
| | | | | Namespace
| | | | | SystemDictionary
| | | Set
| | | | IdentitySet
| ContextPart
| | BlockContext
| | MethodContext
| File
| | Directory
| FileSegment
| Link
| | Process
| | SymLink
| Magnitude
| | Association
| | Character
| | Date
| | LargeArraySubpart
| | Number
| | | Float
| | | Fraction
| | | Integer
| | | | LargeInteger
| | | | | LargeNegativeInteger
| | | | | LargePositiveInteger
| | | | | | LargeZeroInteger
| | | | SmallInteger
| | Time
| Memory
| Message
| | DirectedMessage
| MethodInfo
| Point
| ProcessorScheduler
| Rectangle
| Signal
| | Exception
| | | Error
| | | | Halt
| | | | | ArithmeticError
| | | | | | ZeroDivide
| | | | | MessageNotUnderstood
| | | | UserBreak
| | | Notification
| | | | Warning
| Stream
| | ObjectDumper
| | PositionableStream
| | | ReadStream
| | | WriteStream
| | | | ReadWriteStream
| | | | | ByteStream
| | | | | | FileStream
| | Random
| | TextCollector
| | TokenStream
| UndefinedObject
| ValueAdaptor
| | NullValueHolder
| | PluggableAdaptor
| | | DelayedAdaptor
| | ValueHolder
More on “Streams”
On the criteria to be used in decomposing systems into modules. David Parnas, 1971
The LISP dolist
macro lets you do things to
one
item ata time from a lst
(list).
e.g.
dolist (one lst)
(print one)) (
What does this high level abstraction hide under-the-hood?
Lets find out:
3]> (macroexpand-1 '(dolist (one lst) (print one)))
[
(DO* ((#:LIST-2990 LST (CDR #:LIST-2990))
(ONE NIL))
((ENDP #:LIST-2990) NIL)
(DECLARE (LIST #:LIST-2990))
(SETQ ONE (CAR #:LIST-2990))
(PRINT ONE)
)
3]> (macroexpand-1 (macroexpand-1 '(dolist (one lst) (print one))))
[
(BLOCK NIL
(LET* ((#:LIST-2991 LST)
(ONE NIL))
(DECLARE (LIST #:LIST-2991))
(TAGBODY #:LOOP-2992
(IF (ENDP #:LIST-2991)
(GO #:END-2993))
(SETQ ONE (CAR #:LIST-2991))
(PRINT ONE)
(SETQ #:LIST-2991
(CDR #:LIST-2991))
(GO #:LOOP-2992)
#:END-2993
(RETURN-FROM NIL; (PROGN NIL)))))
Note that defmacro
is a LISP built in that lets you
write your own macros. Note that a Python variant of this become so
acrimonious that Python’s
founder resigned from the community. LISP programmers would just
shrug and say “whatever, use it, don’t use it, up to you”.
defmacro aif (test-form then-form &optional else-form)
(let ((it ,test-form))
`(if it ,then-form ,else-form)))
(
+ 2 7)
(aif (format nil "~A does not equal NIL." it)
(format nil "~A does equal NIL." it))
(;; ⇒ "9 does not equal NIL."
LISP is not so popular these days. For cool languages that support macros:
Before going on… this is going to seem a digression, but just stay with me here
The point of this little digression is that it is useful to jump around things between service providers.
N tellers/ checkout people. How many lines? One? N?
Answer: see 7 insights into Queue Theory
According to the RIOT paper Computers execute in highly dynamic environment.
Some languages are designed for concurrency. To do so, they use new abstractions.
Consider the following block-structured programs (where variables in a sub-block can access variables in parent blocks):
def aa(x):
def bb(y): return aa(y-1)
return x if x < 1 else bb(x)
Here, we can’t run bb
by itself without dragging along
everything we know about `aa
.
So instead, what if all functions are in the global space and all variables can only access the local space of each function?
Here’s a small database language that can derive child info from a
database on father
s and mother
s.
X
and Y
are variabes scoped to within a
clause
(and clauses end with a fulll stop).
X,Y) :- mother(X,Y). # clause1
child(X,Y) :- father(X,Y). # clause2
child(
X,Y) :- child(X,Y). # clause3
descendant(X,Y) :- child(X,Z), descendant(Z,Y). # clause4
descendant(
,don). # clause
father(tim,don). # clause6
father(janet
,jane). # clause7
mother(tim,don). # clause8 mother(janet
Note now that the clauses can all be run seperately on different CPUs
since everything we need to know about (eg) descendant
is
available locally when we call that function.
Now Erlang’s unit of swap is really, really small. From Erlang - Erlang is a concurrent language. - By that I mean that threads are part of the programming language, they do not belong to the operating system. - That’s really what’s wrong with programming languages like Java and C++. - It’s threads aren’t in the programming language, threads are something in the operating system s – and they inherit all the problems that they have in the operating system. - One of those problems is granularity of the memory management system. - The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. - That’s actually too big. - If you add more memory to your machine – you have the same number of bits that protects the memory so the granularity of the page tables goes up – you end up using say 64kB for a process you know running in a few hundred bytes.
Sounds like a crazy idea, right? Well, its actually a crazy 19 billion dollar idea.
And its not just one company:
See also Elixr (Erlang + prettier syntax, not sure it has the same systems advantage as Erlang).
Servless is an abstraction that is smaller than containers
If iterators is small abstraction
If iterators and error handlers are “little abstractions”, then operating systems are “big abstractions”.
Consider a minimal programming language with just a few primitives, plus a macro language for assembling those primitives into more complex reasoning. - Build mappings from primitives down to hardware - Port those small number of mappings to many platform - Let programmers work up from the primitives.
Enter LISP:
LISP, its the source
(Aside: Well, not really. - Turns out, code does more than
“computes”. - It also “connects” (to GUIs, databases, networks) - It
also “explores”; i.e. Cognitive support for humans
exploring the world with software. - But we’ll get back to that.)
Paul Graham - “In 1960, John McCarthy published a remarkable paper in which he did for programming something like what Euclid did for geometry. He showed how, given a handful of simple operators and a notation for functions, you can build a whole programming language. He called this language Lisp, for”List Processing,” because one of his key ideas was to use a simple data structure called a list for both code and data.” - Do not read MacCarthy, 1960, “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I” - Here’s a Better read
Here’s a LISP interpreter written in LISP:
; The Lisp defined in McCarthy's 1960 paper, translated into CL.
; Assumes only quote, atom, eq, cons, car, cdr, cond.
; Bug reports to lispcode@paulgraham.com.
defun null. (x)
(eq x '()))
(
defun and. (x y)
(cond (x (cond (y 't) ('t '())))
('t '())))
(
defun not. (x)
(cond (x '())
('t 't)))
(
defun append. (x y)
(cond ((null. x) y)
('t (cons (car x) (append. (cdr x) y)))))
(
defun list. (x y)
(cons x (cons y '())))
(
defun pair. (x y)
(cond ((and. (null. x) (null. y)) '())
(and. (not. (atom x)) (not. (atom y)))
((cons (list. (car x) (car y))
(cdr x) (cdr y))))))
(pair. (
defun assoc. (x y)
(cond ((eq (caar y) x) (cadar y))
('t (assoc. x (cdr y)))))
(
defun eval. (e a)
(cond
(atom e) (assoc. e a))
((atom (car e))
((cond
(eq (car e) 'quote) (cadr e))
((eq (car e) 'atom) (atom (eval. (cadr e) a)))
((eq (car e) 'eq) (eq (eval. (cadr e) a)
((eval. (caddr e) a)))
(eq (car e) 'car) (car (eval. (cadr e) a)))
((eq (car e) 'cdr) (cdr (eval. (cadr e) a)))
((eq (car e) 'cons) (cons (eval. (cadr e) a)
((eval. (caddr e) a)))
(eq (car e) 'cond) (evcon. (cdr e) a))
(('t (eval. (cons (assoc. (car e) a)
(cdr e))
(
a))))eq (caar e) 'label)
((eval. (cons (caddar e) (cdr e))
(cons (list. (cadar e) (car e)) a)))
(eq (caar e) 'lambda)
((eval. (caddar e)
(append. (pair. (cadar e) (evlis. (cdr e) a))
(
a)))))
defun evcon. (c a)
(cond ((eval. (caar c) a)
(eval. (cadar c) a))
('t (evcon. (cdr c) a))))
(
defun evlis. (m a)
(cond ((null. m) '())
('t (cons (eval. (car m) a)
(cdr m) a))))) (evlis. (
t
, aka
“true”)(eq x y)
is the same or both nil
(car '(a b c))
==> a
; i.e. pull the
head(cdr '(a b c))
==> (b c)
; i.e. pull
the rest(cons 'x '(y z))
==> (x y z)
;
i.e. assemble something(cond (if1 do1) (if2 do2) ..)
; i.e. branch the
reasoning((lambda (p1 p2 p3..) e) a1 a2 a3...)
p1 p2 p3...
to the values of the arguments
a1 a2 a3...
e
.car
cdr
cons
just like anything ese'((a 1) (b 2) ...)
((p1 a1) (p2 a2) (p3 a3) (a 1) (b 2) ...)
lambda (r) (* pi (* r r))))
is.py> (define circle-area (3)
lis.py> (circle-area 28.274333877
lambda (n)
lis.py> (define fact (if (<= n 1)
(1
* n
(- n 1))))))
(fact (10)
lis.py> (fact 3628800
100)
lis.py> (fact 9332621544394415268169923885626670049071596826438162146859296389521759999322991
5608941463976156518286253697920827223758251185210916864000000000000000000000000
10))
lis.py> (circle-area (fact 4.1369087198e+13
first car)
lis.py> (define rest cdr)
lis.py> (define count (lambda (item L)
lis.py> (define if L
(+ (equal? item (first L))
(count item (rest L)))
(0)))
count 0 (list 0 1 2 3 0 0))
lis.py> (3
count (quote the) (quote (the more the merrier the bigger the better)))
lis.py> (4
lambda (x) (* 2 x)))
lis.py> (define twice (5)
lis.py> (twice 10
lambda (f)
lis.py> (define repeat (lambda (x)
(
(f (f x)))))10)
lis.py> ((repeat twice) 40
10)
lis.py> ((repeat (repeat twice)) 160
10)
lis.py> ((repeat (repeat (repeat twice))) 2560
10)
lis.py> ((repeat (repeat (repeat (repeat twice)))) 655360
2 16)
lis.py> (pow 65536.0
lambda (n)
lis.py> (define fib (if (< n 2)
(1
+ (fib (- n 1))
(- n 2))))))
(fib (lambda (a b)
lis.py> (define range (if (= a b)
(quote ())
(cons a
(+ a 1)
(range (
b)))))0 10)
lis.py> (range 0 1 2 3 4 5 6 7 8 9)
(map fib (range 0 10))
lis.py> (1 1 2 3 5 8 13 21 34 55)
(map fib (range 0 20))
lis.py> (1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765) (
LISP was crazy popular and remains very influential
;;; numeric columns -----------------------------
defmethod add1 ((self num) (x string))
("Handling strings from text files
where 'x' is not a number yet."
read-from-string x)))
(add1 self (
defmethod add1 ((self num) (x number))
("Knuth's algorithm for incrementally
updating lo, hi, mu, sd."
with-slots (n mu m2 hi lo sd) self
(let ((d (- x mu)))
(if (> x hi) (setf hi x))
(if (< x lo) (setf lo x))
(incf mu (float (/ d n)))
(incf m2 (float (* d (- x mu))))
(setf sd
(cond ((< m2 0) 0)
(< n 2) 0)
((t (sqrt (/ m2 (- n 1))))))))
( x)
Vendor wars killed LISP
But problem with transpliers: where are the line numbers?
Smalltalk class hierarchy (somewhere in the 1980s):
Object
| Behavior
| | ClassDescription
| | | Class
| | | Metaclass
| BlockClosure
| Boolean
| | False
| | True
| Browser
| Collection
| | Bag
| | MappedCollection
| | SequenceableCollection
| | | ArrayedCollection
| | | | Array
| | | | ByteArray
| | | | WordArray
| | | | LargeArrayedCollection
| | | | | LargeArray
| | | | | LargeByteArray
| | | | | LargeWordArray
| | | | CompiledCode
| | | | | CompiledMethod
| | | | | CompiledBlock
| | | | Interval
| | | | CharacterArray
| | | | | String
| | | | | | Symbol
| | | LinkedList
| | | | Semaphore
| | | OrderedCollection
| | | | RunArray
| | | | SortedCollection
| | HashedCollection
| | | Dictionary
| | | | IdentityDictionary
| | | | | MethodDictionary
| | | | RootNamespace
| | | | | Namespace
| | | | | SystemDictionary
| | | Set
| | | | IdentitySet
| ContextPart
| | BlockContext
| | MethodContext
| File
| | Directory
| FileSegment
| Link
| | Process
| | SymLink
| Magnitude
| | Association
| | Character
| | Date
| | LargeArraySubpart
| | Number
| | | Float
| | | Fraction
| | | Integer
| | | | LargeInteger
| | | | | LargeNegativeInteger
| | | | | LargePositiveInteger
| | | | | | LargeZeroInteger
| | | | SmallInteger
| | Time
| Memory
| Message
| | DirectedMessage
| MethodInfo
| Point
| ProcessorScheduler
| Rectangle
| Signal
| | Exception
| | | Error
| | | | Halt
| | | | | ArithmeticError
| | | | | | ZeroDivide
| | | | | MessageNotUnderstood
| | | | UserBreak
| | | Notification
| | | | Warning
| Stream
| | ObjectDumper
| | PositionableStream
| | | ReadStream
| | | WriteStream
| | | | ReadWriteStream
| | | | | ByteStream
| | | | | | FileStream
| | Random
| | TextCollector
| | TokenStream
| UndefinedObject
| ValueAdaptor
| | NullValueHolder
| | PluggableAdaptor
| | | DelayedAdaptor
| | ValueHolder