Simpler. Please.

Enough Inference, But Not Too Much

"As complexity rises, precise statements lose meaning and meaningful statements lose precision."
-- Lofti Zadeh

"Simplicity is the ultimate form of sophistication."
-- Leonardo da Vinci

"No! No! No!"
-- Business user running in fear from overly complex maths

"Less, But Better"
-- Dieter Rams

Are our AI tools designed "good"? Based on my experience with industrial developers, I would say perhaps not. According to Dieter Rams, good design:

is innovative – The possibilities for progression are not, by any means, exhausted. Technological development is always offering new opportunities for original designs. But imaginative design always develops in tandem with improving technology, and can never be an end in itself.
makes a product useful – A product is bought to be used. It has to satisfy not only functional, but also psychological and aesthetic criteria. Good design emphasizes the usefulness of a product whilst disregarding anything that could detract from it.
is aesthetic – The aesthetic quality of a product is integral to its usefulness because products are used every day and have an effect on people and their well-being. Only well-executed objects can be beautiful.
makes a product understandable – It clarifies the product’s structure. Better still, it can make the product clearly express its function by making use of the user's intuition. At best, it is self-explanatory.
is unobtrusive – Products fulfilling a purpose are like tools. They are neither decorative objects nor works of art. Their design should therefore be both neutral and restrained, to leave room for the user's self-expression.
is honest – It does not make a product appear more innovative, powerful or valuable than it really is. It does not attempt to manipulate the consumer with promises that cannot be kept.
is long-lasting – It avoids being fashionable and therefore never appears antiquated. Unlike fashionable design, it lasts many years – even in today's throw away society.
is thorough down to the last detail – Nothing must be arbitrary or left to chance. Care and accuracy in the design process show respect towards the consumer.
is environmentally friendly – Design makes an important contribution to the preservation of the environment. It conserves resources and minimizes physical and visual pollution throughout the life cycle of the product.
is as little design as possible – Less, but better – because it concentrates on the essential aspects, and the products are not burdened with non-essentials. Back to purity, back to simplicity.

The Obvious Counter-argument Against "Keep it Simple"

Yes, sometimes, complexity is necessary

e.g. 2% optimizations of turbulent non-linear air-flows across a wing in the transonic range

But when it ain’t

Needless complexity = is just silliness

Much industrial success with very complex image processing based on deep learners that derive fascinating internal features uses layers of neural nets

All good stuff
For "large ε problems", ever increasing exact inference is... silly.

Example of "large ε":

Many SE inference results a large ε i.e. small changes to the training data or the AI model leads to large changes in the performance. 77 "equals" 81 when variance is ± large. And for SE, its often large.
For example, here are 10*3 cross-val software effort estimation results (10 times: randomize order of data, divide into three bins; train on 2, test on the other):

And there are many reasons to reflect on how not to do "it" simpler.

Reasons for less

Your next 15 weeks

Fewer headaches
Higher marks

We would be foolish not to exploit inherent simplicities.

Because at least in SE, there are inherent simpliticies

Why is this so? Not clear. But:

Mathematically, models are either simple or poorly supported by the data:
- See here
Programming languages are a subset of natural language and both kinds of languages exhibit remarkably simple "language models"; i.e. given the last N symbols, you can pretty much guess the next symbol.
- Hindle et al.: "Programming languages, in theory, are complex, flexible and powerful, but the programs that real people actually write are mostly simple and rather repetitive, and thus they have usefully predictable statistical proper- ties that can be captured in statistical language models and leveraged for software engineering tasks."

Because in SE, More Complex is often superfluous

Data from Norman Fenton’s Bayes nets discussing software defects = yes, no

Given classes x,y then Fx, Fy is frequency of some ranges in x,y
- Log Odds Ratio = log(Fx/Fy )
- If zero if no difference in x,y The secret of big data is small data. The best thing to do with data is to ignore most of it.
Results:
- Most variables do not contribute to determination of defects

Data from Papakroni's masters thesis

TRAIN: Project 21 features onto first 2 components of PCA
- Recursively divide two dimensions (at median)
- Stopping a SQRT(N)
- In each leaf, replace N projects with median centroid
TEST: Estimate = interpolate 2 near centroids
Performs no worse, and sometimes better, than Random forests, NaiveBayes

For more, see Data Mining for very busy people

Because, Historically, Simpler is often Better

PCA, 1901
Narrows: Amarel 1960s
Prototypes: Chen 1975
Frames: Minsky, 1975
Min environments: DeKleer, 1986
Saturation: Horgan & Mathur: 1980
Homogeneous propagation: Michael: 1981
Master variables: Crawford & Baker, 1995
Clumps, Druzdel, 1997
Feature subset section, Kohavi, 1997,
Back doors, Williams, 2002
Active learning: many people (2000+)

From Section 2.2 of this paper

Simpler methods may build smaller models that use fewer attributes from the data. Such models are more likely to be robust against overfitting, especially on small and noisy data (and have been found to predict data at levels comparable with regression).
Very simple rule-based methpds can perform comparably well to more complex models in a range of domains e.g., public health, medical risk management, performance sci- ence, etc.
Neth and Gigerenzer argue that such rule-bases are tools that work well under conditions of uncertainty.
Brighton showed that rule-based models can perform better than complex nonlinear algorithms such as neural networks, exemplar models, and classification/regression trees.

Speed speed speed

Analytics = work flow to condense much low-value data down to a few diamonds.
Fisher et al. survey of 16 industrial data scientists,
- Due to computational cost of analytics, "the luxuries of interactivity, direct manipulation, and fast system response are gone".
- Modern cloud-based analytics as a throwback to the 1960s– batch processing
- "Fast iteration is key, but incompatible with the way jobs are submitted and processed in the cloud. It’s frustrating to wait for hours, only to realize you need a slight tweak to your feature set."

Less cost (local hardware, cloud services)

The following are somewhat extreme examples. But suppose we could do the following tasks orders of magnitude faster. Just imagine what else could we use all that saved CPU for?

15 years of CPU, one FSE 2013 analytics paper, Wang et al/
- Evaluate automatic tuning tools for code clone recognition tools
$1.2million
- Cost or renting of commercial cloud hardware
- One project, 5 students, 3 years
- FYI: that would EAT and SWALLOW the standard budgets NSF gives researchers like me
Three years of CPU/day
- Anonymous NCSU grad student tuning deep learning networks at XXXX for summer 2018

Support the edge

Edge computing
- Move some computational towards the edge of the network to harness computational capabilities that are currently untapped in edge nodes, such as base stations, routers and switches
Internet of Things
- Simple AI could make better use of billions and billions of low power devices, many of which are operating at very low power

Less Energy Consumption

Power off your phone

Stare at it
How long before you stop?
Phone - power = silly

From Green in Software Engineering:

Present-day primary energy footprints of three business software apps add up to as much as 373 Peta Joules/ year.
People claim they will pay more for a "green" product.
nearly half of surveyed organizations design their business models on the basis of sustainability.
Sustainability = new source of innovation, cost-cutting opportunity , mechanism for gaining competitive advantage.

Less pollution Creating that Energy

IT-related services now account for 2% of all global carbon emissions-- roughly the same as the aviation sector
Simpler AI lets us breathe easier.
- The cloud runs on http://time.com/46777/your-data-is-dirty-the-carbon-price-of-cloud-computing/.

Simpler explanation

Less generation of solutions

Less confusion
Verrappa and Letier:
- "..for industrial problems, these algorithms generate (many) solutions (makes) understanding them and selecting one among them difficult and time consuming"

Simpler Customization

We are already delivering software more complex than what people can manage (see fig1 and fig3).

Many software systems have poorly chosen defaults. Hence, it is useful to seek better configurations.
- Van Aken et al. report that the default MySQL configurations in 2016 assume that it will be installed on a machine that has 160MB of RAM (which, at that time, was incorrect by, at least, an order of magnitude)
- Herodotou et al. show how standard settings for text mining appli- cations in Hadoop result in worst-case execution times.
- In the same vein, Jamshidi et al. reports for text mining applications on Apache Storm, the throughput achieved using the worst configuration is 480 times slower than the throughput achieved by the best configuration.
Understanding, the configuration space of software systems with large configuration space, is challenging.
Exploring more than just a handful of configurations is usually infeasible due to long benchmarking time

Because we need a baseline

Because better science needs better baselines:

Empirical methods in AI:
- Supposedly newer more sophisticated methods should be baselined against a seemingly simpler alternative
- Warning: when I do that, I often find simpler is better.
  - Search-based SE: A Baseline Method For Search-Based Software Engineering
  - Data mining: Bellwethers: A Baseline Method For Transfer Learning
  - Optimizing: "Sampling"' as a Baseline Optimizer for Search-based Software Engineering

Because better engineering needs better baselines:

So many AI tools, so many ways to tune them, so many ways to combine them
- So given a new problem, which learner/optimizer should we apply?
- Hard to say, apriori.
- When new data arrives, you need commissioning experiments; i.e. try a variety of techniques before you can find what words best for the local data.
Comissioning needs baselines
- i.e. an algorithm which can generate floor performance values.
- Lets a developer quickly rule out any method that falls “below the floor”.
- Lets us achieve fast early results, while also gaining some guidance in all their subsequent experimentation (specifically: "try to beat the baseline").

Other

Quicker more effective training, experimentation

Less to understand, faster to understand.
Less to twiddle, less to explore

Easier Reproducibility

Large config spaces are the bane of reproducibility.
You did not get what I got? Di you set the X to 2, Y to 3, and Z to 10,30,age/shoe size+temperature?

Solutions more trust-able

More understandable, more explainable, more I want to use.

Solutions easier to apply

Less to do, faster to do

Cause its just good science

If we do not know what we are doing, we are very clumsy;
When we understand it better, we are more elegant;
So use this test to check if your thinking is confused, or clarified:
- are your current methods very complex/ time-consuming?

A Common Recipe (For Me, at Least)

Row reduction:
- Replace N rows of data with M≪N exemplars
- e.g. cluster and report just some items per cluster
Column reduction:
- Prune C columns of data with the sqrt(C) most influential columns
Range reduction (in columns):
- Discretize numerics, but only cut into ranges in sub-ranges have less variance
Range pruning
- If there is a class variable:
  - only cut if the associated class ranges are less varied
- Else, cluster row/column/range data and only cut if associated ranges are less varied in different clusters.

Cautions

So that’s it? Just find the few dimensions that matter, then stop?o

Well...

The reduced data space exists... but how to find it?
- Feature selection on raw data?
- Or may be synthesized (PCA, spectral, etc)

Also:

And the dimensions that matter NOW
- May not matter SOON
So an agent that assumes low dimensionality
- (but fyi, we should be doing that with all data mining approaches, anyway). Will always be checking and revising their dimensional

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search