csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state


home :: timetable :: syllabus :: groups :: moodle :: license

How Much Data?


How Much data Do we need for Learning?

Before We Start….

Review questions?

  1. What is the standard line on “how much data is enough?”
    • From Peter Norvig
    • From regression theory
    • From semi-supervised learning
  2. Describe each of the following. What are their implications for human decision making
    • Streaming over zero-diversity data
    • STM, LTM
    • Shrikanth’s early bird effect
    • two results (Valerdi’s work; repertory grids) commenting on the rate at which we can extract considered opinions from humans
  3. In English, describe the following math results and their implications for data mining
    • chesboard model
    • probable correctness theory.
  4. Few-shot learning:
    1. describe it
    2. In what sense does FSL mean we can look at fewer examples?
    3. In what sense does FSL require many, many examples?

A more informed position: The question is wrong

image

Another question: How much data can you handle?

For very fast decision making, there is a cognitive science case that we work from less than a dozen examples:

While first proposed in 1981, this STM/LTM theory still remains relevant 10. This theory can be used to explain both expert competency and incompetency in software engineering tasks such as understanding code 11.

Another question: How much data can you get?

How fast can we gather expert opinion?

Evidence from “cost estimation”

Evidence from “Repertory Grids”

Advice on how long to fill in a rep grid?

Overall, we get, for reflective labels on data:

Advice from Mathematics

One commonly cited rule of thumb [^call] is to have at least 10 times the number of training data instances attributes 16 17.

Historically, how much data was enough?

Maths

Chess board model

Data is spread out across a d-dimensional chessboard where each dimension is divided into \(b\) bins 21.

The target is some subset of the data that falls into some of the chessboard cells:

Probable Correctness Theory

Richard Hamlet, Probable correctness theory, 1980 22.

Some what ifs: - If we apply Cohen’s rule (things are indistinguishable if less than \(d{\times}\sigma\) apart, - And if variables are Gaussian ranging \(-3 \le x \le 3\). - Then that space divides into regions of size \(p=\frac{d}{6}\)

scenario d p C n(c,p) \(\log_2(n(c,p))\)
medium effect, non-safety critical 0.35 0.06 0.95 50 6
small effect, safety criticali 0.2 0.03 0.9999 272 8
tiny effects, ultra-safety critical n/a one in a million six sigma
(0.999999)
13,815,504 24

Note the above table makes some very optimistic assumptions about the problem:

But it also tells us that the only way we can reason about safety critical systems is via some sorting heuristic (so we can get the log2 effect) [^call]: Application of machine learning techniques in small sample clinical studies, from StackExchange.com https://stats.stackexchange.com/questions/1856/application-of-machine-learning-techniques-in-small-sample-clinical-studies

Few shot Learning

In the following, the author says LLMs not learners but given the results of this subject, I think an edit is in order:

Need another name

Generalize to new tasks via a sequence of prompts, starting composed of natural language instructions,

Few-shot learning is a subfield of machine learning and deep learning that aims to teach AI models how to learn from only a small number of labeled training data.

More generally “n-shot learning” a category of artificial intelligence that also includes:

Applications:

Methods:

Few Shot Learning in SE

March 2024: Google query: “few-shot learning and ‘software engineering’”

In the first 100 returns, after paper70, no more published few shot learning papers in SE.

In the remaining 70 papers:

year citations venue j=journal;
c=conf; w=workshop
title pdf data
2023 1 Icse_NLBSE w Few-Shot Learning for Issue Report Classification pdf 200 + 200
2023 2 SSBSE c . Search-based Optimisation of LLM Learning Shots for Story Point Estimation pdf 6 to 10
2023 2 ICSE c Log Parsing with Prompt-based Few-shot Learning pdf 4 to 128. most improvement before 16
2023 3 AST c FlakyCat: Predicting Flaky Tests Categories using Few-Shot Learning pdf 400+
2023 5 ICSE c Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning pdf 6-7 (for code generation (40 to 50 (for code repair)
2022 7 Soft.Lang.Eng c Neural Language Models and Few Shot Learning for Systematic Requirements Processing in MDSE pdf 8 to 11
2023 12 ICSE c Towards using Few-Shot Prompt Learning for Automating Model Completion pdf 212 classes
2020 15 IEEE ACCECSS j Few-Shot Learning Based Balanced Distribution Adaptation for Heterogeneous Defect Prediction pdf 100s - 1000s
2019 21 Big Data j . Exploring the applicability of low-shot learning in mining software repositories pdf 100 =>70% accuracy; 100s ==> 90% accuracy
2021 27 ESEM c An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction 10^3 samples of defects
2020 29 ICSE c Unsuccessful Story about Few Shot Malware Family Classification and Siamese Network to the Rescue pdf 10,000s ?
2022 65 ASE c Few-shot training LLMs for project-specific code-summarization pdf 10 samples
2022 101 FSE c Less Training_ More Repairing Please: Revisiting Automated Program Repair via Zero-Shot Learning pdf ?

  1. P. Norvig. (2011) The Unreasonable Effectiveness of Data. Youtube. https://www.youtube.com/watch?v=yvDCzhbjYWs↩︎

  2. F. Rahman, D. Posnett, I. Herraiz, and P. Devanbu, “Sample size vs. bias in defect prediction,” in Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, 2013, pp. 147–157.↩︎

  3. S. Amasaki, “Cross-version defect prediction: use historical data, crossproject data, or both?” Empirical Software Engineering, pp. 1–23, 2020.↩︎

  4. S. McIntosh and Y. Kamei, “Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction,” IEEE Transactions on Software Engineering, vol. 44, no. 5, pp. 412–428, 2017.↩︎

  5. S. N.C., S. Majumder and T. Menzies, “Early Life Cycle Software Defect Prediction. Why? How?,” 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, ES, 2021, pp. 448-459, doi: 10.1109/ICSE43902.2021.00050.↩︎

  6. Jill Larkin, John McDermott, Dorothea P. Simon, and Herbert A. Simon. 1980. Expert and Novice Performance in Solving Physics Problems. Science 208, 4450 (1980), 1335–1342. DOI:http://dx.doi.org/10.1126/science.208.4450.1335 arXiv:http://science.sciencemag.org/content/208/4450/1335.full.pdf↩︎

  7. N. Cowan. 2001. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci 24, 1 (Feb 2001), 87–114.↩︎

  8. George A Miller. 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review 63, 2 (1956), 81.↩︎

  9. Jill Larkin, John McDermott, Dorothea P. Simon, and Herbert A. Simon. 1980. Expert and Novice Performance in Solving Physics Problems. Science 208, 4450 (1980), 1335–1342. DOI:http://dx.doi.org/10.1126/science.208.4450.1335 arXiv:http://science.sciencemag.org/content/208/4450/1335.full.pdf↩︎

  10. Recently, Ma et al. [^wei14] used evidence from neuroscience and functional MRIs to argue that STM capacity might be better measured using other factors than “number of items”. But even they conceded that “the concept of a limited (STM) has considerable explanatory power for behavioral data”.↩︎

  11. Susan Wiedenbeck, Vikki Fix, and Jean Scholtz. 1993. Characteristics of the mental representations of novice and expert programmers: an empirical study. International Journal of Man-Machine Studies 39, 5 (1993), 793–812.↩︎

  12. Valerdi, Ricardo. “Heuristics for systems engineering cost estimation.” IEEE Systems Journal 5.1 (2010): 91-98.↩︎

  13. Kington, Alison. “Defining Teachers’ Classroom Relationships.” (2009). https://eprints.worc.ac.uk/1885/1/Kington%202009.pdf↩︎

  14. Easterby-Smith, Mark. “The Design, Analysis and Interpretation of Repertory Grids.” Int. J. Man Mach. Stud. 13 (1980): 3-24.↩︎

  15. Helen M. Edwards, Sharon McDonald, S. Michelle Young, The repertory grid technique: Its place in empirical software engineering research, Information and Software Technology, Volume 51, Issue 4, 2009, Pages 785-798, ISSN 0950-5849,↩︎

  16. Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19. PMID: 25411322; PMCID: PMC5394463.↩︎

  17. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996 Dec;49(12):1373-9. doi: 10.1016/s0895-4356(96)00236-3. PMID: 8970487.↩︎

  18. Alvarez, L., & Menzies, T. (2023). Don’t Lie to Me: Avoiding Malicious Explanations With STEALTH. IEEE Software, 40(3), 43-53.↩︎

  19. Zhu, X., Vondrick, C., Fowlkes, C.C. et al. Do We Need More Training Data?. Int J Comput Vis 119, 76–92 (2016). https://doi-org.prox.lib.ncsu.edu/10.1007/s11263-015-0812-2↩︎

  20. Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., & predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering (pp. 47-54).↩︎

  21. J. Nam, W. Fu, S. Kim, T. Menzies and L. Tan, “Heterogeneous Defect Prediction,” in IEEE Transactions on Software Engineering, vol. 44, no. 9, pp. 874-896, 1 Sept. 2018, doi: 10.1109/TSE.2017.2720603.↩︎

  22. Hamlet, Richard G. “Probable correctness theory.” Information processing letters 25.1 (1987): 17-25.↩︎