csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state


home :: timetable :: syllabus :: groups :: moodle :: license

Stats


  1. Problem of Comparing Samples What are the two main parts of the problem when comparing samples, and how do significance tests and effect size tests help address these parts?

  2. Statistical Significance vs. Effect Size How can two samples be statistically significant but have only a small effect size? Why is this distinction important in practice?

  3. Real-World SE Example: Distributed Development Compare the findings of Bird et al. (2009) and Kocaguneli et al. (2013) regarding the impact of distributed development on software quality. What did each study conclude?

  4. Curve Overlap and Standard Deviation What happens to the interpretability of sample differences when the standard deviations of the distributions increase, causing overlap?

  5. Assumptions in Parametric Statistical Tests Why are parametric statistical tests used in some scenarios, and what assumptions about the data distribution are required for these tests to be valid?

  6. Challenges of Parametric Tests What are some limitations of using parametric statistical tests, especially when dealing with real-world data that might not fit a simple distribution?

  7. Scott-Knott Method for Grouping Means How does the Scott-Knott method work to cluster means in a set of samples, and what is the connect beteen Scott-Knott and (e.g.) effect size tests.

  8. Cohen’s Delta and Small Effect Size What role does Cohen’s delta play in determining whether one distribution is considered similar to another? What threshold value is used in the lecture, and what does it represent?

  9. Differences Between Parametric and Non-Parametric Tests How do non-parametric tests like CliffsDelta and bootstrap differ from parametric tests, and why might they be preferred in some cases?

  10. Blurring in Results Interpretation What is “blurring” when interpreting the results of statistical comparisons, and why is it significant when analyzing treatments with large variance?

  11. Runtime and Storage Considerations in Statistics Why are non-parametric tests generally slower and more memory-intensive than parametric tests? When might it be appropriate to use quick approximations instead?