Data & Science with Glen Wright Colopy
Jingyi Jessica Li | Statistical Hypothesis Testing vs Machine Learning Binary Classification

Jingyi Jessica Li | Statistical Hypothesis Testing vs Machine Learning Binary Classification

September 19, 2021

Jingyi Jessica Li | Statistical Hypothesis Testing versus Machine Learning Binary Classification

Jingyi Jessica Li  (UCLA) discusses her paper "Statistical Hypothesis Testing versus Machine Learning Binary Classification". Jingyi noticed several high-impact cancer research papers using multiple hypothesis testing for binary classification problems. Concerned that these papers had no guarantee on their claimed false discovery rates, Jingyi wrote a perspective article about clarifying hypothesis testing and binary classification to scientists.

#datascience #science #statistics

0:00 – Intro
1:50 – Motivation for Jingyi's article
3:22 – Jingyi's four concepts under hypothesis testing and binary
classification
8:15 – Restatement of concepts
12:25 – Emulating methods from other publications
13:10 – Classification vs hypothesis test: features vs instances
21:55 - Single vs multiple instances
23:55 - Correlations vs causation
24:30 - Jingyi’s Second and Third Guidelines
30:35 - Jingyi’s Fourth Guideline
36:15 - Jingyi’s Fifth Guideline
39:15 – Logistic regression: An inference method & a classification method
42:15 – Utility for students
44:25 – Navigating the multiple comparisons problem (again!)
51:25 – Right side, show bio-arxiv paper


Gualtiero Piccinini | What Are First-Person Data? | Philosophy of Data Science

Gualtiero Piccinini | What Are First-Person Data? | Philosophy of Data Science

August 29, 2021

Gualtiero Piccinini | What Are First-Person Data?

First-person methods (and its associated data) have been scientifically and philosophically contentious. Are they pseudoscientific? Or simply pushing the bounds of scientific methodology? Obviously, I have no idea… so Prof. Gualtiero Piccinini (University of Missouri – St. Louis) provides a helpful introduction to the topic covering the key points of its history and the philosophical/scientific debate.

0:00 Why cover first-person methods & data?
2:26 First-person methods vs first-person data?
7:10 Are first-person data legitimate at all?
11:50 Phenomenology
13:26 First-person data is extracted from human behavior
18:25 Skepticism & arguments against first-person data
25:40 Psychophysics, introspectionists, behavioralists, cognitivists, and the origins of first-person data
35:20 Using new instruments & methods in science
46:00 Is this where the philosophers roam?

#datascience #statistics #science


David Dunson | Advancing Statistical Science | Philosophy of Data Science

David Dunson | Advancing Statistical Science | Philosophy of Data Science

August 16, 2021

David Dunson | Advancing Statistical Science | Philosophy of Data Science Series

A fundamental question in the philosophy of science is "what does it mean to make scientific progress?" We will have a series of episodes centered around this question for statistics and data science. In our first episode in the series, David Dunson (Duke University) discusses important advances in Bayesian analysis, big data,  uncertainty, and scientific discovery. 

Topic Timestamps
0:00 Intro to David Dunson
1:54 What does it mean to advance data science and statistics? 
6:14 Industry & Optimization, Science & Uncertainty
8:14 Prediction & Discovery / Bayesian Modeling 
14:13 What is “complex” data?
22:49 Big Data, Bayes, and Nonparametrics
33:50 Ad hoc approaches vs principled methods
37:08 Should Machine Learning Publications Refocus on Scientific Discovery?
39:50 Mathematically principled data science & statistics
51:40 Do Bayesians just use priors as regularizers?
55:16 Bayesian Priors and Tuning Inference Methods
1:00:00 Prioritize the Most Important Work in Data Science 
1:07:07 Good Practices of Star Grad Students
1:13:17 The Science in Statistical *Science*

#datascience #science #statistics


Martin Kuldorff | Spatiotemporal Models of Disease Outbreaks

Martin Kuldorff | Spatiotemporal Models of Disease Outbreaks

August 2, 2021

Note: This conversation was recorded June 25, 2021.

Martin Kuldorff | Spatiotemporal Models of Outbreaks
Martin Kuldorff (Harvard Medical School) talks about the integration of biological & demographic information (and general reality) in the spatiotemporal models used to detect disease outbreaks. He also discusses how these methods can be applied to non-infectious diseases like cancer.

0:00 - Spatio-temporal modeling of outbreaks
6:02 - Important features of spatio-temporal outbreak models
12:20 - Which diseases wouldn't you track for modeling?
19:02 - Multiple comparison adjustments of alarms
25:15 - Domain knowledge of outbreak features
29:30 Competing hazards & risks 
34:30 Comparing hemispheres
37:00 - Bridging the gap for infectious diseases to cancer
45:10 - Retrospective data correction / changing monitoring 
57:00 - Competing risks & statistics
1:01:30 - Deducing risks & affects through knowledge of immunological mechanisms
1:09:00 - Future scientific convos

#datascience #science


Jason Costello | Data Science vs Software, Academia vs Industry

Jason Costello | Data Science vs Software, Academia vs Industry

July 19, 2021

Interested in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across various scientific domains.

 

Jason Costello | Data Science vs Software, Academia vs Industry Jason Costello (Hypervector) describes his (non-trivial) transition from academic research into big tech and then the healthcare industry. He outlines a strategy to find the cool research problems that you get in academia while still delivering value to your company. We then talk about the interface of data science / machine learning and software.

 

0:00              Deploying Data Science into the Real World
8:24              Transitioning from Academic to Industrial Data Science
16:56            First step to delivering value to industry
21:38            Toy example of high value data science
25:28            Deep technical challenges are real and useful too!
29:59            Formalized logic in machine learning solutions
32:54            Data Science & Machine Learning Projects can fail.
38:50            Getting to the cool data science projects
47:21            Putting Machine Learning Models into Software
56:21            Software and Deduction, Machine Learning and Induction
1:06:06         Is Software A Deductive Complex System?

 


Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science

Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science

June 14, 2021

Interesting in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across various scientific domains.

 

Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science

Much of our scientific inference revolves around the identification and replication of patterns in data. So what can be done when N=1? Eric Daza gives us a statistician's perspective on the ideas behind N-of-1 studies, its best examples, and strongest critiques.

 

0:00 - The purpose of N-of-1 & generalizability

3:30 - Successes and challenges in N-of-1

9:30 - A lightbulb moment

18:00 – Anomalies, Compliance, & Recurring Patterns

23:00 – Best Critiques of N-of-1, Safety, Efficacy

41:20 - Causal Inference

54:30 – Increasing the number of data scientists

1:03:30 – Biostatistics’ changing place in data science / statistical thinking


Edward McFowland III | Anomalous Pattern Detection & Model Building

Edward McFowland III | Anomalous Pattern Detection & Model Building

June 1, 2021

#datascience #statistics

Edward McFowland III | Anomalous Pattern Detection & Model Building

Edward McFowland III (Harvard Business School) describes the differences between "anomalies" and "anomalous patterns". Edward describes how this informs modeling strategies, in particular, when to use an off-the-shelf model versus building a bespoke model from scratch. He then covers how to draw inspiration from different scientific and technical fields.

0:00 Edward: Live in Conference

2:00 Outliers vs Anomalies vs Anomalous Patterns

9:30 Strategy to Identify Anomalous Data Patterns

19:15 Adding Complexity to Models

25:00 Building Blocks vs Comprehensive Models

39:05 New Pieces of Evidence

40:40 Deciding Data Science Strategies

52:30 Connecting the Technical Dots

58:40 Interdisciplinary Interests


Data Science Job Search | Advice + Q&A

Data Science Job Search | Advice + Q&A

May 26, 2021

#datascience #jobs #career #jobsearch #statistics

The Statistical Consulting Section of the ASA invited me to give a presentation on the data science job search followed by a Q&A.

They were kind enough to let me post it here (with minor edits).

My drawing of "cumulative cost" is wrong. It should intercept the "current cost" line at time = 0.

 

0:00 – Humility, Goals, & Human Data Points
5:00 – Play the Numbers Game
12:40 – Job vs Career
18:18 – Nonsensical Data Science Job Descriptions
25:40 – Technical Review & Presentation
30:00 – The Advantages of Early Career
37:25 – Save Job Descriptions / Industry vs Academia
46:10 – Career vs Job Clarification 
53:10 – Bachelor’s vs Master’s vs Doctorate?
56:10 – Delivering Value Over Time
1:08:10 – Product vs Service 
1:11:10 – Comments From an Academic Perspective
1:116:43 – Get Your Foot in the Door / Doing What You Love
1:25:50 – Future Q&A’s

 


Mike Evans | Statistical Reasoning & Evidence | Philosophy of Data Science Series

Mike Evans | Statistical Reasoning & Evidence | Philosophy of Data Science Series

May 19, 2021

Mike Evans | Statistical Reasoning & Evidence | Philosophy of Data Science Series

Mike Evans (University of Toronto) describes his approach to statistical reasoning. Mike outlines how to recognize and address problems that are statistical in nature and why these approaches should be grounded in our ability to measure statistical evidence. 

 

Watch it on YouTube at: https://youtu.be/Q7JpGZxHxXU

 

0:00 Statistical Reasoning
2:30 The Basic Problem: Reasoning on Statistical Problems
13:00 Rules of Statistical Inference
19:30 Bias (The Controversial Bit?!?!)
24:10 Steps of Statistical Reasoning
25:50 Connection to Philosophy of Science
27:35 Measuring Evidence (Frequentist vs Bayesian vs Loss Function)
29:49 Problems with the p-values
32:00 Choosing & Checking Priors
49:25 Idealism, Good Plans, Bad Plans
54:45 Describing Your Reasoning
59:20 Critiques of the Principle of Evidence
1:04:00 Data-Driven Science vs Hypothesis Driven Science


Deborah Mayo | Statistics & Severe Testing vs Pseudoscience

Deborah Mayo | Statistics & Severe Testing vs Pseudoscience

May 13, 2021

Deborah Mayo | Statistics & Severe Testing vs Pseudoscience

Watch it on…       YouTube        Podbean

 

In our fourth episode of the “science vs pseudoscience” mini-series, Deborah Mayo (Virginia Tech) specifies several necessary criteria to be scientifically rigorous. She gives several examples of how statistical thinking is essential to scientific thinking and why she believes that the “I’ll know it when I see it” approach to delineating science from pseudoscience is not a good approach. 

 

Looking to catch up with the earlier “Science vs Pseudoscience” episode?

You can watch them here:      Intro Episode 1 Episode 2 Episode 3    


Podbean App

Play this podcast on Podbean App