Lesson 6.4: Debugging and Dataset Challenges

Today’s Goals

  • Practice debugging code that works with datasets
  • Identify common errors when handling data
  • Build problem-solving skills with hands-on challenges

Warm-Up Question

  • What is the most common bug you’ve hit when working with data so far?

Common Data Bugs

  • Missing values causing errors in calculations
  • Wrong data types (e.g., strings instead of numbers)
  • Index out of range errors when looping through data
  • File not found errors with CSV files

Bug Example 1: Missing Data

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie'], 'score': [90, None, 85]}
df = pd.DataFrame(data)

print(df['score'].mean())   # ❌ Can cause problems with None

Fixing Missing Data

print(df['score'].dropna().mean())   # ✅ Drops missing values first

Bug Example 2: Wrong Data Type

data = {'age': ['25', '30', '35']}
df = pd.DataFrame(data)

print(df['age'].mean())  # ❌ Error: strings, not numbers

Fixing Data Types

df['age'] = df['age'].astype(int)
print(df['age'].mean())

Student Challenge

  • Each group gets a buggy dataset script
  • Debug it until it runs correctly
  • Write down what the bug was and how you fixed it

Class Activity

  • Share one bug + fix with the class
  • Compare debugging strategies

Wrap-Up

  • Debugging is about persistence and testing
  • Next time: Quiz on Unit 6 concepts