Section 1: Lists

Think of lists as Excel columns on steroids. While Excel limits you to one data type per column, Python lists can mix numbers, text, and even other lists. Want to track sales data, customer names, and product categories all in one place? Lists make it happen. They’re the Swiss Army knife of data structures - simple to use, highly versatile, and indispensable for any data analysis.

Introduction

Lists are dynamic collections that can store multiple items, similar to Excel columns but much more flexible. They are important for data science because they let you work with collections of data, perform calculations across datasets, and organize information for analysis.

Lists vs Other Tools

If you’re familiar with R, Python lists are similar to R vectors but with some key differences:

Feature R Vector Python List
Indexing vec[1] (1-based) list[0] (0-based)
Mixed Types c(1, "text", TRUE) [1, "text", True]
Length length(vec) len(list)
Append c(vec, new_item) list.append(new_item)

Creating and Accessing Lists

Lists can store any type of data and are created using square brackets.

Basic List Operations

# Creating lists
sales_data = [1200, 850, 1100, 950, 1300]
products = ["laptop", "mouse", "keyboard"]
mixed_data = ["Alice", 28, True, 45000.50]

# Accessing elements (zero-indexed)
print(sales_data[0])      # First element: 1200
print(sales_data[-1])     # Last element: 1300

# Slicing - getting portions
first_three = sales_data[0:3]    # [1200, 850, 1100]
last_two = sales_data[-2:]       # [950, 1300]

# List length and basic statistics
print(f"Length: {len(sales_data)}")    # 5
print(f"Max: {max(sales_data)}")       # 1300
print(f"Sum: {sum(sales_data)}")       # 5400

List Indexing and Slicing

# Indexing examples
temperatures = [72, 68, 75, 82, 79, 71, 85]

# Positive indexing (0-based)
print(temperatures[0])    # 72 (first element)
print(temperatures[3])    # 82 (fourth element)

# Negative indexing (from end)
print(temperatures[-1])   # 85 (last element)
print(temperatures[-3])   # 79 (third from end)

# Slicing [start:end:step]
print(temperatures[1:4])      # [68, 75, 82] (elements 1-3)
print(temperatures[:3])       # [72, 68, 75] (first 3 elements)
print(temperatures[3:])       # [82, 79, 71, 85] (from 4th to end)
print(temperatures[::2])      # [72, 75, 79, 85] (every 2nd element)
print(temperatures[::-1])     # [85, 71, 79, 82, 75, 68, 72] (reversed)

Modifying Lists

Lists are mutable, meaning you can change their contents after creation.

Adding Elements

# Adding elements
inventory = ["apples", "bananas"]
inventory.append("oranges")        # Add to end
inventory.insert(1, "grapes")      # Insert at position

# Extending with another list
more_fruits = ["kiwi", "mango"]
inventory.extend(more_fruits)

print(inventory)  # ['apples', 'grapes', 'bananas', 'oranges', 'kiwi', 'mango']

Removing Elements

# Removing elements
inventory = ["apples", "grapes", "bananas", "oranges", "kiwi", "mango"]

inventory.remove("bananas")        # Remove by value
last_item = inventory.pop()        # Remove and return last item
first_item = inventory.pop(0)      # Remove and return first item

print(f"Removed: {last_item}")     # mango
print(f"Removed: {first_item}")    # apples
print(f"Remaining: {inventory}")   # ['grapes', 'oranges', 'kiwi']

Modifying Elements

# Modifying elements
prices = [19.99, 29.50, 15.25, 42.00, 8.99]

# Change individual elements
prices[0] = 24.99                  # Update first price
prices[1:3] = [35.00, 20.00]       # Update multiple elements

# Apply operations to all elements
discounted_prices = [price * 0.9 for price in prices]
print(f"Original: {prices}")
print(f"Discounted: {discounted_prices}")

Advanced List Operations for Data Science

Lists are important for data processing and analysis tasks.

Data Filtering and Transformation

# Data filtering and transformation
temperatures = [72, 68, 75, 82, 79, 71, 85]

# Filter temperatures above 75
hot_days = [temp for temp in temperatures if temp > 75]
print(f"Hot days: {hot_days}")      # [82, 79, 85]

# Convert Fahrenheit to Celsius
celsius_temps = [(f - 32) * 5/9 for f in temperatures]
print(f"Celsius: {[round(c, 1) for c in celsius_temps]}")

# Statistical operations
average_temp = sum(temperatures) / len(temperatures)
print(f"Average: {average_temp:.1f}°F")

# Sorting and organizing data
customer_scores = [("Alice", 85), ("Bob", 92), ("Carol", 78)]
sorted_scores = sorted(customer_scores, key=lambda x: x[1], reverse=True)
print(f"Top performers: {sorted_scores}")

List Comprehensions

List comprehensions provide a concise way to create lists based on existing lists.

# Traditional approach
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = []
for num in numbers:
    squares.append(num ** 2)

# List comprehension approach
squares = [num ** 2 for num in numbers]
print(f"Squares: {squares}")

# Conditional list comprehension
even_squares = [num ** 2 for num in numbers if num % 2 == 0]
print(f"Even squares: {even_squares}")

# Complex list comprehension
sales_data = [1200, 800, 1500, 900, 2000, 1100]
high_sales = [sale for sale in sales_data if sale > 1000]
print(f"High sales: {high_sales}")

Working with Nested Lists

# Nested lists for structured data
sales_by_quarter = [
    [12000, 15000, 13500],  # Q1: Jan, Feb, Mar
    [18000, 16500, 19000],  # Q2: Apr, May, Jun
    [21000, 19500, 22000],  # Q3: Jul, Aug, Sep
    [25000, 23000, 24000]   # Q4: Oct, Nov, Dec
]

# Calculate quarterly totals
quarterly_totals = [sum(quarter) for quarter in sales_by_quarter]
print(f"Quarterly totals: {quarterly_totals}")

# Find best month in each quarter
best_months = [max(quarter) for quarter in sales_by_quarter]
print(f"Best months: {best_months}")

# Calculate year-over-year growth
q4_2023 = quarterly_totals[3]
q4_2022 = 65000  # Previous year
growth_rate = ((q4_2023 - q4_2022) / q4_2022) * 100
print(f"Q4 growth: {growth_rate:.1f}%")

Common List Methods

Python provides many built-in methods for working with lists.

Core List Methods

# Sample data
scores = [85, 92, 78, 96, 88, 91, 83]

# Sorting
scores.sort()                    # Sort in place
print(f"Sorted: {scores}")      # [78, 83, 85, 88, 91, 92, 96]

scores.sort(reverse=True)        # Sort descending
print(f"Descending: {scores}")   # [96, 92, 91, 88, 85, 83, 78]

# Counting and searching
print(f"Count of 85: {scores.count(85)}")    # 1
print(f"Index of 91: {scores.index(91)}")    # 1

# Reversing
scores.reverse()
print(f"Reversed: {scores}")     # [78, 83, 85, 88, 91, 92, 96]

# Copying lists
scores_copy = scores.copy()      # Shallow copy
scores_copy.append(100)
print(f"Original: {scores}")
print(f"Copy: {scores_copy}")

Practice Exercise

Create a program that analyzes monthly sales data:

# Monthly sales analysis
monthly_sales = [45000, 52000, 48000, 61000, 55000, 67000, 59000, 72000, 65000, 58000, 63000, 71000]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

print("Monthly Sales Analysis")
print("=" * 40)

# Basic statistics
total_sales = sum(monthly_sales)
average_sales = total_sales / len(monthly_sales)
best_month = max(monthly_sales)
worst_month = min(monthly_sales)

print(f"Total Sales: ${total_sales:,}")
print(f"Average Sales: ${average_sales:,.0f}")
print(f"Best Month: ${best_month:,}")
print(f"Worst Month: ${worst_month:,}")

# Find best and worst months by name
best_month_name = months[monthly_sales.index(best_month)]
worst_month_name = months[monthly_sales.index(worst_month)]
print(f"Best Month: {best_month_name}")
print(f"Worst Month: {worst_month_name}")

# Quarterly analysis
q1_sales = monthly_sales[0:3]
q2_sales = monthly_sales[3:6]
q3_sales = monthly_sales[6:9]
q4_sales = monthly_sales[9:12]

quarterly_totals = [sum(q1_sales), sum(q2_sales), sum(q3_sales), sum(q4_sales)]
quarters = ["Q1", "Q2", "Q3", "Q4"]

print("\nQuarterly Breakdown:")
for quarter, total in zip(quarters, quarterly_totals):
    percentage = (total / total_sales) * 100
    print(f"{quarter}: ${total:,} ({percentage:.1f}%)")

# Growth analysis
print("\nMonth-over-Month Growth:")
for i in range(1, len(monthly_sales)):
    growth = ((monthly_sales[i] - monthly_sales[i-1]) / monthly_sales[i-1]) * 100
    print(f"{months[i]}: {growth:+.1f}%")

Assets

Resources

  • Python list tutorial: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
  • List comprehensions guide: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
  • Data structures reference: https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range

Summary

Lists are dynamic collections that store multiple items and support flexible operations. They are important for data science tasks like filtering, transforming, and analyzing datasets. Key concepts include indexing, slicing, methods, and list comprehensions for efficient data processing.


© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5