Section 4: Functions

Ever found yourself copying the same Excel formula across dozens of cells? Functions are like creating your own custom Excel formulas, but infinitely more flexible. Instead of manually calculating average sales for each region, you write one function and use it everywhere. They’re the secret to writing code that doesn’t make you want to pull your hair out - clean, reusable, and actually maintainable.

Introduction

Functions are reusable blocks of code that perform specific tasks. They help organize your code, avoid repetition, and make programs easier to understand and maintain. In data science, functions are required for creating reusable analysis tools.

What are Functions?

Functions are like mini-programs within your program. They take inputs (parameters), process them, and return outputs (return values).

Real-World Analogy: Think of a calculator - Input: Numbers and operation (2 + 3) - Process: Addition calculation - Output: Result (5)

Defining Functions

Basic Function Syntax

def function_name(parameters):
    """Docstring describing what the function does"""
    # Function body
    return result

# Example
def greet(name):
    """Greet a person by name"""
    return f"Hello, {name}!"

# Call the function
message = greet("Alice")
print(message)  # "Hello, Alice!"

Function Components

def calculate_tax(amount, tax_rate=0.08):
    """
    Calculate tax on an amount
    
    Parameters:
    amount (float): The amount to calculate tax on
    tax_rate (float): The tax rate (default 8%)
    
    Returns:
    float: The calculated tax amount
    """
    tax = amount * tax_rate
    return tax

# Use the function
price = 100
tax = calculate_tax(price)
print(f"Tax on ${price}: ${tax:.2f}")

# Use with custom tax rate
high_tax = calculate_tax(price, 0.15)
print(f"High tax on ${price}: ${high_tax:.2f}")

Parameters and Arguments

Functions can accept different types of parameters.

Required Parameters

def add_numbers(a, b):
    """Add two numbers"""
    return a + b

result = add_numbers(5, 3)  # 8
print(result)

Default Parameters

def create_customer(name, email, is_vip=False, region="Unknown"):
    """Create a customer record with optional parameters"""
    return {
        "name": name,
        "email": email,
        "is_vip": is_vip,
        "region": region
    }

# Use with defaults
customer1 = create_customer("Alice", "alice@email.com")
print(customer1)

# Override defaults
customer2 = create_customer("Bob", "bob@email.com", is_vip=True, region="North")
print(customer2)

Variable Number of Arguments

def calculate_total(*prices):
    """Calculate total of variable number of prices"""
    return sum(prices)

total1 = calculate_total(10, 20, 30)  # 60
total2 = calculate_total(5, 15, 25, 35)  # 80
print(f"Total 1: {total1}")
print(f"Total 2: {total2}")

def create_report(title, *data_points):
    """Create a report with title and multiple data points"""
    report = f"Report: {title}\n"
    report += "=" * 30 + "\n"
    for i, point in enumerate(data_points, 1):
        report += f"{i}. {point}\n"
    return report

report = create_report("Sales Summary", "Q1: $50K", "Q2: $60K", "Q3: $55K")
print(report)

Return Values

Functions can return single values, multiple values, or no values.

Single Return Value

def square(number):
    """Calculate square of a number"""
    return number ** 2

result = square(5)
print(result)  # 25

Multiple Return Values

def analyze_sales(sales_list):
    """Analyze sales data and return multiple statistics"""
    total = sum(sales_list)
    average = total / len(sales_list)
    maximum = max(sales_list)
    minimum = min(sales_list)
    
    return total, average, maximum, minimum

sales = [1000, 1500, 1200, 1800, 2000]
total, avg, max_sale, min_sale = analyze_sales(sales)

print(f"Total: ${total:,}")
print(f"Average: ${avg:.2f}")
print(f"Highest: ${max_sale:,}")
print(f"Lowest: ${min_sale:,}")

No Return Value (None)

def print_customer_info(customer):
    """Print customer information (no return value)"""
    print(f"Name: {customer['name']}")
    print(f"Email: {customer['email']}")
    print(f"VIP Status: {'Yes' if customer['is_vip'] else 'No'}")

customer = {"name": "Alice", "email": "alice@email.com", "is_vip": True}
print_customer_info(customer)

Scope and Variable Visibility

Understanding scope helps you avoid variable conflicts and write better code.

Local vs Global Scope

# Global variable
company_name = "DataCorp"

def update_employee(employee_name, salary):
    """Update employee information (local scope)"""
    # Local variables
    new_salary = salary * 1.1
    department = "Analytics"
    
    print(f"Employee: {employee_name}")
    print(f"Company: {company_name}")  # Can access global
    print(f"New salary: ${new_salary:,.2f}")
    print(f"Department: {department}")
    
    return new_salary

# Call function
updated_salary = update_employee("Alice", 75000)

# These variables don't exist outside the function
# print(department)  # This would cause an error

Modifying Global Variables

# Global counter
total_orders = 0

def process_order(order_amount):
    """Process an order and update global counter"""
    global total_orders  # Need to declare global to modify
    total_orders += 1
    
    print(f"Order #{total_orders}: ${order_amount}")
    return order_amount

# Process some orders
process_order(150)
process_order(200)
process_order(75)
print(f"Total orders processed: {total_orders}")

Data Science Functions

Functions are particularly useful for data analysis tasks.

Data Processing Functions

def clean_sales_data(raw_data):
    """Clean and validate sales data"""
    cleaned_data = []
    errors = []
    
    for i, record in enumerate(raw_data):
        try:
            # Validate and clean data
            cleaned_record = {
                "id": int(record["id"]),
                "amount": float(record["amount"]),
                "date": record["date"].strip(),
                "region": record["region"].title()
            }
            
            # Additional validation
            if cleaned_record["amount"] < 0:
                errors.append(f"Record {i}: Negative amount")
                continue
                
            cleaned_data.append(cleaned_record)
            
        except (ValueError, KeyError) as e:
            errors.append(f"Record {i}: {e}")
    
    return cleaned_data, errors

# Test with sample data
raw_sales = [
    {"id": "1", "amount": "1500.50", "date": "2024-01-15", "region": "north"},
    {"id": "2", "amount": "-200", "date": "2024-01-16", "region": "south"},
    {"id": "abc", "amount": "1000", "date": "2024-01-17", "region": "east"},
    {"id": "3", "amount": "2500.75", "date": "2024-01-18", "region": "west"}
]

cleaned_data, errors = clean_sales_data(raw_sales)
print(f"Cleaned records: {len(cleaned_data)}")
print(f"Errors: {len(errors)}")
for error in errors:
    print(f"  - {error}")

Analysis Functions

def calculate_metrics(data, metric_type="basic"):
    """Calculate various metrics for data analysis"""
    if not data:
        return None
    
    amounts = [record["amount"] for record in data]
    
    if metric_type == "basic":
        return {
            "count": len(amounts),
            "total": sum(amounts),
            "average": sum(amounts) / len(amounts),
            "min": min(amounts),
            "max": max(amounts)
        }
    elif metric_type == "advanced":
        # More complex calculations
        variance = sum((x - sum(amounts)/len(amounts))**2 for x in amounts) / len(amounts)
        std_dev = variance ** 0.5
        
        return {
            "count": len(amounts),
            "total": sum(amounts),
            "average": sum(amounts) / len(amounts),
            "median": sorted(amounts)[len(amounts)//2],
            "std_dev": std_dev,
            "variance": variance
        }

# Use the function
if cleaned_data:
    basic_metrics = calculate_metrics(cleaned_data, "basic")
    print("Basic Metrics:")
    for key, value in basic_metrics.items():
        print(f"  {key}: {value}")
    
    advanced_metrics = calculate_metrics(cleaned_data, "advanced")
    print("\nAdvanced Metrics:")
    for key, value in advanced_metrics.items():
        print(f"  {key}: {value:.2f}")

Practice Exercise

Create a comprehensive customer analysis system using functions:

def create_customer_database():
    """Create a sample customer database"""
    return [
        {"id": 1, "name": "Alice Johnson", "email": "alice@email.com", "purchases": [150, 200, 300], "region": "North"},
        {"id": 2, "name": "Bob Smith", "email": "bob@email.com", "purchases": [75, 125, 100], "region": "South"},
        {"id": 3, "name": "Carol Davis", "email": "carol@email.com", "purchases": [500, 600, 400], "region": "East"},
        {"id": 4, "name": "David Wilson", "email": "david@email.com", "purchases": [50, 75, 25], "region": "West"},
        {"id": 5, "name": "Eve Brown", "email": "eve@email.com", "purchases": [800, 900, 1000], "region": "North"}
    ]

def calculate_customer_metrics(customer):
    """Calculate metrics for a single customer"""
    total_spent = sum(customer["purchases"])
    avg_purchase = total_spent / len(customer["purchases"])
    max_purchase = max(customer["purchases"])
    min_purchase = min(customer["purchases"])
    
    return {
        "id": customer["id"],
        "name": customer["name"],
        "total_spent": total_spent,
        "avg_purchase": avg_purchase,
        "max_purchase": max_purchase,
        "min_purchase": min_purchase,
        "purchase_count": len(customer["purchases"])
    }

def categorize_customer(metrics):
    """Categorize customer based on spending metrics"""
    if metrics["total_spent"] > 1000:
        return "VIP"
    elif metrics["total_spent"] > 500:
        return "Premium"
    elif metrics["total_spent"] > 200:
        return "Regular"
    else:
        return "New"

def analyze_customer_database(customers):
    """Analyze entire customer database"""
    all_metrics = []
    category_counts = {"VIP": 0, "Premium": 0, "Regular": 0, "New": 0}
    
    for customer in customers:
        metrics = calculate_customer_metrics(customer)
        category = categorize_customer(metrics)
        metrics["category"] = category
        all_metrics.append(metrics)
        category_counts[category] += 1
    
    return all_metrics, category_counts

def generate_customer_report(metrics, category_counts):
    """Generate a comprehensive customer report"""
    total_customers = len(metrics)
    total_revenue = sum(customer["total_spent"] for customer in metrics)
    avg_revenue = total_revenue / total_customers
    
    report = f"""
CUSTOMER ANALYSIS REPORT
{'=' * 50}
Total Customers: {total_customers}
Total Revenue: ${total_revenue:,.2f}
Average Revenue per Customer: ${avg_revenue:,.2f}

Customer Categories:
"""
    
    for category, count in category_counts.items():
        percentage = (count / total_customers) * 100
        report += f"  {category}: {count} customers ({percentage:.1f}%)\n"
    
    report += "\nTop 3 Customers by Spending:\n"
    top_customers = sorted(metrics, key=lambda x: x["total_spent"], reverse=True)[:3]
    for i, customer in enumerate(top_customers, 1):
        report += f"  {i}. {customer['name']}: ${customer['total_spent']:,.2f}\n"
    
    return report

# Run the analysis
customers = create_customer_database()
metrics, category_counts = analyze_customer_database(customers)
report = generate_customer_report(metrics, category_counts)
print(report)

Assets

Resources

  • Python functions tutorial: https://docs.python.org/3/tutorial/controlflow.html#defining-functions
  • Function best practices: https://realpython.com/defining-your-own-python-function/
  • Scope and namespaces: https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces

Summary

Functions are foundational for organizing and reusing code. Key concepts include defining functions, using parameters and return values, understanding scope, and applying functions to data science problems. Functions help create modular, maintainable programs that are easier to test and debug.


© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5