Section 4: Functions
Ever found yourself copying the same Excel formula across dozens of cells? Functions are like creating your own custom Excel formulas, but infinitely more flexible. Instead of manually calculating average sales for each region, you write one function and use it everywhere. They’re the secret to writing code that doesn’t make you want to pull your hair out - clean, reusable, and actually maintainable.
Introduction
Functions are reusable blocks of code that perform specific tasks. They help organize your code, avoid repetition, and make programs easier to understand and maintain. In data science, functions are required for creating reusable analysis tools.
What are Functions?
Functions are like mini-programs within your program. They take inputs (parameters), process them, and return outputs (return values).
Real-World Analogy: Think of a calculator - Input: Numbers and operation (2 + 3) - Process: Addition calculation - Output: Result (5)
Defining Functions
Basic Function Syntax
def function_name(parameters):
"""Docstring describing what the function does"""
# Function body
return result
# Example
def greet(name):
"""Greet a person by name"""
return f"Hello, {name}!"
# Call the function
message = greet("Alice")
print(message) # "Hello, Alice!"Function Components
def calculate_tax(amount, tax_rate=0.08):
"""
Calculate tax on an amount
Parameters:
amount (float): The amount to calculate tax on
tax_rate (float): The tax rate (default 8%)
Returns:
float: The calculated tax amount
"""
tax = amount * tax_rate
return tax
# Use the function
price = 100
tax = calculate_tax(price)
print(f"Tax on ${price}: ${tax:.2f}")
# Use with custom tax rate
high_tax = calculate_tax(price, 0.15)
print(f"High tax on ${price}: ${high_tax:.2f}")Parameters and Arguments
Functions can accept different types of parameters.
Required Parameters
def add_numbers(a, b):
"""Add two numbers"""
return a + b
result = add_numbers(5, 3) # 8
print(result)Default Parameters
def create_customer(name, email, is_vip=False, region="Unknown"):
"""Create a customer record with optional parameters"""
return {
"name": name,
"email": email,
"is_vip": is_vip,
"region": region
}
# Use with defaults
customer1 = create_customer("Alice", "alice@email.com")
print(customer1)
# Override defaults
customer2 = create_customer("Bob", "bob@email.com", is_vip=True, region="North")
print(customer2)Variable Number of Arguments
def calculate_total(*prices):
"""Calculate total of variable number of prices"""
return sum(prices)
total1 = calculate_total(10, 20, 30) # 60
total2 = calculate_total(5, 15, 25, 35) # 80
print(f"Total 1: {total1}")
print(f"Total 2: {total2}")
def create_report(title, *data_points):
"""Create a report with title and multiple data points"""
report = f"Report: {title}\n"
report += "=" * 30 + "\n"
for i, point in enumerate(data_points, 1):
report += f"{i}. {point}\n"
return report
report = create_report("Sales Summary", "Q1: $50K", "Q2: $60K", "Q3: $55K")
print(report)Return Values
Functions can return single values, multiple values, or no values.
Single Return Value
def square(number):
"""Calculate square of a number"""
return number ** 2
result = square(5)
print(result) # 25Multiple Return Values
def analyze_sales(sales_list):
"""Analyze sales data and return multiple statistics"""
total = sum(sales_list)
average = total / len(sales_list)
maximum = max(sales_list)
minimum = min(sales_list)
return total, average, maximum, minimum
sales = [1000, 1500, 1200, 1800, 2000]
total, avg, max_sale, min_sale = analyze_sales(sales)
print(f"Total: ${total:,}")
print(f"Average: ${avg:.2f}")
print(f"Highest: ${max_sale:,}")
print(f"Lowest: ${min_sale:,}")No Return Value (None)
def print_customer_info(customer):
"""Print customer information (no return value)"""
print(f"Name: {customer['name']}")
print(f"Email: {customer['email']}")
print(f"VIP Status: {'Yes' if customer['is_vip'] else 'No'}")
customer = {"name": "Alice", "email": "alice@email.com", "is_vip": True}
print_customer_info(customer)Scope and Variable Visibility
Understanding scope helps you avoid variable conflicts and write better code.
Local vs Global Scope
# Global variable
company_name = "DataCorp"
def update_employee(employee_name, salary):
"""Update employee information (local scope)"""
# Local variables
new_salary = salary * 1.1
department = "Analytics"
print(f"Employee: {employee_name}")
print(f"Company: {company_name}") # Can access global
print(f"New salary: ${new_salary:,.2f}")
print(f"Department: {department}")
return new_salary
# Call function
updated_salary = update_employee("Alice", 75000)
# These variables don't exist outside the function
# print(department) # This would cause an errorModifying Global Variables
# Global counter
total_orders = 0
def process_order(order_amount):
"""Process an order and update global counter"""
global total_orders # Need to declare global to modify
total_orders += 1
print(f"Order #{total_orders}: ${order_amount}")
return order_amount
# Process some orders
process_order(150)
process_order(200)
process_order(75)
print(f"Total orders processed: {total_orders}")Data Science Functions
Functions are particularly useful for data analysis tasks.
Data Processing Functions
def clean_sales_data(raw_data):
"""Clean and validate sales data"""
cleaned_data = []
errors = []
for i, record in enumerate(raw_data):
try:
# Validate and clean data
cleaned_record = {
"id": int(record["id"]),
"amount": float(record["amount"]),
"date": record["date"].strip(),
"region": record["region"].title()
}
# Additional validation
if cleaned_record["amount"] < 0:
errors.append(f"Record {i}: Negative amount")
continue
cleaned_data.append(cleaned_record)
except (ValueError, KeyError) as e:
errors.append(f"Record {i}: {e}")
return cleaned_data, errors
# Test with sample data
raw_sales = [
{"id": "1", "amount": "1500.50", "date": "2024-01-15", "region": "north"},
{"id": "2", "amount": "-200", "date": "2024-01-16", "region": "south"},
{"id": "abc", "amount": "1000", "date": "2024-01-17", "region": "east"},
{"id": "3", "amount": "2500.75", "date": "2024-01-18", "region": "west"}
]
cleaned_data, errors = clean_sales_data(raw_sales)
print(f"Cleaned records: {len(cleaned_data)}")
print(f"Errors: {len(errors)}")
for error in errors:
print(f" - {error}")Analysis Functions
def calculate_metrics(data, metric_type="basic"):
"""Calculate various metrics for data analysis"""
if not data:
return None
amounts = [record["amount"] for record in data]
if metric_type == "basic":
return {
"count": len(amounts),
"total": sum(amounts),
"average": sum(amounts) / len(amounts),
"min": min(amounts),
"max": max(amounts)
}
elif metric_type == "advanced":
# More complex calculations
variance = sum((x - sum(amounts)/len(amounts))**2 for x in amounts) / len(amounts)
std_dev = variance ** 0.5
return {
"count": len(amounts),
"total": sum(amounts),
"average": sum(amounts) / len(amounts),
"median": sorted(amounts)[len(amounts)//2],
"std_dev": std_dev,
"variance": variance
}
# Use the function
if cleaned_data:
basic_metrics = calculate_metrics(cleaned_data, "basic")
print("Basic Metrics:")
for key, value in basic_metrics.items():
print(f" {key}: {value}")
advanced_metrics = calculate_metrics(cleaned_data, "advanced")
print("\nAdvanced Metrics:")
for key, value in advanced_metrics.items():
print(f" {key}: {value:.2f}")Practice Exercise
Create a comprehensive customer analysis system using functions:
def create_customer_database():
"""Create a sample customer database"""
return [
{"id": 1, "name": "Alice Johnson", "email": "alice@email.com", "purchases": [150, 200, 300], "region": "North"},
{"id": 2, "name": "Bob Smith", "email": "bob@email.com", "purchases": [75, 125, 100], "region": "South"},
{"id": 3, "name": "Carol Davis", "email": "carol@email.com", "purchases": [500, 600, 400], "region": "East"},
{"id": 4, "name": "David Wilson", "email": "david@email.com", "purchases": [50, 75, 25], "region": "West"},
{"id": 5, "name": "Eve Brown", "email": "eve@email.com", "purchases": [800, 900, 1000], "region": "North"}
]
def calculate_customer_metrics(customer):
"""Calculate metrics for a single customer"""
total_spent = sum(customer["purchases"])
avg_purchase = total_spent / len(customer["purchases"])
max_purchase = max(customer["purchases"])
min_purchase = min(customer["purchases"])
return {
"id": customer["id"],
"name": customer["name"],
"total_spent": total_spent,
"avg_purchase": avg_purchase,
"max_purchase": max_purchase,
"min_purchase": min_purchase,
"purchase_count": len(customer["purchases"])
}
def categorize_customer(metrics):
"""Categorize customer based on spending metrics"""
if metrics["total_spent"] > 1000:
return "VIP"
elif metrics["total_spent"] > 500:
return "Premium"
elif metrics["total_spent"] > 200:
return "Regular"
else:
return "New"
def analyze_customer_database(customers):
"""Analyze entire customer database"""
all_metrics = []
category_counts = {"VIP": 0, "Premium": 0, "Regular": 0, "New": 0}
for customer in customers:
metrics = calculate_customer_metrics(customer)
category = categorize_customer(metrics)
metrics["category"] = category
all_metrics.append(metrics)
category_counts[category] += 1
return all_metrics, category_counts
def generate_customer_report(metrics, category_counts):
"""Generate a comprehensive customer report"""
total_customers = len(metrics)
total_revenue = sum(customer["total_spent"] for customer in metrics)
avg_revenue = total_revenue / total_customers
report = f"""
CUSTOMER ANALYSIS REPORT
{'=' * 50}
Total Customers: {total_customers}
Total Revenue: ${total_revenue:,.2f}
Average Revenue per Customer: ${avg_revenue:,.2f}
Customer Categories:
"""
for category, count in category_counts.items():
percentage = (count / total_customers) * 100
report += f" {category}: {count} customers ({percentage:.1f}%)\n"
report += "\nTop 3 Customers by Spending:\n"
top_customers = sorted(metrics, key=lambda x: x["total_spent"], reverse=True)[:3]
for i, customer in enumerate(top_customers, 1):
report += f" {i}. {customer['name']}: ${customer['total_spent']:,.2f}\n"
return report
# Run the analysis
customers = create_customer_database()
metrics, category_counts = analyze_customer_database(customers)
report = generate_customer_report(metrics, category_counts)
print(report)Assets
Summary
Functions are foundational for organizing and reusing code. Key concepts include defining functions, using parameters and return values, understanding scope, and applying functions to data science problems. Functions help create modular, maintainable programs that are easier to test and debug.
© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5