Section 1: Object-Oriented Programming
What is Object-Oriented Programming and why do data scientists use it? Object-Oriented Programming (OOP) is a way of organizing code that groups related information and actions together. Think of it like organizing your desk - instead of scattered papers everywhere, you put related documents in labeled folders.
What is Object-Oriented Programming?
Object-Oriented Programming organizes code by creating “objects” that combine: - Data (information stored) - Actions (things you can do with that information)
Real-World Analogy: A Filing Cabinet
Imagine a customer filing cabinet: - Data: Customer name, email, purchase history - Actions: Add new purchase, calculate total spent, check VIP status
This is exactly how OOP works - everything related to customers goes in one “customer object.”
Advantages of OOP
Pros:
- Organization: Related code stays together (easier to find and fix)
- Reusability: Write code once, use it many times
- Collaboration: Team members can work on different objects without conflicts
- Real-world modeling: Code structure matches how we think about business problems
Cons:
- Learning curve: Takes time to understand the concepts
- Complexity: Can be overkill for simple scripts
- Performance: Slightly slower than simple procedural code
When to Use OOP vs Simple Scripts
| Use OOP When | Use Simple Scripts When |
|---|---|
| Building data analysis systems | Quick one-time calculations |
| Working with customer/product data | Simple data transformations |
| Team projects | Personal analysis tasks |
| Reusable analytics tools | Proof-of-concept work |
Classes vs Objects: The Blueprint Concept
Think of building houses:
- Class = House blueprint (shows where rooms go)
- Object = Actual house built from that blueprint
Each house follows the same blueprint but has different details (address, paint color, furniture).
Simple Example: Customer Data
Customer Class (Blueprint):
- Name
- Total purchases
Customer Objects (Actual customers):
- Alice Johnson, alice@email.com, $1,250
- Bob Smith, bob@email.com, $890
Your First Simple Class
Let’s create a Customer class to store customer information:
# Step 1: Define the class (blueprint)
class Customer:
def __init__(self, name, email):
self.name = name
self.email = email
self.total_spent = 0
# Step 2: Create customer objects
alice = Customer("Alice Johnson", "alice@email.com")
bob = Customer("Bob Smith", "bob@email.com")
# Step 3: Use the customer data
print(alice.name) # Shows: Alice Johnson
print(alice.email) # Shows: alice@email.com
print(alice.total_spent) # Shows: 0What each part means:
class Customer:- Creates the customer blueprintdef __init__:- Sets up each new customer (like filling out a form)self.name = name- Stores the customer’s namealice = Customer(...)- Creates a specific customer named Alice
Adding Actions (Methods)
Methods are actions that your objects can perform. Let’s add an action to track purchases:
class Customer:
def __init__(self, name, email):
self.name = name
self.email = email
self.total_spent = 0
# Method to add a purchase
def add_purchase(self, amount):
self.total_spent = self.total_spent + amount
print(f"{self.name} spent ${amount}. Total: ${self.total_spent}")
# Create a customer
alice = Customer("Alice Johnson", "alice@email.com")
# Use the method
alice.add_purchase(100) # Shows: Alice Johnson spent $100. Total: $100
alice.add_purchase(50) # Shows: Alice Johnson spent $50. Total: $150What happened: 1. We created a method called add_purchase 2. This method takes an amount and adds it to the total 3. Each customer object can use this same action
When You’ll Use OOP in Data Science
Common scenarios:
- Customer analysis: Track customer behavior over time
- Product catalogs: Store product information and calculate metrics
- Data processing: Create reusable data cleaning tools
- Reporting systems: Generate consistent reports across different datasets
Assets
Summary
Object-Oriented Programming helps organize code by grouping related data and actions together. Think of it as creating digital filing cabinets for different types of information.
Key concepts:
- Classes: Blueprints that define what data and actions go together
- Objects: Specific instances created from those blueprints
- Methods: Actions that objects can perform
OOP is useful when building reusable data analysis tools and working with business data like customers, products, or transactions. Start simple - you don’t need to use every OOP feature right away.
© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5