Section 2: Variables and Data Types
Ever wondered why Python is everywhere in data science? While Excel has its limits and R feels like a foreign language, Python reads almost like English and handles millions of rows without breaking a sweat. What makes Python the standard for data science and ML engineers? Python’s simple syntax makes complex data analysis accessible to technical teams.
Why Python Dominates Data Science
Python has become the standard for data science and machine learning. Understanding why helps you appreciate its design philosophy and capabilities.
Key Advantages
- Simple, readable syntax - Python code reads almost like English, making it accessible to beginners and maintainable for teams
- Massive ecosystem - Over 300,000 packages available, with specialized libraries for every data science need
- Industry standard - Used by Google, Netflix, Spotify, and virtually every major tech company for ML/AI
- Strong community - Active forums, extensive documentation, and continuous development
Usage Statistics
- 67% of data scientists use Python (Stack Overflow 2024)
- 8.2 million developers worldwide use Python
- Top choice at major technology companies for data science projects
Example: Python’s Readability
# This reads like English
if sales > target:
print("Goal achieved!")The syntax is intuitive and focuses on expressing ideas clearly rather than complex programming constructs.
Python vs Your Familiar Tools
Understanding how Python relates to tools you may already know helps contextualize its role in data analysis.
Common Task Comparison
| Task | Excel | R | Python |
|---|---|---|---|
| Data Import | Click & drag | read.csv() |
pd.read_csv() |
| Filter Data | AutoFilter | subset(data, condition) |
df[df['col'] > 5] |
| Calculate Average | =AVERAGE() |
mean(data$col) |
df['col'].mean() |
| Create Charts | Insert Chart | ggplot() |
plt.plot() |
Why Python Over Excel
- Handles larger datasets - Excel limits to ~1M rows, Python handles billions
- Reproducible - Code can be rerun, Excel requires manual steps
- Version control - Track changes in code, not possible with Excel
- Automation - Run analysis automatically, Excel requires manual intervention
Why Python Over R
- General purpose - R is statistics-focused, Python handles everything
- Better for production - R is research-focused, Python scales to production
- Machine learning - Python has more ML libraries and better performance
- Integration - Python integrates better with web services and databases
Python Syntax & Structure Basics
Python uses simple, clean syntax that emphasizes readability.
Indentation is Important
Python uses indentation (spaces or tabs) to define code blocks instead of curly braces {}.
# Correct indentation
if temperature > 80:
print("It's hot!")
print("Stay hydrated")
else:
print("Nice weather")
print("Enjoy the day")
# Wrong indentation (will cause error)
if temperature > 80:
print("It's hot!") # This will cause IndentationErrorVariable Naming Rules
# Good variable names
customer_name = "Alice"
total_sales = 15000
is_vip_customer = True
# Bad variable names (avoid these)
# 2customer = "Bob" # Can't start with number
# customer-name = "Carol" # Can't use hyphens
# class = "Premium" # Can't use Python keywordsVariables & Data Types
Variables store data that can change during program execution.
Basic Data Types
# Numbers
age = 25 # Integer
salary = 75000.50 # Float
population = 1_000_000 # Integer with underscores for readability
# Text
name = "Alice Johnson" # String
company = 'DataCorp' # String (single quotes also work)
description = """Multi-line
string for longer text"""
# Boolean
is_employed = True # Boolean (True/False)
has_degree = False
# Check data types
print(type(age)) # <class 'int'>
print(type(salary)) # <class 'float'>
print(type(name)) # <class 'str'>
print(type(is_employed)) # <class 'bool'>Type Conversion
Convert between different data types when needed.
# String to number
age_string = "25"
age_number = int(age_string) # Convert to integer
salary_string = "75000.50"
salary_number = float(salary_string) # Convert to float
# Number to string
age = 25
age_text = str(age) # Convert to string
# Boolean conversion
print(bool(1)) # True
print(bool(0)) # False
print(bool("")) # False (empty string)
print(bool("hello")) # True (non-empty string)
# Check if conversion is possible
try:
number = int("not_a_number")
except ValueError:
print("Cannot convert to integer")Basic Operations
Python supports standard mathematical and logical operations.
Mathematical Operations
# Basic arithmetic
a = 10
b = 3
print(a + b) # Addition: 13
print(a - b) # Subtraction: 7
print(a * b) # Multiplication: 30
print(a / b) # Division: 3.333...
print(a // b) # Floor division: 3
print(a % b) # Modulo (remainder): 1
print(a ** b) # Exponentiation: 1000
# Order of operations (PEMDAS)
result = 2 + 3 * 4 # 14 (not 20)
result = (2 + 3) * 4 # 20String Operations
# String concatenation
first_name = "Alice"
last_name = "Johnson"
full_name = first_name + " " + last_name
print(full_name) # "Alice Johnson"
# String formatting (f-strings - recommended)
age = 25
salary = 75000
print(f"{first_name} is {age} years old and earns ${salary:,}")
# String methods
text = " Hello World "
print(text.strip()) # "Hello World" (remove whitespace)
print(text.upper()) # " HELLO WORLD "
print(text.lower()) # " hello world "
print(text.replace("World", "Python")) # " Hello Python "Comparison Operations
# Comparison operators
x = 10
y = 5
print(x > y) # True
print(x < y) # False
print(x >= y) # True
print(x <= y) # False
print(x == y) # False (equality)
print(x != y) # True (inequality)
# String comparison
name1 = "Alice"
name2 = "alice"
print(name1 == name2) # False (case sensitive)
print(name1.lower() == name2.lower()) # TrueLogical Operations
# Logical operators
is_employed = True
has_degree = False
age = 25
# AND operator
print(is_employed and has_degree) # False
print(is_employed and age > 18) # True
# OR operator
print(is_employed or has_degree) # True
print(has_degree or age < 18) # False
# NOT operator
print(not is_employed) # False
print(not has_degree) # True
# Complex logical expressions
eligible = (age >= 18) and (is_employed or has_degree)
print(eligible) # TruePractice Exercise
Create a simple customer data analysis program:
# Customer data
customer_name = "Alice Johnson"
customer_age = 28
customer_salary = 75000
is_vip = False
purchases = [150, 200, 75, 300, 125]
# Calculate statistics
total_purchases = sum(purchases)
average_purchase = total_purchases / len(purchases)
max_purchase = max(purchases)
min_purchase = min(purchases)
# Determine VIP status
if total_purchases > 500 or customer_salary > 80000:
is_vip = True
# Generate report
print("Customer Analysis Report")
print("=" * 30)
print(f"Name: {customer_name}")
print(f"Age: {customer_age}")
print(f"Salary: ${customer_salary:,}")
print(f"VIP Status: {'Yes' if is_vip else 'No'}")
print(f"Total Purchases: ${total_purchases}")
print(f"Average Purchase: ${average_purchase:.2f}")
print(f"Largest Purchase: ${max_purchase}")
print(f"Smallest Purchase: ${min_purchase}")
# Age-based recommendations
if customer_age < 25:
recommendation = "Target young professional products"
elif customer_age < 40:
recommendation = "Focus on career advancement products"
else:
recommendation = "Emphasize retirement planning products"
print(f"Recommendation: {recommendation}")Assets
Summary
Python’s simple syntax and flexible data types make it ideal for data science. Key concepts include proper indentation, variable naming, data type conversion, and basic operations. Understanding these fundamentals prepares you for more complex programming concepts.
© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5
Comments
Use
#for single-line comments and"""for multi-line comments.