Section 10: SQL Fundamentals

What Is a Database?

Imagine walking into your company’s real estate office and seeing filing cabinets stretching from floor to ceiling, each drawer stuffed with property listings, tenant records, and transaction histories. Now imagine trying to find all properties in Orange County priced between $500,000 and $750,000 that sold in the last six months. How many hours would that take?

A database transforms this nightmare into a few seconds of work. Think of it as a digital filing system where information lives in organized containers called tables. Each table holds related information—one for properties, another for agents, a third for transactions. Unlike those filing cabinets, a database lets you instantly search, sort, and combine information in ways that would take weeks to accomplish manually.

In real estate analytics, databases store everything from property characteristics (square footage, bedrooms, lot size) to market transactions (sale prices, listing dates, days on market) to geographic data (neighborhood boundaries, school districts, proximity to amenities). The relational model connects these tables through shared information. For instance, a property table might connect to a transaction table through a unique property ID, allowing you to track a building’s entire sales history without duplicating all its physical characteristics in every transaction record.

Meet SQL: The Language of Databases

SQL (Structured Query Language) gives you the vocabulary to talk to databases. Rather than clicking through menus or filling out forms, you write precise instructions that tell the database exactly what information you want. SQL reads almost like English—you SELECT data FROM tables WHERE certain conditions are true.

The beauty of SQL lies in its consistency. Whether you’re working with a small local MLS database or CoStar’s national property records, the same commands work everywhere. SQL is case-insensitive for commands (though SELECT, Select, and select all work the same way, we conventionally use uppercase for readability), and its syntax follows predictable patterns that quickly become second nature.

Consider this scenario: Your investment committee wants to see all office buildings over 50,000 square feet that traded in the past year. Instead of exporting data to Excel and manually filtering, you write a single SQL query that returns exactly this information in seconds.

Tables and Data Types

Database tables look like spreadsheets but with stricter rules. Each column has a specific data type that determines what kind of information it can hold. A property address must be text, the sale price must be a number, and the listing date must be a date. This schema (the formal structure of your database) prevents errors like accidentally entering “two million” instead of 2000000 in a price field.

Real estate databases typically use these data types: - TEXT or VARCHAR for addresses, property descriptions, and agent names - INTEGER or DECIMAL for prices, square footage, and unit counts
- DATE or DATETIME for listing dates, sale dates, and lease expirations - BOOLEAN for yes/no fields like “has_parking” or “allows_pets”

A properties table might have columns like property_id (INTEGER), address (TEXT), bedrooms (INTEGER), bathrooms (DECIMAL), listing_price (DECIMAL), and date_listed (DATE). Each row represents one property, with values filling each column according to its data type.

Working with Data

The SELECT statement forms the foundation of SQL querying. Start with the simplest possible query to view all properties in your database:

SELECT * FROM properties;

The asterisk means “all columns,” making this a quick way to explore what’s in a table. But returning everything rarely helps with analysis. Instead, specify exactly which columns you need:

SELECT address, listing_price, bedrooms, bathrooms 
FROM properties;

The WHERE clause filters your results to specific criteria. Need properties under $500,000?

SELECT address, listing_price 
FROM properties 
WHERE listing_price < 500000;

What about three-bedroom homes in a specific zip code?

SELECT address, listing_price, bedrooms
FROM properties
WHERE bedrooms = 3 
  AND zip_code = '92866';

The ORDER BY clause sorts your results. Properties often need sorting by price, size, or recency:

SELECT address, listing_price, square_feet
FROM properties
WHERE property_type = 'Single Family'
ORDER BY listing_price DESC;

DESC means descending (highest to lowest), while ASC means ascending. Without specifying, SQL defaults to ascending order.

The LIMIT clause restricts how many results you see—helpful when tables contain thousands of records:

SELECT address, listing_price
FROM properties
WHERE city = 'Irvine'
ORDER BY listing_price DESC
LIMIT 10;

This query finds the ten most expensive properties in Irvine, perfect for identifying luxury market comparables.

You can combine these clauses to answer specific business questions. Your client wants to see recently listed condos under $400,000, sorted by price:

SELECT address, listing_price, date_listed
FROM properties
WHERE property_type = 'Condo'
  AND listing_price < 400000
  AND date_listed > '2025-01-01'
ORDER BY listing_price ASC;

Making Sense of Data

Real power emerges when you connect multiple tables using JOIN operations. Properties exist in neighborhoods, transactions involve agents, and leases connect tenants to units. These relationships link through foreign keys—columns that reference the primary key of another table.

Consider a database with separate tables for properties and agents. Each property listing includes an agent_id that matches an id in the agents table:

SELECT 
    p.address,
    p.listing_price,
    a.agent_name,
    a.phone
FROM properties p
JOIN agents a ON p.agent_id = a.id
WHERE p.city = 'Newport Beach';

The JOIN connects the tables where agent_id matches, giving you property details alongside agent contact information. The letters ‘p’ and ‘a’ are aliases that make the query more readable than writing out full table names repeatedly.

Aggregate functions help summarize data across multiple records. COUNT tells you how many records match your criteria:

SELECT COUNT(*) as total_properties
FROM properties
WHERE property_type = 'Multi-Family';

SUM adds up numeric values across rows:

SELECT SUM(listing_price) as total_value
FROM properties
WHERE zip_code = '92868';

AVG calculates the mean value:

SELECT AVG(price_per_sqft) as avg_price_per_sqft
FROM properties
WHERE property_type = 'Office'
  AND year_built > 2010;

The GROUP BY clause creates summaries by category. Want to see average prices by property type?

SELECT 
    property_type,
    AVG(listing_price) as avg_price,
    COUNT(*) as property_count
FROM properties
GROUP BY property_type
ORDER BY avg_price DESC;

This groups all properties by type, then calculates the average price and count for each group. The result might look like:

property_type avg_price property_count
Office 2,500,000 45
Retail 1,800,000 23
Multi-Family 1,200,000 67
Single Family 750,000 234
Condo 425,000 189

You can combine GROUP BY with JOIN operations to analyze relationships. How many properties does each agent have listed, and what’s their average listing price?

SELECT 
    a.agent_name,
    COUNT(p.property_id) as listings,
    AVG(p.listing_price) as avg_listing_price
FROM agents a
JOIN properties p ON a.id = p.agent_id
GROUP BY a.agent_name
ORDER BY listings DESC;

Adding conditions after grouping requires the HAVING clause (not WHERE). Which agents have more than 10 listings?

SELECT 
    a.agent_name,
    COUNT(p.property_id) as listings
FROM agents a
JOIN properties p ON a.id = p.agent_id
GROUP BY a.agent_name
HAVING COUNT(p.property_id) > 10
ORDER BY listings DESC;

These fundamental SQL operations—SELECT, FROM, WHERE, JOIN, and GROUP BY—form the core toolkit for real estate data analysis. Practice with small queries first, then gradually combine techniques to answer more complex questions. Start by finding all properties in a specific neighborhood, then add price filters, then join with transaction data to see recent sales, and finally group by property type to understand market segments.

The transition from Excel to SQL might feel uncomfortable initially, but the payoff comes quickly. Tasks that took hours of manual filtering and VLOOKUP formulas now complete in seconds. More importantly, SQL queries document your analysis process—anyone can see exactly how you arrived at your numbers, making your work reproducible and auditable.

Modern real estate firms store vast amounts of data: property characteristics, transaction histories, tenant records, market comparables, and economic indicators. SQL transforms this data from an overwhelming mass into actionable intelligence. Whether you’re valuing a portfolio, identifying investment opportunities, or analyzing market trends, SQL provides the precision and speed that Excel simply cannot match at scale.

References


© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.5