Section 2: Data Sources and Data Types

Learning Objectives

By the end of this section, students will be able to:

  • Identify primary and secondary data sources for real estate analysis
  • Understand different data types and their analytical applications
  • Evaluate data quality and reliability for real estate projects
  • Access and integrate multiple data sources effectively
  • Handle geospatial and temporal data in real estate contexts

Introduction

Real estate analytics relies on diverse data sources, each offering unique insights into property markets. This section explores the data landscape available to real estate analysts and how to work with different data types.

The real estate industry generates massive amounts of data daily, yet most analysts access only a small fraction of available information when making investment decisions. This disconnect stems from a fundamental challenge: understanding which sources provide reliable information and how different data types require distinct analytical approaches.

Real estate data exists across hundreds of fragmented sources, each with its own access protocols, update frequencies, and quality standards. The same property might appear in multiple databases with conflicting square footage, varying sale prices, and mismatched addresses. Which source should you trust when CoStar shows 12,000 square feet but the county assessor records 11,450? This question directly impacts valuation models, investment decisions, and client recommendations.

Main Content

Primary Data Sources

Property Records:

  • County assessor databases
  • MLS (Multiple Listing Service) data
  • Property tax records
  • Deed transfers and ownership history

Market Data:

  • Recent sales transactions
  • Current listing prices
  • Rental rates and vacancy data
  • Market inventory levels

Economic Indicators:

  • Interest rates and mortgage data
  • Employment statistics
  • Population demographics
  • Income levels and growth

Secondary Data Sources

Geographic Data:

  • Census tract information
  • School district boundaries
  • Transportation networks
  • Environmental factors

Market Intelligence:

  • Real estate market reports
  • Industry publications
  • Economic forecasts
  • Regulatory changes

Alternative Data:

  • Satellite imagery
  • Social media sentiment
  • Foot traffic patterns
  • Crime statistics

Data Types in Real Estate

Structured Data:

  • Property characteristics (square footage, bedrooms, bathrooms)
  • Transaction records (price, date, location)
  • Financial metrics (rent, expenses, cap rates)

Unstructured Data:

  • Property descriptions and photos
  • Market commentary and reports
  • Social media posts about neighborhoods

Geospatial Data:

  • Property coordinates and boundaries
  • Neighborhood characteristics
  • Distance to amenities and services

Time Series Data:

  • Price trends over time
  • Market cycle indicators
  • Seasonal patterns

Data Quality Assessment

Real estate data quality varies significantly across sources. Analysts must evaluate data reliability before building models or making investment decisions.

Data Completeness Score

Formula = (Complete Records / Total Records) × 100

Where:

  • Complete Records = records with all required fields populated
  • Total Records = total number of records in dataset
  • Target threshold = 85% for reliable analysis
Data Accuracy Validation

Formula = (Verified Records / Sample Size) × 100

Where:

  • Verified Records = records confirmed against authoritative sources
  • Sample Size = randomly selected records for verification
  • Target threshold = 95% for high-confidence analysis

Common Data Quality Issues

Missing Values:

  • Incomplete property characteristics
  • Missing transaction dates
  • Unreported financial metrics

Data Inconsistencies:

  • Conflicting property measurements
  • Mismatched addresses across sources
  • Inconsistent date formats

Outlier Detection:

  • Unrealistic property values
  • Impossible square footage measurements
  • Anomalous market conditions

Example: Data Integration for Property Valuation

Building a comprehensive dataset for residential property valuation requires integrating multiple data sources to create a complete picture of property value drivers.

Data Integration Workflow

PROPERTY VALUATION DATASET
┌─────────────────────────────────────────────────────────────┐
│                    TARGET VARIABLE                         │
│                 Property Sale Price                         │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │
                    ┌─────────┴─────────┐
                    │                  │
                    ▼                  ▼
        ┌─────────────────┐  ┌─────────────────┐
        │   PROPERTY      │  │    LOCATION     │
        │ CHARACTERISTICS │  │      DATA       │
        └─────────────────┘  └─────────────────┘
                    │                  │
                    ▼                  ▼
        ┌─────────────────┐  ┌─────────────────┐
        │    MARKET        │  │    ECONOMIC     │
        │      DATA        │  │    CONTEXT      │
        └─────────────────┘  └─────────────────┘

Data Source Details

Property Characteristics (Internal Factors):

┌─────────────────────────────────────────┐
│ PROPERTY CHARACTERISTICS                │
├─────────────────────────────────────────┤
│ Physical Attributes:                    │
│ • Square footage: 2,400 sq ft           │
│ • Lot size: 0.25 acres                 │
│ • Year built: 2015                      │
│ • Bedrooms: 4, Bathrooms: 3            │
│ • Garage spaces: 2                      │
├─────────────────────────────────────────┤
│ Condition & Upgrades:                   │
│ • Property condition: Excellent         │
│ • Kitchen upgrades: Granite counters    │
│ • HVAC system: New (2020)               │
│ • Roof age: 8 years                     │
└─────────────────────────────────────────┘

Location Data (External Factors):

┌─────────────────────────────────────────┐
│ LOCATION DATA                           │
├─────────────────────────────────────────┤
│ Demographics:                           │
│ • Census tract: 1234.56                 │
│ • Median income: $85,000                │
│ • Population density: 2,100/sq mi       │
│ • Education level: 65% college+         │
├─────────────────────────────────────────┤
│ Amenities & Services:                   │
│ • School district rating: 8.5/10        │
│ • Distance to downtown: 3.2 miles       │
│ • Distance to airport: 12.5 miles       │
│ • Distance to park: 0.8 miles            │
└─────────────────────────────────────────┘

Market Data (Comparative Context):

┌─────────────────────────────────────────┐
│ MARKET DATA                             │
├─────────────────────────────────────────┤
│ Comparable Sales:                       │
│ • Recent sales (6 months): 12 properties│
│ • Average price: $485,000               │
│ • Price range: $420K - $550K            │
│ • Days on market: 28                    │
├─────────────────────────────────────────┤
│ Market Trends:                           │
│ • Price per sq ft: $202                 │
│ • Inventory level: 2.1 months           │
│ • Market trend: +3.2% (YoY)             │
│ • Absorption rate: 0.8 homes/month      │
└─────────────────────────────────────────┘

Economic Context (Macro Factors):

┌─────────────────────────────────────────┐
│ ECONOMIC CONTEXT                        │
├─────────────────────────────────────────┤
│ Local Economy:                          │
│ • Employment rate: 4.2%                 │
│ • Job growth: +2.1% (YoY)              │
│ • Major employers: Tech, Healthcare     │
│ • Population growth: +1.8% (YoY)        │
├─────────────────────────────────────────┤
│ Financial Environment:                  │
│ • Interest rates: 6.8% (30-year fixed)  │
│ • Mortgage availability: High            │
│ • Credit conditions: Favorable          │
│ • Inflation rate: 3.1%                  │
└─────────────────────────────────────────┘

Integration Process

Step 1: Data Collection

  • Property data from MLS listings
  • Location data from Census Bureau
  • Market data from local real estate boards
  • Economic data from Bureau of Labor Statistics

Step 2: Data Matching

  • Geocode addresses to coordinates
  • Match properties to census tracts
  • Align market data by geographic boundaries
  • Synchronize economic data by time periods

Step 3: Feature Engineering

  • Calculate distance-based features
  • Create market comparison ratios
  • Derive economic indicators
  • Standardize measurement units

Result: A comprehensive dataset where each property has 50+ features across four data categories, enabling sophisticated valuation models that capture both property-specific and market-wide factors.

Data Source Reliability Matrix

Different data sources offer varying levels of reliability for different analytical purposes:

Data Source Completeness Accuracy Timeliness Cost Best Use Case
County Assessor High High Low Free Property characteristics
MLS Data Medium High High Medium Market transactions
Census Data High High Low Free Demographics
Alternative Data Low Medium High High Market sentiment

© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.0.6