Section 4: Descriptive Statistics

Learning Objectives

By the end of this section, students will be able to:

  • Calculate and interpret measures of central tendency for real estate data
  • Apply measures of variability to assess market volatility and risk
  • Use quartiles and percentiles to segment real estate markets
  • Detect outliers using Z-scores and statistical methods
  • Analyze correlations between real estate variables
  • Transform raw property data into actionable market intelligence

Introduction

When you first open a spreadsheet containing 10,000 property transactions, what patterns jump out at you? How do you know if that $2.3 million sale represents the market or an outlier? Real estate analysts face this challenge daily: transforming raw transaction data into actionable market intelligence. Descriptive statistics provide the framework for making sense of property data, whether you’re analyzing cap rates across a portfolio, comparing neighborhood price trends, or evaluating the performance of different asset classes.

The statistical measures we’ll explore form the backbone of every market report, investment committee presentation, and valuation model you’ll encounter in commercial real estate. These tools transform overwhelming datasets into clear insights that drive acquisition decisions, portfolio strategies, and market positioning.

Measures of Central Tendency

Understanding where the “center” of your data lies becomes critical when evaluating real estate markets. The three primary measures of central tendency—mean, median, and mode—each tell a different story about your property data, and choosing the right one can mean the difference between accurate market assessment and misleading conclusions.

Arithmetic Mean

Mean = Σxᵢ / n

Where:

  • xᵢ = each individual property value
  • n = total number of properties
  • Σ = sum of all values

Example: Chicago office building sales analysis. Consider a real estate analyst evaluating five recent office building sales in downtown Chicago: $12M, $15M, $18M, $22M, and $83M. The mean sale price would be ($12M + $15M + $18M + $22M + $83M) / 5 = $30M. Notice how that single $83M trophy asset pulls the average far above where most transactions occurred. This phenomenon appears frequently in real estate markets where a few luxury properties or institutional-grade assets can skew averages significantly upward.

The median often provides a more representative picture of typical market activity. By identifying the middle value when all transactions are arranged in order, the median resists the influence of extreme outliers that plague the mean.

Median

Median = Middle value when data is sorted

Where:

  • For odd n: position (n+1)/2
  • For even n: average of positions n/2 and (n/2)+1

Example: Median versus mean in Chicago office buildings. Using our Chicago office building example, the sorted values are $12M, $15M, $18M, $22M, $83M. The median is $18M—the middle value that better represents where most transactions actually occurred. Real estate professionals often prefer median sale prices when communicating with investors or lenders because it provides a more stable measure of market conditions.

The mode identifies the most frequently occurring value in your dataset. While less common in continuous price data, the mode proves invaluable when analyzing categorical real estate data like property types, lease terms, or buyer profiles.

Example: REIT acquisition strategy modal property size. A REIT analyst studying 50 recent acquisitions might find that 5,000-square-foot retail spaces appear most frequently in their dataset, making this the modal property size for their acquisition strategy.

Measures of Variability

Central tendency tells only half the story. Understanding the spread of property values reveals market volatility, risk levels, and investment opportunities that averages alone would miss.

Range

Range = Maximum value - Minimum value

Where:

  • Maximum = highest property value in dataset
  • Minimum = lowest property value in dataset

Example: Single-family home price range analysis. In a neighborhood analysis of 100 single-family homes, prices might range from $450,000 to $1,250,000, giving a range of $800,000. While this provides a quick sense of market breadth, the range tells us nothing about how properties distribute within these extremes.

Variance quantifies how far individual properties deviate from the market average, providing a more nuanced view of market consistency.

Sample Variance

s² = Σ(xᵢ - x̄)² / (n-1)

Where:

  • xᵢ = individual property values
  • x̄ = mean property value
  • n = sample size
  • Division by (n-1) for sample variance (Bessel’s correction)

The variance squares deviations, which amplifies the impact of outliers and changes the unit of measurement. Real estate analysts typically prefer standard deviation—the square root of variance—because it returns to the original units (dollars) and provides an intuitive measure of typical deviation from the mean.

Standard Deviation

s = √[Σ(xᵢ - x̄)² / (n-1)]

Where:

  • s = sample standard deviation
  • All other variables as defined for variance

Example: Apartment building portfolio variability comparison. A portfolio of apartment buildings with a mean value of $5M and standard deviation of $500K suggests relatively consistent asset values. Properties typically fall between $4.5M and $5.5M (within one standard deviation). Contrast this with a portfolio showing a $2M standard deviation, where values swing widely from $3M to $7M, indicating either diverse asset quality or inconsistent market conditions.

Quartiles and Market Segmentation

Real estate markets rarely distribute evenly. Quartiles and percentiles help analysts understand market stratification, from affordable housing segments to luxury properties.

Quartiles and Interquartile Range

Q1 = 25th percentile (lower quartile)

Q2 = 50th percentile (median)

Q3 = 75th percentile (upper quartile)

IQR = Q3 - Q1

Where:

  • Q1 = value below which 25% of data falls
  • Q3 = value below which 75% of data falls
  • IQR = middle 50% of the distribution

Analyzing 200 retail property cap rates in a major metropolitan area:

  • Q1 = 5.5% (investment-grade properties)
  • Q2 = 6.8% (median market cap rate)
  • Q3 = 8.2% (value-add opportunities)
  • IQR = 2.7% (typical market spread)

The interquartile range proves particularly valuable for identifying outliers in property data. Any property priced below Q1 - 1.5×IQR or above Q3 + 1.5×IQR warrants investigation—it might represent a distressed sale, a data error, or an exceptional opportunity.

Percentiles extend this concept further, allowing analysts to position individual properties within the broader market context. When a client asks, “How does our building’s $450/sq ft rent compare to the market?” reporting that it falls at the 85th percentile immediately communicates premium positioning without requiring detailed explanation.

Standardization and Outlier Detection

How do you compare a $2M industrial warehouse in Phoenix with a $15M office tower in Manhattan? Raw prices tell us little about relative value within their respective markets. Z-scores standardize data across different scales and markets, enabling meaningful comparisons.

Z-Score Standardization

Z = (x - μ) / σ

Where:

  • x = individual property value
  • μ = population mean
  • σ = population standard deviation
  • Z = number of standard deviations from mean

Example: Multifamily property Z-score interpretation. A multifamily property selling for $8M in a market with mean $6M and standard deviation $1M has a Z-score of 2.0, indicating it sold two standard deviations above typical market prices. This might flag the property as overpriced, or it could signal superior quality, location advantages, or recent renovations that justify the premium.

Real estate analysts typically investigate properties with Z-scores beyond ±2, and almost always scrutinize those beyond ±3. These extreme values often reveal:

Z-Score Range Interpretation Action Required
-3 to -2 Potentially distressed Investigate for acquisition opportunity
-2 to 2 Normal market range Standard analysis procedures
2 to 3 Premium pricing Verify comparables and adjustments
Beyond ±3 Statistical outlier Check for data error or unique circumstances

The assumption of normal distribution underlies Z-score interpretation. While many real estate metrics approximate normality (especially in large samples), property prices often show right skew due to luxury properties. Analysts should verify distribution shape before applying Z-score thresholds rigidly.

Correlation in Real Estate Markets

Real estate variables rarely operate in isolation. Understanding correlations between factors like location, size, age, and price enables more sophisticated analysis and better investment decisions.

Pearson Correlation Coefficient

r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² × Σ(yᵢ - ȳ)²]

Where:

  • r = correlation coefficient (-1 to +1)
  • xᵢ, yᵢ = paired observations
  • x̄, ȳ = respective means

Example: Office building correlation analysis. Examining 100 office buildings, an analyst might find a correlation of r = 0.72 between building size (square feet) and sale price, indicating that larger buildings generally command higher prices. But what about the relationship between building age and rental income? Here, a correlation of r = -0.31 suggests newer buildings achieve somewhat higher rents, though the relationship isn’t particularly strong.

When relationships aren’t linear—common with factors like distance from city center or building age—the Spearman rank correlation provides more accurate measurement by focusing on monotonic relationships rather than strict linearity.

Spearman Rank Correlation

ρ = 1 - (6Σdᵢ²) / [n(n² - 1)]

Where:

  • dᵢ = difference between paired ranks
  • n = number of observations
  • ρ (rho) = Spearman coefficient

Example: Transit proximity and property value correlation. Consider the relationship between proximity to transit and property values. Properties might show increasing value as they get closer to transit stations, but the relationship might flatten very close to stations due to noise and congestion. Spearman correlation captures this non-linear but monotonic relationship that Pearson might miss.

Remember that correlation never implies causation. A strong correlation between Starbucks locations and property values doesn’t mean Starbucks causes appreciation—both likely respond to underlying demographic and economic factors. This distinction becomes critical when building predictive models or making investment recommendations.

Excel Implementation

Basic Statistical Functions

Measures of Central Tendency: - Mean: =AVERAGE(data_range) - Median: =MEDIAN(data_range) - Mode: =MODE(data_range)

Measures of Variability: - Range: =MAX(data_range) - MIN(data_range) - Standard Deviation: =STDEV(data_range) - Variance: =VAR(data_range)

Quartiles and Percentiles: - Q1: =QUARTILE(data_range, 1) - Q2: =QUARTILE(data_range, 2) - Q3: =QUARTILE(data_range, 3) - IQR: =QUARTILE(data_range, 3) - QUARTILE(data_range, 1)

Z-Score Calculation

Step 1: Calculate mean and standard deviation Step 2: Apply formula: =(value - mean) / stdev Step 3: Use conditional formatting to highlight outliers

Correlation Analysis

Pearson Correlation: - =CORREL(array1, array2)

Spearman Correlation: - Rank both variables using =RANK(value, data_range) - Apply Pearson formula to ranks

Practical Implementation Framework

Descriptive statistics provide the foundation for every subsequent analysis in real estate. Before running complex models or making investment decisions, analysts should systematically examine their data through these lenses, asking: What’s typical? How much variation exists? Where do outliers appear? Which variables move together?

Example: Portfolio acquisition descriptive analysis. An acquisition analyst evaluating a potential portfolio purchase might discover through descriptive analysis that while the average cap rate looks attractive at 7.2%, the standard deviation of 2.1% reveals significant variation in asset quality. Further investigation using quartiles might show that the bottom 25% of properties have deferred maintenance issues, while correlation analysis reveals that properties near transit show 15% lower vacancy rates.

These fundamental tools, when applied thoughtfully to real estate data, transform raw numbers into market intelligence. The patterns they reveal guide everything from individual asset valuations to portfolio-wide strategic decisions. As markets shift and data grows more complex, mastery of these statistical foundations becomes not just useful but indispensable for competitive advantage in real estate investment and analysis.


© 2025 Prof. Tim Frenzel. All rights reserved. | Version 1.1.0