Understanding Joint Distributions in Population Synthesis

Joint vs. Marginal Distributions

Joint distributions capture how variables co-occur in a population, while marginal distributions only show individual variable statistics.

Example with Age, Gender, and Education

Marginal Distributions (Single Variables)

Joint Distributions (Combinations)

Age Gender Education Probability
18-25 Female Degree 12%
26-35 Male School 8%
18-25 Male Degree 5%

How Methods Handle Joint Distributions

Method Joint Distribution Preservation Visualization
Deterministic Reweighting Only matches single-variable marginals well
May distort natural relationships
✅ Age
✅ Gender
✅ Education
❌ Combinations
Iterative Proportional Fitting (IPF) Better at preserving 2-way relationships
3+ way interactions might still be off
✅ Age×Gender
✅ Gender×Education
⚠️ Age×Gender×Education
Conditional Probabilities Explicitly models multi-way dependencies
Best preserves realistic combinations
✅ Age×Gender×Education
✅ Natural clustering

Real-World Example

A real population might show:

Conditional probability methods will maintain these natural relationships, while simpler methods might artificially flatten them to hit marginal targets.

Key Takeaways

  1. Joint distributions reflect real-world correlations between variables
  2. Simple reweighting can create unrealistic combinations even when marginals match
  3. Choose methods based on your need for relationship preservation vs marginal accuracy