Synthetic Population Generation and Validation Proposal
Problem Statement
The Problem: We currently lack a real population dataset to test the synthetic population (synth-pop) method effectively.
Axioms
- A synthetic population should be as close to a twin of the target population as possible.
- The synthetic population generation method should be applicable to all populations.
- We can create a precise artificial population using deterministic methods.
Hypothesis
The synthetic population generation method should be able to create a population that is a twin of an artificial population
Methodology
Deterministic Rules for Artificial Population Generation:
- Create sample LOAs with a mixed makeup.
- Take a subsection of the artificial population for survey purposes.
- Conduct a census on the full artificial population.
Comparison Methods:
- Compare the results between:
- IPF + Monte Carlo
- Just Synthetic Annealing
- GAN + Simulated Annealing (offer this to Daniel)
Proposed Steps
Step 1: Create a National Artifical Population
- Decide on:
- Size of the population
- Granularity of the population
- Create rules for population generation
- Have population groupings
- Have diferent proportions of grouping within an area
- Add randomness
Expected Outcomes
- A validated synthetic population generation method that can be applied to various populations.
- Insights into the effectiveness of different comparison methods (IPF+ Monte Carlo with GAN + SA, just SA, and GAN+ R version of SA).
- Identification of any selection biases in the survey and census data.
Conclusion
This proposal outlines a comprehensive approach to generating and validating a synthetic population using deterministic methods and various comparison techniques. The results will provide valuable insights into the accuracy and applicability of the synthetic population generation method.