Synthetic Population Generation Using Simulated Annealing
Objective
To create a synthetic population for each census area by combining individual-level survey data (Understanding Society) with aggregated census data using Simulated Annealing (SA). The synthetic population will match the demographic distributions of the census while preserving individual-level characteristics.
Methodology
1. Data Preparation
graph LR
A[Survey Data] -->|Individual Records| B(Encode into Categories)
C[Census Data] -->|Category Definitions| B
C -->|Aggregated Counts| D(SA Constraints)
B --> E[Shared NumPy Array]
Example Output (Area 1):
Census Requirements:
- Females (20-40) = 2
- Females (40-60) = 1
- Males (20-40) = 1
- Males (40-60) = 0
Survey Data Pool:
ID | Age | Gender
1 | 25 | Female
2 | 35 | Male
3 | 40 | Female
4 | 45 | Female
5 | 50 | Male
Encoded Population:
ID | Category
1 | Female (20-40)
2 | Male (20-40)
3 | Female (40-60)
4 | Female (40-60)
5 | Male (40-60)
Census Requirements:
- Females (20-40) = 2
- Females (40-60) = 1
- Males (20-40) = 1
- Males (40-60) = 0
Survey Data Pool:
ID | Age | Gender
1 | 25 | Female
2 | 35 | Male
3 | 40 | Female
4 | 45 | Female
5 | 50 | Male
Encoded Population:
ID | Category
1 | Female (20-40)
2 | Male (20-40)
3 | Female (40-60)
4 | Female (40-60)
5 | Male (40-60)
2. Parallel Simulated Annealing
graph TD
E[NumPy Array] -->|Shared Data| F[Parallel SA Threads]
D[SA Constraints] --> F
F -->|Best-Fit IDs| G[Synthetic Population per Area]
3. Population Reconstruction
graph LR
G[Synthetic Population IDs] --> H[Link to Survey Data]
H --> I[Final Synthetic Dataset]
Example Output (Area 1):
Census Requirements:
- Females (20-40) = 2
- Males (20-40) = 1
SA Solution:
Selected IDs = [1, 3, 2]
Final Population:
ID | Age | Gender
1 | 25 | Female
3 | 40 | Female
2 | 35 | Male
Census Requirements:
- Females (20-40) = 2
- Males (20-40) = 1
SA Solution:
Selected IDs = [1, 3, 2]
Final Population:
ID | Age | Gender
1 | 25 | Female
3 | 40 | Female
2 | 35 | Male
Full System Workflow
graph TD
A[Survey Data] --> B(Encode Categories)
C[Census Data] -->|Category Definitions| B
C --> D(SA Constraints)
B --> E[Shared NumPy Array]
E --> F[Parallel SA Threads]
D --> F
F --> G[Synthetic Populations]
G --> H[Link IDs to Survey Data]
H --> I[Final Synthetic Dataset]
Key Enhancements
- Consistent categorization: Ensures survey data uses same groups as census
- Parallel Workflow: The data is in a format that is easily shareable among threads and readily testable by the fitness function.