Synthetic Population Generation Using Simulated Annealing

Objective

To create a synthetic population for each census area by combining individual-level survey data (Understanding Society) with aggregated census data using Simulated Annealing (SA). The synthetic population will match the demographic distributions of the census while preserving individual-level characteristics.

Methodology

1. Data Preparation

graph LR A[Survey Data] -->|Individual Records| B(Encode into Categories) C[Census Data] -->|Category Definitions| B C -->|Aggregated Counts| D(SA Constraints) B --> E[Shared NumPy Array]
Example Output (Area 1):
Census Requirements:
- Females (20-40) = 2
- Females (40-60) = 1
- Males (20-40) = 1
- Males (40-60) = 0

Survey Data Pool:
ID | Age | Gender
1 | 25 | Female
2 | 35 | Male
3 | 40 | Female
4 | 45 | Female
5 | 50 | Male

Encoded Population:
ID | Category
1 | Female (20-40)
2 | Male (20-40)
3 | Female (40-60)
4 | Female (40-60)
5 | Male (40-60)

2. Parallel Simulated Annealing

graph TD E[NumPy Array] -->|Shared Data| F[Parallel SA Threads] D[SA Constraints] --> F F -->|Best-Fit IDs| G[Synthetic Population per Area]

3. Population Reconstruction

graph LR G[Synthetic Population IDs] --> H[Link to Survey Data] H --> I[Final Synthetic Dataset]
Example Output (Area 1):
Census Requirements:
- Females (20-40) = 2
- Males (20-40) = 1

SA Solution:
Selected IDs = [1, 3, 2]

Final Population:
ID | Age | Gender
1 | 25 | Female
3 | 40 | Female
2 | 35 | Male

Full System Workflow

graph TD A[Survey Data] --> B(Encode Categories) C[Census Data] -->|Category Definitions| B C --> D(SA Constraints) B --> E[Shared NumPy Array] E --> F[Parallel SA Threads] D --> F F --> G[Synthetic Populations] G --> H[Link IDs to Survey Data] H --> I[Final Synthetic Dataset]

Key Enhancements