We start with a cohort of 10,000 healthy adolescents and will follow them for 24 months.
Step 2: Case Identified @ 8 Months
An individual is hospitalized (Case 1). We stop time at this exact point.
Step 3: Risk-Set Sampling (1:4)
From everyone *still* healthy at 8 months (the "Risk Set"), we randomly select 4 matched controls.
Step 4: Process Repeats...
This sampling process repeats *each time* a new case (Case 2 @ 14m, Case 3 @ 19m...) occurs.
Step 5: Fast Forward to 24 Months
The process continues until all **300 target cases** are identified, matched with 1,200 controls. (N = 8,500)
Step 6: Why Conditional Logit?
This 1:N matching design breaks the assumption of independence. We **must** use an analysis that respects these matched sets. **Conditional Logistic Regression** is the standard method to analyze this specific data structure.
Resulting Data Structure (for Analysis)
Matched Set
Case ID
Control IDs
Time
1
P-0752
P-1432, P-0012, P-8345, P-3321
8m
2
P-4512
P-2311, P-5543, P-0122, P-9876
14m
...
...
...
...
300
P-2319
P-5409, P-1120, P-7768, P-0034
24m
Sensitivity Analysis: SAP Threat Dashboard
1. Core Threat Simulation
3 Core Bias Threats for SAP design:
Clustering: (Scenario 3) Students in the same school are not independent. Ignoring this causes false positives.
Data Quality:
(Scenario 2) Misclassification: Self-report is unreliable.
(Scenario 4) Informative LTFU: Heavy users are more likely to drop out.
2. Simulation Parameters
Sensitivity:0.95
Specificity:0.95
Note: Parameters are locked based on the selected scenario.