Apply Reference Period Crosswalk to PNADC Data
Source:R/pnadc-apply-periods.R
pnadc_apply_periods.RdThis function takes a crosswalk from pnadc_identify_periods and
applies it to any PNADC dataset (quarterly or annual). It can optionally
calibrate the survey weights to match external population totals at the
chosen temporal granularity (month, fortnight, or week).
Usage
pnadc_apply_periods(
data,
crosswalk,
weight_var,
anchor,
calibrate = TRUE,
calibration_unit = c("month", "fortnight", "week"),
calibration_min_cell_size = 1,
target_totals = NULL,
smooth = FALSE,
keep_all = TRUE,
verbose = TRUE
)Arguments
- data
A data.frame or data.table with PNADC microdata. Must contain join keys
Ano,Trimestre,UPA,V1008, andV1014to merge with the crosswalk.- crosswalk
A data.table crosswalk from
pnadc_identify_periods.- weight_var
Character. Name of the survey weight column. Must be specified:
"V1028"for quarterly PNADC data"V1032"for annual PNADC data (visit-specific or annual releases organized by quarters)
- anchor
Character. How to anchor the weight redistribution. Must be specified:
"quarter"for quarterly data or annual releases organized by quarters (preserves quarterly totals)"year"for annual visit-specific data (preserves yearly totals)
- calibrate
Logical. If TRUE (default), calibrate weights to external population totals. If FALSE, only merge the crosswalk without calibration.
- calibration_unit
Character. Temporal unit for weight calibration. One of
"month"(default),"fortnight", or"week".- calibration_min_cell_size
Integer. Minimum sample size required in a cell for it to be used in hierarchical raking. Cells smaller than this threshold are collapsed to coarser levels. Default: 1 (use all cells).
- target_totals
Optional data.table with population targets. If NULL (default), fetches monthly population from SIDRA and derives targets for fortnight/week. Each time period (month, fortnight, or week) is calibrated to the FULL Brazilian population from SIDRA.
If providing custom targets, the population column (
m_populacaofor months,f_populacaofor fortnights,w_populacaofor weeks) must be in thousands. The function multiplies by 1000 internally.- smooth
Logical. If TRUE, smooth calibrated weights to remove quarterly artifacts. Smoothing is adapted per time period: monthly (3-period window), fortnight (7-period window), weekly (no smoothing). Default: FALSE.
- keep_all
Logical. If TRUE (default), keep all observations including those with undetermined reference periods. If FALSE, drop undetermined rows.
- verbose
Logical. If TRUE (default), print progress messages.
Value
A data.table with the input data plus crosswalk columns:
- ref_month_in_quarter, ref_month_in_year
Month position (1-3 in quarter, 1-12 in year)
- ref_fortnight_in_month, ref_fortnight_in_quarter
Fortnight position (1-2 in month, 1-6 in quarter)
- ref_week_in_month, ref_week_in_quarter
Week position (1-4 in month, 1-12 in quarter)
- ref_month_yyyymm, ref_fortnight_yyyyff, ref_week_yyyyww
Integer period codes
- determined_month, determined_fortnight, determined_week
Logical determination flags
- weight_monthly, weight_fortnight, or weight_weekly
Calibrated weights (if calibrate=TRUE)
Details
Merges a reference period crosswalk with PNADC microdata and optionally calibrates survey weights for sub-quarterly analysis.
Weight Calibration
When calibrate = TRUE, the function performs hierarchical rake weighting:
Groups observations by nested demographic/geographic cells
Iteratively adjusts weights so sub-period totals match anchor-period totals
Calibrates final weights against external population totals (FULL Brazilian population)
Optionally smooths weights to remove quarterly artifacts
Population Targets
All time periods (months, fortnights, and weeks) are calibrated to the FULL Brazilian population from SIDRA. This means:
Monthly weights sum to the Brazilian population for that month
Fortnight weights sum to the Brazilian population for the containing month
Weekly weights sum to the Brazilian population for the containing month
Hierarchical Raking Levels
The number of hierarchical cell levels is automatically adjusted based on the calibration unit to avoid sparse cell issues:
"month": 4 levels (age, region, state, post-stratum) - full hierarchy"fortnight": 2 levels (age, region) - simplified for lower sample size"week": 1 level (age groups only) - minimal hierarchy for sparse data
Anchor Period
The anchor parameter determines how weights are redistributed:
"quarter": Quarterly totals are preserved and redistributed to months/fortnights/weeks"year": Yearly totals are preserved and redistributed to months/fortnights/weeks
Use anchor = "quarter" with quarterly V1028 weights, and
anchor = "year" with annual V1032 weights.
See also
pnadc_identify_periods to build the crosswalk
Examples
if (FALSE) { # \dontrun{
# Build crosswalk
crosswalk <- pnadc_identify_periods(pnadc_stacked)
# Apply to quarterly data with monthly calibration
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter"
)
# Apply to annual data
result <- pnadc_apply_periods(
pnadc_annual,
crosswalk,
weight_var = "V1032",
anchor = "year"
)
# Weekly calibration
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibration_unit = "week"
)
# No calibration (just merge crosswalk)
result <- pnadc_apply_periods(
pnadc_2023,
crosswalk,
weight_var = "V1028",
anchor = "quarter",
calibrate = FALSE
)
} # }