Identify Reference Periods in PNADC Data
Source:R/pnadc-identify-periods.R
pnadc_identify_periods.RdPNADC is a quarterly survey, but each interview actually refers to a specific temporal period within the quarter. This function identifies which month, fortnight (quinzena), and week each observation belongs to, enabling sub-quarterly time series analysis.
The algorithm uses a nested identification approach:
Phase 1: Identify MONTHS for all observations using:
IBGE's reference week timing rules (first reference week – ending in a Saturday – with sufficient days)
Respondent birthdates to constrain possible interview dates
UPA-panel level aggregation across ALL quarters (panel design)
Dynamic exception detection (identifies quarters needing relaxed rules)
Phase 2: Identify FORTNIGHTS for month-determined observations:
Search space constrained to 2 fortnights within determined month
Household-level aggregation within each quarter
Phase 3: Identify WEEKS for fortnight-determined observations:
Search space constrained to ~2 weeks within determined fortnight
Household-level aggregation within each quarter
Arguments
- data
A data.frame or data.table with PNADC microdata. Required columns:
Ano: Survey yearTrimestre: Quarter (1-4)UPA: Primary Sampling UnitV1008: Household id/sequence within UPAV1014: Panel identifierV2008: Birth day (1-31)V20081: Birth month (1-12)V20082: Birth yearV2009: Age
Optional but recommended:
V2003: Person sequence within household
- verbose
Logical. If TRUE (default), display progress information.
- store_date_bounds
Logical. If TRUE, stores date bounds and exception flags in the crosswalk for optimization when calling
pnadc_experimental_periods(). This enables 10-20x speedup for the probabilistic strategy by avoiding redundant computation. Default FALSE.
Value
A data.table crosswalk with columns:
- Ano, Trimestre, UPA, V1008, V1014
Join keys (year, quarter, UPA, household, panel)
- ref_month_in_quarter
Integer. Month position in quarter (1, 2, 3) or NA
- ref_month_in_year
Integer. Month position in year (1-12) or NA
- ref_fortnight_in_month
Integer. Fortnight position in month (1 or 2) or NA
- ref_fortnight_in_quarter
Integer. Fortnight position in quarter (1-6) or NA
- ref_week_in_month
Integer. Week position in month (1-4) or NA
- ref_week_in_quarter
Integer. Week position in quarter (1-12) or NA
- date_min
Date. Lower bound of the interview reference date for the individual. Only returned if store_date_bounds = TRUE
- date_max
Date. Upper bound of the interview reference date for the individual. Only returned if store_date_bounds = TRUE
- week_1_start
Date. Sunday of the IBGE first reference week of the month. Only returned if store_date_bounds = TRUE
- week_1_end
Date. Saturday of the IBGE first reference week of the month. Only returned if store_date_bounds = TRUE
- week_2_start
Date. Sunday of the IBGE second reference week of the month. Only returned if store_date_bounds = TRUE
- week_2_end
Date. Saturday of the IBGE second reference week of the month. Only returned if store_date_bounds = TRUE
- week_3_start
Date. Sunday of the IBGE third reference week of the month. Only returned if store_date_bounds = TRUE
- week_3_end
Date. Saturday of the IBGE third reference week of the month. Only returned if store_date_bounds = TRUE
- week_4_start
Date. Sunday of the IBGE fourth reference week of the month. Only returned if store_date_bounds = TRUE
- week_4_end
Date. Saturday of the IBGE fourth reference week of the month. Only returned if store_date_bounds = TRUE
- month_max_upa
Integer. Maximum month position across UPA-V1014 group (for debugging). Only returned if store_date_bounds = TRUE
- month_min_upa
Integer. Minimum month position across UPA-V1014 group (for debugging). Only returned if store_date_bounds = TRUE
- fortnight_max_hh
Integer. Maximum fortnight position within household (for debugging). Only returned if store_date_bounds = TRUE
- fortnight_min_hh
Integer. Minimum fortnight position within household (for debugging). Only returned if store_date_bounds = TRUE
- week_min_hh
Integer. Minimum week position within household (for debugging). Only returned if store_date_bounds = TRUE
- week_max_hh
Integer. Maximum week position within household (for debugging). Only returned if store_date_bounds = TRUE
- ref_month_yyyymm
Integer. Identified reference month in the format YYYYMM, where MM follows the IBGE calendar. 1 <= MM <= 12
- ref_fortnight_yyyyff
Integer. Identified reference fortnight in the format YYYYFF, where FF follows the IBGE calendar. 1 <= FF <= 24
- ref_week_yyyyww
Integer. Identified reference Week in the format YYYYWW, where WW follows the IBGE calendar. 1 <= WW <= 48
- determined_month
Logical. Flags if the month was determined.
- determined_fortnight
Logical. Flags if the fortnight was determined.
- determined_week
Logical. Flags if the week was determined.
Details
Builds a crosswalk containing reference periods (month, fortnight, and week) for PNADC survey data based on IBGE's interview timing rules.
Temporal Granularity
The crosswalk contains three levels of temporal granularity:
Month: 3 per quarter, ~97\
Fortnight (quinzena): 6 per quarter, ~9\
Week: 12 per quarter, ~3\
Cross-Quarter Aggregation (Important!)
For optimal month determination rates, input data should be stacked across multiple quarters (ideally 4+ years). The algorithm leverages PNADC's rotating panel design where the same UPA-V1014 is interviewed in the same relative position across quarterly visits.
Fortnight Definition
Fortnights are numbered 1-6 per quarter (2 per month), based on the IBGE reference week calendar (not calendar days). Each IBGE "month" consists of exactly 4 reference weeks (28 days), starting on a Sunday:
Fortnight 1 in month: IBGE weeks 1-2 (days 1-14 of the IBGE month)
Fortnight 2 in month: IBGE weeks 3-4 (days 15-28 of the IBGE month)
Note
Nested Identification Hierarchy
The algorithm enforces strict nesting by construction:
Fortnights can ONLY be identified for observations with determined months
Weeks can ONLY be identified for observations with determined fortnights
This guarantees: determined_week => determined_fortnight => determined_month
See also
pnadc_apply_periods to apply the crosswalk and
calibrate weights
Examples
if (FALSE) { # \dontrun{
# Build crosswalk from stacked quarterly data
crosswalk <- pnadc_identify_periods(pnadc_stacked)
# Check determination rates
crosswalk[, .(
month_rate = mean(determined_month),
fortnight_rate = mean(determined_fortnight),
week_rate = mean(determined_week)
)]
# Verify nesting (always TRUE by construction)
crosswalk[determined_fortnight, all(determined_month)]
crosswalk[determined_week, all(determined_fortnight)]
# Apply to a specific dataset
result <- pnadc_apply_periods(pnadc_2023, crosswalk,
weight_var = "V1028",
anchor = "quarter")
} # }