Experimental Period Identification Strategies
Source:R/pnadc-experimental-periods.R
pnadc_experimental_periods.RdThree experimental strategies are available, all properly nested by period:
probabilistic: For narrow ranges (2 possible periods), classifies based on where most of the date interval falls. Assigns only when confidence exceeds threshold.
upa_aggregation: Extends strictly identified periods to other observations in the same UPA-V1014 within the quarter, if a sufficient proportion already have strict identification.
both: Sequentially applies probabilistic strategy first, then UPA aggregation on top. Guarantees identification rate >= max of individual strategies.
Usage
pnadc_experimental_periods(
crosswalk,
strategy = c("probabilistic", "upa_aggregation", "both"),
confidence_threshold = 0.9,
upa_proportion_threshold = 0.5,
verbose = TRUE
)Arguments
- crosswalk
A crosswalk data.table from
pnadc_identify_periods()- strategy
Character specifying which strategy to apply. Options: "probabilistic", "upa_aggregation", "both"
- confidence_threshold
Numeric (0-1). Minimum confidence required to assign a probabilistic period. Used by probabilistic and combined strategies. Default 0.9.
- upa_proportion_threshold
Numeric (0-1). Minimum proportion of UPA observations (within quarter) that must have strict identification with consensus for extending to unidentified observations. Default 0.5.
- verbose
Logical. If TRUE, print progress information.
Value
A modified crosswalk with additional columns. Output is directly compatible
with pnadc_apply_periods():
ref_month_in_quarter,ref_month_in_year,ref_month_yyyymm: Month position (combined strict + experimental, strict takes priority)ref_fortnight_in_month,ref_fortnight_in_quarter,ref_fortnight_yyyyff: Fortnight position (combined strict + experimental)ref_week_in_month,ref_week_in_quarter,ref_week_yyyyww: Week position (combined strict + experimental)determined_month,determined_fortnight,determined_week: TRUE if period is assigned (strictly or experimentally)determined_probable_month,determined_probable_fortnight,determined_probable_week: TRUE if period was assigned by probabilistic strategyprobabilistic_assignment: TRUE if any period was assigned experimentally (vs strictly deterministic)week_1_start,week_1_end, ...,week_4_start,week_4_end: IBGE week boundaries for the assigned month
Details
Provides experimental strategies for improving period identification rates beyond the standard deterministic algorithm. All strategies respect the nested identification hierarchy: weeks require fortnights, fortnights require months.
Nesting Enforcement
All strategies enforce proper nesting:
Fortnights can only be assigned if month is identified (strictly OR experimentally)
Weeks can only be assigned if fortnight is identified (strictly OR experimentally)
Probabilistic Strategy
For each period type (processed in order: months, then fortnights, then weeks):
Check that the required parent period is identified
If bounds are narrowed to exactly 2 sequential periods, calculate which period contains most of the date interval
Calculate confidence based on the proportion of interval in the likely period (0-1)
Only assign if confidence >=
confidence_threshold
For months: aggregates at UPA-V1014 level across all quarters (like strict algorithm) For fortnights and weeks: works at household level within quarter
UPA Aggregation Strategy
Extends strictly identified periods based on consensus within geographic groups:
Months: Uses UPA level within quarter
Fortnights/Weeks: Uses UPA level within quarter (all households in same UPA are interviewed in same fortnight/week within a quarter)
Calculate proportion of observations with strictly identified period
If proportion >=
upa_proportion_thresholdAND consensus exists, extendApply in nested order: months first, then fortnights, then weeks
Combined Strategy ("both")
Sequentially applies both strategies to maximize identification:
First, apply the probabilistic strategy (captures observations with narrow date ranges and high confidence)
Then, apply UPA aggregation (extends based on strict consensus within UPA/UPA-V1014 groups)
This guarantees that "both" identifies at least as many observations as either individual strategy alone. The strategies operate independently (UPA aggregation considers only strict identifications), so the result is the union of both strategies.
Integration with Weight Calibration
The output can be passed directly to pnadc_apply_periods() for weight calibration.
The derived columns combine strict and experimental assignments, with strict taking priority. Use the
probabilistic_assignment flag to filter if you only want strict determinations.
Note
These strategies produce "experimental" assignments, not strict determinations.
The standard pnadc_identify_periods() function should be used for
rigorous analysis. Experimental outputs are useful for:
Sensitivity analysis
Robustness checks
Research into identification algorithm improvements
See also
pnadc_identify_periods to build the crosswalk that this function modifies.
pnadc_apply_periods to apply period crosswalk and calibrate weights.
Examples
if (FALSE) { # \dontrun{
# Build standard crosswalk
crosswalk <- pnadc_identify_periods(pnadc_data)
# Apply experimental strategies
crosswalk_exp <- pnadc_experimental_periods(
crosswalk,
strategy = "probabilistic",
confidence_threshold = 0.9
)
# Check how many additional assignments we get
crosswalk_exp[, .(
strict = sum(!is.na(ref_month_in_quarter) & !probabilistic_assignment),
experimental = sum(probabilistic_assignment, na.rm = TRUE),
total = sum(determined_month)
)]
# Use directly with calibration (experimental output is compatible)
result <- pnadc_apply_periods(pnadc_data, crosswalk_exp,
period = "month", calibrate = TRUE)
# Or filter to only strict determinations
strict_only <- crosswalk_exp[probabilistic_assignment == FALSE | is.na(probabilistic_assignment)]
} # }