Experimental Period Identification Strategies

Three experimental strategies are available, all properly nested by period:

probabilistic: For narrow ranges (2 possible periods), classifies based on where most of the date interval falls. Assigns only when confidence exceeds threshold.
upa_aggregation: Extends strictly identified periods to other observations in the same UPA-V1014 within the quarter, if a sufficient proportion already have strict identification.
both: Sequentially applies probabilistic strategy first, then UPA aggregation on top. Guarantees identification rate >= max of individual strategies.

Usage

pnadc_experimental_periods(
  crosswalk,
  strategy = c("probabilistic", "upa_aggregation", "both"),
  confidence_threshold = 0.9,
  upa_proportion_threshold = 0.5,
  verbose = TRUE
)

Arguments

crosswalk: A crosswalk data.table from pnadc_identify_periods()
strategy: Character specifying which strategy to apply. Options: "probabilistic", "upa_aggregation", "both"
confidence_threshold: Numeric (0-1). Minimum confidence required to assign a probabilistic period. Used by probabilistic and combined strategies. Default 0.9.
upa_proportion_threshold: Numeric (0-1). Minimum proportion of UPA observations (within quarter) that must have strict identification with consensus for extending to unidentified observations. Default 0.5.
verbose: Logical. If TRUE, print progress information.

Value

A modified crosswalk with additional columns. Output is directly compatible with pnadc_apply_periods():

ref_month_in_quarter, ref_month_in_year, ref_month_yyyymm: Month position (combined strict + experimental, strict takes priority)
ref_fortnight_in_month, ref_fortnight_in_quarter, ref_fortnight_yyyyff: Fortnight position (combined strict + experimental)
ref_week_in_month, ref_week_in_quarter, ref_week_yyyyww: Week position (combined strict + experimental)
determined_month, determined_fortnight, determined_week: TRUE if period is assigned (strictly or experimentally)
determined_probable_month, determined_probable_fortnight, determined_probable_week: TRUE if period was assigned by probabilistic strategy
probabilistic_assignment: TRUE if any period was assigned experimentally (vs strictly deterministic)
week_1_start, week_1_end, ..., week_4_start, week_4_end: IBGE week boundaries for the assigned month

Details

Provides experimental strategies for improving period identification rates beyond the standard deterministic algorithm. All strategies respect the nested identification hierarchy: weeks require fortnights, fortnights require months.

Nesting Enforcement

All strategies enforce proper nesting:

Fortnights can only be assigned if month is identified (strictly OR experimentally)
Weeks can only be assigned if fortnight is identified (strictly OR experimentally)

Probabilistic Strategy

For each period type (processed in order: months, then fortnights, then weeks):

Check that the required parent period is identified
If bounds are narrowed to exactly 2 sequential periods, calculate which period contains most of the date interval
Calculate confidence based on the proportion of interval in the likely period (0-1)
Only assign if confidence >= confidence_threshold

For months: aggregates at UPA-V1014 level across all quarters (like strict algorithm) For fortnights and weeks: works at household level within quarter

UPA Aggregation Strategy

Extends strictly identified periods based on consensus within geographic groups:

Months: Uses UPA level within quarter
Fortnights/Weeks: Uses UPA level within quarter (all households in same UPA are interviewed in same fortnight/week within a quarter)

Calculate proportion of observations with strictly identified period
If proportion >= upa_proportion_threshold AND consensus exists, extend
Apply in nested order: months first, then fortnights, then weeks

Combined Strategy ("both")

Sequentially applies both strategies to maximize identification:

First, apply the probabilistic strategy (captures observations with narrow date ranges and high confidence)
Then, apply UPA aggregation (extends based on strict consensus within UPA/UPA-V1014 groups)

This guarantees that "both" identifies at least as many observations as either individual strategy alone. The strategies operate independently (UPA aggregation considers only strict identifications), so the result is the union of both strategies.

Integration with Weight Calibration

The output can be passed directly to pnadc_apply_periods() for weight calibration. The derived columns combine strict and experimental assignments, with strict taking priority. Use the probabilistic_assignment flag to filter if you only want strict determinations.

Note

These strategies produce "experimental" assignments, not strict determinations. The standard pnadc_identify_periods() function should be used for rigorous analysis. Experimental outputs are useful for:

Sensitivity analysis
Robustness checks
Research into identification algorithm improvements

Examples

if (FALSE) { # \dontrun{
crosswalk <- pnadc_identify_periods(pnadc_data)

crosswalk_exp <- pnadc_experimental_periods(
  crosswalk,
  strategy = "probabilistic",
  confidence_threshold = 0.9
)

crosswalk_exp[, .(
  strict = sum(!is.na(ref_month_in_quarter) & !probabilistic_assignment),
  experimental = sum(probabilistic_assignment, na.rm = TRUE),
  total = sum(determined_month)
)]

result <- pnadc_apply_periods(pnadc_data, crosswalk_exp,
                              weight_var = "V1028", anchor = "quarter")

strict_only <- crosswalk_exp[
  probabilistic_assignment == FALSE | is.na(probabilistic_assignment)
]
} # }