Checks that input data has required columns for the specified processing.
Value
If stop_on_error = TRUE, returns invisibly if valid or stops with error.
If stop_on_error = FALSE, returns a list with:
valid: Logical indicating if data passed all validationsissues: Named list of validation issues found (empty if none)n_rows: Number of rows in input datan_cols: Number of columns in input datajoin_keys_available: Character vector of available join key columns
Details
The function performs the following validations:
Checks for required columns for reference period identification:
Ano,Trimestre,UPA,V1008,V1014,V2008,V20081,V20082,V2009Validates year range (2012-2100 for PNADC coverage)
Validates quarter values (must be 1-4)
Validates birth day values (must be 1-31 or 99 for unknown)
Validates birth month values (must be 1-12 or 99 for unknown)
Warns about unusual ages (outside 0-130 range)
If
check_weights = TRUE, also validates weight-related columns:V1028,UF,posest,posest_sxi
See also
pnadc_identify_periods which calls this function
internally to validate input data.
Examples
# Minimal valid data (all 9 required columns)
sample_data <- data.frame(
Ano = 2023L, Trimestre = 1L, UPA = 110000001L,
V1008 = 1L, V1014 = 1L,
V2008 = 15L, V20081 = 3L, V20082 = 1990L, V2009 = 33L
)
validate_pnadc(sample_data)
# Data with missing columns returns issues (non-stop mode)
incomplete_data <- data.frame(Ano = 2023L, Trimestre = 1L)
result <- validate_pnadc(incomplete_data, stop_on_error = FALSE)
result$valid # FALSE
#> [1] FALSE
result$issues # lists missing columns
#> $missing_ref_month
#> [1] "UPA" "V1008" "V1014" "V2008" "V20081" "V20082" "V2009"
#>