Skip to contents

Checks that input data has required columns for the specified processing.

Usage

validate_pnadc(data, check_weights = FALSE, stop_on_error = TRUE)

Arguments

data

A data.frame or data.table with PNADC microdata

check_weights

Logical. If TRUE, also check for weight-related variables.

stop_on_error

Logical. If TRUE, stops with an error. If FALSE, returns a validation report list.

Value

If stop_on_error = TRUE, returns invisibly if valid or stops with error. If stop_on_error = FALSE, returns a list with:

  • valid: Logical indicating if data passed all validations

  • issues: Named list of validation issues found (empty if none)

  • n_rows: Number of rows in input data

  • n_cols: Number of columns in input data

  • join_keys_available: Character vector of available join key columns

Details

The function performs the following validations:

  • Checks for required columns for reference period identification: Ano, Trimestre, UPA, V1008, V1014, V2008, V20081, V20082, V2009

  • Validates year range (2012-2100 for PNADC coverage)

  • Validates quarter values (must be 1-4)

  • Validates birth day values (must be 1-31 or 99 for unknown)

  • Validates birth month values (must be 1-12 or 99 for unknown)

  • Warns about unusual ages (outside 0-130 range)

  • If check_weights = TRUE, also validates weight-related columns: V1028, UF, posest, posest_sxi

See also

pnadc_identify_periods which calls this function internally to validate input data.

Examples

if (FALSE) { # \dontrun{
validate_pnadc(my_data)
validate_pnadc(my_data, check_weights = TRUE)
} # }