Find optimal decay parameters (alpha) for spatial interpolation

Optimizes the per-tract-per-bracket decay parameters that minimize the error between interpolated values and known population counts. Uses per-bracket SGD with column normalization and log-barrier penalty inside torch autograd. Each demographic bracket gets its own per-tract kernel; gradients flow through all computations. Works on both CPU and GPU (CUDA/MPS).

Usage

optimize_alpha(
  time_matrix,
  pop_matrix,
  source_matrix,
  row_targets = NULL,
  optim = optim_control(),
  offset = 1,
  verbose = TRUE
)

Arguments

time_matrix: Numeric matrix [n x m]. Raw travel times. Rows = target zones, columns = source points.
pop_matrix: Numeric matrix [n x k]. Known population per zone, with k demographic groups as columns.
source_matrix: Numeric matrix [m x k]. Known counts at source points (e.g., registered voters by age group).
row_targets: Numeric vector of length n, or NULL. Target row sums for balancing. Each element specifies how much weight a zone should attract, proportional to its share of total population. If NULL (default), auto-computed as rowSums(pop_matrix) / sum(pop_matrix) * m.
optim: An optim_control() object with optimization parameters (max_epochs, lr_init, convergence_tol, patience, barrier_mu, alpha_init, alpha_min, use_gpu, device, dtype). Default: optim_control().
offset: Numeric. Value added to travel times before exponentiation (power kernel only; ignored when kernel = "exponential"). Default: 1.
verbose: Logical. Print progress messages? Default: TRUE.

Value

A list of class "interpElections_optim" with components:

alpha: Numeric matrix [n x k]. Optimal per-tract-per-bracket decay parameters. Each row is a census tract, each column is a demographic bracket. Inactive brackets (zero population or voters) are filled with 1.
value: Numeric. Poisson deviance at optimum.
loss: Numeric. Full loss at optimum (deviance + barrier + entropy).
W: Numeric matrix [n x m]. Column-normalized weight matrix from the best-epoch. Use directly for interpolation via W \%*\% data. Column sums are approximately 1.
method: Character. Method used (e.g., "pb_sgd_colnorm_cpu", "pb_sgd_colnorm_cuda").
convergence: Integer. 0 = converged (gradient-based or window-based early stopping); 1 = stopped at max_epochs.
epochs: Integer. Number of epochs completed.
steps: Integer. Total number of SGD gradient steps.
elapsed: difftime object. Wall-clock time.
message: Character. Additional information.
history: Numeric vector. Full-dataset loss at each epoch.
grad_norm_final: Numeric. Final gradient norm (theta-space).
grad_history: Numeric vector. Gradient norm (theta-space) at each epoch.
lr_history: Numeric vector. Learning rate at each epoch.
kernel: Character. Kernel used ("power" or "exponential").

Details

The optimization requires the torch R package. Install it with setup_torch() if not already available.

Kernel: Two kernel functions are available, controlled via the kernel field in optim_control():

Power (default): \(K(t) = (t + \text{offset})^{-\alpha}\). Classic inverse distance weighting.
Exponential: \(K(t) = \exp(-\alpha \cdot t)\). Lighter tail; relative decay increases with distance. Does not use offset.

Parameterization: alpha[i,b] is reparameterized as alpha = alpha_min + softplus(theta) with theta unconstrained, where softplus(x) = log(1 + exp(x)). With the default alpha_min = 1 (power kernel), alpha is always at least 1 (inverse-distance decay or steeper). The exponential kernel defaults to alpha_min = 0. Set alpha_min = 0 for unconstrained optimization (similar to legacy exp(theta)).

Epoch structure: Each epoch is one full-data gradient step with exact gradients (column sums require all tracts). The loss reported at each epoch is the true loss evaluated on the full dataset.

Two execution paths:

CPU (default): use_gpu = FALSE or NULL. Uses torch on CPU device. Fast for small/medium problems (< 2000 tracts).
GPU: use_gpu = TRUE. Uses CUDA or MPS. Faster for large problems (> 2000 tracts).

GPU memory usage is bounded by 2 * ka * n * m * bytes_per_elem.

Examples

if (FALSE) { # \dontrun{
tt <- matrix(c(2, 5, 3, 4, 6, 2), nrow = 2)
pop <- matrix(c(100, 200), nrow = 2)
src <- matrix(c(80, 120, 100), nrow = 3)
result <- optimize_alpha(tt, pop, src, verbose = FALSE)
result$alpha

# With custom control
result <- optimize_alpha(tt, pop, src,
  optim = optim_control(use_gpu = TRUE, max_epochs = 5000),
  verbose = FALSE)
} # }