Optimization control parameters

Creates a control object for the SGD optimizer in optimize_alpha(). All parameters have sensible defaults; override only what you need.

Usage

optim_control(
  max_epochs = 50000L,
  lr_init = 0.05,
  convergence_tol = 1e-05,
  patience = 100L,
  barrier_mu = 1,
  entropy_mu = 0,
  target_eff_src = NULL,
  dual_eta = 1,
  alpha_init = 2,
  alpha_min = NULL,
  kernel = "power",
  use_gpu = NULL,
  device = NULL,
  dtype = "float32",
  force_chunked = NULL,
  chunk_size = NULL,
  report_every = NULL,
  perf_log = NULL
)

Arguments

max_epochs: Integer. Maximum number of epochs (full passes through all tracts). The optimizer may stop earlier if convergence is detected. Default: 50000.
lr_init: Numeric. Initial ADAM learning rate. In adaptive mode (target_eff_src set), uses constant LR throughout (the dual update changes the loss landscape every epoch, making monotone LR decay counterproductive). In non-adaptive mode, follows SGDR cosine annealing with warm restarts. Default: 0.05.
convergence_tol: Numeric. Minimum relative improvement over the patience window for the optimizer to keep running. If the deviance (or total loss in non-adaptive mode) improves by less than convergence_tol fraction over the last patience epochs, the optimizer converges. Default: 1e-5.
patience: Integer. Lookback window (in epochs) for the window-based convergence criterion. The optimizer compares the current deviance against the deviance patience epochs ago and converges when the relative improvement is less than convergence_tol. Works alongside a gradient-based criterion that triggers when the relative gradient norm is small for several consecutive epochs. Convergence is never declared before max(3 * patience, 500) epochs to allow sufficient exploration. Default: 100.
barrier_mu: Numeric. Strength of the log-barrier penalty that prevents any census tract from receiving zero predicted voters. Set to 0 to disable. Default: 1.
entropy_mu: Numeric. Strength of the Shannon entropy penalty that discourages diffuse weight distributions (many effective sources per tract). Higher values push the optimizer to concentrate weights on fewer nearby stations, reducing effective sources at the cost of higher Poisson deviance. The penalty uses the sum of per-tract entropies, consistent with the deviance and barrier terms. Set to 0 to disable. Default: 0. Ignored when target_eff_src is set (dual ascent mode).
target_eff_src: Numeric or NULL. Target number of effective sources per tract. When set (not NULL), enables dual ascent: the optimizer automatically adapts entropy_mu during training to reach this target. Must be > 1. Mutually exclusive with manual entropy_mu tuning. Default: NULL (disabled).
dual_eta: Numeric. Scaling factor for the per-epoch additive dual update of entropy_mu (augmented Lagrangian). Each epoch: entropy_mu += dual_eta * rho / T_damp * (mean_H - log(target)), where rho = m (number of source stations) and T_damp = 500 is a fixed dampening constant. The quadratic penalty (m/2) * n * (mean_H - log(target))^2 in the loss does the heavy lifting; this dual update ensures exactness. Default: 1.0.
alpha_init: Numeric scalar, vector of length n, or matrix [n x k]. Initial guess for alpha. A scalar is recycled to all tracts and brackets. Default: 2.
alpha_min: Numeric or NULL. Lower bound for alpha values. The reparameterization becomes alpha = alpha_min + softplus(theta). If NULL (default), the bound is set based on kernel: 1 for "power" (linear-or-steeper decay), 0 for "exponential".
kernel: Character. Kernel function for spatial decay. "power" (default): \(K(t) = (t + \text{offset})^{-\alpha}\), the classic inverse distance weighting kernel. "exponential": \(K(t) = \exp(-\alpha \cdot t)\), which has a light tail (relative decay increases with distance). The exponential kernel does not need an offset and allows alpha_min = 0.
use_gpu: Logical or NULL. If TRUE, use GPU (CUDA or MPS). If FALSE, use CPU. If NULL (default), reads the package option interpElections.use_gpu (set via use_gpu()).
device: Character or NULL. Torch device: "cuda", "mps", or "cpu". Only used when GPU is enabled. Default: NULL (auto-detect).
dtype: Character. Torch dtype: "float32" or "float64". Default: "float32". Float32 halves memory usage with negligible precision loss.
report_every: Integer or NULL. How often to print epoch progress. NULL (default) prints ~20 lines automatically (max_epochs %/% 20). Set to a smaller value (e.g., 10) for detailed per-batch reporting with timing. With report_every = 1 every epoch is printed.
perf_log: Character or NULL. Path to a CSV file for saving per-epoch performance metrics (loss components, gradient norm, lr, alpha stats, effective sources, wall-clock time per epoch). Useful for benchmarking and studying convergence. NULL (default) disables the log. The file is written once after the epoch loop completes (or at convergence).

Value

A list of class "interpElections_optim_control" with one element per parameter.

Examples

# Default settings
optim_control()
#> interpElections optimization control:
#>   max_epochs: 50000 
#>   lr_init: 0.05 
#>   convergence_tol: 1e-05 
#>   patience: 100 
#>   barrier_mu: 1 
#>   alpha_init: 2 
#>   alpha_min: 1 
#>   kernel: power 
#>   use_gpu: NULL (auto) 
#>   device: auto 
#>   dtype: float32 

# Use GPU with more epochs
optim_control(use_gpu = TRUE, max_epochs = 5000)
#> interpElections optimization control:
#>   max_epochs: 5000 
#>   lr_init: 0.05 
#>   convergence_tol: 1e-05 
#>   patience: 100 
#>   barrier_mu: 1 
#>   alpha_init: 2 
#>   alpha_min: 1 
#>   kernel: power 
#>   use_gpu: TRUE 
#>   device: auto 
#>   dtype: float32 

# Stricter convergence
optim_control(convergence_tol = 1e-6, patience = 200)
#> interpElections optimization control:
#>   max_epochs: 50000 
#>   lr_init: 0.05 
#>   convergence_tol: 1e-06 
#>   patience: 200 
#>   barrier_mu: 1 
#>   alpha_init: 2 
#>   alpha_min: 1 
#>   kernel: power 
#>   use_gpu: NULL (auto) 
#>   device: auto 
#>   dtype: float32

Usage

Arguments

Value

See also

Examples