Creates a control object for the SGD optimizer in optimize_alpha().
All parameters have sensible defaults; override only what you need.
Usage
optim_control(
max_epochs = 50000L,
lr_init = 0.05,
convergence_tol = 1e-05,
patience = 100L,
barrier_mu = 1,
entropy_mu = 0,
target_eff_src = NULL,
dual_eta = 1,
alpha_init = 2,
alpha_min = NULL,
kernel = "power",
use_gpu = NULL,
device = NULL,
dtype = "float32",
force_chunked = NULL,
chunk_size = NULL,
report_every = NULL,
perf_log = NULL
)Arguments
- max_epochs
Integer. Maximum number of epochs (full passes through all tracts). The optimizer may stop earlier if convergence is detected. Default: 50000.
- lr_init
Numeric. Initial ADAM learning rate. In adaptive mode (
target_eff_srcset), uses constant LR throughout (the dual update changes the loss landscape every epoch, making monotone LR decay counterproductive). In non-adaptive mode, follows SGDR cosine annealing with warm restarts. Default: 0.05.- convergence_tol
Numeric. Minimum relative improvement over the patience window for the optimizer to keep running. If the deviance (or total loss in non-adaptive mode) improves by less than
convergence_tolfraction over the lastpatienceepochs, the optimizer converges. Default: 1e-5.- patience
Integer. Lookback window (in epochs) for the window-based convergence criterion. The optimizer compares the current deviance against the deviance
patienceepochs ago and converges when the relative improvement is less thanconvergence_tol. Works alongside a gradient-based criterion that triggers when the relative gradient norm is small for several consecutive epochs. Convergence is never declared beforemax(3 * patience, 500)epochs to allow sufficient exploration. Default: 100.- barrier_mu
Numeric. Strength of the log-barrier penalty that prevents any census tract from receiving zero predicted voters. Set to 0 to disable. Default: 1.
- entropy_mu
Numeric. Strength of the Shannon entropy penalty that discourages diffuse weight distributions (many effective sources per tract). Higher values push the optimizer to concentrate weights on fewer nearby stations, reducing effective sources at the cost of higher Poisson deviance. The penalty uses the sum of per-tract entropies, consistent with the deviance and barrier terms. Set to 0 to disable. Default: 0. Ignored when
target_eff_srcis set (dual ascent mode).- target_eff_src
Numeric or NULL. Target number of effective sources per tract. When set (not NULL), enables dual ascent: the optimizer automatically adapts
entropy_muduring training to reach this target. Must be > 1. Mutually exclusive with manualentropy_mutuning. Default: NULL (disabled).- dual_eta
Numeric. Scaling factor for the per-epoch additive dual update of
entropy_mu(augmented Lagrangian). Each epoch:entropy_mu += dual_eta * rho / T_damp * (mean_H - log(target)), whererho = m(number of source stations) andT_damp = 500is a fixed dampening constant. The quadratic penalty(m/2) * n * (mean_H - log(target))^2in the loss does the heavy lifting; this dual update ensures exactness. Default: 1.0.- alpha_init
Numeric scalar, vector of length n, or matrix [n x k]. Initial guess for alpha. A scalar is recycled to all tracts and brackets. Default: 2.
- alpha_min
Numeric or NULL. Lower bound for alpha values. The reparameterization becomes
alpha = alpha_min + softplus(theta). If NULL (default), the bound is set based onkernel: 1 for"power"(linear-or-steeper decay), 0 for"exponential".- kernel
Character. Kernel function for spatial decay.
"power"(default): \(K(t) = (t + \text{offset})^{-\alpha}\), the classic inverse distance weighting kernel."exponential": \(K(t) = \exp(-\alpha \cdot t)\), which has a light tail (relative decay increases with distance). The exponential kernel does not need an offset and allowsalpha_min = 0.- use_gpu
Logical or NULL. If
TRUE, use GPU (CUDA or MPS). IfFALSE, use CPU. IfNULL(default), reads the package optioninterpElections.use_gpu(set viause_gpu()).- device
Character or NULL. Torch device:
"cuda","mps", or"cpu". Only used when GPU is enabled. Default: NULL (auto-detect).- dtype
Character. Torch dtype:
"float32"or"float64". Default:"float32". Float32 halves memory usage with negligible precision loss.- report_every
Integer or NULL. How often to print epoch progress.
NULL(default) prints ~20 lines automatically (max_epochs %/% 20). Set to a smaller value (e.g.,10) for detailed per-batch reporting with timing. Withreport_every = 1every epoch is printed.- perf_log
Character or NULL. Path to a CSV file for saving per-epoch performance metrics (loss components, gradient norm, lr, alpha stats, effective sources, wall-clock time per epoch). Useful for benchmarking and studying convergence.
NULL(default) disables the log. The file is written once after the epoch loop completes (or at convergence).
Examples
# Default settings
optim_control()
#> interpElections optimization control:
#> max_epochs: 50000
#> lr_init: 0.05
#> convergence_tol: 1e-05
#> patience: 100
#> barrier_mu: 1
#> alpha_init: 2
#> alpha_min: 1
#> kernel: power
#> use_gpu: NULL (auto)
#> device: auto
#> dtype: float32
# Use GPU with more epochs
optim_control(use_gpu = TRUE, max_epochs = 5000)
#> interpElections optimization control:
#> max_epochs: 5000
#> lr_init: 0.05
#> convergence_tol: 1e-05
#> patience: 100
#> barrier_mu: 1
#> alpha_init: 2
#> alpha_min: 1
#> kernel: power
#> use_gpu: TRUE
#> device: auto
#> dtype: float32
# Stricter convergence
optim_control(convergence_tol = 1e-6, patience = 200)
#> interpElections optimization control:
#> max_epochs: 50000
#> lr_init: 0.05
#> convergence_tol: 1e-06
#> patience: 200
#> barrier_mu: 1
#> alpha_init: 2
#> alpha_min: 1
#> kernel: power
#> use_gpu: NULL (auto)
#> device: auto
#> dtype: float32