icml-2024-splotlight-posters-short.md from https://icml.cc/virtual/2024/events/2024SpotlightPosters

EfficientZero V2: A general sample-efficient RL framework excels in diverse control tasks (discrete/continuous, visual/low-dimensional) outperforming SoTA, including DreamerV3, in 50/66 benchmarks.
Gambling-Based Confidence Sequences: A novel gambling framework constructs tight, non-asymptotic confidence sequences for bounded random vectors, including categorical and probability-vector-valued observations, outperforming existing methods like the posterior-prior ratio martingale.
In-Context Learning Circuits: Mechanistic study reveals how induction heads, key to in-context learning in transformers, emerge through interactions of three identified sub-circuits during training.
Explaining Probabilistic Models with Distributional Values: This paper introduces distributional values, generalising cooperative game theory and value operators to provide fine-grained explanations of probabilistic models like vision and language models by tracking changes in model output rather than scalar probability.
MoleculeWalk: A data-efficient, interpretable model represents molecules as random walks over graph grammars, enabling superior molecule generation and property prediction for complex molecules by explicitly describing the hierarchical design space featuring motifs to be the design basis.
Second-Order Uncertainty Quantification: A distance-based approach using the Wasserstein distance satisfies desirable theoretical properties for uncertainty measures based on second-order probability distributions in ML classification.
Truly No-Regret Learning in CMDPs: This paper introduces the first primal-dual algorithm for CMDPs with provable sublinear regret without error cancellations, guaranteeing safety during learning, not just for the final policy.
Optimal Ridge Regularization for OOD Prediction: We identify conditions determining the sign of the optimal ridge regularization level under covariate and regression shifts for OOD prediction, revealing stark differences compared to the in-distribution setting.
Transolver: A fast Transformer PDE solver uses physics-aware attention to capture correlations in complex geometries, achieving state-of-the-art results on standard benchmarks and industrial simulations.
LLM-Modulo Frameworks: Auto-regressive LLMs cannot plan or self-verify, but can be effective knowledge sources in model-based planning/reasoning, enabling more flexible problem and preference specifications.
Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD: FVF, SV, and DID representations enable GNNs to leverage cell characteristics and global geometry for enhanced CFD simulations.
Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique: We present novel optimal algorithms, dual to existing anchor-based methods, that achieve the same worst-case rates using a distinct acceleration mechanism.
Pair-Align: Improves Graph Domain Adaptation (GDA) by mitigating conditional structure shift and label shift via edge and label weight adjustments, achieving superior performance on node classification tasks.
Counterfactual Simulatability of Natural Language Explanations: Evaluating whether LLM explanations allow humans to accurately predict model outputs on counterfactual inputs, revealing limitations in explanation precision.
Relaxing the Accurate Imputation Assumption: This work proposes novel doubly robust estimators for debiased collaborative filtering, which remain unbiased even with inaccurate pseudo-labelings or propensities, and shows superior performance on semi-synthetic and real-world datasets.
Fine-tuning RL: Forgetting pre-trained capabilities hinders RL fine-tuning, impacting transfer, but knowledge retention techniques mitigate this and improve performance, achieving SOTA in NetHack.
Navigating Scaling Laws: Adaptive models optimally traverse scaling laws, outperforming static models and reducing compute by changing shape during training.
Charms: Transfers expert tabular knowledge to enhance image classification by aligning image channels and tabular attributes using optimal transport and maximizing mutual information, improving both performance and interpretability.
TravelPlanner: A new benchmark for real-world travel planning uses LLMs, 4M data records, and 1225 curated intents/plans, revealing that even GPT-4 struggles (0.6% success).
QuRating: Selects high-quality pre-training data for LMs by training a QuRater model to learn scalar ratings from pairwise comparisons of texts based on human-defined qualities, leading to improved LM performance.
Tight Partial Identification of Causal Effects: Provides closed-form tight partial identification (PI) bounds for causal effects using only the marginal confounder distribution, without imposing additional assumptions like entropy or mutual information constraints.
Time Weaver: A novel diffusion-based model leverages heterogeneous metadata (categorical, continuous, time-variant) to improve time series generation and introduces a new evaluation metric for conditional generation specificity.
MoEs Unlock Parameter Scaling for Deep RL: Value-based networks with Soft MoEs exhibit improved parameter scaling, leading to better performance in various RL training regimes.
Agnostic Sample Compression Schemes for Regression: We introduce the first bounded sample compression schemes for agnostic regression with $\ell_p$ loss, achieving linear size for linear regression with $\ell_1$ and $\ell_\infty$ losses, while proving impossibility for other $p \in (1, \infty)$.
MFA: Minimal Frame Averaging constructs minimal, provably equivariant frames for diverse groups (Lorentz, unitary) to boost ML model efficiency in tasks like n-body simulation and top tagging.
MAgg (Metamorphic Aggregation): Extends Test-Time Augmentation (TTA) with metamorphic relations modeled by a GNN to improve combinatorial problem-solving across three tasks (SAT, Decision TSP, GED).
Asymptotics of Feature Learning in Two-Layer NNs: Leveraging spiked Random Features (sRFs) and Gaussian universality, we characterize the generalization error of two-layer NNs after one GD step, demonstrating the crucial role of data adaptation for learning non-linear functions.
Position: What Makes an Image Realistic? — A good generative model is insufficient to quantify realism; a universal critic can assess realism without adversarial training, guiding practical implementations and analysis.
Failures Are Fated, But Can Be Faded: This work uses deep reinforcement learning (DRL) with limited human feedback to characterize and mitigate failures (accuracy, bias, etc.) in large vision and language models (VLMs).
Imprecise Domain Generalisation: Enables flexible OOD generalization by optimizing a spectrum of strategies during training, allowing deployment-time specification of preferences.
Promoting External and Internal Equities: Two online resource allocation models promote proportional fairness based on agent demands (external equity) and group demographics (internal equity), considering ex-ante/ex-post fairness metrics.
Block Acceleration Without Momentum: Optimal stepsizes for block gradient descent (BGD) on least-squares achieve twice the asymptotic convergence rate of gradient descent (GD) with Polyak's momentum, without using momentum, when block data matrices are orthogonal.
Best Arm Identification for Stochastic Rising Bandits: This work introduces R-UCBE and R-SR algorithms for fixed-budget BAI in SRBs, offering theoretical guarantees and matching lower bounds on error probability with sufficient budget.
Neural Jump-Diffusion Temporal Point Processes: This work introduces NJDTPP, a novel TPP framework based on neural jump-diffusion SDEs, offering model flexibility and theoretical guarantees, outperforming SOTA models in prediction tasks.
Stereographic Spherical Sliced Wasserstein (S3W): S3W leverages stereographic projections and Radon transforms to efficiently compute Wasserstein distances between spherical probability distributions, offering high speed and parallelizability for tasks like gradient flows and self-supervised learning.
What Will My Model Forget?: Forecasting forgotten examples during LM refinement improves controllability and interpretability of the replay process, reducing catastrophic forgetting by selectively replaying examples predicted to be forgotten.
BBox-Adapter: A lightweight adapter for black-box LLMs uses ranking-based Noise Contrastive Estimation to improve performance on target domains by 6.77% while reducing costs by 31.30x.
QBMK: A novel quantum-based matching kernel for un-attributed graphs leverages quantum Shannon entropy and CTQW to capture both global and local structural characteristics, outperforming state-of-the-art graph kernels and deep learning methods.
Dynamic Correlation Clustering: We present an algorithm maintaining an O(1)-approximation in O(polylog n) amortized update time for correlation clustering in dynamic vertex streams, improving upon prior O(Δ) update time.
LIDAO: An information-theoretic framework for debiasing LLMs, minimizing fluency degradation while provably improving fairness, even under adversarial prompting.
TSBO: A novel semi-supervised BO algorithm integrates the teacher-student paradigm to minimize expensive labeled data queries by using selective regularization via student feedback.
CLIF: A novel, hyperparameter-free spiking neuron model that facilitates backpropagation through complementary leaky integration, achieving SNN accuracy exceeding comparable ANNs.
Levels of AGI: A framework classifying AGI model capabilities by performance and generality, offering a common language to compare models, assess risks, and track progress.
Decision Support Systems: A novel online learning methodology leverages counterfactual prediction sets and monotonicity for exponentially improved regret in human-in-the-loop classification, without stylized expert models.
Physics of Language Models: Knowledge extraction in LLMs depends on training data diversity, requiring data augmentation (e.g., paraphrasing) for reliable extraction, as revealed by probing internal knowledge encoding.
Individual Fairness in Graph Decomposition: This work introduces novel randomized planar graph decomposition algorithms achieving various trade-offs between individual fairness (comparable separation probabilities for nodes at comparable distances) and desirable properties like connectivity and optimal cluster count, with theoretical bounds potentially tied to a major open problem in metric embeddings and demonstrated efficacy on Congressional redistricting.
No Dimensional Sampling Coresets for Classification: We derive the first dimension-independent coresets for classification problems via sensitivity sampling, enabling efficient optimization on small data subsets with approximation guarantees for various loss functions and distributional inputs.
RIME: This robust preference-based RL algorithm learns effectively from noisy preferences via a sample selection-based discriminator, warm-starting the reward model to bridge the gap between pre-training and online learning.
Learning Optimal Deterministic Policies with Stochastic PG: This work provides a theoretical analysis of training stochastic policies via PG methods, for deploying their deterministic counterparts in continuous RL, studying convergence and exploring the action/parameter exploration trade-off.
Transformers, Parallel Computation, and Logarithmic Depth: We show that constant self-attention layers are equivalent to constant MPC rounds, implying logarithmic depth suffices for basic tasks, unlike other neural sequence models and some transformer approximations, establishing parallelism as key.
How Free is Parameter-Free Stochastic Optimization?: This work introduces the first fully parameter-free stochastic non-convex optimization method, outperforming SOTA algorithms, while proving a lower bound that limits the possibility of fully parameter-free stochastic convex optimization.
UEO: Realistically fine-tunes CLIP on unlabeled data with out-of-distribution samples by jointly optimizing textual prompts and visual features to enhance in- and out-of-distribution recognition.
Novel Spectral Algorithms for the PCM: This work introduces a fast and accurate spectral algorithm for PCM inference with optimal error guarantees, enabling efficient learning of mixture models and demonstrating its effectiveness on real-world datasets.
Sharp Rates in Dependent Learning Theory: We derive near mixing-free, instance-optimal generalization rates for ERM with dependent data and square loss in hypothesis classes with tail decay in Orlicz space.
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping: Combining target networks and over-parameterized linear function approximation weakens the convergence condition for bootstrapped value estimation with off-policy data.
Replicable Learning of Large-Margin Halfspaces: We present the first dimension-independent, polynomial-time, proper, replicable algorithm for learning large-margin halfspaces with improved sample complexity, achieving optimal dependence on the accuracy parameter.
Beyond Implicit Bias: In online learning, unlike offline learning, SGD noise offers no implicit bias benefits; its advantages are purely computational, enabling cheaper gradient steps along a "golden path."
SLIPS: A novel stochastic localization technique, based on iterative posterior sampling, enabling efficient sampling from unnormalized densities, demonstrated on multimodal benchmarks.
Polytope-Bound Neural Architectures: This work derives upper and lower bounds for NN widths based on the polytope structure of the dataset, applying these principles to simplicial complexes and manifolds, and develops an algorithm to infer dataset polytope structure from trained NNs.
IVON, an improved variational optimizer, matches or surpasses Adam's performance when training LLMs and ResNets, offering better predictive uncertainty at similar computational cost and enabling novel applications like finetuning, merging, and generalization error prediction.
SMM: Sample-specific multi-channel masks improve visual reprogramming (VR) by reducing approximation error via a lightweight ConvNet generating unique masks for each sample, boosting performance on ResNet and ViT.
A Distributional Analogue to the Successor Representation: This work introduces the distributional successor measure (SM), separating transition dynamics and reward in distributional RL, enabling zero-shot risk-sensitive policy evaluation.
Fundamental Benefit of Alternating Updates in Minimax Optimization: We prove that Alt-GDA converges faster than Sim-GDA for SC-SC objectives and introduce Alex-GDA, a faster variant with extrapolation, achieving linear convergence for bilinear problems.
Adaptive Online Experimental Design for Causal Discovery: An interventional-data-efficient online learning algorithm adaptively selects interventions and terminates with high probability after taking a problem-dependent number of interventional samples, outperforming existing causal discovery methods.
BASED: Combining linear and sliding window attention enables navigating the recall-memory tradeoff, matching strong sub-quadratic LMs in perplexity while outperforming them on recall-intensive tasks.
ZPDVR: A novel ZO method reduces both sampling and coordinate-wise variances for composite optimization, achieving optimal SZO complexity without approximating FO information.
Closing the Gap: We prove the first global convergence of the last iterate of actor-critic with neural networks, Markovian sampling, and continuous spaces, achieving a sample complexity of $\tilde{O}(\epsilon^{-3})$.
Leveraging Attractor Dynamics in Spatial Navigation for Better Language Parsing: A prefrontal-hippocampal-entorhinal model (PHE-trinity) uses attractor networks, like entorhinal grid networks, to represent syntactic structure for improved language command parsing, showing enhanced systematic generalization.
ANT: A novel diffusion model transfer learning method uses similarity-guided training and adversarial noise selection to address data limitations in few-shot image generation, improving quality and diversity.
SymC: Exploiting code symmetries with a novel equivariant self-attention, this model achieves superior performance on program analysis tasks, outperforming SOTA code models, including GPT-4, without pre-training.
CSIQA: A novel BIQA model integrates global context contrast and local sensitivity via contrastive learning and masked attention to outperform SOTA methods.
MC-ViT: By leveraging memory consolidation and redundancy reduction, this vision transformer scales to long video contexts, setting a new SOTA on EgoSchema, Perception Test, and Diving48.
Refined Coreset Selection: This novel method prioritizes both model performance and minimal coreset size, efficiently identifying smaller coresets for DL training while maintaining accuracy.
PLOT: A novel online tracking algorithm for unknown, non-stationary targets in linear control systems, using RLS and receding horizon control, achieving sublinear dynamic regret and demonstrated on a quadrotor.
NLHF: Aligns LLMs with human preferences by learning a pairwise preference model and finding its Nash equilibrium, unlike traditional RLHF that uses reward models.
On Stronger Computational Separations Between Multimodal and Unimodal ML: This work demonstrates an average-case computational separation where multimodal learning is easy, while unimodal learning is computationally hard for typical instances, but suggests such strong separations may be rare in practice due to connections with cryptography.
The Perception-Robustness Tradeoff in Deterministic Image Restoration: We prove a fundamental tradeoff between perceptual quality, measurement consistency, and adversarial robustness in deterministic image restoration, showing high-performing models must have large Lipschitz constants.
Estimating Unknown Population Sizes: We introduce a novel method using the hypergeometric likelihood within a variational autoencoder framework to estimate discrete distributions with unknown population and category sizes, outperforming existing count data models in accuracy and latent space informativeness, demonstrated on NLP and single-cell genomics data.
Memorization Through the Lens of Loss Curvature: We propose loss curvature around training samples, averaged over epochs, as a memorization metric for DNNs, finding it captures memorization quantitatively and qualitatively, correlates with existing metrics, and detects mislabeled/duplicate data.
On the Complexity of Finite-Sum Smooth Optimization under the PL Condition: This work establishes nearly tight lower bounds on IFO complexity for minimizing finite-sums under the PL condition, both in centralized and decentralized settings.
MetaFormer: A novel ViT-based few-shot learning framework uses masked sample attention and patch-grained task attention to capture sample and task relationships, achieving state-of-the-art results on multiple datasets.
MAG Listing Algorithm: First brute-force-free algorithm to list all MAGs in a MEC by recursively determining valid local vertex structures, improving efficiency and effectiveness.
Position: Amazing Things Come From Having Many Good Models: The Rashomon Effect, where many equally-performing ML models exist for a dataset, offers advantages for interpretability, fairness, and addressing user preferences in noisy, tabular data.
Faster Adaptive Decentralized Learning Algorithms: Novel AdaMDOS and AdaMDOF algorithms achieve near-optimal sample complexity for nonconvex stochastic and finite-sum optimization in decentralized settings.
TAT: Aggregating historical and current trajectories in a dynamic tree structure, TAT enhances diffusion planner reliability by marginalizing unreliable states, boosting performance, and enabling faster planning without retraining.
SNPSE: A score-based, sequential method using conditional score-based diffusion models for likelihood-free inference in simulator-based models, improving simulation efficiency.
Auto-Encoding Morph-Tokens for Multimodal LLMs: Morph-tokens enable multimodal LLMs to excel at both visual comprehension (text generation) and generation (image reconstruction) by serving as abstract prompts and reconstructable visual tokens, respectively.
Revisiting the Power of Prompt for Visual Tuning: This work introduces a novel prompt initialization strategy for Visual Prompt Tuning (VPT) using downstream token prototypes, significantly improving performance across various benchmarks, especially in self-supervised pre-training.
Improved Operator Learning by Orthogonal Attention: A novel neural operator leverages orthogonal attention, inspired by kernel integral operators and eigenfunction approximation via NNs, outperforming baselines on PDE benchmark datasets.
Prospective Side Information for LMDPs: An algorithm for LMDPs with weakly informative side information achieves an exponential sample complexity improvement over existing methods despite facing a $\Omega(K^{2/3})$ regret lower bound.
Stochastic Interpolants with Data-Dependent Couplings: We formalize data-dependent couplings for stochastic interpolants, enabling conditional generative models via dynamical transport maps learned by solving a simple regression problem, demonstrated on super-resolution and in-painting.
How Uniform Random Weights Induce Non-uniform Bias: Randomly initialized interpolating NNs generalize well due to a bias towards simpler functions, induced by parameter redundancy, resulting in sample complexity proportional to the teacher NN's complexity rather than the student's.
PruneX: A circuit domain generalization framework learns domain-invariant representations based on transformation-invariant domain-knowledge to reduce ineffective transformations in logic synthesis (LS) heuristics, achieving up to 3.1x speedup.
Code as Reward (VLM-CaR): Generates dense reward functions from pre-trained vision-language models (VLMs) through code generation, accelerating RL training by reducing the computational cost of VLM queries while maintaining reward accuracy.
Perturb-and-Project: Novel DP algorithms for cosine similarity and k-way marginals using input perturbation with tight sum-of-squares certificates, outperforming prior work, especially on t-sparse datasets.
Conformal Prediction for Multi-Dimensional Time Series by Ellipsoidal Sets: MultiDimSPCI constructs sequential prediction regions for multivariate time series with finite-sample conditional coverage guarantees, outperforming existing CP and non-CP baselines.
Size-invariance Matters: This work proposes size-invariant metrics and losses for multi-object SOD, addressing the bias of existing metrics towards larger objects by evaluating each object independently.
How Deep Networks Learn Sparse and Hierarchical Data: The Sparse Random Hierarchy Model (SRHM) demonstrates that CNNs learn hierarchical representations by becoming insensitive to discrete spatial transformations of sparse, hierarchically generated data, explaining the link between invariance and performance.
Concentration Inequalities for General Functions of Heavy-Tailed RVs: Novel unbounded analogues of bounded difference inequalities are derived for functions of independent heavy-tailed RVs with finite variance, enabling applications to ML and high-dimensional statistics.
FedLESAM: Locally estimating global perturbations improves federated sharpness-aware minimization (SAM) by aligning local updates with the global loss landscape, exceeding the performance of local perturbation methods.
Position: AR probabilistic models of LLMs are inherently non-identifiable—models with equivalent test loss can exhibit markedly different behaviors impacting zero-shot rule extrapolation, in-context learning, and fine-tunability.
Efficient Pareto Manifold Learning: A novel MTL approach using a main network and low-rank matrices efficiently learns the Pareto manifold, reducing parameters and improving performance, especially for many tasks.
SelfExtend: Extends LLM context windows without fine-tuning by constructing bi-level attention (grouped and neighbor) computed via the model's self-attention mechanism.
Model Alignment as Prospect Theoretic Optimization: We show that aligning LLMs with human feedback implicitly incorporates human biases like loss-aversion, and propose a new human-aware loss (HALO) based on Kahneman-Tversky utility that matches or exceeds preference-based methods.
Craftax: A fast open-ended RL benchmark, extending Crafter with NetHack elements and enabling training with 1B environment interactions on a single GPU in under an hour, challenges existing exploration methods.
MATRIX: LLM self-alignment via simulated social scenes (Monopolylogue) improves value alignment vs. GPT-4, shown theoretically and across benchmarks.
Automating Proxy Variable Selection: This work introduces a novel method for automatically selecting proxy variables for multiple unmeasured confounders in linear causal models, improving causal effect estimation without prior knowledge of proxy validity.
Regression with Multi-Expert Deferral: This work introduces a novel regression with deferral framework, using multiple experts, applicable to various losses and costs, with theoretical consistency guarantees exceeding Bayes consistency.
Faithfulness Measurable Masked Language Models: A novel fine-tuning method incorporates masking during training, enabling efficient and accurate faithfulness measurement for nine importance measures across 16 NLP datasets.
GFMs: GFMs leverage a novel "graph vocabulary" of transferable units encoding graph invariance to overcome transfer challenges inherent in traditional GNNs trained on specific datasets.
FiT: Flexible Vision Transformer for diffusion models generates images with unrestricted resolutions/aspect ratios by using dynamically-sized tokens, unlike static-resolution grids, improving resolution generalization.
Testing the Feasibility of Linear Programs with Bandit Feedback: A novel test using low-regret algorithms and a non-asymptotic law of iterated logarithms determines the feasibility of unknown LPs with bandit feedback, adapting to the signal level with near-optimal sample complexity.
PriorBoost: An adaptive algorithm for learning from aggregate responses constructs increasingly homogeneous bags via size-constrained k-means clustering, outperforming non-adaptive methods for event-level prediction.
SYFLOW: An end-to-end method using normalizing flows to find exceptional subgroups with arbitrary target distributions, scaling to large datasets and producing diverse, interpretable results.
A Theoretical Analysis of Backdoor Poisoning Attacks in CNNs: This work provides theoretical insights into the effectiveness of backdoor poisoning attacks (BPAs) on CNNs via analysis of a two-layer CNN trained on a poisoned dataset demonstrating successful attacks while maintaining clean input accuracy.
Learning Causal Relations from Subsampled Time Series: This paper introduces DHT-CIT, a novel algorithm using two time-slices and conditional independence tests to learn causal relations from subsampled time series without interventions.
Re-Dock: A novel diffusion bridge generative model for flexible molecular docking predicts ligand and pocket sidechain conformations simultaneously by modeling binding energy and 3D structures jointly, outperforming existing methods on apo and cross-docking benchmarks.
Allocation Requires Prediction Only If Inequality Is Low: Prediction-based resource allocation only outperforms simpler aggregate methods when between-group inequality is low and intervention budgets are high.
DAT: A unified, end-to-end OCR model using interactive attention and prompt-based segmentation for concurrent scene text detection, layout analysis, and document page detection.
Flash-Diffusion: A novel latent diffusion model with sample-adaptive inference times, using severity encoding to estimate degradation severity and fine-tune the reverse diffusion sampling trajectory.
ACM-MILP: Adaptively modifies constraints in groups based on latent probability estimations and community detection, preserving instance hardness for improved data generation for MILP solvers and hyperparameter tuning.
Multi-Track Message Passing: Tackles GNN oversmoothing and oversquashing by preventing heterophily mixing via a novel multi-track message passing scheme, achieving SOTA performance on benchmark datasets.
Convergence of Convex Message Passing Algorithms: Convex message passing algorithms, such as max-sum diffusion and TRW-S for MAP inference in graphical models, are proven to converge to a fixed point in (O(1/\epsilon)) iterations.
Position: NFL theorems suggest specialized inductive biases are needed, but we show that neural networks, formalized with Kolmogorov Complexity, exploit the low complexity of real-world data, unifying diverse problems with a few models.
Tuning-Free Stochastic Optimization: We formalize "tuning-free" algorithms, proving their possibility in bounded domains, impossibility in unbounded ones for convex problems, and offering a tuning-free SGD variant for non-convex optimization.
Fault-Tolerant PAC Learning: A novel theoretical framework analyzes the sample complexity of learning in the presence of random or adversarial faults, revealing its connection to the number of perturbing functions induced by the faults.
Differentially Private Synthetic Text: Aug-PE generates high-utility, differentially private synthetic text data using only API access to LLMs, avoiding computationally expensive fine-tuning.
MALIBO: A novel meta-learning approach for likelihood-free BO directly learns the utility of queries across tasks, explicitly models task uncertainty, and enables robust adaptation to new tasks.
LatProtRL: Reinforcement learning navigates a protein language model's latent space to optimize protein fitness, escaping local optima and achieving high fitness from low-fitness starting sequences.
Triple Changes Estimator: Extends the triple difference estimator to the changes-in-changes framework, enabling estimation of counterfactual distributions for targeted policy evaluations, going beyond average treatment effects offered by DiD.
Convex Relaxations of ReLU NNs: This work shows polynomial-time global optimality, within a factor of $O(\sqrt{\log n})$, of convex relaxations for two-layer ReLU NNs with weight decay on random training data, greatly improving on prior work.
DISCRET: Synthesizes faithful, rule-based explanations for individual treatment effect (ITE) estimation, achieving accuracy comparable to black-box models while remaining self-interpretable.
Jetfire: An INT8 transformer pretraining method using an INT8 data flow and per-block quantization for faster training without accuracy loss.
Position: Offline RL for DTRs needs critical re-evaluation due to inconsistent metrics, lack of baselines, diverse RL formulations, and cases where random policies outperform RL algorithms.
Sparse and Structured Hopfield Networks: This work introduces Hopfield-Fenchel-Young energies, a family of sparse HNs with differentiable update rules, connecting loss margins, sparsity, and retrieval; extension via SparseMAP enables retrieving associations, validated on MIL and rationalization.
Minimax Optimality of Score-Based Diffusion Models: Kernel-based score estimators achieve nearly minimax optimal sample quality for sub-Gaussian distributions without requiring lower bound assumptions on the data distribution.
Generalization in Kernel Regression Under Realistic Assumptions: Provides a unified theory upper bounding excess risk in kernel regression under realistic settings, showing benign/tempered overfitting and new perturbation bounds revealing implicit self-regularization via heavy-tailed eigendecomposition.
BayOTIDE: A Bayesian online multivariate time series imputation method uses functional decomposition with GP priors and SDEs for scalable inference on irregularly sampled streaming data, capturing global trends and periodic patterns.
Practical Performance Guarantees for Pipelined DNN Inference: Novel MIP relaxations yield strong lower bounds for pipeline-parallel DNN inference, effectively closing the optimality gap and guiding practitioners toward near-optimal solutions.
End-to-End Neuro-Symbolic RL with Textual Explanations: This work introduces a novel NS-RL framework that jointly learns structured states and symbolic policies, refined by a distilled vision foundation model and explained with GPT-4 generated text.
Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds: This work introduces novel learning-rate-free stochastic optimization algorithms for Riemannian manifolds, eliminating learning rate tuning and achieving optimal convergence rates.
Fast Sampling-Based Sketches for Tensors: We introduce fast sketches for two and three mode tensors using convolution-based sampling, enabling ℓ₀ sampling and ℓ₁ embeddings with runtime scaling linearly with dimension d rather than d² or d³.
Transport of Algebraic Structure to Latent Embeddings: Learns bijections from latent spaces to "mirrored algebras" to transport algebraic structure (e.g., set union) to latent representations (e.g., INR) while provably respecting the algebraic laws.
Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features: We prove minimax optimality of random features for kernel ridge regression on large-scale, dependent data under exponential $\tau$-mixing, but sub-optimality under polynomial decay.
InterpreTabNet: Improves TabNet for tabular data by using a sparse attention mechanism from a Gumbel-Softmax distribution, enabling clearer interpretations of feature importance while maintaining predictive accuracy.
T-RevSNN: A novel Temporal Reversible SNN architecture jointly addresses training and inference challenges by turning off most spiking neurons' temporal dynamics, achieving $O(L)$ training memory and $O(1)$ inference energy cost with high accuracy on ImageNet.
VQ-BeT: A versatile behavior generation model uses hierarchical vector quantization to tokenize actions, improving multimodal prediction and accelerating inference compared to Behavior Transformers and Diffusion Policies.
Dynamic Facility Location: We present the first fully dynamic algorithm for facility location in high-dimensional Euclidean spaces, achieving $O(c)$-approximation with $\tilde{O}(poly(d) n^{1/c + o(1)})$ amortized update time and $O(1)$ recourse.
CPRNN: Tensor decomposition applied to 2RNNs leverages second-order interactions while controlling parameter count via rank, outperforming RNNs, 2RNNs, and MIRNNs on PTB.
Local vs. Global Interpretability: Analyzing ML model interpretability through computational complexity reveals a duality between local and global explanations, with varying complexity across linear models, DTs, and NNs.
Optimal Kernel Quantile Learning with Random Features: This work introduces KQR-RF, achieving minimax optimal rates via a data-dependent sampling strategy, handling heterogeneous and heavy-tailed data robustly even in the agnostic setting.
DRCT: Diffusion Reconstruction Contrastive Training improves generalization of diffusion-generated image detection by 10%+ via high-quality reconstruction and contrastive learning of diffusion artifacts.
Improving Interpretation Faithfulness for Vision Transformers: This work introduces Faithful ViTs (FViTs), leveraging Denoised Diffusion Smoothing (DDS) to enhance the stability and robustness of self-attention explanations against input perturbations.
Batch and match (BaM): A new black-box variational inference (BBVI) approach using a score-based divergence with closed-form updates for Gaussian variational families, offering faster convergence than ELBO maximization.
A Geometric Decomposition of Finite Games: We introduce incompressible games based on the Shahshahani metric, showing that continuous-time EW dynamics in such games exhibit volume preservation, a constant of motion, and Poincaré recurrence, and are equivalent to harmonic games.
ToRES: A simple, effective incomplete multi-view clustering (IMVC) method uses prototype-sample affinity and cross-view prototypes to reduce memory, eliminate hyper-parameters, and directly optimize cluster indicators for stable results.
Handling Heterogeneous Curvatures in Bandit LQR Control: We study online LQR with bandit feedback and semi-adversarial disturbances under heterogeneous cost curvatures via reduction to bandit convex optimization with memory, achieving interpolated regret guarantees.
RCGP: Enables provably robust and conjugate Gaussian process regression with closed-form updates at virtually no additional cost using generalised Bayesian inference.
Memoria: A novel, human-inspired NN memory architecture effectively addresses the "fateful forgetting" problem in diverse tasks, exhibiting primacy, recency, and temporal contiguity effects.
ERQ: Error Reduction for Post-Training Quantization (PTQ) of Vision Transformers (ViTs) sequentially minimizes activation and weight quantization error via Ridge Regression, outperforming state-of-the-art methods like GPTQ.
Masked Face Recognition with Generative-to-Discriminative Representations: This work proposes a unified deep network with a greedy module-wise pretraining strategy to learn generative-to-discriminative representations for robust masked face recognition.
DeRa: Decoding-time realignment efficiently explores different regularization strengths in aligned LMs without retraining, enabling smooth transitions between unaligned and aligned models and simpler hyperparameter tuning.
clawNOs: Neural operators that automatically encode conservation laws, significantly improving learning efficacy, especially in small-data regimes for applications like constitutive modeling and fluid dynamics.
RICE: An innovative RL refining scheme uses explanation methods to identify critical states, creating a mixed initial state distribution that breaks through training bottlenecks and improves performance.
Classifier-Free Guidance Spotlight: Classifier-Free Guidance (CFG) improves language models (Pythia, GPT-2, LLaMA) across diverse tasks (Q&A, reasoning, code, translation), matching larger models' performance and stacking with other techniques like Chain-of-Thought.
A Subquadratic Time Algorithm for Robust Sparse Mean Estimation: We present a subquadratic time algorithm for robust sparse mean estimation of a k-sparse mean from corrupted samples, overcoming the quadratic runtime barrier of previous approaches.
Pessimism Meets Risk: We introduce two provably sample-efficient risk-sensitive offline RL algorithms for linear MDPs using the entropic risk measure, improving sample complexity bounds via pessimistic value iteration and variance-based decomposition.
Learning Decision Trees and Forests with Algorithmic Recourse: This paper proposes a novel algorithm for learning accurate decision trees and random forests while ensuring the existence of algorithmic recourse actions by leveraging adversarial training.
Sparse is Enough in Fine-tuning Pre-trained LLMs: Sparse fine-tuning based on a gradient-based algorithm (SIFT) leverages PAC-Bayesian theory to demonstrate effective adaptation of PLMs by viewing pre-training as a prior distribution shift.
DLPL: Learns perspective-invariant features from single-view images using discretized perspectives, homography transformations, and attention for improved segmentation and detection.
Position: Intent-aligned AI Systems Must Optimize for Agency Preservation: Truthful AI aligned solely to human intent is insufficient; preserving long-term human agency, formally defined here, is a more robust standard requiring explicit optimization.
Distributed High-Dimensional Quantile Regression: A double-smoothing approach transforms quantile regression into least-squares optimization, enabling efficient distributed estimation with near-oracle convergence and accurate support recovery, even with dependent errors.
Tying Embeddings: Tying input/output embeddings in NLP models relies on the distributional hypothesis, where similar semantics imply similar contexts, impacting model size and training.
Learning with Partial-Label and Unlabeled Data: A novel mutual information-based approach tackles both label redundancy and insufficiency in weakly supervised learning by dynamically exchanging labels within candidate sets.
Position: Foundation models lack data authenticity, consent, and provenance due to massive, under-documented training sets, hindering ethical development; we analyze this landscape and propose solutions for responsible AI.
Adaptive Proximal Gradient Methods: Linesearch-free APGMs converge for convex problems under local Hölder gradient continuity without approximations, covering semi-algebraic functions, and don't need Hölder constants a priori.
Unsupervised Zero-Shot RL via Functional Reward Encodings: FRE learns functional representations of rewards with a transformer-based VAE, enabling zero-shot transfer to new tasks by encoding reward samples from offline, unlabeled trajectories.
Beyond the Norms: This work introduces a novel data-driven score for detecting unreliable predictions in regression models by estimating discrepancy density and measuring its statistical diversity.
DsDm: Model-aware dataset selection, optimizing for target task performance, yields a 2x compute multiplier for LMs over standard data quality filtering.
On a Neural Implementation of Brenier's Polar Factorization: This work introduces a practical, neural implementation of Brenier's polar factorization theorem, leveraging ICNNs for the convex potential and exploring applications to non-convex optimization and density sampling.
FAFE: Improves antibody-antigen complex modeling by using a geodesic distance loss on noisy group frames, addressing FAPE's gradient vanishing issue with high rotational errors, achieving a correct rate of 52.3% (DockQ > 0.23).
ULAREF: A unified framework leverages label refinement via global reliability detection and local enhancement with consistency loss to address learning with inaccurate supervision (noisy labels, partial labels, etc.).
Leveraging (Biased) Information: MIN-UCB leverages offline data for improved online learning in stochastic MABs, outperforming UCB when given a bound on the difference between offline and online reward distributions.
Quasi-Monte Carlo (QMC) Features for Kernel Approximation: QMC features improve kernel approximation error from O(1/√M) to O(1/M) compared to Monte Carlo (MC), enabling faster kernel methods like kernel ridge regression.
Position: Mission Critical: Satellite data is a distinct modality in ML, demanding new research to fully leverage its unique characteristics and societal impact.
eP&R: Efficient precision and recall metrics based on hubness-aware sampling match original metrics for generative models but are computationally cheaper.
PwP: A computationally efficient algorithm for online contextual dynamic pricing with feature-based elasticity, achieving optimal O(√dT logT) regret with heteroscedastic valuation.
Vocabulary for Universal Approximation: Constructively proves a finite vocabulary of mappings exists for universally approximating continuous functions, motivating a novel compositional model for regular languages.
Test-Time Degradation Adaptation (TDA): TDA adapts a pre-trained diffusion model to handle open-set image restoration by using a test-time degradation adapter, achieving performance comparable to task-specific methods.
WebLINX: A conversational web navigation benchmark with 100K interactions over 2300 expert demonstrations across 150 real-world websites, revealing that fine-tuned smaller models outperform even GPT-4V, though generalization remains a challenge.
StackSight: A neurosymbolic approach using LLMs and chain-of-thought prompting decompiles WebAssembly into C++ by visualizing and tracking virtual stack alterations via static analysis.

qpwo/icml-2024-splotlight-posters-short.md