Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-24

Latent Visual States for Efficient Multimodal Reasoning

The integration of visual evidence has significantly enhanced the capabilities of large multimodal models. However, this integration predominantly relies on generating discrete outputs (etc., code or box coordinates) to invoke external tools, a process that introduces rigid dependencies and substantial latency. To overcome these limitations, we propose {EVA} (LatEnt Visual StAtes), a novel framework that natively generates continuous latent visual representations. These internal representations manifest as an adaptive sequence of Latent\_slot tokens, serving as intermediate visual thoughts during the reasoning process. These Latent\_slot tokens are then trained end-to-end with the discrete text tokens. This co-optimization, notably, causes extreme policy deviation in the 'transition window' following the Latent\_slot tokens. We develop D-GSPO (Decouple-GSPO) to target this root cause by decoupling the optimization of latent and discrete components. To support SFT, we construct EVA-230K, a high-quality text-image interleaved CoT dataset encompassing a diverse range of real-world scenes, documents, charts and OCR tasks. Extensive experiments across multiple benchmarks confirm that EVA achieves significant performance gains while enhancing inference efficiency.

02.
arXiv (CS.LG) 2026-06-16

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

arXiv:2602.20427v2 Announce Type: replace Abstract: Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.

03.
arXiv (CS.CL) 2026-06-25

Graph-Based Phonetic Error Correction of Noisy ASR

Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors are often structured, arising from phonetic similarity rather than random noise, making naive token-level correction insufficient. We propose a structured ASR correction framework, that we call G-SPIN, that combines phonetic graph modeling with contextual language understanding. A graph neural network (GNN) first constructs acoustically plausible candidate neighborhoods for flagged tokens, explicitly restricting the correction search space to phonetic alternatives. A masked language model (MLM) then provides local contextual scoring, and an instruction-tuned large language model (LLM) performs final context-aware re-ranking over this compact candidate set. By decoupling structured phonetic reasoning from contextual semantic selection, our method avoids unconstrained generation while improving correction accuracy. The framework is lightweight, modular, and operates entirely at inference time.

04.
PLOS Medicine 2026-05-11

Connected or chained by social media? Child and adolescent mental health in a digital era

作者:

by Silja Kosola Social media has evolved from connection to compulsion, disproportionately harming children and adolescents. Addictive designs together with developmental vulnerability fuel mental health risks and highlight the urgent need for stricter age limits and stronger protections. In this Perspective, Silja Kosola outlines how social media disproportionately harms child and adolescent mental health, and argues that while recent policy changes aimed at protecting youth from social media are welcome, stricter age limits and greater accountability of social media companies are needed.

05.
arXiv (quant-ph) 2026-06-15

Electromagnetic Wightman functions and vacuum densities for a brane intersecting the AdS boundary

arXiv:2604.17583v2 Announce Type: replace-cross Abstract: We investigate the combined effects of a brane intersecting the AdS boundary and background gravitational field on the local characteristics of the electromagnetic vacuum. Two types of boundary conditions on the brane are considered, which are higher-dimensional generalizations of the perfect electric (PEC) and perfect magnetic (PMC) boundary conditions in Maxwell's electrodynamics. The brane-induced contributions to the Wightman functions of the vector potential and field tensor are explicitly extracted. Simple expressions in terms of elementary functions are provided. The behavior of the vacuum expectation values (VEVs) is mimicked by a scalar field with a negative effective mass squared determined by the radius of the AdS spacetime. The expectation values of the electric and magnetic fields squares and of the energy-momentum tensor are investigated as local characteristics of the vacuum state. The brane-induced contributions to these VEVs have opposite signs for the PEC and PMC conditions. For the PMC condition, this contribution is negative for the electric field squared and positive for the magnetic field squared. The VEV of the energy-momentum tensor has a nonzero off-diagonal component. The brane-induced vacuum energy density is positive for PMC condition, whereas the normal and parallel stresses change sign as functions of the distance from the brane. Unlike the problem involving a planar boundary in the Minkowski bulk, the vacuum energy-momentum tensor does not vanish in (3+1)-dimensional AdS spacetime.

06.
medRxiv (Medicine) 2026-06-22

Study protocol: Feasibility and clinical implications of real-time cerebral autoregulation monitoring in major noncardiac surgery with the Medtronic Cotrending algorithm (AUTOREGULATE-NONCARDIAC-COTRENDING)

Background: Perioperative hypotension is associated with postoperative organ injury. However, trials of hypotension avoidance have not found meaningful improvements in postoperative cardiovascular, renal, neurological or functional outcomes. One possible explanation is that organ perfusion depends on patients individual autoregulatory ranges. Hence, technology enabling monitoring of the autoregulatory status of vital organs, e.g. the brain, could provide a physiologic basis for personalising of blood pressure targets. However, current established methodologies for monitoring cerebral autoregulation in noncardiac surgery, e.g. the cerebral oximetry index (COx), are limited by performance and usability. The Medtronic Cotrending algorithm has been developed to provide automated, near real-time assessment of cerebral autoregulation. While feasibility was demonstrated in cardiac surgery, its applicability in major noncardiac surgery remains unknown. This study aims to evaluate the technical feasibility and clinical implications of Cotrending-based cerebral autoregulation monitoring in major noncardiac surgery. Objectives: Primary objective: To evaluate the technical feasibility of using the Medtronic Cotrending algorithm to monitor intraoperative cerebral autoregulation in real-time during major noncardiac surgery, drawing comparisons to the COx algorithm. Secondary objectives: to investigate the potential clinical implications of Cotrending-based cerebral autoregulation monitoring. Design: Single-centre, prospective cohort study. Setting: Swiss tertiary care centre Patients: Patients enrolled in AUTOREGULATE-NONCARDIAC who were monitored intraoperatively with the Medtronic INVOS(TM) 5100 near-infrared spectroscopy (NIRS) system. Outcomes: Technical feasibility outcomes include success rate of determination of the lower limit of cerebral autoregulation, intraoperative uptime, time to first estimate of the lower limit of cerebral autoregulation, sensitivity to external factors and to data artefacts; agreement of Cotrending-derived lower limit of cerebral autoregulation with COx-derived lower limit of cerebral autoregulation. Conclusions: N/A Trial registration: Clinicaltrials.gov NCT07630129

07.
arXiv (CS.CL) 2026-06-24

Less is More: Quality-Aware Training Data Selection for Scientific Summarization

Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific summarization datasets remain limited in scale and structure for modern long-context models. In this work, we address both challenges by a) constructing and releasing one of the largest biomedical and life science datasets for long-document summarization, containing 1.88 million PMC articles, and b) analyzing the reference quality of author-written abstracts with source-grounded and model-based metrics. We show that author-written abstracts vary in their alignment with the full article and that these quality signals can guide training-data selection. Training on selected high-quality subsets outperforms random sampling at matched training sizes and can match or exceed larger random subsets on factuality-oriented metrics. Our findings suggest that reference quality is an important factor in scientific summarization and that quality-aware data selection can improve training efficiency.

08.
arXiv (CS.CL) 2026-06-19

CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in Czechia

We present CzechDocs, a multiway parallel dataset of formatted documents (HTML, DOCX, and PDF) covering Czech and minority languages used in Czechia-primarily Ukrainian and English, with smaller portions of Vietnamese, Russian and other languages. The dataset is designed to support the evaluation of machine translation systems that aim to preserve document formatting during translation. We provide a comparison of the most common approaches to format-preserving machine translation on a validation subset of the dataset. This validation split, together with the evaluation toolkit, is publicly released for further research. A held-out test split will be reserved for a future shared task focused on document-level translation with formatting preservation.

09.
arXiv (CS.CL) 2026-06-16

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to speech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference.

10.
arXiv (CS.AI) 2026-06-17

STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training

arXiv:2606.17979v1 Announce Type: new Abstract: Existing RL post-training methods for text-to-image generation usually convert the final-image reward into a single scalar advantage and apply it with the same strength to the entire generative trajectory. However, text-to-image generation naturally has temporal and spatial structure: different denoising steps are responsible for different generation stages, and the content that truly determines text alignment often appears only in part of the image. This granularity mismatch makes it difficult for policy updates to focus on the generative components that actually affect the reward. To address this issue, we propose SpatioTemporal Adaptive Reward (STAR) Allocation for RL post-training of text-to-image diffusion and flow models. STAR uses text-image attention inside the generative model and starts from the core content that the user truly cares about in the prompt. It constructs spatial allocation maps that dynamically vary across denoising steps and rollouts, and allocates the same group-relative advantage to more relevant latent regions with almost no additional computational overhead. STAR then applies stronger policy updates to these regions through a spatially resolved policy objective. We use Stable Diffusion 3.5 Medium as the base model and evaluate on three tasks: GenEval, OCR text rendering, and PickScore. Experimental results show that STAR improves compositional semantic alignment, text rendering, and preference optimization without changing the external reward source, achieving $\mathbf{0.9759}$, $\mathbf{0.9757}$, and $\mathbf{23.60}$ on GenEval, OCR, and PickScore, respectively.

11.
arXiv (CS.LG) 2026-06-18

The Illusion of Improvement: Reject Inference Strategies in Credit Scoring

arXiv:2606.18479v1 Announce Type: new Abstract: Reject inference methods are widely used to mitigate survival bias in credit scoring, yet their effectiveness remains poorly understood. We systematically evaluate several such methods and uncover a structural failure mode: in a natural retraining cycle, models whose accuracy improves while recall collapses create an illusion of improvement that leads practitioners to believe the system is getting better when, in fact, its rejection quality – the ability to correctly screen out defaulters – is deteriorating. We then propose a controlled exploration strategy that breaks the feedback loop without statistical assumptions: the lender deliberately approves a fraction of rejected applicants and observes their true outcomes. We show that accuracy and rejection quality give opposite recommendations on whether to explore: accuracy favors no exploration, while rejection quality improves with it, confirming that standard evaluation metrics are misleading under selection bias. Even minimal exploration rates (2–5\%) prove sufficient in our experiments to diagnose the severity of the feedback loop at near-zero cost. Our findings are consistent across two machine learning methods and three real-world datasets, and suggest that standard evaluation protocols are inadequate for assessing models trained under survival bias.

12.
arXiv (CS.CV) 2026-06-12

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the action interface through which those tools are invoked. In this work, we study how the design of this interface shapes the agent's capacity for open-ended spatial reasoning. Existing spatial agents either employ single-pass code execution, which commits to a full analysis strategy before any intermediate result is observed, or rely on a structured tool-call interface that often offers less flexibility for freely composing operations or tailoring the analysis to each task. Both designs offer limited flexibility for open-ended, complex 3D/4D spatial reasoning. We therefore propose SpatialClaw, a training-free framework for spatial reasoning that adopts code as the action interface. SpatialClaw maintains a stateful Python kernel pre-loaded with input frames and a suite of perception and geometry primitives, letting a VLM-backed agent write one executable cell per step conditioned on all prior outputs, enabling the agent to flexibly compose and manipulate perception results and adapt its analysis to both intermediate text and visual observations and the demands of each problem. Evaluated across 20 spatial reasoning benchmarks spanning a broad range of static and dynamic 3D/4D spatial reasoning tasks, SpatialClaw achieves 59.9% average accuracy, outperforming the recent spatial agent by +11.2 points, with consistent gains across six VLM backbones from two model families without any benchmark- or model-specific adaptation.

13.
arXiv (CS.CL) 2026-06-16

Human genetic evidence is associated with drug approval across therapeutic areas: an observational analysis of 26,278 target-disease pairs with temporal validation and feature ablation

Genetic evidence is enriched among approved drug targets: in an observational analysis of 26,278 target-disease pairs from Open Targets and ChEMBL, targets with any genetic association had a 3.25-fold higher approval rate than those without (OR = 3.25, 95% CI 2.79-3.79, p = 1.91e-42). A target-level analysis accounting for non-independence of pairs sharing the same gene gave OR = 2.79 (bootstrap 95% CI 2.22-3.53); the oncology pair-level OR of 6.72 attenuates to 2.71 at the target level, illustrating how non-independence inflates area-specific estimates. The enrichment replicated in post-2015 approvals (OR = 3.51, p = 1.72e-8). Feature ablation across six evidence types revealed that literature mining alone accounts for most classifier performance (AUPRC = 0.099 versus 0.109 for all features), consistent with temporal leakage from post-approval publications. Excluding literature, remaining evidence types retain above-baseline signal (AUPRC = 0.084, 1.63x baseline). Sensitivity analyses bracket the pair-level OR between 3.25 and 4.93. Genetic evidence alone yields only a 1.0-percentage-point absolute AUPRC gain and the best model has poor calibration; the classifier has limited practical predictive value. We catalogue 1,433 genetically supported Phase 1/2 pairs as a hypothesis-generating resource. All findings are observational.

14.
arXiv (CS.LG) 2026-06-16

Coercivity and Local Convergence of Physical Learning in Linear Circuits

arXiv:2606.15443v1 Announce Type: cross Abstract: Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods – Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) – for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

15.
bioRxiv (Bioinfo) 2026-06-19

Sanjeevani: A manually curated anti-cancerous phytochemical database integrated with downstream analysis tools.

Background: Cancer continues to pose a massive global health burden. While plant-derived phytochemicals offer promising therapeutic leads, existing natural product databases often lack cancer specificity, dataset downloadability, and integrated screening tools. Methods: We developed Sanjeevani, an integrative web platform cataloguing 4,823 curated anticancer phytochemicals. Using a balanced dataset of 9,646 molecules, we trained Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbours classifiers using a hybrid feature representation of RDKit descriptors and 2048-bit ECFP4 fingerprints. The platform also integrates AutoDock Vina for web-based molecular docking for binding affinity, poses prediction and ADMET-AI for pharmacokinetics estimation. Results: The SVM model demonstrated the strongest predictive capability, achieving a top test accuracy of 0.966 and a ROC-AUC of 0.992. Benchmarking across five docking tools confirmed that AutoDock Vina successfully balanced computational automation with literature-consistent binding affinity replication. The final architecture provides rapid interactive 2D/3D visualizations integrated with downstream analysis tools. Conclusion: Sanjeevani provides an open-access, one-stop pipeline that bridges the gap between raw natural product data and actionable computational screening, accelerating natural product-based oncology drug discovery.

16.
arXiv (quant-ph) 2026-06-16

Experimental realization of the complete seven-phase Anderson-localization landscape

arXiv:2606.14825v1 Announce Type: cross Abstract: Anderson localization has evolved far beyond the conventional dichotomy between extended and localized states. Modern localization theory predicts a complete transport hierarchy comprising extended, critical, and localized phases together with all coexistence phases among them, forming a seven-phase Anderson-localization landscape. Despite its fundamental importance, this hierarchy has never been experimentally realized within a single system. Here we realize the complete seven-phase Anderson-localization landscape in a one-dimensional Floquet photonic lattice. By engineering quasiperiodic hopping profiles containing inhomogeneously distributed hopping zeros, we generate critical states and enable their coexistence with extended and localized sectors. The resulting transport regimes are directly resolved through their distinct spatiotemporal dynamics, including ballistic expansion, confined critical oscillations, and persistent localization. We observe all seven phases, including the elusive triply coexisting extended-critical-localized phase, and experimentally track the phase transitions connecting them. Our results establish the first complete experimental map of the Anderson-localization landscape and provide a unified platform for investigating mobility edges, multifractality, and programmable coherent transport.

17.
arXiv (CS.CL) 2026-06-24

CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

Reliable provenance for LLM outputs requires multi-bit watermarks that remain robust under editing while maintaining strict false-positive control. Existing ECC-based LLM watermarks rely largely on hard-decision decoding, discarding token-level reliability information. We propose CORE-BREW, a Constant-hit-Rate Embedding extension of block-wise BREW for robust multi-bit watermarking. CORE-BREW calibrates the watermark channel by targeting a fixed hit rate p-star, yielding closed-form per-token log-likelihood ratios (LLRs) for principled soft-decision decoding. It supports two detection modes: Strict-Safe, which preserves the bounded-distance designated-codeword acceptance region, and FPR-Calibrated, which uses likelihood-based scoring and lightweight list decoding to characterize the FPR-TPR trade-off. Experiments on open-source LLMs under token-level edits and paraphrasing demonstrate improved low-FPR discrimination and robustness over prior multi-bit watermarking baselines while maintaining comparable semantic quality.

18.
arXiv (CS.LG) 2026-06-25

Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models

arXiv:2606.24962v1 Announce Type: new Abstract: Recent progress in large-scale sequence modeling has shown that a single model can learn useful representations across highly diverse data distributions. Inspired by these advances, we investigate whether a unified transformer policy can be trained across large collections of heterogeneous reinforcement learning environments. We introduce LDM-v0, a Large Decision Model trained offline on trajectories collected from thousands of environments spanning multiple domains and modalities. LDM-v0 is a multi-task, multi-modal transformer policy conditioned on histories of observations, actions, rewards, and termination signals, and trained through supervised next-action prediction over offline trajectories. We describe the environment infrastructure, automated data generation pipeline, model architecture, and training methodology used to build LDM-v0, and evaluate its performance across diverse environments. We show that a single pretrained model matches the performance of independently trained task-specific reference policies on approximately 1,000 environments including robotics, autonomous driving, inventory management, cybersecurity, trading, and video games. These results demonstrate the feasibility of large-scale offline pretraining across heterogeneous reinforcement learning environments using a single transformer policy.

19.
arXiv (quant-ph) 2026-06-16

Complete Relational Description of Spin in a Quantum Background

arXiv:2606.15873v1 Announce Type: new Abstract: The standard description of the state of a spin in quantum mechanics presupposes externally fixed directions – a classical background. Can a spin be fully described instead in relation to other quantum mechanical systems? Poulin suggested twenty years ago group averaging over rotations the joint state of a fundamental spin and a reference spin with large angular momentum which, however, yields a classical bit in a probabilistic mixture. We revisit this idea and show that when the quantum reference system is augmented to two large spins, the standard quantum mechanical description of a spin is recovered in the limit of large quantum numbers for the reference system.

20.
Nature Biotechnology 2026-06-08

Single-cell spatial pharmacobiology for imaging antibody-based therapies in solid tumors

作者: 未知作者

We have developed single-cell spatial pharmacobiology (SSP), which combines in situ imaging of a systemically infused fluorescent therapeutic antibody with high-plex spatial proteomics. Applied to head and neck and pancreatic tumors from patients treated in phase 1 trials, SSP revealed marked spatial heterogeneity in antibody delivery and target engagement, which was shaped by conserved stromal barriers.

21.
PLOS Computational Biology 2026-06-02

A comparative study of simulation-based inference methods for epidemic models with identifiability considerations

作者:

by Geunsoo Jang, K. Selçuk Candan, Gerardo Chowell Epidemic models play a critical role in understanding transmission dynamics, generating forecasts, and informing public health interventions when they are properly calibrated to epidemiological data. Traditional Bayesian inference methods rely on the likelihood function to update prior knowledge using observed data. However, for realistic epidemic models, likelihood functions are often analytically intractable or computationally prohibitive, which can limit the applicability of these methods. Simulation-based inference provides a promising alternative by approximating posterior distributions through forward simulations rather than an explicit likelihood evaluation. In this study, we present a systematic comparison of four approaches: Approximate Bayesian Computation (ABC), Neural Posterior Estimation (NPE), a neural method with temporal embedding, and Preconditioned Neural Posterior Estimation (PNPE), which integrates elements of both classical and neural techniques. These methods are evaluated across epidemic models of increasing complexity under fixed simulation budgets and varying levels of observational noise, with explicit attention to both structural and practical identifiability. Our results show that neural methods generally improve posterior fidelity and predictive accuracy compared with ABC under constrained simulation budgets. PNPE achieved strong performance in several simulation settings, whereas temporal embeddings improved inference in models with complex epidemic dynamics by capturing sequential dependencies. These gains come with important trade-offs: PNPE required substantially greater computational resources and, unlike fully amortized NPE-based methods, may require reconditioning for each new observation. In contrast, ABC remained computationally efficient and provided reasonable, though often more conservative, posterior estimates. Overall, our findings highlight trade-offs among computational efficiency, posterior accuracy, uncertainty calibration, and inference reusability, suggesting that method selection should depend on model complexity, data quality, identifiability, and available computational resources.

22.
arXiv (CS.CL) 2026-06-24

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks. More recently, three other datasets from the Prague family were added and the annotations thoroughly revisited, forming the "Prague Dependency Treebank-Consolidated" (PDT-C). In comparison to the original PDT, PDT-C is more than twice as large, but it is also much more diverse in terms of genres and domains. In this paper, we describe the conversion of the new resource to Universal Dependencies. While the two annotation schemes are relatively similar at the first sight, there are numerous small differences in topology of the dependency structures and in granularity of the POS and relation type inventories. We demonstrate a selection of such differences on examples, discuss the diverging motivations, as well as ways to overcome the differences during conversion. We argue that while PDT is less "universal" and more tightly bound to one language, its multi-layer annotation is rich and provides all information needed for basic UD trees, and much more.

23.
arXiv (CS.LG) 2026-06-17

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

arXiv:2606.17319v1 Announce Type: cross Abstract: Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

24.
arXiv (CS.CL) 2026-06-16

Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization

Personalizing large language models requires adapting model behavior to individual users while preserving robustness and deployment-scale efficiency. Existing approaches typically personalize LLMs either at the input level, by retrieving user histories or constructing profile prompts, or at the parameter level, by maintaining user-specific parameter-efficient modules. The former makes personalization sensitive to retrieval quality and prompt design, whereas the latter incurs storage and maintenance costs that grow with the user population. To address these limitations, we propose TAP-PER (Temporal Attentive Prefix for PERsonalization), a prefix-based framework that encodes user preferences as learnable representations, eliminating explicit prompt construction and replacing heavy per-user adapters with lightweight user-state prefix embeddings. Inspired by personalized recommendation systems, TAP-PER decomposes user modeling into user-state and query-conditioned components, and incorporates temporal signals to capture the evolving nature of user interests. Experiments on six LaMP tasks show that TAP-PER consistently outperforms prompt-based and model-based baselines across classification, rating, and generation settings. Moreover, TAP-PER uses 130x fewer per-user parameters than OPPU and roughly half the total parameter footprint of PER-PCS at the 1,000-user scale, demonstrating that scalable LLM personalization can be achieved without explicit prompt construction or heavy per-user adapters.

25.
arXiv (CS.CL) 2026-06-16

Uncertainty Is Not a Safety Net for Clinical VQA, but Can It Anticipate Model Failure?

Safe deployment of clinical vision-language models (VLMs) requires reliable uncertainty estimation (UE): a signal indicating when predictions should be trusted or escalated to a clinician. We test whether current UE methods actually deliver this signal. Benchmarking 8 methods across 12 VLMs on clinical visual question-answering (VQA), we find that UE quality is not an intrinsic property of the UE method: it tracks model accuracy, degrading precisely where the model performance is weakest, and therefore where reliability is most needed. When we stress-test models by hiding the correct option among the multiple-choice answers (NOTA perturbations), accuracy collapses while uncertainty barely changes, leaving models systematically miscalibrated. Yet, we find that uncertainty on the unperturbed input reliably anticipates which predictions will collapse under NOTA, indicating that UE in current VLMs carries diagnostic information about model fragility. Our results position UE as a diagnostic tool for identifying fragile predictions and motivate perturbation-based evaluation as a path toward safe clinical deployment.