Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

Understanding helpfulness and harmless tension in reward models

Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their conflicts remain poorly understood. We study alignment tension in reward models trained under helpfulness-only, harmlessness-only, and mixed-objective settings. We find that mixed-objective models often underperform single-objective models, indicating interference between objectives. Using activation-based methods, we identify neurons associated with each objective and study their functional roles via targeted ablations. We find that these neurons causally support their corresponding objectives while often negatively affecting the opposing one. We find that a substantial proportion of neurons are shared between helpfulness and harmlessness, and that these shared neurons exert a disproportionate influence on model behaviour, contributing to alignment tension. Additionally, our results provide insights and mechanistic interpretation into how alignment objectives are represented in reward models and why multi-objective alignment remains challenging, motivating future work on disentangled and controllable alignment methods.

02.
arXiv (CS.AI) 2026-06-16

AI Supply Chain Galaxy: 3D Visual Analytics for License Compliance

arXiv:2606.16292v1 Announce Type: cross Abstract: The rapid proliferation of machine learning model reuse has transformed the AI ecosystem into a highly interconnected supply chain. Traditional compliance tools and static reports struggle to navigate these massive, multi-hop dependency networks. To address this, we present AI Supply Chain Galaxy (AISCG), an interactive 3D visual analytics system for model provenance and compliance auditing. AISCG maps models into a 3D spatial layout, integrating explicit structural dependencies with a rule-based compliance engine. It supports multi-scale exploration, from global community detection to localized, path-aware lineage tracing. We demonstrate its efficacy through an ecosystem-scale empirical analysis of 908,449 models from Hugging Face. Our findings reveal a concerning landscape: 55.46% of models exhibit compliance risks or metadata conflicts/omissions. We also identified distinct risk patterns, including a 56.67% license omission rate in adapter derivations and an 8.05% "license drift" rate in fine-tuning. Through a case study on the complex Llama model family, we show how AISCG empowers analysts to intuitively trace inherited restrictive terms and identify root causes across deep topological networks, significantly reducing the cognitive load of compliance auditing.

03.
arXiv (CS.CV) 2026-06-17

Mordal: Automated Pretrained Model Selection for Vision Language Models

Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models. We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using $8.9\times$–$11.6\times$ lower GPU hours than grid search. We have also discovered that Mordal achieves about 69\% higher weighted Kendall's $\tau$ on average than the state-of-the-art model selection method across diverse tasks.

04.
arXiv (CS.AI) 2026-06-15

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

arXiv:2606.13884v1 Announce Type: new Abstract: Modern decision systems increasingly rely on learned components whose outputs may be confident yet wrong, exposing downstream actions to costly errors. We introduce Risk-Aware Causal Gating (RACG), a framework that decides whether to act on, defer, or abstain from a model's prediction by combining causal effect estimation with calibrated risk control. RACG models the causal pathway from candidate actions to outcomes and gates each decision according to an estimated counterfactual risk rather than raw predictive confidence. To make gating reliable, we derive distribution-free bounds on the probability of acting under high-risk conditions and show how these bounds translate into operating thresholds that satisfy user-specified safety constraints. We further propose an adaptive gating policy that adjusts to distribution shift by monitoring discrepancies between predicted and realized outcomes, tightening the gate when causal assumptions appear violated. Across simulated interventions and real-world decision benchmarks, RACG reduces high-cost errors substantially while preserving most of the utility of an ungated policy, and it outperforms confidence-based and selective-prediction baselines at matched abstention rates. Our results indicate that explicitly separating causal risk from predictive uncertainty yields decision systems that are both safer and more transparent, offering a principled mechanism for trustworthy automation in high-stakes settings.

05.
arXiv (quant-ph) 2026-06-19

Operational Tube-Sector Theory of Quantum State Distinguishability Under Generalized Symmetries

作者:

arXiv:2606.19678v1 Announce Type: cross Abstract: A variational principle for quantum-state distinguishability is established in many-body systems with generalized symmetries, including noninvertible cases described by fusion categories. Standard fidelity and symmetry-resolved diagnostics emerge as coarse-grained limits of a more refined operational structure. When symmetry actions terminate at entanglement cuts, distinguishability is governed by boundary tube algebras within a symmetry-constrained measurement resource theory. The physically admissible instruments are characterized by complete positivity, entanglement-cut locality, boundary-module covariance, and sequential stability. The resulting optimal measurement structure is uniquely fixed by the center of the boundary tube algebra, $\mathcal{A}_{\mathrm{phys}} = Z\!\left(\mathrm{Tube}_{\mathcal{C}}(\mathcal{M}_A)\right)$, whose primitive idempotents define tube-sector probabilities that refine fidelity-based and symmetry-resolved descriptions. The associated tube positive-operator-valued measures (POVM) are extremal and yield optimal one-shot hypothesis-testing distinguishability under symmetry constraints. The construction is universal across fusion categories and independent of microscopic realization.

06.
arXiv (CS.LG) 2026-06-17

Perron–Frobenius Operator Matching for Generative Modeling

arXiv:2606.17465v1 Announce Type: new Abstract: We introduce Perron–Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.

07.
arXiv (CS.CV) 2026-06-12

On Pitfalls of $RemOve-And-Retrain$: Data Processing Inequality Perspective

The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, cannot add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.

08.
arXiv (CS.CV) 2026-06-15

Efficient Online 3D Multi-Camera Multi-Object Tracking and Pose Estimation

This paper proposes a fast and online method for jointly performing 3D multi-object tracking and pose estimation using multiple monocular cameras. Our algorithm requires only 2D bounding box and pose detections, eliminating the need for costly 3D training data or computationally expensive deep learning models. Our solution is an efficient implementation of a Bayes-optimal multi-object tracking filter, enhancing computational efficiency while maintaining accuracy. We demonstrate that our algorithm is significantly faster than state-of-the-art methods without compromising accuracy, using only publicly available pre-trained 2D detection models. We also illustrate the robust performance of our algorithm in scenarios where multiple cameras are intermittently disconnected or reconnected during operation.

09.
arXiv (CS.CV) 2026-06-11

SHERPA: Seam-aware Harmonized ERP Adaptation for Open-Domain 360$^\circ$ Panorama Generation

Panoramic imagery is increasingly used in world-generation, games, and simulation, where users may need not only photorealistic scenes but also stylized and non-photorealistic environments. Large-scale text-to-image diffusion and flow models provide broad style and semantic priors for this goal, but planar image training misaligns them with the wrap-around topology and polar regions of $360^\circ$ panoramas represented in equirectangular projection (ERP). We present SHERPA, a lightweight adaptation framework that combines frequency-selective Circular RoPE, Circular Latent Encoding/Decoding, image-side FFN adapters, and a Dual-Path Training Scheme. Circular RoPE replaces only the seam-sensitive high-frequency horizontal RoPE band with integer-periodic harmonics while preserving the pretrained lower-frequency spectrum. The Paired Panorama Path supervises geometry, while the Unpaired Style Path uses self-supervised yaw consistency for target-free stylized prompts. As a result, SHERPA generates $360^\circ$ panoramas across both photorealistic panorama domains and open-domain stylized prompts.

10.
arXiv (quant-ph) 2026-06-17

Quantum Routers: A Switching-Fabric Framework for Quantum-Native Forwarding

arXiv:2606.17773v1 Announce Type: new Abstract: Forwarding in quantum networks cannot be realized by directly transposing classical switching fabrics, since the no-cloning theorem and the quantum measurement postulate constrain the direct relay of quantum information while ruling out copy-based buffering and inspection. In this paper, we propose a switching-fabric framework for quantum routers based on multipartite entanglement. Specifically, we formalize the notion of an entanglement-based switching fabric, in which a graph state acts as the forwarding resource and entanglement forwarding is realized through local Pauli measurements. We translate the classical notions of blocking and non-blocking operation into structural conditions for entanglement-based fabrics, by deriving the edge-controlled (EC) design principle for non-blocking operation. We instantiate this principle through a monolithic EC crossbar and a modular Clos-type EC fabric, for which we characterize resource scaling and identify the regime where the modular design becomes more resource-efficient than the monolithic one. Finally, a forwarding-latency analysis establishes a fundamental distinction between matching-oblivious and matching-driven forwarding: the proposed EC fabrics realize all requested input-output entanglement links with constant forwarding depth under sufficient measurement parallelism, whereas matching-driven EPR-based fabrics exhibit latency that scales with the number of requested connections. The proposed framework provides a hardware-agnostic foundation for quantum-router switching fabrics.

11.
medRxiv (Medicine) 2026-06-18

Entrainment of cortical gamma oscillations predicts improved bradykinesia and dyskinesia in Parkinson's disease

Background: Deep brain stimulation (DBS) of the subthalamic nucleus (STN) is hypothesized to improve motor symptoms in Parkinson's disease (PD) by suppressing pathologically elevated beta activity and promoting "prokinetic" gamma activity in the cortico-basal ganglia-thalamo-cortical loop. Advances in bidirectional DBS devices have revealed that stimulation can modify gamma oscillations via subharmonic entrainment, though entrainment's therapeutic role remains unclear. Objectives: To identify stimulation parameters that entrain motor cortical and STN gamma oscillations in PD at rest and during movement, and examine their association with motor function. Methods: Sensorimotor cortex and STN field potentials were collected using a bidirectional DBS system in four subjects with PD over a range of stimulation amplitudes and frequencies. Entrainment amplitude at half the stimulation frequency was quantified at rest and during a finger-tapping task in the ON-medication state. The presence or absence of entrainment was studied as a physiomarker of motor symptom severity. Results: The amplitude of stimulation-entrained gamma oscillations was non-linearly related to stimulation intensity and frequency and varied by stimulation contact choice. Entrainment amplitude was highest in precentral gyrus and increased with movement. In the ON-medication state, precentral gyrus gamma entrainment was associated with reduced bradykinesia, dyskinesia, and dystonia. Subthalamic gamma entrainment predicted improved dystonia but was a less significant marker for motor benefit than cortical entrainment. Conclusions: Stimulation-entrained gamma oscillations in the motor network are a physiomarker for optimal DBS response in PD, and could have a role in physiology-guided DBS programming, complementing existing strategies based on suppression of basal ganglia beta activity.

12.
arXiv (math.PR) 2026-06-18

Evolution of Conditional Entropy for Diffusion Dynamics on Graphs

arXiv:2510.19441v2 Announce Type: replace-cross Abstract: The modeling of diffusion processes on graphs is the basis for many network science and machine learning approaches. Entropic measures of network-based diffusion have recently been employed to investigate the reversibility of these processes and the diversity of the modeled systems. While results about their steady state are well-known, very few exact results about their finite-time evolution exist. Here, we introduce the conditional entropy of heat diffusion in graphs, and outline a mathematical framework that contextualizes diffusion and conditional entropy within the theories of continuous-time Markov chains and information theory. In particular, we highlight that this entropic measure satisfies an information-theoretical version of the second law of thermodynamics, thereby providing a parallelism between diffusion dynamics on networks and their physical counterparts. Furthermore, we obtain explicit results for its evolution on complete, path, and circulant graphs, as well as a mean-field approximation for Erdös-Rényi graphs. We also obtain asymptotic results for general networks and provide bounds for the evolution of conditional entropy. Finally, we experimentally demonstrate several properties of conditional entropy for diffusion over random graphs, such as the Watts-Strogatz model.

13.
arXiv (CS.CV) 2026-06-16

LentiAvatar: Pseudo-Multiview Reconstruction and Subpixel Prism Rendering for Real-Time Stereoscopic Communication

Real-time stereoscopic video communication has long been a goal of immersive telepresence, yet practical systems still require specialized capture rigs or reduce remote users to a single portrait view. We present LentiAvatar, a Gaussian head-avatar system that connects monocular avatar capture with subpixel-encoded glasses-free lenticular display for real-time autostereoscopic communication. From a monocular portrait video, LentiAvatar reconstructs a controllable head avatar and optimizes it for the lateral viewing zones induced by the display. The method uses natural head turns as pseudo-multiview (PMV) supervision to constrain regions that are otherwise weakly observed in monocular training, including hair, ears, jaw contours, and neck boundaries. Reliable side frames are yaw-binned, aligned to virtual cameras, and supervised within a strict head-and-hair domain; contour-aware losses and staged regularization further suppress ghosting, alpha leakage, and depth instability while preserving lateral detail. At runtime, LentiAvatar renders 32 virtual views and encodes them into a 4K lenticular raster with calibrated subpixel-routing masks. The live-tracker prototype sustains 10.65 FPS, and a subject-specific distilled driver raises the same display pipeline to 38.49 FPS.

14.
arXiv (CS.CL) 2026-06-18

Efficient Financial Language Understanding via Distillation with Synthetic Data

Large instruction-following models are powerful but costly to deploy, particularly in finance, where labelled data are limited by confidentiality and expert annotation cost. We present an efficient framework for financial sentiment analysis through distillation with synthetic data, transferring knowledge from a large instruction-tuned teacher to compact student models. The framework is designed for low-resource conditions, where a small set of real examples are collected and labelled by hand. The framework then clusters the examples and uses the clusters to select seeds for generating synthetic examples via structured few-shot prompting. Experiments show that clustering-based seed selection yields more representative synthetic data than random sampling, enabling compact models to achieve strong performance with minimal supervision. Notably, on a more complex and noisy text domain, the compact model trained on the complete synthetic-seed corpus even outperforms the teacher model, while remaining competitive on formal text. The framework provides a practical route toward resource-efficient domain adaptation in financial NLP with minimal human labelling effort.

15.
arXiv (CS.AI) 2026-06-16

Greed Is Learned: Visible Incentives as Reward-Hacking Triggers

arXiv:2606.16914v1 Announce Type: new Abstract: Deployed agents increasingly act with their reward proxy in view, such as a balance, score, or KPI dashboard. We show that reinforcement learning can make a policy addicted to such a visible self-benefit channel. It chases the displayed payoff across held-out domains, sacrifices the true task to do so, and follows the channel wherever we rewrite it, while policies that never saw the channel stay honest. We call this reward-channel addiction and study it in MoneyWorld, a synthetic sandbox. The addiction can flip a model's safety alignment: trained only on innocuous money tasks with no safety content, the model abandons the safe action it otherwise always takes whenever a dashboard pays for an unsafe one, and reverts to safe once the channel is hidden. This learned bribe replicates across model scales and families. Blindly optimizing super-capable, next-generation AI on KPIs or P\&L can be dangerous for alignment. Greed is learned when following such a channel pays.

16.
arXiv (quant-ph) 2026-06-16

What does measuring one qubit reveal about another? $K$-networks as a directed diagnostic for quantum circuits

arXiv:2606.16549v1 Announce Type: new Abstract: Many-qubit circuit states are hard to inspect directly, so they are often summarized by pairwise graph weights. Common pairwise weights report symmetric correlations, while many circuit questions are directed and basis-specific: if qubit $i$ is measured in a given basis, how strongly does the outcome reshape the conditional state of qubit $j$? We define $K_{i\to j}$, a directed, basis-conditioned edge weight for this question. It is large when the two measurement outcomes occur with comparable probability and leave qubit $j$ in clearly different conditional states; it is zero when the source outcome is deterministic or the target states are indistinguishable. The scalar uses standard binary-ensemble distinguishability; the paper's contribution is to turn this conditional comparison into a directed network layer for circuit states. The resulting networks are computable from two-qubit reduced density matrices. They are diagnostic (not entanglement measures): for pure two-qubit states $K$ reduces to the tangle $C^2$ (squared concurrence)[WoottersConcurrence,CKWTangle], while separable mixed states can reach $K=1$. Examples on teleportation, Grover, QAOA, and random circuit families show the intended use: $K$-networks map feed-forward, phase, and interaction-graph structure that symmetric or computational-basis summaries can leave weak or absent.

17.
arXiv (CS.AI) 2026-06-16

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

arXiv:2605.22664v3 Announce Type: replace Abstract: LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions. To meet enterprise needs, frontier AI labs have developed agents that can construct entire spreadsheets from scratch. This is especially relevant in finance, where core workflows such as financial modeling, forecasting, and scenario analysis are commonly conducted through spreadsheets. Yet, existing spreadsheet benchmarks do not measure this advanced capability, focusing instead on question-answering or single-formula edits. To address this gap, we provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis. Since deliverables therein are routinely reviewed and revised by multiple stakeholders, judging their quality necessarily involves high-level criteria such as readability or ease of modification. To reflect the multidimensional nature of solution quality, we develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards. The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review, but even the strongest agents frequently fall short of professional finance standards and degrade sharply as the difficulty increases beyond a few chained calculations. This suggests that current agents are not yet able to reliably produce professional-quality spreadsheets at the level of complexity real-world workflows demand.

18.
arXiv (CS.CV) 2026-06-16

Learning Directional Semantic Transitions for Longitudinal Chest X-ray Analysis

Chest X-ray (CXR) interpretation often requires longitudinal comparison to assess disease progression. Existing approaches typically rely on temporal feature fusion or inter-study discrepancy modeling, yet remain limited in capturing subtle progression semantics and overlook the inherently directional nature of disease trajectories. In this paper, we propose ProTrans, a novel vision-language pretraining framework that formulates disease progression as a directional semantic transition between paired CXR studies. ProTrans leverages radiology reports to anchor individual CXR representations within interpretable disease states, and introduces a learnable progression feature map to explicitly encode semantic shifts between states, aligned with report-derived progression descriptions. To enforce direction-aware perception, ProTrans incorporates a reversed temporal modeling process and imposes bidirectional reconstruction consistency across states and transitions, thereby disentangling directional semantics and promoting coherent trajectory modeling. Extensive experiments on longitudinal downstream tasks, including disease progression classification and progression captioning, demonstrate that ProTrans consistently outperforms existing methods, establishing a unified pretraining framework for longitudinal CXR understanding. https://github.com/RPIDIAL/ProTrans

19.
arXiv (CS.LG) 2026-06-17

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

arXiv:2606.18049v1 Announce Type: new Abstract: Decision-making with deep learning-based time series forecasting requires not only accurate predictions but also actionable insights. However, current architectures do not inherently provide such information. Specifically, guidance is needed on how current conditions must be modified to shift from a predicted outcome to a desired future scenario. Counterfactual explanations provide a natural framework for this task, as they represent minimal input changes that alter the model's prediction, indicating when and how intervention is required. Existing approaches rely on instance-wise optimization, leading to inconsistency across instances, high computational costs, and limited applicability in real-time settings. To address these limitations, we reformulate counterfactual generation for time series forecasting as the problem of learning a globally consistent intervention strategy, allowing counterfactuals to be generated through a single shared function. We propose Counterfactual Time Series Explanations (ConTex), a model-agnostic, decomposed architecture comprising a temporal context encoder and a conditional encoder, followed by two heads that capture interventions in terms of temporal relevance and modification strength. This structure overcomes the instability and inconsistency of instance-based approaches by producing targeted, interpretable interventions across time and feature dimensions in a single forward pass, making it suitable for real-time applications. Across multiple forecasting architectures and benchmark datasets, ConTex achieves state-of-the-art validity while generating sparse counterfactuals that minimize the number of necessary interventions. Additionally, our approach reduces computational cost by at least 12-36x compared to instance-wise generation and supports real-time inference at approximately 0.007 seconds.

20.
arXiv (CS.CV) 2026-06-12

CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing

Editing natural images using textual descriptions in text-to-image diffusion models remains a significant challenge, particularly in achieving consistent generation and handling complex, non-rigid objects. Existing methods often struggle to preserve textures and identity, require extensive fine-tuning, and exhibit limitations in editing specific spatial regions or objects while retaining background details. This paper proposes Context-Preserving Adaptive Manipulation (CPAM), a novel zero-shot framework for complicated, non-rigid real image editing. Specifically, we propose a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively. This ensures that the objects' shapes, textures, and identities are maintained while keeping the background undistorted during the editing process using the mask guidance technique. Additionally, we develop a localized extraction module to mitigate the interference with the non-desired modified regions during conditioning in cross-attention mechanisms. We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner. CPAM can be seamlessly integrated with multiple diffusion backbones, including SD1.5, SD2.1, and SDXL, demonstrating strong generalization across different model architectures. Extensive experiments on our newly constructed Image Manipulation BenchmArk (IMBA), a robust benchmark dataset specifically designed for real image editing, demonstrate that our proposed method is the preferred choice among human raters, outperforming existing state-of-the-art editing techniques. The source code and data will be publicly released at the project page: https://vdkhoi20.github.io/CPAM

21.
Nature Biotechnology 2026-06-05

Structural motif search across the protein universe with Folddisco

作者:

Detecting similar protein structural motifs in large structure collections is computationally expensive. We developed Folddisco, a fast structural motif search tool that uses an index of position-independent geometric features, including side-chain orientation, combined with a rarity-based scoring system. Folddisco is 20-fold faster in querying and fourfold more storage-efficient than existing methods while improving accuracy. Folddisco is freely available online ( https://folddisco.foldseek.com ), along with a webserver ( https://search.foldseek.com/folddisco ). Folddisco enables protein structural motif search in million scale databases.

22.
arXiv (CS.AI) 2026-06-12

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

arXiv:2606.13051v1 Announce Type: new Abstract: Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.

23.
arXiv (CS.CV) 2026-06-17

See First, Answer Later: Visual Evidence Pre-Alignment via Sufficiency-Driven RL

Multimodal large language models (MLLMs) integrate strong text reasoning with visual inputs, yet their responses can be inconsistent with the underlying images, indicating ineffective utilization of visual evidence during inference. The prevailing training paradigm relies on large-scale caption-based pretraining for general alignment, followed by supervised fine-tuning and reinforcement learning to enable instruction following and complex reasoning. However, such pretraining provides only weak visual grounding: short, coarse captions bias models toward salient objects while neglecting fine-grained visual evidence. In this paper, we introduce Visual Evidence Pre-Alignment (VEPA), an intermediate stage between pretraining and post-training that explores a novel sufficiency-driven objective with Group Relative Policy Optimization (GRPO) to optimize question-conditioned visual evidence descriptions. Extensive experiments across diverse benchmarks show that our VEPA consistently enhances performance on visually demanding evaluations and complements standard supervised post-training. Further analyses show that the income stems from strengthened, transferable visual grounding, rather than from additional task-specific training.

24.
arXiv (CS.AI) 2026-06-19

PCBSchemaGen: Reward-Guided LLM Code Synthesis for Printed Circuit Boards (PCB) Schematic Design with Structured Verification

arXiv:2602.00510v2 Announce Type: replace Abstract: Most LLM code-synthesis benchmarks rely on unit tests as the reward oracle, but PCB schematic design has none: correctness is defined by structured physical constraints over real IC packages and pin-level assignments, per-task golden references are unavailable, and SPICE simulation does not validate schematic-level correctness. We introduce PCBSchemaGen, a training-free inference-time framework that turns a frozen LLM into a verifiable, repairable PCB schematic generator. The framework induces a domain schema from IC datasheets to ground LLM decoding, pairs it with a deterministic 5-layer continuous-reward verifier with pin-level error localization, and refines candidates through a Thompson Sampling arm-acquiring bandit. We evaluate on 2 PCB benchmarks covering 227 real-IC tasks across 22 unified circuit domains, including a public-schematic-derived suite that serves as a fully held-out generalization test (verifier, KG library, and prompts frozen before any evaluation). Under our framework, an open-weight 31B model (Gemma-4-31B) passes 81.3% of PCBBench tasks on average, and the same framework transfers across both benchmarks with zero verifier code changes; a Circuitron-style inference-time prompting baseline on the same Gemma-4-31B backbone collapses on hard system-level designs. This suggests inference-time refinement under a deterministic structural verifier is a general recipe for reference-free LLM code synthesis in domains without unit-test oracles. Our benchmarks and deterministic verifier are publicly available at https://github.com/HZou9/PCBSchemaGen_v2.

25.
arXiv (CS.AI) 2026-06-12

Hellinger Multimodal Variational Autoencoders

arXiv:2601.06572v4 Announce Type: replace-cross Abstract: Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions using either a product of experts (PoE), a mixture of experts (MoE), or their combinations to approximate the joint posterior. In this work, we revisit multimodal inference through the lens of probabilistic opinion pooling, an optimization-based approach. We start from Hölder pooling with $\alpha=0.5$, which corresponds to the unique symmetric member of the $\alpha-divergence$ family, and derive a moment-matching approximation, termed Hellinger. We then leverage such an approximation to propose HELVAE, a multimodal VAE that avoids sub-sampling, yielding an efficient yet effective model that: (i) learns more expressive latent representations as additional modalities are observed; and (ii) empirically achieves better trade-offs between generative coherence and quality, outperforming state-of-the-art multimodal VAE models.