Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
bioRxiv (Bioinfo) 2026-06-12

The Geometry of Allostery: A Laplacian Minor Hierarchy for Many-Body Protein Communication

Quantifying how cooperative, many-body relationships drive allostery in protein networks remains a major challenge. To address this, we develop the Laplacian minor hierarchy, a mathematical framework that characterizes the geometric invariants of a protein network. Lower-order minors yield standard metrics including the partition function and effective distances, whereas higher-order minors define novel topological measures: cooperation indices, each bounded between zero and one, that characterize pathway correlations at increasing levels of complexity, the third-order minor determines whether allosteric pathways are correlated or uncorrelated, and the fourth-order minor quantifies how distinct pathways communicate through intermediary residues. We apply this framework to analyze the evolutionary adaptation of the PSD95pdz3 domain from Class I to Class II ligand specificity via mutations G330T and H372A. The cooperation index demonstrates a distinct evolutionary hierarchy: the G330T mutation establishes distributed pathway couplings that the H372A mutation subsequently exploits, whereas H372A alone produces minimal global changes. Furthermore, the fourth-order analysis identifies His317 as a critical intermediary node bridging the class-switching (330-372) and class-bridging (330-400) allosteric pathways. These results demonstrate that allosteric dependencies emerge only when mutations accumulate in specific combinations, with a hierarchical organization of pathways structured around position 330 and intermediary nodes His317 and Phe400. Rather than predicting allosteric mechanisms, this framework provides a mechanistic explanation for why and how allostery emerges during protein evolution.

02.
arXiv (CS.AI) 2026-06-16

CAP: Towards PPG Universal Representation Learning with Patient-level Supervision

arXiv:2606.15284v1 Announce Type: cross Abstract: Photoplethysmography (PPG) plays a central role in wearable health monitoring and clinical decision support. Yet existing approaches to universal PPG representation learning largely focus on signal-level objectives and often overlook patient-level health context, which limits generalization to complex clinical tasks and heterogeneous cohorts. To address this gap, we construct a large-scale paired PPG-EHR multimodal dataset by distilling fragmented medical histories and clinical records into cohesive, patient-level electronic health records (EHR). Building on this resource, we propose Clinical Anchored Pretraining for PPG (CAP). During pretraining, CAP performs cross-modal contrastive alignment that anchors PPG representations to patient-level clinical semantics, guiding the encoder beyond waveform fitting toward modeling consistency in a patient's overall physiological state. During downstream adaptation, the pretrained PPG encoder provides clinically grounded representations that strengthen inductive bias and improve robustness and transferability. Experiments demonstrate that CAP consistently outperforms strong baselines on four diverse downstream tasks. CAP achieves a particularly large gain on respiratory rate prediction (up to +87.6% relative improvement over the state-of-the-art baseline) and delivers an average relative +26.7% across all tasks. We further enhance the interpretability of our approach through comprehensive analyses, including ablations and multiple complementary visualizations of the learned representations. The code for our experiments is available at: https://github.com/gody123gody/CAP .

03.
arXiv (CS.AI) 2026-06-12

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

arXiv:2606.13054v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit exceptional general language processing capabilities, but their memory and compute costs hinder deployment. Ternarization has emerged as a promising compression technique, offering significant reductions in model size and inference complexity. However, existing methods struggle with heavy-tailed activation distributions and therefore keep activations in high precision, fundamentally limiting end-to-end inference acceleration. To overcome this limitation, we propose TWLA, a post-training quantization (PTQ) framework that achieves 1.58-bit weight compression and 4-bit activation quantization while maintaining high accuracy. TWLA comprises three components: (1) Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ) minimizes layer-output error under weight ternarization via a two-stage optimization from Euclidean initialization to manifold relocation; (2) Kronecker Orthogonal Tri-Modal Shaping (KOTMS) applies a Kronecker-structured orthogonal rotation to reshape weights into ternary-friendly tri-modal distributions, while the shared rotation statistically suppresses activation outliers; and (3) Inter-Layer Aware Activation Mixed Precision (ILA-AMP) explicitly introduces adjacent-layer second-order interaction costs in bit allocation and jointly optimizes for the layer-wise disparity of activation quantization gains induced by the shared orthogonal transform, preventing cascades triggered by a few weak layers. Extensive experiments demonstrate that TWLA maintains high accuracy under W1.58A4, while delivering significant inference acceleration. The code is available at .

04.
arXiv (CS.LG) 2026-06-18

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

arXiv:2606.19230v1 Announce Type: new Abstract: This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

05.
arXiv (CS.LG) 2026-06-15

Adaptive Nucleus Truncation for Long-Form Reasoning

arXiv:2606.13982v1 Announce Type: cross Abstract: Sampling plays an important role in long-form language-model reasoning. Over thousands of decoding steps, small changes in the candidate token set can compound into different reasoning trajectories, stability profiles, and final answers. Existing truncation methods such as top-$p$, min-$p$, and fixed top-$n\sigma$ sampling improve over unrestricted sampling, but they rely on fixed thresholds that cannot adapt to changes in entropy, task difficulty, training stage, or generation budget. We introduce Adaptive Nucleus Truncation Sampling (ANTS), which extends top-\(n\sigma\) sampling from a fixed decoding rule into an adaptive rollout-control mechanism for long-form generation. ANTS selects standardized neighborhoods around the maximum logit before temperature scaling, adapts the truncation width using an entropy-conditioned controller, and retains a no-truncation fallback arm to stabilize training when truncation becomes unsafe. On a 33B-total / 4B-active sparse Mixture-of-Experts reasoning model, ANTS improves average performance over percentage-based benchmarks by +1.9, +3.8, and +5.2 points at 8K, 16K, and 32K generation budgets, respectively. The strongest gains appear on instruction following and mathematical reasoning, with IFBench improving by more than 10 points at 32K and AIME 2025 improving by 7 points. Code generation reveals an important budget interaction. On Codeforces, ANTS trails the baseline at 8K, but reverses this gap and substantially improves ELO at 16K and 32K. These results suggest that sampler design should be treated not just as a decoding hyperparameter, but as part of how we stabilize and scale long-budget reasoning.

06.
arXiv (CS.CL) 2026-06-18

Fair Cognitive Impairment Detection Through Unlearning

Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned models often exploit demographic cues correlated with labels, resulting in a large performance gap across subgroups. We present a multimodal framework that combines (i) cross-model fusion between modalities (speech, text, and image), and (ii) unlearning using gradient reversal that discourages the shared embedding from encoding task-irrelevant demographic attributes. Evaluated on the multilingual benchmarks TAUKADIAL and PREPARE, our method outperforms the state-of-the-art multilingual and multimodal baseline in MCI classification while substantially reducing the performance gap across patient subgroups (sex and language). We further analyze transfer across datasets, showing that demographic unlearning helps learn more robust representations for MCI detection.

07.
medRxiv (Medicine) 2026-06-15

Repurposing cardiovascular disease risk models to predict incident and co-occurring cardiovascular, cardiometabolic and neurocognitive outcomes.

Background: Cardiovascular disease (CVD), cardiometabolic and neurocognitive conditions share risk factors and frequently co-occur. We evaluated whether four established CVD risk prediction models (QRISK3, PCE, SCORE2, SCORE2-OP) can be repurposed to predict 10-year risk of these conditions and their co-occurrence with CVD. Methods: The models were recalibrated using 20% of the UK Biobank (UKB) and evaluated in the remaining 80%. We performed external validation using data from Clinical Practice Research Datalink (CPRD) Aurum, assessing model discrimination (c-statistics) and calibration (intercept and slope). We used permuted feature importance to determine the influence of each individual predictor in the models. Results: Depending on the model, the c-statistics for incident CVD ranged from 0.71 to 0.74 in the UKB test set (16,137 events). Discrimination was equal to or higher than CVD when evaluated against non-traditional CVD outcomes: 0.74 to 0.77 for heart failure (3,471 events), 0.72 to 0.73 for atrial fibrillation (9,213 events), 0.73 to 0.75 for peripheral arterial disease (1,927 events) and 0.80 to 0.82 for abdominal aortic aneurysm (595 events). For the multimorbidity endpoints, model discrimination ranged from 0.74 for the composite of CVD and T2DM (SCORE2-OP) to 0.83 for the composite of CVD and dementia or Parkinson's disease (QRISK3). When considering the onset of any cardiovascular, cardiometabolic, or neurocognitive outcome discrimination ranged from 0.71 to 0.72. The repurposed models slightly underestimated the predicted risk in the CPRD compared to the UKB: average difference in calibration intercept was at most -0.64. After age and sex, smoking status and systolic blood pressure contributed most to model predictions. Conclusions: Repurposed CVD models can be used to identify 10-year risk of many CVD-related conditions and their multimorbidity. These may be used to support risk-based approaches to prevention and screening. The repurposed models have been made available at: https://repurposed-cvd-risk-models.shinyapps.io/cvd_cmd_dementia_app/ Keywords: Risk prediction; cardiovascular disease; cardiometabolic disease; dementia; disease prevention.

08.
arXiv (CS.LG) 2026-06-16

Scalar-Stepsize Nonuniform Monte Carlo Optimistic Policy Iteration: A Certified Counterexample

arXiv:2606.15978v1 Announce Type: new Abstract: Tsitsiklis proved convergence of Monte Carlo optimistic policy iteration under a uniform update structure and identified nonuniform update frequencies as a delicate obstruction. We give a certified negative answer for the natural scalar-stepsize, unnormalized asynchronous state-value recursion with fixed nonuniform state-selection probabilities. In a three-state, two-action discounted MDP, the nonuniform update frequencies induce a diagonally scaled greedy-policy mean field with a certified nonconstant attracting hybrid periodic orbit. With a bounded unbiased geometric-horizon estimator and Robbins–Monro stepsizes, the original stochastic recursion remains trapped near the cycle with positive probability and therefore fails to converge. The example pinpoints a geometric obstruction: uniform sampling gives radial residual contraction, whereas scalar nonuniform sampling anisotropically distorts the residual dynamics and can generate switched attracting cycles.

09.
arXiv (CS.AI) 2026-06-18

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

作者:

arXiv:2606.19172v1 Announce Type: new Abstract: Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

10.
arXiv (CS.LG) 2026-06-18

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

arXiv:2606.18338v1 Announce Type: new Abstract: The search for life beyond Earth will depend on detecting faint signatures in the atmospheres of potentially habitable exoplanets. Interpreting those signatures requires understanding the host planet's climate: the same molecule may signal life on one planet and abiotic chemistry on another. Global climate models (GCMs) provide this understanding, but individual runs can require up to millions of core-hours and substantial domain expert time. Machine-learning emulators could remove this bottleneck, but progress has been limited by the absence of a curated, multi-model exoclimate dataset. We introduce ThousandWorlds, an ML-ready benchmark for exoclimate emulation and for the broader regime of low-data, multi-simulator, parameter-to-field regression. The dataset contains approximately 1800 simulations from five GCMs, mapping eight planet parameters to 3D atmospheric fields including temperature, humidity, winds, clouds, and radiation. Three nested subsets define progressively harder challenges: single-simulator regression, multi-simulator regression with complete observations, and multi-simulator regression with structured missingness. We propose two evaluation protocols: one for ranking methods, and one that measures performance relative to the disagreement between GCMs themselves. We evaluate seven baselines spanning simple methods, deep learning, and Gaussian processes. GP-based methods perform best, suggesting that ThousandWorlds exposes a regime where off-the-shelf deep learning does not yet succeed. Data: https://doi.org/10.57967/hf/8695. Code: https://github.com/edstevenson/ThousandWorlds.

11.
arXiv (CS.CV) 2026-06-18

Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

Existing multi-speaker dialogue systems bind speakers to utterances through structured supervision: per-turn tags, multi-stream transcriptions, or learnable speaker embeddings. These systems operate within speech-only pipelines that produce clean vocal sequences without the ambient texture of real conversations. We take a different approach. Our method, ScenA, conditions a text-to-audio flow-matching foundation model, pretrained on large-scale in-the-wild data, directly on multiple reference voices and a free-form natural language prompt that describes an entire multi-speaker audio scene. Leveraging such a foundational model allows us to inherit its capacity for natural, non-studio audio: background noise, room acoustics, overlapping dialogue, and spontaneous paralinguistic events, while adding multi-speaker control without any per-turn structure. Concretely, reference latents are concatenated into the model's token sequence and distinguished by lightweight identity-aware positional encodings. However, we identify a critical obstacle to this approach: the Reference Shortcut. During training under standard noise schedules, the model can identify the matching reference by acoustic similarity to the noisy target, bypassing the text prompt entirely. We address this with a high-noise-biased timestep distribution that forces the model to rely on the text prompt for speaker assignment. We evaluate ScenA on the CoVoMix2-Dialogue benchmark, showing that it outperforms existing multi-speaker systems on speaker-binding metrics while generating rich conversational audio with overlapping speech, emotional vocalizations, and ambient sound. Our results demonstrate the advantage of using a general-purpose audio model conditioned on a free-form scene description, rather than passing structured dialog scripts through a speech-only pipeline.

12.
arXiv (CS.CV) 2026-06-18

AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model

Large Vision and Language Models (LVLMs) have advanced rapidly, yet European Portuguese (pt-PT) remains systematically underserved by existing open-source multimodal models, which either conflate it with Brazilian Portuguese or severely under-represent it in their training data mixes. We introduce AMALIA-VL, the first open-source instruction-tuned LVLM built natively for pt-PT, pairing a high-resolution vision encoder with dynamic image tiling and a fully open pt-PT-optimized language model via a learned connector. We contribute with a purposefully designed three-stage training process - vision-language alignment, general visual instruction tuning, and preference optimization - together with a pt-PT-centric multimodal data mix combining curated and translated public datasets with novel datasets that address the near-total absence of European Portuguese multimodal resources. Our evaluation shows that AMALIA-VL establishes a strong baseline for open-source pt-PT LVLMs.We will release model weights, training data, and construction pipelines along with machine-translated pt-PT evaluation benchmarks to help democratize pt-PT LVLM development.

13.
arXiv (CS.CV) 2026-06-16

Sub-Semantic Image Segmentation

Images can be segmented based on visual cues (i.e., texture segmentation) or into objects (i.e., semantic segmentation). We propose a new category of sub-semantic image segmentation that blurs the line between the two. In sub-semantic image segmentation, language is not used to name whole objects. Instead, it is used to partition an image into stable appearance patterns that can be described by language. To do that, we couple a general-purpose vision-language model to SAM 3, a promptable segmentation backbone whose native text pathway can ground rich descriptions into masks. Simple coupling fails for a number of reasons that we identify in the paper, and we overcome them by introducing DETECTURE that resolves three concrete failure modes – language leakage between texture regions, prompt competition inside the segmentation backbone, and semantic distortion at the language-to-mask interface. Since there is no dataset of sub-semantic image segmentation, we introduce one, termed TextureADE. The new dataset is derived from the ADE20K dataset using a system we designed. We compare DETECTURE to a number of baselines and find that it achieves the strongest performance on several datasets using different metrics. Code is available at https://github.com/Scientific-Computing-Lab/TextureDetecture.

14.
arXiv (CS.AI) 2026-06-12

Modern analog computing for solving differential and matrix equations

arXiv:2606.13179v1 Announce Type: cross Abstract: In recent years, driven by the computational demands of data-intensive applications such as artificial intelligence and scientific computing, analog computing has gained renewed interest. Given the diversity of computational tasks and recent advancements in analog CMOS circuits and resistive memory technologies, we refer to the evolving landscape as modern analog computing. In this context, we identify three core computational primitives: solving differential equations, solving matrix equations, and performing matrix-vector multiplications, and we explore the connections among them. We also examine various hardware implementations of these analog computing operators, including those built with discrete components, integrated circuits, and resistive memory devices. Among these, resistive memory arrays emerge as particularly promising due to their implementation efficiency. The paper then surveys recent progress in leveraging modern analog computing to solve differential and matrix equations using both advanced analog CMOS circuits and resistive memory arrays. Finally, we discuss the applications of these circuits, the precision and scalability issues and their potential solutions, the relationship with in-memory computing, and the unique computational complexity of analog computing. This paper provides a unified perspective on analog computing, highlighting its strengths, current developments, and challenges, and positioning it as a pivotal enabler of next-generation computational frontiers.

15.
arXiv (CS.AI) 2026-06-15

STREAM: Multi-Tier LLM Inference Middleware with Dual-Channel HPC Token Streaming

arXiv:2606.13968v1 Announce Type: cross Abstract: Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers offer powerful GPU resources at no marginal cost and keep data within institutional boundaries, but operate behind firewalls and are designed for batch jobs rather than interactive use; commercial cloud APIs provide frontier-model quality on demand but impose significant cost and data retention policies unsuitable for sensitive research data. No existing system unifies all three. STREAM (Smart Tiered Routing Engine for AI Models) addresses this gap with four contributions: (1) a three-tier routing architecture combining local, HPC, and cloud inference with a local LLM-based complexity judge; (2) a dual-channel HPC streaming architecture that separates the Globus Compute control plane (authentication and job dispatch) from a WebSocket relay data plane (token delivery), enabling sub-second TTFT (0.54 s median, 21.1x over batch mode's 11.40 s) through institutional firewalls without VPN or firewall rule changes, with end-to-end AES-256-GCM encryption ensuring the relay operator cannot read token payloads; (3) tier-aware context summarization that prevents long conversations from forcing simple queries onto expensive tiers; and (4) an HPC-as-API proxy mode that exposes HPC inference as an OpenAI-compatible endpoint callable from any standard client with no HPC expertise, a deployment pattern made practical only by the sub-second TTFT of contribution (2). Llama 3.2 3B achieves 85.1% free-tier retention on a 1,200-query benchmark spanning ten domains. Measured TTFT: 0.26 s local, 0.54 s HPC (relay), 1.68 s cloud.

16.
arXiv (CS.CV) 2026-06-11

Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment

Personality assessment aims to infer stable personality traits from dynamic behaviors across language, voice, and facial cues. Since different personality dimensions are revealed through distinct behavioral perspectives, modeling trait-specific evidence is challenging. However, most existing approaches adopt a uniform multimodal fusion strategy across all dimensions, assuming identical modality contributions. This overlooks trait-specific modality preferences and introduces cross-modal interference. To address this issue, we propose a novel personality assessment framework called Traits Run Deeper, which consists of three components. Specifically, the Multimodal Foundation Representation (MFR) module constructs personality-oriented multimodal inputs and leverages psychology-informed semantic templates as anchors, enabling foundation models to capture trait-relevant information. Building upon MFR, the Trait-Specific Modality Fusion (TSMF) module acts as an asymmetric fusion mechanism, allowing each dimension to selectively exploit different modality pathways from modality-specific modeling to complementary fusion. Thus, TSMF captures heterogeneous modality preferences while reducing cross-modal contamination. Furthermore, the Distribution-Calibrated Personality Regression (DCPR) module mitigates label imbalance and central tendency bias through target distribution calibration, improving robustness and stability. Experimental results on the AVI Challenge 2026 validation set demonstrate the effectiveness of the proposed framework, reducing mean squared error (MSE) by approximately 25% compared with the baseline. Consistent improvements are observed on the official test set, where our method achieves the best performance and ranks first in the Personality Assessment Track. The source code will be made available at https://github.com/MSA-LMC/AVI2026.

17.
arXiv (CS.CV) 2026-06-16

Assessing Reliability of Symbol Detection in Concept Bottleneck Models

Concept Bottleneck Models (CBMs) are a relevant tool for explainable Artificial Intelligence because they make their predictions through human-interpretable symbols. However, high task accuracy does not guarantee that these symbols are detected faithfully: jointly trained CBMs may encode task-specific shortcuts in the bottleneck, making their explanations unreliable. In this paper, we study concept-detection reliability by swapping independently trained concept detectors and classification heads that share the same symbolic vocabulary. We use the resulting performance degradation, concept-level metrics, and symbol-wise uncertainty estimates to identify concepts that are especially prone to spurious firing. Finally, we propose a reliability-aware training strategy in which a shared concept detector is optimized with multiple classification heads and penalized for relying on globally or instance-wise unreliable symbols. On CUB-200-2011 with full concept supervision, detectors and heads are almost freely interchangeable (swap drop below one accuracy point, relative retention above $99\%$, and no concept detected below chance), whereas on a controlled synthetic task we show that, as the concept-supervision weight is reduced, models keep near-perfect task accuracy while swapped accuracy and agreement with the ground-truth concepts collapse to chance. Our reliability-aware training substantially mitigates this leakage, roughly doubling swap accuracy in the leaky regime.

18.
arXiv (CS.LG) 2026-06-18

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

arXiv:2605.25929v2 Announce Type: replace-cross Abstract: The effectiveness of multi-agent LLM deliberation depends not only on the agents' individual predictions, but also on how they communicate and collaborate. We study this mechanism through the lens of Friedkin-Johnsen (FJ) opinion dynamics, a tractable model for analyzing stubbornness, influence, and opinion change in multi-agent systems that captures empirically observed deliberation patterns. We show that the FJ parameters are input-dependent, turning multi-agent deliberation into a mixture of experts. This perspective implies that multi-agent systems can outperform single agents and static ensembles when routing reflects agent competence. Since competence is latent in practice, we analyze how influence is established through observable proxies: agents' self-assessed confidence, their perceived confidence, and initial alignment with other agents' views.

19.
arXiv (CS.LG) 2026-06-19

An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling

arXiv:2606.19770v1 Announce Type: new Abstract: We propose an information-theoretic framework for graph novelty generation, which aims to generate data that are distinct from existing patterns while preserving global structural consistency. Our approach embeds data into a latent space, models the latent distribution using finite mixture models, and generates novel samples by imposing explicit novelty and reliability conditions formulated in terms of description length. Specifically, novelty is enforced by requiring generated samples to be poorly explained by all existing mixture components, while reliability constrains their impact on the overall mixture structure under the Minimum Description Length (MDL) principle. We provide a theoretical analysis showing that, with appropriate threshold choices, the probabilities of misclassifying non-novel or unreliable samples converge to zero with explicit rates. Experiments on synthetic and benchmark graph datasets demonstrate that the proposed method enables principled novelty generation with quantifiable risk.

20.
arXiv (CS.CV) 2026-06-16

A Text Recognition Dataset from Sahidic Coptic Ancient Manuscripts

In this work, we target Handwritten Text Recognition (HTR) in low-resource scenarios, which arise from underrepresented languages, rare scripts, and degraded visual conditions typical of historical documents. We introduce SCAM (Sahidic Coptic Ancient Manuscripts), a new line-level dataset built from digitized ancient manuscripts written in the extinct Sahidic Coptic dialect. The dataset reflects a realistic and challenging setting, as it combines heterogeneous acquisition conditions across libraries with typical manuscript degradations such as ink fading, bleed-through, and material deterioration. In addition to visual complexity, SCAM poses significant linguistic challenges due to the scarcity of resources for Sahidic Coptic, its uncommon alphabet, and dialect-specific diacritics. To support research in low-resource HTR, we benchmark several state-of-the-art approaches based on different paradigms, highlighting their limitations and strengths in this setting. Our results underline the gap between current HTR performance on well-resourced modern scripts and historically grounded, low-resource scenarios, thus providing a reference point for future developments.

21.
arXiv (math.PR) 2026-06-12

Non-commutative Law of iterated logarithm

arXiv:2509.22037v2 Announce Type: replace-cross Abstract: We prove optimal non-commutative analogues of the classical Law of Iterated Logarithm (LIL) for both martingales and sequences of independent (non-commutative) random variables. The classical martingale version was established by Stout [Sto70b] and the independent case by Hartman-Wintner [HW41]. Our approach relies on a key exponential inequality essentially due to Randrianantoanina [Ran24] that improves that from Junge and Zeng [JZ15]. It allows to derive an optimal non-commutative Stout-type LIL just as in [Zen15], from that martingale result we then deduce a non-commutative Hartman-Wintner type LIL for independent sequences of random variables.

22.
arXiv (CS.CV) 2026-06-11

Multi-View In-Cabin Monitoring System for Public Transport Vehicles

We introduce a multi-view in-cabin monitoring dataset for public transportation with synchronized RGB and depth images from four inward-facing cameras and a rotating LiDAR covering the vehicle interior of a digitalized and partly automated German city bus. The dataset contains 9.136 synchronized samples with annotations and is accompanied by a calibration and pseudo-labeling pipeline that generates 3D human pose estimates and oriented 3D bounding boxes for occupants. We further provide a nuScenes-format conversion and benchmark representative multi-view 3D detection models (e.g., Lift-Splat-Shoot and BEVFusion), supporting comparative evaluation and small-scale training of multi-view in-cabin perception models. The dataset and tools are available at https://github.com/EvgenyGorelik/multiview_incabin_dataset.

23.
arXiv (math.PR) 2026-06-16

Layerwise Terminal Discrepancy in Chen's Reverse-Heat Coupling on the Boolean Cube

arXiv:2606.04573v2 Announce Type: replace-cross Abstract: Recently, Chen [Chen2026] proved that Talagrand's Boolean convolution conjecture holds up to the dimension-free factor \((\log\log\eta)^{3/2}\), namely for every fixed \(\tau>0\), \[ \mu\{P_\tau f>\eta\|f\|_1\} \le C_\tau \frac{(\log\log\eta)^{3/2}}{\eta\sqrt{\log\eta}}, \qquad \eta>e^3. \] We revisit the terminal testing-discrepancy step in Chen's perturbed reverse-heat coupling. Chen estimates this discrepancy globally in terms of the remaining gap to the terminal level. We keep the same coupling and the same reverse-heat formulations, but localize the terminal discrepancy on each remaining-gap layer before summing the layers. This changes the fixed-time anti-concentration cost from order \((\log L)^{3/2}/\sqrt L\) to order \((\log L)/\sqrt L\), where \(L=\log\eta\). Consequently, we obtain a \((\log\log\eta)^{1/2}\) improvement as \[ \mu\{P_\tau f>\eta\|f\|_1\} \le C_\tau \frac{\log\log\eta}{\eta\sqrt{\log\eta}}, \qquad \eta>e^3. \]

24.
arXiv (CS.CL) 2026-06-12

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitation is the lack of flexibility to proactively investigate suspicious parts of a paper based on accumulated evidence, as human reviewers do. In this paper, we explore how to enable an LLM-based review agent to perform such proactive investigation. We find that this can be naturally formulated as a Markov Decision Process (MDP), and propose ProReviewer, a scientific peer review agent that proactively reviews a paper guided by a maintained, structured review log. The structured review log serves as a workspace for the agent to track evidence and intermediate findings collected during review. Experiments show that ProReviewer with an 8B backbone, trained by supervised fine-tuning and optimized by reinforcement learning, achieves the highest average score across five quality dimensions, outperforming prompt-based methods with much larger frontier LLMs by up to 39% and the strongest fine-tuned baseline by 16% relatively. It also attains the highest win rates against baselines in human evaluation.

25.
arXiv (CS.CL) 2026-06-19

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.