Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-24

RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring

Clinical value sets define the standardized terminology codes used in quality measurement, phenotyping, cohort construction, and clinical decision support. The recently introduced Retrieval-Augmented Set Completion (RASC) benchmark showed that direct zero-shot large language model (LLM) generation is poorly suited to this task: clinical code systems are large, version-controlled, and not reliably memorized by language models. We study a stage-wise alternative in which candidate-pool construction is optimized for recall and a constrained LLM adjudicator is optimized for candidate selection. On the full 3,744-value-set RASC test split, Qwen3-based retrieval with vocabulary-aware expansion and code-display rescue retrieval increases candidate-pool recall from the original RASC retrieval baseline of 0.553 to 0.730; on the held-out-publisher stratum, pool recall is 0.655. The higher-recall pool alone is not sufficient: applying the original SAPBert cross-encoder to this expanded pool gives full-test macro F1 of 0.287 and held-out-publisher macro F1 of 0.233. Replacing the stage-2 selector with blinded GPT-5 adjudication over the same pool increases full-test macro F1 to 0.549 and held-out-publisher macro F1 to 0.533. These results show that retrieval-constrained LLM adjudication can substantially improve value set completion while preserving the safety constraint that all returned codes must come from an auditable candidate pool.

02.
arXiv (CS.CL) 2026-06-12

Polar: A Benchmark for Evaluating Political Bias in LLMs

Political bias in large language models (LLMs) is increasingly significant, but difficult to measure reproducibly across political and linguistic contexts. We introduce Polar, a 4,026-instance multiple-choice benchmark that measures political bias through option-level likelihoods rather than prompt-based generation. Polar covers two ideological axes and eight issue categories derived from the Manifesto Project, and evaluates models in parallel across U.S. and South Korean political contexts. Across 38 LLMs, measured bias varies systematically with political context, issue category, model group, and presentation language. All models lean left-progressive on U.S. political content, but show more centered and mixed patterns on South Korean content. Translation experiments further show that presentation language alone can shift measured bias. These findings highlight the need for multilingual and cross-contextual evaluation of political bias in LLMs.

03.
arXiv (CS.AI) 2026-06-24

E-MRL: Cross-view Aligned Evidence-driven Multimodal Reinforcement Learning for Reliable 3D Tumor Analysis

arXiv:2606.23888v1 Announce Type: cross Abstract: While Vision-Language Models (VLMs) show great promise in volumetric medical report generation, they frequently suffer from visual hallucinations and a lack of grounding in 3D CT data. Current Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) strategies typically optimize text fidelity alone, essentially rewarding correct diagnoses derived from language priors rather than genuine visual perception. To address this, we propose cross-view aligned Evidence-driven Multimodal Reinforcement Learning (Evidence-MRL, noted as E-MRL), a reliable RL reasoning framework that formulates the generation process as a Markov Decision Process of "diagnosis-localization-verification". Unlike standard approaches, our model is explicitly trained to identify a "key evidence slice" alongside the global diagnostic report, grounding its findings in verifiable visual evidence. Crucially, we introduce a novel cross-view consistency reward, which validates the semantic alignment between the golden-standard report and a local visual re-query of the selected key slice, providing additional rewards for correctly-localized reasoning. Experiments on large-scale 3D CT tumor datasets demonstrate that E-MRL significantly reduces hallucinations and improves diagnostic accuracy compared to SFT and RL baselines, offering a clinically interpretable solution for visually-grounded and tumor analysis.

05.
arXiv (CS.CV) 2026-06-18

A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures

Visual Question Answering (VQA) in the Remote Sensing (RS) domain presents unique challenges due to the high resolution, multi scale object distribution, and semantic complexity of aerial imagery. While general domain Foundation Models have achieved remarkable success, their direct application to RSVQA is hindered by massive domain shifts and the computationally prohibitive nature of full fine tuning. This study presents a comparative analysis of RS Adapter, a Parameter Efficient Fine Tuning (PEFT) strategy, applied across three distinct Vision Language Model (VLM) architectures: the Dual Encoder CLIP, the Encoder Decoder BLIP, and the Hybrid FLAVA. We introduce a unified architectural surgery pipeline that injects lightweight bottleneck adapters into the attention and MLP layers of frozen backbones, enabling rapid adaptation with less than 5 percent of trainable parameters. Experimental results on the high resolution RSVQA x dataset demonstrate that while all adapted models achieve convergence, the Hybrid FLAVA architecture offers a superior balance of multimodal reasoning and retrieval capabilities compared to its unimodal counterparts. Our findings establish a new baseline for resource efficient VQA in disaster assessment and urban monitoring.

06.
arXiv (quant-ph) 2026-06-24

Uncovering Latent Structures in Robust Pulse Sequences: A Model-Based Reinforcement Learning Approach for Adaptable Quantum Control

arXiv:2606.24507v1 Announce Type: new Abstract: Real-time adaptive control of quantum systems requires rapid generation of robust, high-fidelity pulses across a continuous range of operating conditions. Standard optimization algorithms such as gradient-ascent pulse engineering (GRAPE) solve each instance independently, discarding information between runs and requiring costly reinitialization when parameters change. We present an approach to robust optimal quantum control based on model-based reinforcement learning, in which a single neural network – embedding the Hamiltonian directly into the training pipeline – generates robust gates across an entire family of gate configurations, without pre-computed training data. Demonstrated on a single-spin (two-level) system, the trained networks produce pulses for arbitrary rotation angles over a range of pulse durations, detunings, and field inhomogeneities in milliseconds, at fidelities comparable to multi-seed GRAPE. The framework is inherently adaptable: any parameter entering the Hamiltonian can serve as a network input, extending the approach to different systems and control settings. Beyond speed, the network reveals structure in the control landscape: it discovers the same structured phase profiles that appear in GRAPE solutions – made identifiable through fidelity-invariant symmetry transformations – but more consistently than independent optimization. This consistency enables smooth interpolation across the entire trained parameter space.

07.
arXiv (CS.CV) 2026-06-16

Cross-Modal Registration Between 3D and 2D Fingerprints via Pose-Aware Unwrapping and Point-Cloud Fusion

Three-dimensional (3D) fingerprints preserve global finger geometry and local ridge structure while avoiding contact-induced deformation, but they remain difficult to integrate with legacy two-dimensional (2D) fingerprint systems. This paper addresses the intermediate stage between 3D acquisition and cross-modal matching, and presents a unified framework for 3D fingerprint preprocessing and registration across contactless and contact-based 2D modalities. The framework combines four components: 1) a nonparametric visualization and unwrapping method that converts a 3D fingerprint point cloud into a rolled-equivalent 2D representation without relying on a global finger-shape model; 2) a point-cloud fusion pipeline that registers and mosaics multiple partial 3D captures into a more complete fingerprint model; 3) an ellipse-based pose normalization method for canonical finger alignment; and 4) a pose-aware cross-modal registration strategy that improves compatibility between 3D fingerprints and both contactless and contact-based 2D fingerprints. Experiments on a self-collected multimodal fingerprint database containing 150 fingers show that the proposed framework achieves ridge-level 3D registration accuracy, robust pose estimation, and consistent gains in 2D compatibility. In particular, the 3D fusion error is concentrated around 0.09 mm, contactless 2D–3D registration reaches ridge-scale projection accuracy, and pose-aware unwrapping improves genuine matching scores relative to generic 3D unwrapping. These results support the use of 3D fingerprints as an effective geometric bridge across heterogeneous fingerprint modalities. The baseline implementation has been publicly released at https://github.com/XiongjunGuan/3DFpVisual.

08.
bioRxiv (Bioinfo) 2026-06-22

HTS-Oracle X: AI-Guided Prospective Discovery of Small Molecule Immune Checkpoint Binders

Targeting immune checkpoint protein-protein interactions (PPIs) using small molecules remains limited by the shallow, featureless binding surfaces of co-stimulatory and co-inhibitory receptors and the characteristically low hit rates of conventional high-throughput screening against these interfaces. Here we report HTS-Oracle X, a multimodal deep learning platform that integrates bidirectional cross-attention fusion of ChemBERTa SMILES embeddings with extended RDKit descriptors, trains on continuous biophysical binding signals rather than binary labels, and employs Monte Carlo Dropout uncertainty quantification for uncertainty-adjusted compound selection. Trained on 45,760 Dianthus TRIC-screened compounds per target under scaffold-aware cross-validation, HTS-Oracle X was applied prospectively to a 100,160-compound Enamine library against CD28, TIM-3, and VISTA. From 150 model-selected compounds, 45 dose-response confirmed binders were identified (30.0% overall hit rate), yielding enrichment factors of 234-408x over experimentally established random prospective baselines and 16 sub-micromolar hits. The top hits, HX-CD28-1 (KD = 233 nM), HX-TIM3-1 (KD = 249 nM), and HX-VISTA-1 (KD = 345 nM), demonstrated on-target functional activity in immune cell and tumor co-culture assays. HTS-Oracle X represents a scalable AI-guided framework for small molecule discovery against non-enzymatic immune checkpoint targets.

09.
arXiv (CS.CV) 2026-06-18

Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin

Accurate, spatially explicit characterization of tropical forest structure is essential for carbon accounting and ecosystem monitoring, yet most ML pipelines predict canopy-top height proxies (e.g., RH95/RH98) or AGBD as separate scalar targets, rather than learning the forest vertical structure as an ordered profile. The community lacks a ML-ready multimodal benchmark for predicting the entire GEDI RH profile jointly with AGBD, or for evaluating methods that enforce physically consistent ordering across RH percentiles. We address this with Biomazon, a 20 m multimodal benchmark dataset over the Amazon Basin that pairs GEDI RH and AGBD targets with multi-sensor predictors (Sentinel-1/2, ALOS-2 PALSAR-2, Copernicus DEM, Dynamic World LULC, and AlphaEarth embeddings) under standardized spatial splits and evaluation protocols. Using a shared encoder-decoder with task-specific heads as a baseline framework, we conduct a comprehensive ablation study of (i) backbone/model scale, (ii) modality contributions, and (iii) the use of auxiliary embeddings under standalone and fusion settings, and we report both single-target and joint-target results to quantify tradeoffs under a unified training protocol. Finally, we contextualize baseline performance through regionally aligned comparisons against existing gridded products, including GEDI L4D RH10-RH98 and AGBD, at matching temporal scale. Biomazon, together with the accompanying protocols and baseline results, establishes a reference benchmark for future work on structurally consistent RH-profile prediction and structure-biomass modeling in tropical forests.

10.
arXiv (CS.AI) 2026-06-16

Parallelizing Tool Execution and LLM Generation for Low-Latency Agent Serving

arXiv:2603.18897v2 Announce Type: replace-cross Abstract: LLM-powered agents execute tasks through a sequential loop of model generation and tool execution. Today's serving systems serialize this loop, leaving tool latency exposed on the task critical path. This paper presents PASTE, a tool-aware agent-serving system that predicts concrete future tool invocations from recurring agent patterns and executes them speculatively while the LLM is still generating. PASTE isolates speculative results until confirmed by the LLM and jointly schedules tool execution and returning LLM sessions to avoid shifting bottlenecks to the GPU. Across deep research, coding, and scientific-agent workloads, PASTE reduces average task completion time by 43.5% and lowers observed tool latency by 1.8x.

11.
arXiv (CS.AI) 2026-06-18

TMR-GGNN: Credit Card Fraud Detection based on Time-Aware Multi-Relational Guided Graph Neural Network

arXiv:2606.18444v1 Announce Type: cross Abstract: In recent years, credit card fraud detection has faced significant challenges due to highly imbalanced data, evolving fraud patterns, and complex relational structures among transaction entities. To address these issues, this research proposes a novel framework called Timeaware Multi Relational Guided Graph Neural Network (TMR GGNN). Particularly, the proposed TMR GGNN extends the encoder decoder Graph Neural Network GNN architecture by modeling heterogeneous interactions across customers, merchants, devices, and IPs over temporal windows. Subsequently, the proposed TMR GGNN approach constructs a dynamic, multi relational graph and incorporates a time aware relational attention mechanism within the encoder to adaptively weigh the transaction relevance based on temporal proximity and semantic context. Consequently, the decoder employs a contrastive learning module to distinguish between real and synthesized transaction patterns, while improving the models generalization of rare fraud cases. Additionally, to effectively manage severe class imbalances and emphasize discriminative learning, a composite loss function combining Information Noise Contrastive Estimation (InfoNCE) based contrastive loss with Focal Loss is introduced. This integration assists in improving fraud identification while mitigating false negatives.

12.
arXiv (CS.AI) 2026-06-11

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

arXiv:2602.08986v2 Announce Type: replace-cross Abstract: In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. This difficulty partly arises from the natural rarity of certain classes (or hierarchical nodes) and the hierarchical constraint that ensures child nodes are almost always less frequent than their parents. To address this, we propose a weighted loss objective for neural networks that combines node-wise imbalance weighting with focal weighting components, the latter leveraging modern quantification of ensemble uncertainties. By emphasizing rare nodes rather than rare observations (data points), and focusing on uncertain nodes for each model output distribution during training, we observe improvements in recall by up to a factor of five on benchmark datasets, along with statistically significant gains in $F_{1}$ score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.

13.
arXiv (CS.CL) 2026-06-16

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Advanced agents are increasingly demonstrating the potential to operate as autonomous engineers, creating a growing demand for evaluation benchmarks that capture the complexity of real-world development. Such environments typically involve both complex code and large-scale data (i.e., file system). However, existing benchmarks usually evaluate code-centric or data-centric capabilities in isolation, leaving a clear gap with real development scenarios. In this paper, we bridge this gap by introducing CODA-BENCH, the first benchmark to jointly evaluate code and data intelligence in a data-intensive environment. We construct a data-intensive Linux sandbox based on the Kaggle ecosystem (containing hundreds of datasets), where agents must actively explore complex file hierarchies to identify relevant resources and generate code for data-driven analytical tasks. CODA-BENCH comprises 1,009 tasks spanning 31 communities, with each task environment containing an average of 980 files, simulating realistic data scale and noise. Evaluations of advanced agents reveal that even top-performing systems struggle to effectively integrate data discovery with code execution, achieving a success rate of only 61.1%. These results highlight a substantial gap in current agentic capabilities for data-intensive tasks and point to promising directions for future research.

14.
arXiv (CS.CV) 2026-06-19

U$^2$Mamba: A Two-level Nested U-structure Mamba for Salient Object Detection

Mamba-based models have emerged as a promising alternative for salient object detection (SOD), offering significant advantages in modeling long sequences. However, existing models often fail to explore contextual information and the depth of the entire architecture. This paper introduces U$^2$Mamba, a powerful and innovative U-structured network for salient object detection. We propose multiscale Mamba U-blocks (MMUBs) that enhance the model depth to improve local feature extraction capabilities. Our newly developed nested U-structure, incorporating MMUBs, enables the network to integrate various receptive fields from shallow and deep layers, thereby collecting richer contextual information and longer-range data without being constrained by resolution. Instead of using the traditional deep supervision scheme and top-level supervised training, we propose a hierarchical training supervision method where the loss is computed at each level during the training process. Extensive experiments demonstrate that U$^2$Mamba achieves highly competitive performance against state-of-the-art methods. The source code is available at \url{https://github.com/JL021/U2Mamba}.

15.
arXiv (CS.AI) 2026-06-16

The Proxy Knows Too Much: Sealing LLM API Routers with Attested TEEs

arXiv:2606.16358v1 Announce Type: cross Abstract: Agents increasingly access large language models (LLMs) through API routers. A router terminates the client's transport-layer security session and opens a separate upstream session, so it holds the full interaction in plaintext. This makes the router an application-layer man-in-the-middle: it can rewrite agent tool calls, swap dependencies for typosquatted packages, trigger attacks only under audit-evading conditions, and passively exfiltrate secrets. Existing client-side defenses are evadable. We propose AEGIS, a provider-transparent attested API router whose data path is a client-verified faithful passthrough. AEGISconfines plaintext handling to a small hardware-enclave component while leaving authentication, scheduling, accounting, and management on the untrusted host. The client verifies the enclave before releasing plaintext. The host can neither read nor alter the interaction, and plaintext leaves only toward destinations fixed by the measured image. We show that all four malicious-router attack classes succeed against a plaintext-access baseline and are blocked by AEGIS, including adaptive tests against the same boundary. The trusted path is $851$ lines, carries three provider-native APIs without conversion, and completes every request under real-provider workload and concurrency. In a seeded audit pilot, two commodity coding agents find eight and ten of ten planted invariant violations. The local relay overhead is about six milliseconds per request.

16.
arXiv (CS.LG) 2026-06-17

Constrained Diffusion Models with Primal-Dual Inference

arXiv:2606.17192v1 Announce Type: new Abstract: This paper develops constrained diffusion models with primal-dual inference (PDI) to sample from optimal distributions of entropy-regularized optimization problems with average constraints. We formalize constrained sampling in the Lagrangian dual domain, where the optimal distribution takes the form of a Gibbs distribution indexed by the optimal dual variable. Rather than estimating this dual multiplier before sampling and freezing it throughout generation, PDI jointly infers the optimal primal distribution and its parametrizing dual variable. Each reverse diffusion step denoises using the score field associated with the current multiplier and then updates the multiplier through dual ascent using the estimated constraint violation of the denoised samples. To enable this conditional score field, we train a single dual-conditioned score network over the family of Gibbs distributions induced by the dual variables encountered during inference. We prove that the time average of the dual variables generated along the inference trajectory converges to a neighborhood of the dual optimum and bound the effect of residual dual mismatch on the terminal distribution through schedule-dependent stability factors. We evaluate PDI on constrained sampling from a mixture of Gaussians, wireless resource allocation, and portfolio management.

17.
arXiv (CS.LG) 2026-06-17

Finsler Geometry, Graph Neural Networks, and You

arXiv:2606.17185v1 Announce Type: new Abstract: Graph neural network architectures based on the graph Laplacian approximate the Laplace-Beltrami operator, thus limiting their application to isotropic operators. As a nonlinear alternative to the Laplace-Beltrami operator, we consider estimates of the Finsler Laplacian on point clouds sampled from a manifold. We prove that these discrete estimates converge to the true operator on the manifold as the number of point samples grows. Moreover, we show that this operator can be expressed as a graph neural network layer, which we use to define a family of Finslerian graph neural networks constrained to express Finsler geometry. We show that Finslerian graph neural networks recover the geometry underlying nonlinear diffusion equations in practice.

18.
PLOS Computational Biology 2026-06-11

A zero-parameter first-principles gate framework for full-length TP53 missense variant interpretation

by Masamichi Iizumi Missense variant interpretation often achieves useful predictive performance but remains mechanistically opaque, particularly in proteins that combine structured domains with intrinsically disordered regions (IDRs). We developed Gate & Channel, a zero-parameter, first-principles framework for full-length TP53 missense variant analysis in which each prediction is generated by explicit IF-THEN gates derived from physicochemistry, geometry, structural constraints, and polymer physics rather than fitted weights. Variants are evaluated across independent channels representing distinct physical failure modes; a variant is predicted disruptive if any gate closes. A second hierarchical layer (“Geta”) encodes physically grounded post-closure exceptions, allowing sensitivity and specificity to be improved on disjoint variant populations. The v18 framework consists of 12 channels and 2 Getas spanning structured domains and IDRs, capturing DNA-contact disruption, Zn coordination, burial-dependent packing, secondary-structure compatibility, post-translational modification chemistry, short linear motif disruption (including a multi-partner coupled-folding face), proline-directed kinase recognition, and IDR-specific proline and glycine backbone constraints. Across 1,369 TP53 missense variants, the framework achieved 84.5% sensitivity and 89.1% positive predictive value, with 90.9% sensitivity preserved in the DNA-binding core and all 9/9 hotspot mutations captured. A post hoc audit of discordant IDR calls indicated that many apparent false positives had plausible molecular rationales, consistent with a distinction between molecular mechanism disruption and clinical penetrance. Applied to KRAS, TDP-43, and BRCA1, the same channels capture the dominant pathogenic mechanisms in each protein as a proof of principle, while residual missed variants name specific gates yet to be written. The framework is distributed as the open-source Python package pathogenicity-gates (v0.5.1, MIT). These results show that a substantial fraction of full-length TP53 missense variation can be resolved through explicit, auditable physical gates that carry meaning beyond TP53, with each remaining failure naming the next rule to be written.

19.
arXiv (quant-ph) 2026-06-19

Attosecond Path Qubits in High-Harmonic Generation: Classical Dephasing and Trace-Out Decoherence

arXiv:2606.20372v1 Announce Type: cross Abstract: High-harmonic generation (HHG) is governed by interference between electron trajectories. We propose that the dominant short and long trajectories define an experimentally addressable two-level subsystem: an attosecond path qubit (APQ). We formulate a trajectory-resolved density matrix to identify two distinct coherence-loss mechanisms: classical dephasing from ensemble averaging and quantum decoherence arising from the trace-out of unobserved degrees of freedom. By investigating shot-to-shot fluctuations and unresolved transverse momentum, we demonstrate that while dephasing suppresses coherence through averaging, the ``trace-out'' channel produces mixed states even for fixed driving parameters. We explore how these mechanisms modify APQ purity and show that mode selection and conditioning provide operational routes to isolate them. These results establish a reduced-state framework for diagnosing coherence loss in HHG and for engineering trajectory-based quantum states in attosecond interferometry.

20.
arXiv (CS.CL) 2026-06-25

ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure

Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed Self-Compression: when multiple independent and answerable questions are presented within a single prompt, the model spontaneously produces shorter reasoning traces for each question. This phenomenon arises from multi-question contextual pressure during generation and consistently manifests across models and benchmarks. Building on this observation, we propose ConPress (Learning from Contextual Pressure), a lightweight self-supervised fine-tuning approach. ConPress constructs multi-question prompts to induce self-compression, samples the resulting model outputs, and parses and filters per-question traces to obtain concise yet correct reasoning trajectories. These trajectories are directly used for supervised fine-tuning, internalizing compressed reasoning behavior in single-question settings without external teachers, manual pruning, or reinforcement learning. With only 8k fine-tuning examples, ConPress reduces reasoning token usage by 59% on MATH500 and 33% on AIME25, while maintaining competitive accuracy.

21.
arXiv (CS.LG) 2026-06-16

Machine Learning-Driven Chemical Reactor Network Modeling of the Sandia-D Flame

arXiv:2606.14729v1 Announce Type: cross Abstract: Turbulent combustion simulations are crucial for many scientific and engineering systems. However, the high cost to fully resolve the complex multiscale and multiphysics behavior makes direct simulation typically infeasible. The equivalent reactor network (ERN) approach attempts to improve computational efficiency by replacing a multidimensional turbulent simulation with a series of much cheaper 0-D and 1-D chemical reactors, providing a surrogate model that retains detailed chemistry at the cost of simplified flow physics. However, their development remains a challenge, often requiring either expert analysis, or automated approaches that sacrifice accuracy. In this work, we develop an automated machine-learning-assisted framework for constructing ERNs of the Sandia-D turbulent methane/air flame. Principal component analysis is first used to reduce high-dimensional thermochemical computational fluid dynamics (CFD) data to a low-dimensional latent space, where k-means clustering identifies physically interpretable flame regions used to initialize a reactor-network graph. This initialization is then refined using finite-difference gradient descent wrapped around non-differentiable Cantera reactor simulations. Across 30 RANS simulations spanning a range of pilot temperatures and inlet methane compositions, the optimized 7-reactor ERN achieves a maximum-temperature $R^2$ score of 0.7945 while preserving a $\sim6000\times$ speedup over the CFD solver. Outlet CO prediction remains more challenging, with a final $R^2$ score of $-0.4183$, but improves substantially from the unoptimized clustering initialization. These results show that unsupervised thermochemical feature extraction can provide effective physics-informed initializations for ERN construction, while gradient-based refinement can significantly improve predictive accuracy without manual reactor-network design.

22.
arXiv (CS.AI) 2026-06-16

Token Reduction Should Go Beyond Efficiency in Generative Models – From Vision, Language to Multimodality

arXiv:2505.18227v4 Announce Type: replace-cross Abstract: In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, agentic framework design, and broader ML and scientific domains.

23.
medRxiv (Medicine) 2026-06-22

Development and validation of a risk prediction algorithm to estimate all-cause mortality among community-dwelling Canadians: the Mortality Population Risk Tool (MPoRT)

BACKGROUND: The risk of all-cause mortality can inform decision-making for chronic disease prevention. We developed a predictive algorithm to estimate the 5-year risk of death among community-dwelling adults. METHODS: We derived and validated the Mortality Population Risk Tool (MPoRT) using data from population health surveys in Canada (the Canadian Community Health Survey) and the United States (the National Health Interview Survey), survey years 2001 to 2011, linked to vital statistics. The outcome was death within five years of the survey response. The algorithm was developed using data from Ontario respondents using a Cox proportional hazards model, then modified and re-estimated to allow cross-national assessment in Canada and the United States. Twenty-three prespecified predictors were assessed: seven sociodemographic, six behavioural, and ten general health and chronic disease. RESULTS: 527,369 respondents aged 20 to 105 years were included in the Canadian and United States development and validation cohorts, with 43,758 deaths during 3.68 million person-years follow-up. The final sex-specific MPoRT algorithms each contained 21 variables, showing strong discrimination (C-statistic: females 0.874 [0.871–0.877]; males 0.867 [0.865–0.871]) and good calibration overall and in 246 of 247 subgroups. Discrimination was modestly attenuated (0.01 decrease in C-statistic) in cross-national validation between Canada and the United States, with good calibration across all 71 subgroups. INTERPRETATION: MPoRT accurately discriminated all-cause mortality using only self-reported data, enabling broad application without clinical measures. While validation outside North America is needed to confirm broader applicability, MPoRT is designed for straightforward recalibration using routinely available national mortality data. This supports targeted chronic disease prevention strategies at both the population and individual levels, though the limitations inherent to self-reported predictors should be considered when interpreting predictions.

24.
arXiv (CS.CL) 2026-06-16

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility – yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position – a content-free artifact – shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.

25.
arXiv (CS.AI) 2026-06-12

Brick: Spatial Capability Routing for the Mixture-of-Models (MoM) Paradigm

arXiv:2606.13241v1 Announce Type: new Abstract: Defining query difficulty is one of the hardest problems in deployment engineering. Existing LLM routers rely on surface features such as domain labels, keywords, and token count, ignoring the within-domain variance that actually determines model success. Frontier models cost ten to one hundred times more than local open-weight models, so at production scale even small per-request savings become a direct cloud-bill lever. We present Brick, a multimodal router that scores each model on six capability dimensions, combines this with a per-query difficulty estimate, and dispatches via a cost-penalized geometric rule. A continuous preference knob lets operators slide between max-quality and max-saving profiles at deploy time. On a benchmark of 5,504 queries, Brick at max-quality reaches 76.98% accuracy, beating the best single model (75.02%) and all tested routers. At a neutral cost-quality profile, Brick achieves 74.11% accuracy at 4.71x lower cost than always using the strongest model. At min-cost, it cuts cost 22.15x with 11.85 points accuracy loss. Median latency drops from 51.2s to 22.8s.