Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-12

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

02.
arXiv (CS.AI) 2026-06-19

SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

arXiv:2606.19755v1 Announce Type: cross Abstract: Speculative inference accelerates large language model (LLM) decoding but provides no inherent safety guarantees. Existing safety defenses are largely incompatible with speculative inference: they either introduce additional computation or disrupt the draft-verify mechanism, negating acceleration benefits. This reveals a fundamental incompatibility between current safety methods and speculative decoding. We propose SafeSpec, a safety-aware speculative inference framework that integrates risk estimation directly into the verification process. SafeSpec attaches a lightweight latent safety head to the target model to jointly evaluate semantic validity and safety in a single forward pass. When unsafe generations are detected, SafeSpec applies rollback and safety-guided reflective multi-sampling to recover safe continuations rather than terminating generation. We model jailbreak attacks as distributional shifts over generative trajectories, where adversarial prompts increase the probability of harmful continuations without eliminating safe ones. Under this model, SafeSpec performs risk-aware trajectory recovery within the speculative decoding process. Across multiple models and adversarial benchmarks, SafeSpec achieves a substantially improved safety-efficiency trade-off. On Qwen3-32B, SafeSpec reduces attack success rates by 15% while preserving a 2.06x inference speedup on benign workloads, demonstrating that speculative acceleration and inference-time safety can be jointly optimized.

03.
arXiv (CS.CV) 2026-06-16

MMLongEmbed: Benchmarking Multimodal Embedding Models in Long-Context Scenarios

Recent advancements have significantly expanded the theoretical context windows of Multimodal Embedding Models (MEMs). However, larger context windows do not necessarily translate into effective comprehension and representation of long-context multimodal inputs, which remains a critical bottleneck for real-world deployment. To address the lack of systematic evaluation in this setting, we introduce MMLongEmbed, the first comprehensive benchmark for evaluating MEMs in long-context scenarios. MMLongEmbed comprises four retrieval tasks spanning multiple context-length ranges, covering text, document, and video modalities. Through extensive evaluation of state-of-the-art models, we find that current architectures rely heavily on superficial feature matching and struggle to capture deep semantic and structural dependencies. We further observe that performance degradation varies systematically with context length and key information placement. Moreover, models exhibit substantially different robustness to redundant contextual information across modalities. For reproducibility, the benchmark and code are publicly available.

04.
arXiv (math.PR) 2026-06-19

Model-independent upper bounds for the prices of Bermudan options with convex payoffs

arXiv:2503.13328v3 Announce Type: replace-cross Abstract: Suppose $\mu$ and $\nu$ are probability measures on $\mathbb{R}$ satisfying $\mu \leq_{cx} \nu$. Let $a$ and $b$ be convex functions on $\mathbb{R}$ with $a \geq b \geq 0$. We are interested in finding $$\sup_{\mathbf{M}} \sup_{\tau} \mathbb{E}^{\mathbf{M}} \left[ a(X) I_{ \{ \tau = 1 \} } + b(Y) I_{ \{ \tau = 2 \} } \right] $$ where the first supremum is taken over consistent models $\mathbf{M}$ (i.e., filtered probability spaces $(\Omega, \mathbf{F}, \mathbb{F}, \mathbb{P})$ such that $Z=(z,Z_1,Z_2)=(\int_{\mathbb{R}} x \mu(dx) = \int_{\mathbb{R}} y \nu(dy), X, Y)$ is a $(\mathbb{F},\mathbb{P})$ martingale, where $X$ has law $\mu$ and $Y$ has law $\nu$ under $\mathbb{P}$) and $\tau$ in the second supremum is a $(\mathbb{F},\mathbb{P})$-stopping time taking values in $\{1,2\}$. Our contributions are first to characterise and simplify the dual problem, and second to completely solve the problem under some structural assumptions on the measures $\mu$ and $\nu$ (namely that $\mu$ and $\nu$ are absolutely continuous probability measures that satisfy the Dispersion Assumption). A key finding is that the canonical set-up in which the filtration is that generated by $Z$ is not rich enough to define an optimal model and additional randomisation is required. This holds even though the marginal laws $\mu$ and $\nu$ are atom-free. The problem has an interpretation of finding the robust, or model-free, no-arbitrage bound on the price of a Bermudan option with two possible exercise dates, given the prices of co-maturing European options.

05.
arXiv (CS.LG) 2026-06-16

Repeated Bilateral Trade: The Quest for Fairness

arXiv:2606.15369v1 Announce Type: new Abstract: We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the price. Rather than maximizing only the gain from trade, we consider platforms that seek balanced divisions of the generated surplus. We show that natural fairness desiderata lead to a one-parameter Rawls-to-Nash family of fair-gain objectives, obtained by aggregating the seller's and buyer's net gains through nonpositive Hölder means. Unlike the standard gain-from-trade objective and the Rawlsian fair-gain objective studied in prior work, our proposed objectives induce a new statistical structure in which expected rewards are recovered from threshold feedback through a two-dimensional singular-kernel integral identity. This leads to a nonstandard pure-exploration problem whose natural estimators are rectangular double sums with row-column dependence and singular weights. Assuming independent i.i.d. seller and buyer valuation sequences with arbitrary unknown marginals, we characterize the optimal learning rates for the whole Rawls-to-Nash family of fair-gain objectives, giving matching fixed-confidence sample-complexity and regret bounds up to polylogarithmic factors.

06.
arXiv (CS.CV) 2026-06-16

City landscape in sight: A crowdsourced framework for unlocking urban-scale window view perceptions from real estate imagery

City landscapes viewed through home windows influence quality of life, yet perceptions of actual window views at the urban scale remain understudied. This study presents an approach for large-scale mapping of perceptions using 12,334 window view images (WVIs) collected from actual residential properties listed on real estate platforms in Wuhan, China, representing a rarely explored form of urban view imagery that offers advantages over the rendered or simulated window views commonly examined in previous studies. Through a non-immersive virtual reality platform, we collected 27,477 pairwise comparisons across six perceptual dimensions (e.g.\ Vivid) from 304 participants based on 499 WVIs. A hybrid neural network model was trained to predict human perceptions of all crowdsourced WVIs and map their spatial distribution. Results reveal significant spatial autocorrelation with distinct hot and cold spots across the whole city. Floor level strongly influences human perceptions: while higher floors offer more preferred and extensive window views, lower-floor windows provide residents with quiet and vivid views. An inference model further shows that window view composition matters considerably: high ratios of sky, trees, and low-rise buildings enhance people's preferences and perceptions of vividness, whereas high ratios of high-rise buildings increase perceptions of monotony and oppression. Importantly, these effects are non-linear: the excessive presence of certain elements can alter their impact on human perception. This work advances urban-scale understanding of residents' visual experiences and provides evidence-based guidance for human-centric urban planning and real estate to optimise visual landscapes from windows.

07.
bioRxiv (Bioinfo) 2026-06-12

Generalisable tissue-wide molecular reconstruction from histology

Spatial transcriptomics technologies measure gene expression within intact tissues but remain difficult to scale across large tissue sections and patient cohorts. Consequently, many studies rely on tissue microarrays (TMAs) or sparse spatial profiling designs, where molecular measurements are available for only limited tissue regions and are often generated using heterogeneous gene panels. Existing H&E to spatial gene expression prediction methods remain challenged by sparse molecular measurements, partially overlapping gene panels and tissue-wide reconstruction across heterogeneous spatial datasets. Here, we present GHIST+, a framework for tissue-wide reconstruction of single-cell molecular states from H&E histology. GHIST+ integrates cellular morphology, local tissue context and shared tissue representations to extend sparse molecular measurements into tissue-wide molecular maps across heterogeneous spatial datasets. Across multiple cancer types and GTEx breast tissues, GHIST+ reconstructs biologically meaningful tissue-wide molecular organisation from sparse TMA-derived measurements while preserving spatial tissue structure, cell-type organisation and age-associated tissue states across cancer and non-cancer settings. GHIST+ establishes a scalable framework for transforming sparse spatial profiling experiments into tissue-wide molecular maps, enabling cohort-scale molecular reconstruction from routine histology under heterogeneous spatial transcriptomic settings.

08.
arXiv (quant-ph) 2026-06-16

Programmable Gauge-Field Textures with Ultracold Atoms in Momentum Space

arXiv:2606.15124v1 Announce Type: cross Abstract: Synthetic gauge fields with ultracold atoms offer a route to quantum matter in which electromagnetic environments can be designed rather than merely imposed. While the Harper-Hofstadter model has been realized in several cold-atom systems, existing implementations are largely limited to spatially uniform magnetic fluxes. Here we experimentally realize a highly programmable two-dimensional momentum-state lattice of ultracold atoms with local control over the Peierls phase pattern, enabling direct implementation of Harper-Hofstadter Hamiltonians with tunable and spatially structured synthetic gauge fields. We observe a crossover from ballistic to strongly flux-modified bulk dynamics with suppressed transport. By introducing a synthetic electric field through site-dependent energy gradients, we further demonstrate Hall-type transverse drift arising from the interplay between electric and magnetic fields. In addition, we engineer a synthetic flux domain wall separating regions with opposite magnetic fluxes and observe anisotropic propagation guided along the interface. These results move cold-atom gauge-field engineering from uniform magnetic backgrounds toward designer gauge textures, providing an experimental setting for transport across programmable topological interfaces.

09.
arXiv (CS.AI) 2026-06-18

SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

arXiv:2606.18936v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce SciRisk-Bench, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling fine-grained diagnosis of where scientific models remain unsafe.

10.
bioRxiv (Bioinfo) 2026-06-18

segSHAPE: RNA secondary structure prediction from nanopore direct RNA sequencing

RNAs adopt complex structures that regulate key biological processes, making accurate structure prediction essential. Chemical probing coupled with Nanopore direct RNA sequencing (DRS) offers a route to single-molecule structural inference, but current tools are limited by inaccurate signal-to-sequence alignment, which degrades modification-rate estimation and downstream structure prediction. Here we introduce segSHAPE for RNA secondary structure prediction from Nanopore DRS data (both RNA002 and RNA004 chemistries), a probe-agnostic framework that improves signal alignment using prior information of basecalling and per-read signal baseline shift correction, learns position-specific k-mer raw signal parameters, and estimates per-nucleotide modification rates with an unsupervised anomaly detector. On three public RNA002 DRS datasets spanning different chemical probes (AcIm, NAI-N3) and RNAs from 421 to 1552 nt, segSHAPE achieves the highest F1 score and Matthews correlation coefficient (MCC) on all RNAs, exceeding the strongest baseline by 3.4 to 5.8 percentage points in MCC. It additionally captures the ligand-induced conformational change of the thiamine pyrophosphate (TPP) riboswitch RNA directly from RNA002 DRS data using the DEPC probe. On a public RNA004 DRS dataset, segSHAPE improves over the sm-PORE-cupine baseline by 17 ROC-AUC points in modification rate estimation and by 6.7 MCC points in structure prediction. These results establish segSHAPE as a unified, probe-agnostic pipeline for RNA structure prediction from Nanopore DRS data.

11.
arXiv (CS.CV) 2026-06-17

HLS-GPT: A Generative Pretrained Transformer (GPT) for Continental-Scale NASA Harmonized Landsat and Sentinel-2 (HLS) Reflectance Reconstruction Across All Bands on Arbitrary Dates

Recent deep learning methods for Landsat and Sentinel-2 reflectance time series reconstruction remain limited by restricted spectral coverage, limited geographic scalability, or patch-based designs with short temporal contexts. We present HLS-GPT, a large-scale generative pretrained Transformer model for reconstructing NASA Harmonized Landsat Sentinel-2 30 m surface reflectance for all bands, any date, and any pixel location. HLS-GPT uses a hierarchical Transformer architecture to handle the different spectral band configurations of Landsat and Sentinel-2 and operates on single-pixel 12-month time series. To capture geographic and seasonal variability, the model was trained with nine years of HLS time series from more than 0.25 million training pixels across the conterminous United States. A random cropping and masking strategy extracts 12-month periods with varying start dates across epochs, masks 50% of valid observations, and trains the model to reconstruct the masked reflectance values from the remaining observations. Evaluation using more than 62,000 independent test pixels shows robust reconstruction under diverse land surface conditions, including complex crop phenology and sparse, irregular observations. Leave-one-observation-out evaluation achieved reconstruction RMSE below 0.026 for all HLS spectral bands, with relative RMSE below 35% for visible bands and below 13% for other bands. Red-edge band errors were comparable to red and near-infrared errors despite the absence of red-edge bands on Landsat. Sensitivity analyses that randomly masked 10% to 90% of test observations showed only modest degradation when 10% to 50% of observations were masked, with all-band RMSE below 0.028. Image reconstruction over nine independent 109 by 109 km CONUS HLS tiles further demonstrates that HLS-GPT outperforms two conventional methods and the NASA-IBM Prithvi model.

12.
arXiv (CS.CV) 2026-06-11

Frozen Multimodal Embeddings for Personality and Cognitive Ability Assessment in Asynchronous Video Interviews

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging multimodal learning problem because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1\% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

13.
arXiv (CS.AI) 2026-06-12

Valid Inference with Synthetic Data via Task Exchangeability

arXiv:2606.13629v1 Announce Type: cross Abstract: There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on "LLM-as-a-judge" outputs; and proteomics research is accelerated by generative models that produce synthetic protein structures. These developments raise an intriguing possibility: synthetic data may help researchers ask more questions, run more studies, and accelerate discovery. But they also raise a fundamental concern: synthetic data can be biased, noisy, and misspecified. In this work, we propose statistical principles for using synthetic data in scientific research with provable validity guarantees. The key insight is a new technical condition that we call task exchangeability. Informally, this is a requirement that the researcher can identify historical tasks, for which real data is available, such that their current task of interest is exchangeable with the historical tasks in an appropriate mathematical sense. We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability. We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.

14.
arXiv (math.PR) 2026-06-11

Matrix Discrepancy for Representations of Finite Groups

arXiv:2606.12181v1 Announce Type: new Abstract: Given a finite group $G$, we prove that there exist signs $\varepsilon\in\{\pm1\}^G$ such that $$\left\| \sum_{g\in G} \varepsilon_g\rho(g) \right\|\leq C\, \sqrt{|G|},$$ where $\rho$ is the left regular representation of $G$, and $C$ is a universal constant. This special case of the Matrix Spencer conjecture was posed in [BKMZ24], where it was established for simple groups.

15.
arXiv (CS.LG) 2026-06-19

A Unified Perspective on the Dynamics of Deep Transformers

arXiv:2501.18322v2 Announce Type: replace Abstract: Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the iterative application of attention across layers induces complex dynamics that remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. Our first set of contributions focuses on compactly supported initial data. We show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system, thus generalizing and extending previous analysis to several variants of self-attention: multi-head attention, L2 attention, Sinkhorn attention, Sigmoid attention, and masked attention–leveraging a conditional Wasserstein framework. In a second set of contributions, we are the first to study non-compactly supported initial conditions, by focusing on Gaussian initial data. Again for different types of attention, we show that the Transformer PDE preserves the space of Gaussian measures, which allows us to analyze the Gaussian case theoretically and numerically to identify typical behaviors. This Gaussian analysis captures the evolution of data anisotropy through a deep Transformer. In particular, we highlight a clustering phenomenon that parallels previous results in the non-normalized discrete case.

16.
arXiv (CS.LG) 2026-06-19

Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation

arXiv:2606.20364v1 Announce Type: new Abstract: A companion study established a de-biased, cross-model VLM-as-3D-judge that reliably ranks single-image-to-3D mesh quality where cheap geometry and CLIP proxies fall short. This paper asks: can that judge's preferences specialize a strong open generator, TRELLIS, on one asset class (furniture), cheaply and without human labels? Taking the judge from ranking to optimization is where the work lives. Pushing a VLM judge into the training and evaluation loop exposes failure modes ranking never triggered, so our contribution is an optimization-grade hardening of the judge: a training judge (Qwen2.5-VL-7B) held distinct from an evaluation judge (InternVL3-8B) to break circularity; position-bias correction; and fixes for three failure modes (image overload, geometry-hiding splat renders, and reference-free judging that rewards clean-but-wrong outputs), with calibration evidence (clear-gap win-rate 0.83-1.0; base-vs-base ~0.5). Using this protocol as an independent evaluator, and working only from public models and data with lightweight parameter-efficient adaptation, we find our methods match the strong base rather than exceed it. Independent base samples carry essentially no learnable preference (0.94 order-flip rate), so signal must be engineered by quality-contrastive construction. Across six adaptation methods, two input regimes, and a severity sweep, the most targeted - conditioner repair under severe degradation - reaches parity (0.50) with the base, while no method clears the >=65% win-rate target. The result is mechanistic: clean inputs saturate the judge, flow-DIT fine-tuning washes out through the sampler, and conditioning repair is the locus that moves geometry. Win-rates are directional at n=8 objects. Matching a strong public-data base with cheap adaptation is itself informative: exceeding it needs more than lightweight PEFT on public data, and the judge protocol is reusable.

17.
arXiv (CS.AI) 2026-06-15

Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories

arXiv:2511.07368v3 Announce Type: replace-cross Abstract: Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RL with verifiable rewards (RLVR) and test-time scaling (TTS). While recent work highlights the role of exploration in improving pass@K, empirical evidence points to a paradox: RLVR and ORM/PRM typically reinforce existing paths rather than expanding the reasoning scope, raising the question of why exploration helps if no new patterns emerge. To reconcile this paradox, we adopt the perspective of Kim et al. (2025), viewing easy (e.g., simplifying a fraction) versus hard (e.g., discovering the some symmetry) reasoning steps as low versus high probability Markov transitions. In this tractable model, pretraining corresponds to tree-graph discovering, while post-training corresponds to CoT reweighting. We provably show that, both RLVR and ORM/PRM would favor heavily to several high-probability paths, and thereby forget rare-but-crucial CoTs. Building on this, we further prove that exploration strategies such as rejecting easy instances and KL regularization help preserve rare CoTs. Empirical simulations corroborate our theoretical results.

18.
arXiv (CS.LG) 2026-06-18

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

arXiv:2605.20726v2 Announce Type: replace-cross Abstract: Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

19.
arXiv (CS.CL) 2026-06-15

AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization

Large reasoning models typically follow a read-then-think paradigm: they observe the complete input, reason over a static context, and then produce the answer. Yet many real-world scenarios are inherently dynamic, such as audio and video stream, where information arrives as a continuous stream and models must reason, update, and respond under partial observations. Recent streaming reasoning methods allow models to think while reading, but they largely rely on supervised imitation of pre-constructed trajectories, which limits their flexibility. In this paper, we propose AdaSR, an adaptive streaming reasoning framework that enables models to reason during input streaming and perform final deliberation once the stream is complete, learning when to think, and how much computation to allocate across different stages. To optimize this hierarchical reasoning process, we introduce Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming reasoning and deep reasoning phases, providing more fine-grained advantage assignment instead of uniformly distributing a single sequence-level advantage over all tokens. HRPO integrates format, accuracy, and adaptive thinking rewards to enforce valid reasoning protocols, preserve final task performance, and encourage latency-aware computation allocation. Experiments show that AdaSR achieves a better balance among reasoning accuracy, computational efficiency, and streaming latency compared with supervised fine-tuning baseline. We release our code at https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR.

20.
arXiv (CS.AI) 2026-06-11

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

arXiv:2606.11537v1 Announce Type: new Abstract: Financial and tabular question answering requires more than fluent reasoning: answers must be grounded in the exact facts, formulas, units, signs, and scales that support them. A single misread cell or incorrect operation can silently produce a plausible but wrong result. We introduce \textsc{MOCA-Agent}, a market-of-claims code agent that replaces free-form multi-agent debate with claim-level verification. The system decomposes each question into typed atomic claims, asks specialist trader agents to buy or sell those claims, clears their orders into confidence-weighted accept/reject decisions, and synthesizes an executable Python program from market-supported evidence. A code-aware verifier then checks the program for execution, structural consistency, and common financial reasoning errors, with at most one market-aware repair round. Across ten public benchmarks spanning financial numerical reasoning, general tabular reasoning, ESG question answering, and multimodal chart reasoning, \textsc{MOCA-Agent} achieves strong performance using a fixed Qwen3.6-27B backbone, including $78.3\%$ on FinQA, $76.0\%$ on FinanceMath, $71.2\%$ on MultiHiertt, $86.9\%$ on ESGenius, and $85.6\%$ average on FinChart-Bench. These results show that aggregating evidence at the level of atomic claims, rather than whole answers, improves robustness in high-stakes numerical reasoning.\footnote{The code and data are available: https://github.com/UBC-NLP/MoCA-Agent.

21.
arXiv (CS.AI) 2026-06-11

Power Term Polynomial Algebra for Boolean Logic

arXiv:2603.13854v2 Announce Type: replace-cross Abstract: We introduce power term polynomial algebra, a representation language for Boolean formulae designed to bridge conjunctive normal form (CNF) and algebraic normal form (ANF). The language is motivated by the tiling mismatch between these representations: direct CNFANF conversion may cause exponential blowup unless formulas are decomposed into smaller fragments, typically through auxiliary variables and side constraints. In contrast, our framework addresses this mismatch within the representation itself, compactly encoding structured families of monomials while representing CNF clauses directly, thereby avoiding auxiliary variables and constraints at the abstraction level. We formalize the language through power terms and power term polynomials, define their semantics, and show that they admit algebraic operations corresponding to Boolean polynomial addition and multiplication. We prove several key properties of the language: disjunctive clauses admit compact canonical representations; power terms support local shortening and expansion rewrite rules; and products of atomic terms can be systematically rewritten within the language. Together, these results yield a symbolic calculus that enables direct manipulation of formulas without expanding them into ordinary ANF. The resulting framework provides a new intermediate representation and rewriting calculus that bridges clause-based and algebraic reasoning and suggests new directions for structure-aware CNFANF conversion and hybrid reasoning methods.

22.
bioRxiv (Bioinfo) 2026-06-16

MetaPilot: genome-aware adaptive search-space refinement for unified DDA and DIA metaproteomics

Metaproteomic peptide identification is constrained by the structure and size of the protein search space. Pooled gene catalogues provide coverage but obscure genome-level evidence, and current workflows for data-dependent (DDA) and data-independent (DIA) acquisition diverge in their database strategies. We present MetaPilot, a genome-aware workflow that uses conserved marker-protein evidence to rank candidate genomes from MGnify catalogues and construct adaptive, sample-specific search spaces. Applied to paired DDA/DIA datasets of defined mixtures and fecal samples, MetaPilot adapted genome selection to community complexity and reproduced published peptide evidence while expanding the detectable peptide space. In DDA-independent reanalysis of Orbitrap human gut DIA data, MetaPilot identified 24.4% more peptides than the published DDA-derived library and 2.06-fold more than the matched DDA-assisted DIA search. On timsTOF DIA-PASEF mouse intestinal data, it outperformed uMetaP by 41.8~119.7%, enabling genome-resolved functional interpretation without DDA-PASEF input.

24.
arXiv (CS.AI) 2026-06-11

Sovereign Assurance Boundary: Certificate-Bound Admission for Agentic Infrastructure

arXiv:2606.11632v1 Announce Type: cross Abstract: Agentic infrastructure introduces a critical control-plane authorization problem: non-deterministic reasoning systems can propose high-stakes mutations to production resources, yet existing security mechanisms – such as identity and access management (IAM), policy engines, consensus protocols, and audit logs – either enforce static, context-unaware permissions or merely record actions post-execution. This paper introduces the Sovereign Assurance Boundary (SAB), a certificate-bound runtime admission layer for autonomous execution authority. SAB intercepts agent proposals at an assurance airlock, compiles them into typed execution contracts $C$, and binds these contracts to cryptographic evidence digests $H(E)$ and policy versions. The contracts are then routed through consequence-aware certification paths. Upon successful admission, the system emits a signed Sovereign Assurance Certificate ($\Omega$) that is strictly scoped to a specific execution identity, revocation epoch, and validity window. Finally, a sovereign execution broker verifies $\Omega$ and performs fresh pre-execution revocation and drift checks before invoking infrastructure APIs. We detail the airlock-broker architecture, formalize its admission and revocation invariants, and report preliminary feasibility measurements from a Go prototype evaluated over 2,500 admission attempts. Ultimately, this broker-enforced model prevents autonomous reasoning from directly mutating state, transforming delegated execution authority into a cryptographically verifiable, evidence-bound, revocable, and replayable runtime artifact.

25.
arXiv (quant-ph) 2026-06-17

Creating squeezed and non-classical collective motional many-body states through stroboscopic Rydberg dressing

arXiv:2606.17849v1 Announce Type: cross Abstract: Realizing conditional quantum operations, e.g., quantum gates, for quantum computing and simulation requires controlled interactions between particles. Often, these interactions depend on the interparticle distance, and accordingly, an uncertainty of the relative particle position may translate into gate infidelities. We consider here a quantum computing platform based on an array of neutral atoms and present a method that allows to reduce the uncertainty of all interatomic distances. Our approach exploits the coupling between atomic motion and stroboscopically excited atomic Rydberg states. It allows to collectively squeeze the modes corresponding to interatomic displacements, thereby reducing distance fluctuations down to a fraction of the motional vacuum state. Furthermore, the method permits the creation of non-classical states with substantial Wigner negativity. These correlated states may allow reducing motional decoherence, increasing gate fidelity, and potentially yield a resource for quantum-enhanced metrology.