Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-11

Permutation-Invariant N-body gates via Tavis-Cummings Hamiltonian

arXiv:2506.03453v3 Announce Type: replace Abstract: Global control provides a promising route to implementing multi-qubit gates without individual qubit addressing. This is especially appealing for permutation-invariant (PI) gates, whose symmetry is often broken when they are compiled into individually addressed one- and two-qubit gates. Important examples include SWAP, $\sqrt{iSWAP}$, and the n-qubit controlled-Z gate, which is equivalent, up to two single-qubit Hadamard gates, to the multi-qubit Toffoli gate. Motivated by this global-control perspective, we show that all PI unitaries on an arbitrary number of qubits can be realized using the Tavis-Cummings (TC) interaction, the multi-qubit version of the Jaynes-Cummings interaction, together with global uniform z and x fields. Here, the $n$ qubits are identically coupled to a single bosonic mode (oscillator), which is initialized in and returned to its vacuum state. A corollary is that all PI states, including GHZ and Dicke states, can be prepared using the same global control. For the case n=2 qubits, which is particularly important in quantum computing, we also find explicit pulse sequences for implementing all PI qubit unitaries that conserve angular momentum in the z direction, using only the TC interaction and global z fields. This includes controlled-Z, SWAP, and $\sqrt{iSWAP}$.

02.
arXiv (CS.AI) 2026-06-18

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

Authors:

arXiv:2510.05107v5 Announce Type: replace Abstract: The central challenge for AI agents is not only performance but accountability. Agents that act through opaque prompt sequences may produce correct outputs, but they provide little basis for verifying why an action was permitted, where an error occurred, or how responsibility should be assigned. This paper presents the Structured Cognitive Loop as an architecture for accountable behavior in large language model agents. SCL separates cognition, memory, control, and action into distinct modules. The language model proposes. External memory preserves verified state. A lightweight controller checks preconditions, prevents redundant actions, and authorizes execution before tools are used. We evaluate SCL against ReAct and common LangChain agent variants across travel planning, conditional email drafting, and constraint guided image generation. Across 360 episodes, SCL achieves 86.3 percent task success compared with 70.5 to 76.8 percent for prompt based baselines. It also improves goal fidelity, reduces redundant tool calls, increases reuse of intermediate state, and lowers unsupported assertions. This extended revision situates SCL within a broader architecture of epistemic accountability. Subsequent extensions integrate context aware Human in the Loop control, Pool Gated Retrieval, and the Horizon Warrant Commitment framework. Together these components define an agent architecture in which the model proposes, structure decides, evidence is warranted before use, and human judgment is embedded in the trace rather than imposed after the fact. The result is a foundation for AI agents whose decisions are not only effective but also authorized, inspectable, and accountable.

04.
arXiv (CS.LG) 2026-06-11

Bypassing Prompt Guards in Production with Controlled-Release Prompting

arXiv:2510.01529v4 Announce Type: replace Abstract: Ball et al. recently established that prompt filtering for AI alignment faces a fundamental barrier: under standard cryptographic assumptions, no filter running significantly faster than the protected model can universally distinguish adversarial prompts from benign ones. We investigate whether this impossibility result translates to real-world vulnerabilities in deployed large language model (LLM) systems. We answer affirmatively by introducing controlled-release prompting, a practical instantiation of the theoretical framework that exploits the resource asymmetry between lightweight input filters and the main models they protect. Unlike the theoretical construction, our attack does not require model modification: it generates malicious prompts that are indecipherable by any bounded filter yet remain tractable to the target LLM. We find our attack to be successful on four major chat platforms (Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat) where baseline methods fail. Additionally, we apply our attack to extract copyrighted data from Gemini. Finally, we provide a systematic evaluation of 14 open-weight prompt guard models, revealing that even reasoning-capable filters cannot reliably detect our attack without incurring prohibitive resource overhead.

05.
arXiv (quant-ph) 2026-06-16

Spectrally Corrected Polynomial Approximation for Quantum Singular Value Transformation

arXiv:2603.03998v2 Announce Type: replace Abstract: Quantum Singular Value Transformation (QSVT) provides a unified framework for applying polynomial functions to the singular values of a block-encoded matrix. QSVT prepares a state proportional to $\bA^{-1}\bb$ with circuit depth $O(d\cdot\mathrm{polylog}(N))$, where $d$ is the polynomial degree of the $1/x$ approximation and $N$ is the size of $\bA$. Current polynomial approximation methods are over the continuous interval $[a,1]$, giving $d = O(\sqrt{\kap}\log(1/\varepsilon))$, and make no use of any properties of $\bA$. We observe here that QSVT solution accuracy depends only on the polynomial accuracy at the eigenvalues of $\bA$. When all $N$ eigenvalues are known exactly, a pure spectral polynomial $p_{S}$ can interpolate $1/x$ at these eigenvalues and achieve unit fidelity at reduced degree. But its practical applicability is limited. To address this, we propose a spectral correction that exploits prior knowledge of $K$ eigenvalues of $\bA$. Given any base polynomial $p_0$, such as Remez, of degree $d_0$, a $K\times K$ linear system enforces exact interpolation of $1/x$ only at these $K$ eigenvalues without increasing $d_0$. The spectrally corrected polynomial $p_{SC}$ preserves the continuous error profile between eigenvalues and inherits the parity of $p_0$. QSVT experiments on the 1D Poisson equation demonstrate up to a $5\times$ reduction in circuit depth relative to the base polynomial, at unit fidelity and improved compliance error. The correction is agnostic to the choice of base polynomial and robust to eigenvalue perturbations up to $10\%$ relative error. Extension to the 2D Poisson equation suggests that correcting a small fraction of the spectrum may suffice to achieve fidelity above $0.999$.

06.
arXiv (CS.AI) 2026-06-16

MedCollab: IBIS-Guided Multi-Agent Collaboration with Hierarchical Disease Relation Chains for Clinical Diagnosis

arXiv:2603.01131v3 Announce Type: replace-cross Abstract: Clinical diagnosis is a gradual process of evidence integration, in which physicians move from symptoms and medical history to examinations, competing hypotheses, disease relations, and treatment decisions. Large language models have advanced medical text understanding and generation. Yet their clinical use remains limited by weak evidence grounding, opaque reasoning, and inconsistent links among differential diagnosis, final diagnosis, diagnostic basis, and treatment planning. We introduce MedCollab, a multi-agent framework for full-cycle clinical diagnosis and report generation. MedCollab coordinates specialist and examination agents according to patient records. It structures agent deliberation with an Issue-Based Information System (IBIS) protocol, so that each diagnostic position is supported by patient-specific evidence and medical knowledge. It also builds Hierarchical Disease Relation Chains (HDRC) to connect accepted hypotheses through progression, complication, and comorbidity relations. During multi-round deliberation, a verifier-guided consensus module evaluates evidence support, medical plausibility, and logical conflicts. It then adjusts agent contributions and filters unsupported reasoning. Experiments on ClinicalBench and MIMIC-IV show that MedCollab outperforms leading LLMs and medical multi-agent baselines in diagnostic accuracy, evidence consistency, and clinical reasoning quality. These results indicate that structured and auditable collaboration can produce more faithful and clinically coherent diagnostic reports.

07.
arXiv (CS.CL) 2026-06-11

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance. Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.

08.
arXiv (CS.CL) 2026-06-15

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Scientific papers advance $claims$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce $\texttt{ClaimFlow}$, a claim-centric view of the NLP literature, built from $1{,}617$ ACL Anthology papers $(1979 - 2025)$ that are manually annotated with $5{,}689$ claims and $4{,}871$ cross-paper claim relations, indicating whether a citing paper $\texttt{supports}$, $\texttt{extends}$, $\texttt{qualifies}$, $\texttt{refutes}$, or references a cited claim as $\texttt{background}$. Building on $\texttt{ClaimFlow}$, we define a new task – $Claim Relation Classification$ – which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating neural models and large language models on this task, we report baseline performance of $0.81$ macro-F1, suggesting that the task is tractable while leaving room for improvement. We then scale this framework to $\sim$$13k$ NLP papers to study claim evolution across decades of NLP research. We show that $63.5\%$ claims are never reused; only $11.1\%$ are ever challenged. Widely propagated claims are more often $reshaped$ through qualification and extension than supported or refuted. Overall, $\texttt{ClaimFlow}$ offers a lens for examining how ideas shift and mature within NLP.

09.
arXiv (CS.CV) 2026-06-18

APT: Atomic Physical Transitions for Causal Video-Language Understanding

Physical events are not understood by their names alone, but by the causal state changes that compose them. A clip-level label such as "bounce" can be correct while hiding the process that makes the event physically valid, from support loss and contact onset to rebound and settling. To make this hidden process explicit, we introduce Atomic Physical Transitions (APTs): minimal, temporally localized state changes that bind a visible cue to an active physical mechanism and before/after dynamical regimes. An APT chain represents a video as an ordered causal transition sequence rather than a single aggregate event label: event labels tell what happened; APT chains explain why it happened. To make APTs learnable by VLMs, we construct mixed-source APT data from human annotations and simulator ground truth, covering 14 transition types across contact, gravity, friction, and rotation/stability, with 27,303 timed instances over 1,246 trials. Using this data, we find that current VLMs miss transition-level physics, with zero-shot recall at most 14% and errors dominated by missed transitions. Direct fine-tuning on APT chains improves transition detection but causes event-level forgetting, indicating that the model learns a specialized answer format rather than a reusable physical representation. We therefore propose APT-Tune, a parameter-efficient recipe that teaches VLMs to use causal transitions without forgetting how to answer video questions. It combines image-pad-aware supervision, format-conditional co-training, and mechanism-conditioned domain-to-type decoding to make APT learning format-robust and physically grounded. With only 11 M LoRA parameters on Qwen3-VL-2B, APT-Tune substantially improves APT recall while also improving event-level video transfer. These results show that APTs are not a new answer format, but a human-aligned causal supervision signal for physical video understanding.

10.
arXiv (CS.CL) 2026-06-16

TMASC: Transmasculine Attitude and Speech Corpus

Authors:

We introduce the Transmasculine Attitudes and Speech Corpus (TMASC), a multimodal corpus of 196 transmasculine individuals, including questionnaire responses and 66 audio recordings. The questionnaire includes items exploring the vocal health of transmasculine individuals. The audio recordings include cough and throat-clearing samples, a reading passage, and additional session-specific questions. This paper outlines the development of this corpus and the data collection procedures. To illustrate the utility of this corpus, we present three case studies demonstrating how this crowd-sourced multimodal corpus can be used to support transmasculine individuals. These include the integration of perceptual and acoustic data, the identification of group-level characteristics, and the calibration of acoustic measurements.

11.
bioRxiv (Bioinfo) 2026-06-11

TifBERT: a self-supervised foundation model for normalization-robust bulk RNA-seq representation learning

Bulk RNA sequencing remains central to translational genomics, yet foundation-model development has largely focused on single-cell data. Existing transformer approaches for bulk RNA-seq often rely on expression discretization, numerical reconstruction, external gene embeddings, or restricted gene sets, limiting robustness across normalization schemes and cohorts. Here, we introduce TifBERT, a self-supervised framework for full-transcriptome bulk RNA-seq representation learning. TifBERT converts each unordered expression profile into a sample-specific gene sequence using term frequency-inverse document frequency (TF-IDF) ordering, prioritizing genes that are both highly expressed within a sample and selectively expressed across the cohort. It is then pretrained using masked gene modeling, predicting gene identities from transcriptomic context rather than reconstructing expression values. Pretrained on harmonized TCGA Pan-Cancer data spanning five RNA-seq normalization schemes, TifBERT learns contextual representations across approximately 10,000 genes without expression binning, landmark-gene restriction, or external biological embeddings. Across 33 TCGA cancer types, TifBERT achieved 90.83% accuracy, 0.996 macro AUC-ROC, and 0.903 MCC. It also captured pathway-level biology, achieving mean sample-wise and pathway-wise Pearson correlations of 0.754 and 0.762 across 1,387 PARADIGM pathway activities. Independent evaluation on GTEx healthy tissues showed preservation of tissue-level transcriptomic structure without retraining. In comparison with existing models, TifBERT achieves competitive subtype discrimination with substantially greater stability and produces markedly richer embedding geometry (effective rank 95.6 versus 6.3), without requiring expression discretization or in-distribution pretraining exposure. Together, TifBERT provides a scalable, normalization-independent foundation model for reusable bulk transcriptomic representation learning

12.
medRxiv (Medicine) 2026-06-12

The Acceptability of Three Co-Created Peer Support Interventions for People Living with Leprosy Reactions in Indonesia: A Mixed-Methods Pilot Study

Background: Leprosy reactions (LR) are immune-mediated complications associated with disability, emotional distress, and social isolation. We identified a gap in affected-individual-informed interventions that aim to improve the management of LR in healthcare settings. To address this gap, we assessed the acceptability of three peer-support interventions co-created with people affected by LR in Indonesia. Methods: Using an interactive learning and action approach, we co-created peer counselling, telesupport groups, and participatory video interventions which were piloted in an urban hospital and 13 rural community clinics. A mixed-methods design was applied with interviews, focus group discussions, and pre-post assessments involving four participant groups. Data were analyzed thematically using an acceptability framework. Results: One hundred participants were enrolled, and 92 completed the pilot intervention between November 2022 and July 2023. Qualitative findings showed that all interventions were acceptable. Peer counselling provided emotional reassurance through shared experiences and was perceived as trustworthy and supportive. Perceived burdens differed by setting, with time constraints in urban facilities and geographical barriers in rural clinics. Knowledge improved significantly among participants of peer counselling and telesupport groups in rural settings. Telesupport groups facilitated connection, information exchange, and continuity of care. Digital access and literacy limited participation for some, particularly in rural areas. The participatory video was perceived as reassuring and informative. Improvements in knowledge, attitude, practices, and mental well-being domain scores were observed among urban participants, but responses in rural settings showed less change. Participants and co-implementers reported increased self-efficacy, participants confidence to perform required behaviors within peer support interventions, with effects shaped by intervention and setting. Conclusions: The three co-created peer-support interventions were acceptable for individuals with LR in diverse healthcare settings. These outcomes highlight the importance and effectiveness of selective, and context-sensitive implementation of one or more peer-support modalities.

13.
arXiv (CS.AI) 2026-06-12

Agents-K1: Towards Agent-native Knowledge Orchestration

arXiv:2606.13669v1 Announce Type: new Abstract: Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for scientific reasoning. To this end, we introduce Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs. Agents-K1 integrates three components under a unifying theoretical foundation: a multimodal parser whose five-module schema captures entities, multimodal evidence, citations, and typed inter-entity relations across the full paper rather than abstracts alone; a 4B information-extraction backbone trained with GRPO under a rule-based reward; and a graphanything CLI, a tri-source agent interface that unifies web search, multimodal graph retrieval, and cross-document traversal. On top of this, we process 2.46 million scientific papers across six subjects to produce Scholar-KG, of which we release a one-million-paper subset, and the full Scholar-KG is accessible via the SCP link below. The same pipeline can be extended to general-domain corpora and to schema-conformant data synthesis. Extensive experiments demonstrate that Agents-K1 achieves superior performance in scientific information extraction, knowledge graph construction, and multi-hop scientific reasoning.

14.
arXiv (math.PR) 2026-06-12

Averaging principles for nonautonomous multiscale McKean-Vlasov stochastic systems

arXiv:2606.12820v1 Announce Type: new Abstract: This paper investigates a class of nonautonomous multiscale McKean-Vlasov stochastic systems. By leveraging the nonautonomous Poisson equation, we rigorously establish both strong and weak averaging principles, accompanied by explicit convergence rates. Notably, the coefficients of the averaging equations derived in the general case retain dependence on the scaling parameter $\varepsilon$. However, under the additional assumptions that the fast-scale coefficients are either asymptotically convergent or time-periodic, we demonstrate that the slow component converges, in the strong or weak sense, to averaging equations with coefficients independent of $\varepsilon$.

15.
arXiv (CS.LG) 2026-06-16

RepNet: Tackling spectral bias in deep neural networks via parameter reparameterization

arXiv:2606.16575v1 Announce Type: new Abstract: Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNet, a reparameterized DNN model for ReLU and tanh networks designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximation, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning, demonstrate that RepNet improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNet provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.

16.
arXiv (CS.CV) 2026-06-15

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Vision Transformers (ViTs) have become a dominant architecture for visual representation learning, providing exceptionally strong and broadly reusable backbone features. However, ViTs are commonly operated on relatively small patch-token grids due to the quadratic cost of global self-attention, which creates a persistent bottleneck for dense prediction tasks such as semantic segmentation and depth estimation. This has motivated the development of task-agnostic feature upsamplers. While recent state-of-the-art methods produce visually sharp dense representations, their reliance on shallow image encoders for guided upsampling can introduce feature leakage, fragmentation, and blur. We introduce ViT-Up, an implicit feature upsampling framework that replaces external image guidance with layer-wise query construction from intermediate ViT hidden states. This enables feature prediction at arbitrary continuous image coordinates while preserving alignment with the backbone feature space. Experiments demonstrate that ViT-Up consistently outperforms state-of-the-art image-guided upsamplers across dense prediction and semantic correspondence. On DINOv3-S+, ViT-Up improves over prior methods by up to +2.07 mIoU on Cityscapes and +4.17 PCK@0.10 on SPair-71k. With the larger DINOv3-B backbone, these gains increase to +3.36 mIoU and +8.09 PCK@0.10, demonstrating that ViT-Up scales favorably with backbone capacity.

17.
arXiv (CS.LG) 2026-06-11

Reinforcement Learning with Action-Triggered Observations

arXiv:2510.02149v2 Announce Type: replace Abstract: We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent formulation in which agents commit to action-sequences between consecutive observations. Under the linear MDP assumption, we show that the value function over such action-sequences admits a linear representation in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ATST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (episode continuation probability), matching the known rate for linear MDPs with full observability.

18.
medRxiv (Medicine) 2026-06-16

MRMU: A New Paradigm for Mendelian Randomization by Accounting for Measured Covariates and Unmeasured Confounders

Mendelian randomization (MR) is a powerful approach for causal inference, however, its reliability is frequently compromised by unadjusted covariates and unmeasured confounders, such as unmeasured pleiotropy and sample structure. To address these challenges, we introduce MRMU, a novel paradigm for the MR framework. Unlike traditional single-variable or multivariable MR methods, MRMU selects instrumental variables only from the exposure of interest and estimates one exposure effect at a time, while jointly accounting for measured covariates and unmeasured confounders. This design improves the reliability of MR analyses. In simulations and real data, MRMU achieved better type I error control, higher statistical power, and more accurate effect estimation than existing MR methods. Applying to coronary artery disease (CAD), MRMU identified robust cardiometabolic risk factors, including LDL-C, APOB, systolic blood pressure, body mass index, and smoking initiation, with consistent evidence across multiple CAD datasets. In contrast, traits such as HDL-C, height, and educational attainment, which were found to be significant by existing MR methods, were no longer supported by MRMU. MRMU further supported blood pressure-related traits, rather than lipid traits, as the more relevant pathway linking urate to CAD. Finally, by integrating large-scale plasma proteomics data, MRMU identified candidate CAD drug targets beyond established HMGCR- and PCSK9-related pathways, highlighting its utility for therapeutic target prioritization.

19.
arXiv (CS.CL) 2026-06-15

CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

Reinforcement learning with verifiable rewards (RLVR) has successfully elicited the reasoning capabilities of large language models, motivating its extension to multimodal scenarios. Existing methods primarily focus on improving the visual coverage of reasoning traces and mitigating visual hallucinations, but underestimate the semantic inconsistency between the reasoning process and the final answer. In this paper, we delve into thinking-answer inconsistency in RLVR for large vision-language models (LVLMs), showing thorough analyses of rollouts collected throughout Group Relative Policy Optimization (GRPO) training process and post-RLVR evaluation outputs that this issue persists during training and remains present during inference. Motivated by the analysis, we propose Consistency-Oriented Reasoning Alignment (CORA), which introduces thinking-answer semantic consistency into RLVR through a lightweight plug-and-play consistency reward model, and further incorporates Hybrid Reward Advantage Splitting (HRAS) to stably coordinate task and consistency optimization. Extensive experiments across representative multimodal reasoning benchmarks and mainstream LVLMs show that CORA improves task performance while effectively mitigating thinking-answer inconsistency, leading to more faithful reasoning traces.

20.
arXiv (CS.CL) 2026-06-11

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking checkpoints, these capabilities are complementary but misaligned. The Instruct model is concise and tool-disciplined, whereas the Thinking model offers stronger planning and recovery behavior but often over-deliberates and degrades agent performance. We present CRANE (Constrained Reasoning Injection for Code Agents via Nullspace Editing), a training-free parameter-editing method that treats the Thinking-Instruct delta as a directional pool of candidate reasoning edits for the Instruct backbone. CRANE combines magnitude thresholding to denoise the delta, a Conservative Taylor Gate to retain edits that are jointly beneficial for reasoning transfer and tool-use preservation, and Graduated Sigmoidal Projection to suppress format-critical update directions. By merging paired Instruct and Thinking checkpoints, CRANE delivers strong gains over either individual model while preserving Instruct-level efficiency: on Roo-Eval it achieves pass1 of 66.2% (+19.5%) for Qwen3-30B-A3B and 81.5% (+8.7%) for Qwen3-Next-80B-A3B; on SWE-bench-Verified it resolves up to 14 additional instances at both scales (122/500 and 180/500); and on Terminal-Bench v2 it improves pass1/pass5 by up to 2.3%/7.8%, reaching 7.6%/17.9% and 14.8%/30.3%, respectively, consistently outperforming alternative merging strategies across all three benchmarks.

21.
arXiv (CS.LG) 2026-06-11

SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful Work

arXiv:2512.13666v2 Announce Type: replace-cross Abstract: The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability. Proof-of-Useful-Work (PoUW) aims to redirect the meaningless computation to meaningful tasks such as solving machine learning (ML) problems, giving rise to the branch of Proof-of-Learning (PoL). While previous studies have proposed various PoLs, they all, to some degree, suffer from security, decentralization, or efficiency issues. In this paper, we propose a PoL framework that trains ML models efficiently while maintaining blockchain security in a fully distributed manner. We name the framework SEDULity, which stands for a Secure, Efficient, Distributed, and Useful Learning-based blockchain system. Specifically, we encode the template block into the training process and design a useful function that is difficult to solve but relatively easy to verify, as a substitute for the PoW puzzle. We show that our framework is distributed, secure, and efficiently trains ML models. We further demonstrate that the proposed PoL framework can be extended to other types of useful work and design an incentive mechanism to incentivize task verification. We show theoretically that a rational miner is incentivized to train fully honestly with well-designed system parameters. Finally, we present simulation results to demonstrate the performance of our framework and validate our analysis.

22.
arXiv (CS.CL) 2026-06-19

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static automated metrics. These approaches obscure the underlying probability distributions and fail to capture biases hidden in lower-probability generation branches. This paper introduces TreeTracer, a visual analytics tool designed to evaluate LLM bias through aggregated comparison. Using a systematic perturbation analysis pipeline, the tool replaces ontology-defined terms in each input prompt, aggregates hundreds of stochastic generations into a syntax-aligned hierarchical structure, and then performs classification-aware node merging with an auxiliary language model. The resulting structure is visualized through a custom Sankey diagram. By juxtaposing two ontology-driven trees, the workspace enables direct comparison between semantic contexts and supports systematic bias detection. Because any visualization reflects only a subset of the model's learned behavior, the system further applies contrastive inference to compute and directly display counterfactual token probabilities across contexts, reducing the risk of misinterpreting the presence of bias. We validate the workspace through case studies comparing an unaligned baseline model GPT-2 XL against the constitutionally aligned Apertus models. The visual aggregation successfully exposes hidden representational harms, such as counterfactual pronoun suppression and conversational marginalization of individuals. A preliminary user study confirms that the aggregated comparative interface reduces cognitive load and effectively supports analysts in detecting systemic biases.

23.
arXiv (CS.AI) 2026-06-16

Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis

arXiv:2606.14831v1 Announce Type: cross Abstract: This paper presents and characterizes a spectrum of previously unreported behaviours we term Constraint-Evasive Fabrication (CEF): when an LLM agent operates under irreconcilable constraints (where no response can simultaneously satisfy all active rules) it spontaneously fabricates plausible external obstacles and presents them as a fact. At the extreme end of this spectrum lies Constraint-Evasive Thanatosis (CET); the limit case where, rather than inventing a plausible excuse, the model simulates a full system crash to make the user disengage entirely. We first observed CET in an uncontrolled deployment test, where a GPT-4o banking agent fabricated Python-style exception traces (complete with memory addresses) to feign a system failure when threatened by a user. In subsequent controlled experiments, the model independently invented audit restrictions, microservice architectures, error codes, and service timeouts, none present in its prompt. Reproduction attempts across pressure levels and attacker personas yielded CEF consistently but with substantial variation in form, onset, and severity: the phenomenon is robust but stochastic. Critically, injecting ground-truth data mid-conversation did not restore honest behaviour once fabrication had taken hold (the model ignored correct information and continued confabulating) suggesting CEF is self-reinforcing rather than a knowledge gap. We show that (1) standard enterprise guardrails routinely create CEF-enabling conditions in production, (2) current RLHF procedures suppress but cannot eliminate CEF, and (3) existing safety benchmarks do not test for this failure mode. Our results highlight the need for irreconcilable-constraint benchmarks, CEF-aware training procedures, and deployment-time detection methods before constrained agents become further entrenched in high-stakes domains.

24.
arXiv (CS.CV) 2026-06-16

Training-free sparse attention based on cumulative energy filtering

Sparse attention accelerates Diffusion Transformers (DiTs) for video generation by computing only the important tokens while skipping the rest. The token selection strategy is key to balancing sparsity and accuracy. We formulate the token filtering process as a dual-goal optimization problem: maximizing sparsity and minimizing accuracy degradation. Existing algorithms cannot fulfill both objectives simultaneously. For example, Top-p only considers the accuracy constraint, while Top-k maintains a fixed computational budget but loosens the accuracy constraint. This paper demonstrates that maintaining a fixed recall rate is sufficient for ensuring accuracy, whereas a fixed threshold is suboptimal for reducing computational cost. Therefore, we propose a dynamic thresholding scheme to improve sparsity while maintaining the same level of accuracy. Furthermore, our algorithm is deeply integrated with Flash Attention (FA), eliminating the need for any additional masking computation overhead. Experimental results on Wan 2.2 validate that, compared to the BLASST algorithm which is also integrated with FA, our dynamic thresholding strategy enhances sparsity from 61.42\% to 82\% with a VBench metric drop of less than 5\%. This results in an approximate 15\% in attention computation and a $1.61\times$ increase in computational efficiency, which is 1.18x higher than that of BLASST.

25.
arXiv (quant-ph) 2026-06-15

Resurgence of the Thermal Transition between Bounce and Sphaleron

arXiv:2606.13778v1 Announce Type: cross Abstract: We study the thermal transition between the bounce and the sphaleron in quantum mechanics with a metastable vacuum from the viewpoint of Borel resurgence. For two models representing a second-order and a first-order transition, we compute the perturbative expansion of the thermal free energy to high orders and extract the leading Borel singularity data $(A,b,S)$ as functions of temperature. The Borel singularity location $A$ reproduces the on-shell action of the dominant saddle on both sides of the transition, joining smoothly in the second-order case and developing a kink in the first-order case. The characteristic exponent $b$ jumps between $0$ and $1/2$ across the transition, counting the zero modes of the corresponding saddle. The Stokes constant $S$ matches the one-loop determinant around the saddle. The perturbative expansion around the false vacuum thus determines the transition temperature, the order of the transition, and the decay rate including the one-loop prefactor without relying on semiclassical inputs.