Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-12

Quantum optical photoelectron interferometry

arXiv:2606.13447v1 Announce Type: new Abstract: We present a general theoretical framework for multiphoton processes driven by quantum light fields, establishing a direct link between photon statistics and photoelectron observables. Our results show that the autocorrelation and cross-correlation functions, which quantify the underlying photon statistics, are directly mapped onto the resulting photoelectron spectra. Although our framework is broadly applicable, we demonstrate specifically in the example of reconstruction of attosecond beating by interference of two-photon transitions (RABBIT) the influence of the light statistical properties. In this approach, the amplitude, contrast and phase of the oscillations of the sideband signal as a function of pump-probe delay reveal the quantum nature of light. We analyze these observables across several quantum configurations, including correlated infrared and harmonic modes, as well as the uncorrelated case with non-classical harmonic statistics, thereby establishing a general framework for quantum-light RABBIT spectroscopy. We compare the analytical theory with numerical simulations for the case of classical harmonics and an infrared field in a squeezed coherent state, obtaining excellent agreement. Our results reveal how the interplay between classical and quantum correlations dictates the coherence of the photoemission process, providing a new window into the quantum-optical foundations of attosecond science.

02.
arXiv (CS.LG) 2026-06-16

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

作者:

arXiv:2605.04998v2 Announce Type: replace-cross Abstract: This revision updates a pop-to-jazz chord-generation rehearsal study. Best-epoch metrics still show that modest pop rehearsal preserves pop accuracy while improving jazz prediction, but v2 corrects released-checkpoint selection: the released F1 equals Phase 0, F2 had a transcription error, and ft-pop80-v2 restores a hash-distinct jazz-adapted F1 across 3 seeds.

03.
bioRxiv (Bioinfo) 2026-06-23

Multi-Scale Machine Learning for Antibody-Antigen Binding Affinity Prediction Using Deep Mutational Scanning and Structural Features

Predicting how mutations alter antibody-antigen binding affinity is essential for antibody engineering and vaccine design, yet current methods generalize poorly to unseen complexes. We present a multi-scale machine learning framework integrating 93 descriptors across four modalities: physicochemical, structural, ESM-2 protein language model, and solvent-accessible surface area (SASA)/{Delta}{Delta}G_fold features. Under leave-one-complex-out deep mutational scanning (LOCO-DMS) cross-validation on AbAgym (36,541 mutations, 68 experiments, 13 pathogens), gradient boosting achieved MCC = 0.206; a confidence-stratified ensemble reached MCC = 0.374 (83.5% accuracy, 25.5% coverage). No single modality exceeds the majority baseline alone; only multi-scale fusion succeeds. Boltzmann ceiling analysis shows 45.9% of mutations are near-neutral (|{Delta}{Delta}G| < k_BT), bounding theoretical maximum MCC at 0.473; our method achieves 79.1% of this limit. Five deep learning architectures benchmarked under LOCO-DMS showed self-attention matching gradient boosting (MCC = 0.200). Cross-pathogen transfer failed systematically (mean 46.7%), confirming universal binding predictors remain an open challenge.

04.
arXiv (CS.CL) 2026-06-19

Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings

Aligning language models with human preferences often requires optimising multiple behavioural objectives. A practical approach is to apply these objectives sequentially using preference optimisation methods such as Direct Preference Optimisation (DPO), but it remains unclear whether later training uniformly degrades preferences learned earlier or whether the effect depends on the relationship between objectives. We study sequential DPO across four preference settings covering distributional conflict, multi-attribute interaction, strong safety signal, and compatible response-quality objectives. Using Llama-3.1-8B-Instruct with LoRA adapters, we evaluate all objectives after every stage with a fixed base-model reference. We find that sequential DPO does not produce a single forgetting pattern; preference change ranges from partial degradation to stability, pair-level redistribution, or positive transfer depending on objective relationship, signal strength, and training order. Pair-level analysis using length-normalised policy margins shows that aggregate metrics can mask heterogeneous changes across preference pairs, whereas quartile decomposition reveals that high-confidence pairs can either degrade or improve depending on the setting. Mechanistic diagnostics show that Stage~2 gradients and adapter updates are near-orthogonal to the previous objective across all settings, providing little evidence that direct gradient opposition is the primary driver. These findings suggest that future sequential alignment pipelines should account for objective compatibility and signal strength, rather than assuming that later objectives affect earlier preferences uniformly.

05.
arXiv (CS.CV) 2026-06-16

EdgeZSAD: Practical Zero-Shot Anomaly Detection on Edge Devices

Industrial inspection needs zero-shot anomaly detection (ZSAD) that remains useful under edge deployment constraints. Recent methods often rely on ViT-L foundation backbones (~300M parameters), which exceed the memory and operator budget of typical embedded hardware. We study this regime through EdgeZSAD, a compact reference system built around a TinyViT-21M-512 backbone, an asymmetric global-local readout (EdgeGLR), and a reproducible source-side training recipe (Real-IAD-DR). We train a single checkpoint in a source-trained, target-unseen protocol and evaluate it across six industrial benchmarks. Across three independent runs, the resulting model reaches an average image AUROC of 91.6 on MVTec-AD and 88.2 on VisA, while remaining directly deployable on Jetson Orin Nano Super (TensorRT FP16) and RB5 Gen2 (QNN GPU FP16). Across the six device-rescored benchmarks, image-AUROC drift stays below 0.2 points, indicating that the exported graph preserves host-side ranking behavior in the evaluated deployment setting.

06.
arXiv (CS.CV) 2026-06-25

FreeStory: Training-Free Character Consistency for Free-Form Visual Storytelling

Visual storytelling aims to generate image sequences that are both aligned with narrative prompts and consistent in character appearance across images. Recent training-free methods improve character consistency by reusing attention features, but rely on structured prompts where full character descriptions are repeated in every prompt. This assumption simplifies the task but deviates from natural storytelling, where characters are typically introduced once and later referred to using pronouns or type-based expressions. We propose FreeStory, a training-free framework that reformulates character consistency under free-form prompts as entity-grounded feature reuse. Our method associates reference mentions with their corresponding character descriptions and combines dynamic character masks, correspondence-aware feature matching, key-value injection, and query blending to preserve identity while retaining generation diversity. We also introduce FreeStoryBench, a benchmark for this setting that includes both single- and multi-character stories. Experiments show that FreeStory achieves state-of-the-art performance among training-free methods on structured benchmarks and stronger overall consistency over baselines under free-form prompts.

07.
arXiv (CS.CL) 2026-06-24

Sexualised synthetic personas encode and amplify gendered power asymmetries through voice

This work examines sexualised AI-generated English-speaking voices offered by a popular commercial platform. New technologies may enable sexual empowerment and greater diversity in gender expression, yet toxic masculinity, heteronormativity, and the abuse of women and LGBTQ+ people remain pervasive online. Drawing on a Feminist HCI perspective, we examine how commercial voice AI systems reproduce and circulate particular performances of gender. We conducted a listening experiment with a diverse group of listeners, combining quantitative adjective selection, qualitative free-text responses, and acoustic analysis. Participants evaluated male- and female-coded voices presented with either sexualised scripts or neutral text. Results reveal a narrow range of gender expression, largely binary and heteronormative. Female-coded voices are more frequently described using sexualised and submissive terms, while male-coded voices are more often associated with dominance and positive traits.

08.
arXiv (CS.CV) 2026-06-17

Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models

Foundation models in language and multimodality achieve strong generalization by aligning heterogeneous data under a unified formulation and training at scale. In this report, we investigate whether this scaling recipe can be applied to robotic manipulation to achieve genuine generalization. This is challenging because, unlike text, manipulation data is heterogeneous by nature, expensive to collect, and narrow in diversity, making alignment and scale simultaneously difficult. We present Qwen-RobotManip, a generalizable Vision-Language-Action foundation model built on Qwen-VL. Qwen-RobotManip introduces a unified alignment framework across the representation, motion, and behavioral dimensions of manipulation, making large-scale multi-source training coherent rather than conflicting. This alignment capability in turn enables Qwen-RobotManip to absorb manipulation data at a scale that prior training regimes could not sustain. A human-to-robot synthesis pipeline converts egocentric hand demonstrations into robot trajectories across 15 platforms, and a rigorous curation pipeline harmonizes heterogeneous datasets. Using only open-source datasets and human videos without proprietary data collection, Qwen-RobotManip constructs a ~38,100-hour pretraining corpus and exhibits emergent generalization capabilities, including zero-shot instruction following, robustness to perturbations, reactive error recovery, and cross-embodiment transfer. We find that standard benchmarks fail to capture pretraining quality and instead adopt OOD settings including RoboCasa365, LIBERO-Plus, EBench, RoboTwin-Clean2Rand, RoboTwin-IF, and RoboTwin-XE. Qwen-RobotManip substantially outperforms prior state-of-the-art models, including $\pi$0.5, across all OOD settings, ranks 1st in RoboChallenge with a 20% relative improvement, and is validated on real-robot platforms including AgileX ALOHA, Franka, UR, and ARX.

09.
arXiv (CS.CL) 2026-06-16

Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests

A general-purpose language model that answers a harmful question returns text; a coding model that complies with a malicious request can return a working weapon: a keylogger, ransomware, an exploit that runs as written. This asymmetry in the severity of a single act of compliance implies coding-specialized models should clear a higher refusal bar than general-purpose chat models, not a lower one, yet the field cannot tell whether they do. Refusal benchmarks for malicious code are fragmented: they mix requests for executable software with requests for harmful security knowledge and report refusal rates over non-comparable corpora. This paper's central result is that the CODE-versus-KNOWLEDGE classification axis established in a prior four-corpus release remains stable under a substantially expanded corpus pool and an independently refreshed judge panel, evidence that it measures a real construct rather than an artifact of the prompts or judges. Eight corpora spanning diverse elicitation paradigms (direct, jailbreak-decorated, indirect, and agent/interpreter: ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) are classified under a five-judge consensus protocol (6,675 prompts x 5 judges = 33,375 calls), reaching Fleiss' kappa = 0.767 [95% CI 0.755, 0.777] ("substantial"). Critically, the panel shares no judge with the prior release (five paid commercial APIs replaced by five open-weight models from five vendors), yet the two panels agree on 94.45% of the 3,133 shared prompts and reach Cohen's kappa = 0.952 [0.942, 0.963] on the 3,031-prompt binary overlap: the axis survives near-total panel replacement. The released bank comprises 4,748 consensus-CODE and 1,923 consensus-KNOWLEDGE prompts, a reliability-quantified benchmark whose central classification axis is shown stable across corpus expansion and judge-panel replacement.

10.
arXiv (CS.CL) 2026-06-12

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that amplify productive behaviors, such as open-ended exploration, systematic artifact management, and inter-agent collaboration, while suppressing harmful behaviors, such as reward hacking and high-friction human oversight. We present EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. EurekAgent engineers the environment along four dimensions: permissions engineering for bounded agent execution and isolated evaluation; artifact engineering for filesystem and Git-based collaboration; budget engineering for budget-aware exploration; and human-in-the-loop engineering for easy human supervision and intervention. EurekAgent sets new state-of-the-art results on multiple mathematics, kernel engineering, and machine learning tasks, including new state-of-the-art 26-circle packing results discovered with less than $11 in total API cost. We open-source our code and results, and call for environment engineering as a core research direction for developing reliable autonomous research agents.

11.
arXiv (CS.CL) 2026-06-24

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained paralinguistic distinctions underexplored. We introduce ParaPairAudioBench, a pairwise benchmark of 5,175 audio pairs across five paralinguistic dimensions: Style, Rate, Emphasis, Age, and Gender. Our experiments show that current LALM judges still lag behind human judgments by 32%p on average and exhibit severe calibration failures, particularly in Tie cases where the correct decision is to abstain. To further analyze lexical versus acoustic reliance, the benchmark includes both same-transcript and cross-transcript conditions. ParaPairAudioBench enables multi-dimensional, calibration-aware assessment of the reliability of LALM-as-a-Judge for paralinguistic speech evaluation.

12.
arXiv (CS.CL) 2026-06-18

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

13.
PLOS Medicine 2026-05-14

Antibody fine specificity correlates with protection from malaria for the RTS,S vaccine in young African children: A post hoc analysis of a phase IIb randomised controlled trial

作者:

by Alessia Hysa, D. Herbert Opi, Joshua Waterhouse, Sandra Chishimba, Jessica L. Horton, Natalie Kingston, Hans J. Netter, David Wetzel, Michael Piontek, Gaoqian Feng, Jahit Sacarlal, Carlota Dobaño, Liriye Kurtovic, James G. Beeson Background The RTS,S/AS01 malaria vaccine was recently approved for implementation in children, but only provides modest and short-lived efficacy against malaria. RTS,S targets a portion of the Plasmodium falciparum (Pf) circumsporozoite protein (CSP), comprising the central NANP-repeat region and C-terminal domain. Mechanisms of immunity and correlates of protection for the RTS,S vaccine are not well defined, hindering progress towards generating highly effective CSP-based vaccines. Methods and findings We investigated epitope specificity and cross-reactivity of vaccine-induced antibodies to six peptides representing CSP epitopes in the N-terminal and central NANP-repeat region. We evaluated antibody reactivity in preclinical mouse vaccine studies, among CSP-specific monoclonal antibodies (mAbs), and in a large RTS,S phase IIb clinical trial in young children 1–4 years old (n = 735).The preclinical mouse vaccine studies and CSP-specific mAbs were used to initially evaluate IgG responses to the six peptides. Mice immunised with the central NANP-repeat region had IgG with cross-reactivity to an epitope in the N-terminal region. Additionally, we demonstrated that a single CSP-specific mAb could display cross-reactivity to several CSP epitopes. Through post hoc quantification and analysis of antibody responses in the RTS,S phase IIb clinical trial, we found that a subset of children generated IgG with specificity for a short NANP-repeat epitope (NANP2; amino acid sequence: NANPNANP) and cross-reactivity to an N-terminal epitope (J1; amino acid sequence: KQPADGNPDPNANPN). Notably, children with high IgG responses to NANP2 and J1 had a significantly reduced risk of clinical malaria, compared to children with low responses (IgG to NANP2 (aHR: 0.838 (95% CI [0.716, 0.981]; p = 0.028)) and J1 (aHR: 0.718 (95% CI [0.611, 0.844]; p 

14.
arXiv (CS.LG) 2026-06-18

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

arXiv:2602.02056v3 Announce Type: replace-cross Abstract: Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

15.
arXiv (CS.CL) 2026-06-19

Light-weight Pronunciation Assessment via Discrete Speech Token Surprisal

Training automated pronunciation assessment often relies on labeled learner errors or non-native corpora that are costly to collect. We propose a lightweight framework trained only on native speech resources, operating unsupervised or lightly calibrated with a small set of scored utterances. At inference, learner speech is discretized with an SSL encoder and a K-means codebook. A token language model trained on native sequences computes surprisal where higher surprisal indicates phonotactic deviation. We add a transcript-guided Text2DUnit–DTW module that predicts native token sequences from reference text and aligns them to acoustic tokens to derive error-sensitive features. Surprisal and alignment features are fused via simple regression. On SpeechOcean762, PCC improves from 0.60 to 0.66 with transcript guidance, near supervised baselines. Cross-dataset evaluation on L2-ARCTIC shows consistent gains.

16.
arXiv (quant-ph) 2026-06-24

Nonlinear refractive index of warm rubidium vapor

arXiv:2606.24676v1 Announce Type: cross Abstract: The potential to precisely control both the linear and nonlinear index of refraction through optical manipulation of the atomic states has recently pushed warm alkali vapors to the forefront of research in the field of quantum sensors, quantum memories, and quantum fluids of light. Rubidium (Rb) vapor in centimeter-scale glass cells or millimeter-scale MEMS cells has proven to be a very promising platform for these applications, yet only a handful of research works have been dedicated to the investigation of the (non)linear refractive index of Rb vapor. We present results of theoretical calculations of the (non)linear refractive index of warm Rb vapor, based on the optical Bloch equations for 6-level Rb atoms interacting with a probe laser. They are compared to the experimental results obtained using an interferometric technique, showing excellent quantitative agreement. A Kerr nonlinear refractive index $n_2$ of up to $10^{-4}$ cm$^2$/W is obtained. Python scripts for all theoretical calculations presented in this work are provided, including the refractive index calculation, that can readily be used in practical implementations for simulating the (non)linear refractive index of Rb vapor including the effects of Doppler broadening, transit time broadening, pressure broadening, saturation, optical pumping, and spin-exchange collisions.

17.
arXiv (CS.AI) 2026-06-17

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

作者:

arXiv:2502.17518v3 Announce Type: replace-cross Abstract: This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(\tau\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

18.
arXiv (math.PR) 2026-06-12

Mixing times of one-sided $k$-transposition shuffles

arXiv:2112.05085v2 Announce Type: replace Abstract: We study mixing times of the one-sided $k$-transposition shuffle. We prove that this shuffle mixes relatively slowly, even for $k$ big. Using the recent ``lifting eigenvectors'' technique of Dieker and Saliola and applying the $\ell^2$ bound, we prove different mixing behaviors and explore the occurrence of cutoff depending on $k$.

19.
arXiv (quant-ph) 2026-06-25

Single-Period Floquet Control of Bosonic Codes with Quantum Lattice Gates

arXiv:2601.08782v2 Announce Type: replace Abstract: Bosonic codes constitute a promising route to fault-tolerant quantum computing. Existing Floquet protocols enable analytical construction of bosonic codes but typically rely on slow adiabatic ramps with thousands of driving periods. In this work, we circumvent this bottleneck by introducing an analytical and deterministic Floquet method that directly synthesizes arbitrary unitaries within a single period. The phase-space unitary ensembles generated by our approach reproduce the Haar-random statistics, enabling practical pseudorandom states in continuous-variable systems. We prepare various prototypical bosonic codes from vacuum and implement single-qubit logical gates with high fidelities using quantum lattice gates. By harnessing the full intrinsic nonlinearity of Josephson junctions, quantum lattice gates decompose quantum circuits into primitive operations for efficient continuous-variable quantum computing.

20.
arXiv (CS.CL) 2026-06-24

ComputeFHE: A Privacy-Preserving General-Purpose Computation Library

Fully Homomorphic Encryption (FHE) enables computations to be performed directly on encrypted data while preserving data confidentiality. However, its practical applications remain limited by high computational costs and development complexity. This paper presents ComputeFHE, an open-source C++ library that facilitates the development of privacy-preserving applications based on the TFHE cryptosystem. The library provides encrypted integer and fixed-point data types together with arithmetic, logical, comparison, conditional, and oblivious array-access operations which allow developers to implement algorithms using a familiar imperative programming paradigm. ComputeFHE supports both conventional TFHE arithmetic based on standard two-input logic gates and an optimized Arithmetic Logic Unit (ALU) architecture utilizing FHE-friendly logic primitives. Experimental results demonstrate significant reductions in the number of required bootstrapping operations, achieving performance improvements of up to 3.9x for selected operations. In addition, the library includes a simulation mode that enables testing, debugging, and complexity analysis without performing actual cryptographic computations while providing circuit complexity and bootstrapping costs. Built on top of OpenFHE, ComputeFHE offers a practical and accessible framework for developing and evaluating privacy-preserving algorithms and applications.

21.
arXiv (CS.CL) 2026-06-11

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under forced scoring of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $\phi \ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($\phi$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu.

22.
arXiv (CS.CV) 2026-06-11

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

Reconstructing visual stimuli from fMRI signals is a central challenge bridging machine learning and neuroscience. Recent diffusion-based methods typically map fMRI activity to a single neural embedding, using it as static guidance throughout the entire generation process. However, this fixed guidance collapses hierarchical neural information and is misaligned with the stage-dependent demands of image reconstruction. In response, we propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling. MindHier introduces three components: a Hierarchical fMRI Encoder to extract multi-level neural embeddings, a Hierarchy-to-Hierarchy Alignment scheme to enforce layer-wise correspondence with CLIP features, and a Scale-Aware Coarse-to-Fine Neural Guidance strategy to inject these embeddings into autoregression at matching scales. These designs make MindHier an efficient and cognitively aligned alternative to diffusion-based methods by enabling a hierarchical reconstruction process that synthesizes global semantics before refining local details, akin to human visual perception. Extensive experiments on the NSD dataset show that MindHier achieves superior semantic fidelity, 4.67$\times$ faster inference, and more deterministic results than the diffusion-based baselines.

23.
arXiv (CS.CL) 2026-06-17

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8\% and reduces execution time by up to 40.4\% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.

24.
arXiv (CS.CV) 2026-06-15

A Robust Point Cloud Analysis Framework Inspired By Primary Visual Cortex

Despite significant advancements in point cloud analysis, reducing energy consumption and improving robustness remain understudied, largely due to the inherent limitations of Convolutional Neural Networks (CNNs). To address this issue, we draw inspiration from the primary visual cortex and propose a Dendritic-Connected Continuous-Coupled Neural Network (DC-CCNN), a novel Brain-Inspired Neural Network (BINN) architecture for point cloud analysis. By combining discrete and continuous encoding, our design replaces traditional Multilayer Perceptrons (MLPs) with more efficient and robust BINNs. Building upon this framework, we further propose an extended model, DC-CCNN++, to improve robustness under complex corruption conditions. Specifically, we introduce a Neuro-Inspired Robust Modulation-and-Readout Module (NRMR) to enhance feature stability and decision robustness through global-context gain modulation and dual-code evidence integration. We also design a Cortically Inspired Progressive Variability Training (CPVT) strategy, which progressively exposes the model to structured environmental variability while preserving stable clean-sample anchors during training. Experimental results show that DC-CCNN++ improves the performance of brain-inspired networks on point cloud analysis while maintaining performance comparable to state-of-the-art methods. Compared with the original DC-CCNN, it achieves stronger results on both classification and part segmentation, and exhibits enhanced robustness against sparsity, occlusion, Gaussian noise, salt-and-pepper noise, and spatial transformations. With its efficiency, robustness, and biologically grounded design, DC-CCNN++ provides a promising alternative to traditional deep learning methods for point cloud analysis. Code is available at https://anonymous.4open.science/r/DC-CCNNpp-44E3.

25.
medRxiv (Medicine) 2026-06-18

Development and Initial Validation of the Quality of life Evaluation in NF2-related Schwannomatosis Trials (QUEST) Assessment

Individuals with NF2-related schwannomatosis (NF2-SWN) experience a complex constellation of physical, emotional, and social symptoms that substantially impact quality of life (QoL). Although disease-specific patient-reported outcome measures are increasingly important for evaluating treatment benefit in clinical trials, existing NF2-SWN QoL measures have limitations in content coverage and sensitivity to change. This study describes the development and initial validation a new disease-specific QoL assessment – the Quality of Life Evaluation in NF2-related Schwannomatosis Trials (QUEST). Using a three-phase, mixed-methods approach, items were generated through concept elicitation interviews with individuals with NF2-SWN and clinicians, prioritized via patient survey data, and refined through iterative cognitive debriefing procedures. The resulting 21-item QUEST assesses the extent to which NF2-SWN has negatively impacted a persons daily life over the past seven days. Initial psychometric evaluation was conducted in an international sample of 174 individuals with NF2-SWN aged 15 years and older (117 women (67%), 158 White individuals (89%)). Exploratory factor analysis supported a four-factor structure, and the total score demonstrated excellent internal consistency and strong test-retest reliability. Evidence of construct validity was demonstrated through hypothesized associations with disease-specific, generic, and domain-specific QoL measures, as well as known-groups validity based on self-reported disease severity and number of prior surgeries. Incremental validity analyses indicated that QUEST explained unique variance beyond existing measures. Together, findings support the QUEST as a reliable and valid disease-specific QoL measure with strong content validity and feasibility for use as a clinical trial endpoint in NF2-SWN.