Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Anatomically Conditioned Recurrent Refinement for Topology-Aware Circle of Willis Segmentation

Segmenting the Circle of Willis (CoW) from Magnetic Resonance Angiography (MRA) is challenging due to complex topology and thin vascular structures that are prone to fragmentation. Standard Convolutional Neural Networks (CNNs) often fail to capture these topological constraints, resulting in "broken vessel" artifacts. To address this, we propose the Anatomically Conditioned Recurrent Refinement U-Net (AC2RUNet). Our architecture decouples segmentation into two streams: a Static Stream that extracts invariant anatomical features and a lightweight Dynamic Stream that iteratively refines topological errors over time. We further introduce a dynamic curriculum learning strategy that transitions from high-recall geometric supervision to topology-aware constraints. Validated on the TopCoW dataset, AC2RUNet substantially reduces Hausdorff Distance (4.72 mm vs 9.17 mm) and Betti number errors (0.19 vs 0.40), improving topological connectivity over the nnU-Net baseline while maintaining comparable volumetric Dice.

02.
arXiv (CS.AI) 2026-06-18

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

arXiv:2606.18385v1 Announce Type: new Abstract: Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).

03.
arXiv (CS.AI) 2026-06-16

Service-Induced Congestion in Memory-Constrained LLM Serving

arXiv:2606.15555v1 Announce Type: cross Abstract: In large language model (LLM) serving, each request accumulates persistent graphics processing unit (GPU) memory during service as its key-value cache grows with every generated token. Under high concurrency, aggregate memory usage therefore increases endogenously over time: the service process itself creates future capacity pressure. When memory capacity is exceeded, systems evict active requests, discarding cached state and restarting them later, which wastes computation and reduces throughput. We develop a discrete-time dynamical model of memory-constrained LLM inference that captures admission, memory growth, and eviction under continuous batching. In the saturated-input regime, the system admits both eviction-free fixed points and limit cycles with evictions. For homogeneous workloads, we show that the eviction-free equilibrium is unstable and that, except for a Lebesgue-measure-zero exact-capture set, the system converges to a unique worst-case limit cycle that is asymptotically stable outside this exceptional set, with throughput losses as large as 50%. For heterogeneous workloads, we prove a stability criterion in the two-class common-input setting and explain how the survival-polynomial mechanism generalizes to multiple classes and heterogeneous-input lengths. Under an input-dominated scaling regime, coprime decoding lengths stabilize the eviction-free equilibrium, while non-coprime lengths create synchronized modes that drive instability. These results characterize when workload heterogeneity desynchronizes completions and helps stabilize memory-constrained serving. More broadly, we identify service-induced congestion as a structural instability mechanism and derive scheduling design principles for sustaining high throughput.

04.
arXiv (CS.CV) 2026-06-16

Multimodal LLM-Empowered Re-Ranking for Generalizable Person Re-Identification

Domain Generalizable (DG) person re-identification (Re-ID) has attracted growing research interest due to its potential for deployment in unseen real-world scenarios. Most existing approaches address DG Re-ID by focusing on training domain-generalizable encoders but ignore the possible refinements in inference stage. In contrast, this work explores an alternative direction which improves inference re-ranking to enhance DG Re-ID. Conventional re-ranking methods typically rely on neighborhood-based distances to refine the initial ranking list, inherently depending on features produced by the Re-ID encoder. However, they deteriorate on target domains since the encoder lacks sufficient generalizability to produce reliable feature distances on unseen scenarios. Inspired by the remarkable generalization capabilities of recent Multimodal Large Language Models (MLLMs), we propose an MLLM-empowered distance metric to improve re-ranking in DG Re-ID. Specifically, we first adapt an MLLM to Re-ID data through supervised fine-tuning, which incorporates a domain-agnostic prompt and a query-candidate hard mining scheme. Then, the adapted MLLM is employed to compute a $\mu$-distance during inference, which is robust to domain gap and significantly enhances subsequent re-ranking performance. Our approach is model-agnostic and can be seamlessly integrated into previous re-ranking frameworks. Extensive experiments demonstrate that our approach consistently yields substantial performance improvements across multiple DG Re-ID benchmarks. The code of this work will be released at https://github.com/RikoLi/MUSE soon.

05.
arXiv (CS.CV) 2026-06-11

From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models

While multimodal large language models (MLLMs) have made substantial progress in single-image spatial reasoning, multi-image spatial reasoning, which requires integration of information from multiple viewpoints, remains challenging. Cognitive studies suggest that humans address such tasks through two mechanisms: cross-view correspondence, which identifies regions across different views that correspond to the same physical locations, and stepwise viewpoint transformation, which composes relative viewpoint changes sequentially. However, existing studies incorporate these mechanisms only partially and often implicitly, without explicit supervision for both. We propose Human-Aware Training for Cross-view correspondence and viewpoint cHange (HATCH), a training framework with two complementary objectives: (1) Patch-Level Spatial Alignment, which encourages patch representations to align across views for spatially corresponding regions, and (2) Action-then-Answer Reasoning, which requires the model to generate explicit viewpoint transition actions before predicting the final answer. Experiments on three benchmarks demonstrate that HATCH consistently outperforms baselines of comparable size by a clear margin and achieves competitive results against much larger models, while preserving single-image reasoning capabilities.

06.
arXiv (CS.AI) 2026-06-15

Aligning Quantum Operators with Large Language Models

arXiv:2606.13811v1 Announce Type: cross Abstract: Can Large Language Models (LLMs) understand and reason about quantum operators? Despite their remarkable capabilities in mathematics and symbolic reasoning, LLMs remain inherently blind to quantum representations such as unitary matrices. In this work, we take a step toward bridging this gap by introducing an approach that maps unitary operators into the latent space of an LLM, enabling unified modeling over quantum and linguistic inputs. We instantiate this idea on Clifford+T circuit synthesis over a Pauli rotation gate set, where our model achieves results competitive with state-of-the-art methods and scales consistently with training data, with no signs of saturation. Our approach further enables language-conditioned synthesis, allowing gate constraints unseen during training to be specified directly in natural language. This work suggests a path toward quantum–aware foundation models that can natively interpret and reason about quantum operations, which could have broader implications reaching across quantum compilation and algorithm discovery.

07.
arXiv (CS.AI) 2026-06-11

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

arXiv:2509.11575v3 Announce Type: replace Abstract: Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (https://github.com/blacksnail789521/Time-Series-Reasoning-Survey). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.

08.
arXiv (CS.CL) 2026-06-11

Measuring language complexity from hierarchical reuse of recurring patterns

We introduce the ladderpath index as a measure of language complexity grounded in algorithmic information theory. It counts the minimum steps needed to reconstruct a sequence through hierarchical reuse of repeated substructures, capturing an exactly computable but constrained form of algorithmic compressibility related to, but distinct from, Kolmogorov complexity. We apply the ladderpath approach to 21 parallel corpora from the Parallel Universal Dependencies dataset. The ladderpath index is approximately invariant across the languages, and varies much less than the corpus length. This is more pronounced when all corpora are mapped to a unified binary representation, providing evidence for the equi-complexity hypothesis from a representation-independent perspective. We also observe trade-offs between character inventory size and corpus length, and between vocabulary-level and corpus-level reconstruction complexity, supporting the trade-off hypothesis that total complexity is conserved and redistributed across linguistic levels. The reusable substructures identified by the ladderpath approach, without any linguistic input, overlap with words and morphological components attested in the natural vocabulary. The hierarchical reuse captured by the ladderpath approach parallels the chunking mechanisms proposed in cognitive science, where the human cognitive system compresses linguistic input into nested, reusable units under shared memory and processing constraints. This connection between cognitive chunking and the ladderpath approach provides a new interpretation for the equi-complexity and trade-off hypotheses, grounding both in the shared cognitive architecture that underlies language processing across human languages.

09.
PLOS Medicine 2026-05-29

Characterization of the VHH-Fc construct rimteravimab in healthy adults and patients hospitalized for mild-to-moderate COVID-19: Two Phase 1 randomized clinical trials

作者:

by Ellen Jansen, Viki Bockstal, Florence Herschke, Per Olsson Gisleskog, Manuela Rinaldi, Angélique Boerboom, Salah Hadi, Natalia Gaibu, Michel Moutschen, Dominique Tersago Background Variable Heavy domain of Heavy chains (VHH) are innovative tools to target unique epitopes, yet few have been developed as heavy chain-only antibodies for clinical use. Rimteravimab (referred to here as XVR011) is a humanized antibody developed for the treatment of mild-to-moderate coronavirus disease 2019 (COVID-19), consisting of two identical VHHs targeting the receptor binding domain (RBD) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike, with a human immunoglobulin (Ig) G1 fragment constant of antibody (Fc), silenced for Fc effector functions. We conducted two Phase 1 studies in healthy volunteers or hospitalized COVID-19 patients to evaluate its safety, tolerability, pharmacokinetics and immunogenicity. Methods and findings A randomized, double-blinded, single-center, placebo-controlled, single ascending dose study was performed in healthy volunteers (Phase 1a, EXEVIR0102, EudraCT 2021-003707-17), in parallel to an open-label, multi-center, single ascending dose study in patients hospitalized for mild to moderate COVID-19 (Phase 1b, EXEVIR0101, EudraCT 2020-005299-36, NCT04884295). Participants received a single intravenous infusion of 250, 500 or 1,000 mg of XVR011. The primary objective for both trials was the safety and tolerability of XVR011. Pharmacokinetics were evaluated as a secondary objective in Phase 1a and as an exploratory objective in Phase 1b. Efficacy (evaluated as respiratory parameters and COVID-19 clinical status) and antiviral activity in patients were evaluated as a secondary objective in Phase 1b. Immunogenicity was evaluated as an exploratory objective. Part 2 of the EXEVIR0101 study (initially a phase 1b/2 study) was not conducted due to the loss of XVR011 potency against SARS-CoV-2 Omicron BA.2. Demographics, safety, efficacy, and immunogenicity were analyzed using descriptive statistics, while pharmacokinetics were analyzed with noncompartmental pharmacokinetics (PK) modeling.In the Phase 1a study, there were no infusion-related reactions, serious treatment-emergent adverse events (TEAEs) or TEAEs grade ≥3. 22/30 volunteers (73.3%) reported 53 TEAEs (49 Grade 1, 4 Grade 2) with none being related to XVR011. The most common TEAE was headache (n = 8, 26.7%) in various treatment groups. In the Phase 1b study, 27 hospitalized patients were enrolled, and followed up to 30 days. Seven patients (25.9%) reported a total of 15 TEAEs, the majority (80%) being mild to moderate (Grade 1–2). There were no treatment-related serious TEAEs. All TEAEs resolved by the end of the study. Peak exposure (maximal concentration, Cmax) and systemic exposure (area under the curve, AUC0-t, and AUC0-inf) for XVR011 increased dose-proportionally. Geomean half-life ranged from 15.4 to 17.0 days in Phase 1a, while individual half-life ranged from 11.4 to 15.6 days in Phase 1b. SARS-CoV-2 viral load, as detected in nasopharyngeal samples by reverse transcription and quantitative polymerase chain reaction (RT-qPCR), decreased similarly in all cohorts compared to baseline. No treatment-induced anti-drug antibodies (ADA) were detected in Phase 1a. In Phase 1b, higher XVR011 concentrations increased the likelihood of ADA formation, without impacting pharmacokinetics and pharmacodynamics. No obvious dose-response in COVID-19 clinical status or respiratory parameters was observed.Technological limitations included study size, absence of placebo for the Phase 1b, absence of repeated dosing, evolving SARS-CoV-2 variants and standard-of-care. Conclusions XVR011 displayed a favourable safety, tolerability, pharmacokinetics, and immunogenicity profile, both in healthy volunteers and in patients hospitalized for mild to moderate COVID-19. These data pave the way for the design and clinical development of VHH-Fc constructs.

10.
arXiv (CS.CV) 2026-06-15

A Robust Point Cloud Analysis Framework Inspired By Primary Visual Cortex

Despite significant advancements in point cloud analysis, reducing energy consumption and improving robustness remain understudied, largely due to the inherent limitations of Convolutional Neural Networks (CNNs). To address this issue, we draw inspiration from the primary visual cortex and propose a Dendritic-Connected Continuous-Coupled Neural Network (DC-CCNN), a novel Brain-Inspired Neural Network (BINN) architecture for point cloud analysis. By combining discrete and continuous encoding, our design replaces traditional Multilayer Perceptrons (MLPs) with more efficient and robust BINNs. Building upon this framework, we further propose an extended model, DC-CCNN++, to improve robustness under complex corruption conditions. Specifically, we introduce a Neuro-Inspired Robust Modulation-and-Readout Module (NRMR) to enhance feature stability and decision robustness through global-context gain modulation and dual-code evidence integration. We also design a Cortically Inspired Progressive Variability Training (CPVT) strategy, which progressively exposes the model to structured environmental variability while preserving stable clean-sample anchors during training. Experimental results show that DC-CCNN++ improves the performance of brain-inspired networks on point cloud analysis while maintaining performance comparable to state-of-the-art methods. Compared with the original DC-CCNN, it achieves stronger results on both classification and part segmentation, and exhibits enhanced robustness against sparsity, occlusion, Gaussian noise, salt-and-pepper noise, and spatial transformations. With its efficiency, robustness, and biologically grounded design, DC-CCNN++ provides a promising alternative to traditional deep learning methods for point cloud analysis. Code is available at https://anonymous.4open.science/r/DC-CCNNpp-44E3.

11.
arXiv (CS.CV) 2026-06-19

MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing

We introduce MeshPad, a generative approach that creates 3D meshes from sketch inputs. Building on recent advances in artist-reminiscent triangle mesh generation, our approach addresses the need for interactive mesh creation. To this end, we focus on enabling consistent edits by decomposing editing into 'deletion' of regions of a mesh, followed by 'addition' of new mesh geometry. Both operations are invoked by simple user edits of a sketch image, facilitating an iterative content creation process and enabling the construction of complex 3D meshes. Our approach is based on a triangle sequence-based mesh representation, exploiting a large Transformer model for mesh triangle addition and deletion. In order to perform edits interactively, we introduce a vertex-aligned speculative prediction strategy on top of our additive mesh generator. This speculator predicts multiple output tokens corresponding to a vertex, thus significantly reducing the computational cost of inference and accelerating the editing process, making it possible to execute each editing step in only a few seconds. Comprehensive experiments demonstrate that MeshPad outperforms state-of-the-art sketch-conditioned mesh generation methods, achieving more than 22% mesh quality improvement in Chamfer distance, and being preferred by 90% of participants in perceptual evaluations.

12.
arXiv (CS.AI) 2026-06-16

Gender Differences in AI Literacy Workshop Outcomes and Deepfake Engagement

arXiv:2606.14718v1 Announce Type: cross Abstract: As Artificial Intelligence (AI) literacy initiatives expand in K-12 settings, understanding how gender shapes student baseline perceptions, tool-use, and responsiveness to interventions is essential for equitable curriculum design. This study examines gender differences in AI literacy, safety awareness, and STEM career aspirations among Australian secondary students (Years 7, 8, and 10; N(pre) = 199, n(post) = 136) from two co-educational government schools who participated in a one-day AI literacy workshop. Using statistical regression methods controlling for year level and school, we found that pre-workshop, male students reported significantly higher STEM career interest across all three domains (AI, computer science, and engineering), while female students were significantly more likely to use AI for schoolwork and to seek advice from AI tools. Gender-differentiated patterns also emerged in deepfake behaviours: males were significantly more likely to have created or shared deepfake content. Both genders improved in AI knowledge post-intervention, yet females showed a richer profile of gains: wider conceptual understanding, greater confidence, and meaningful increases in AI and CS career interest that partially narrowed the gender STEM gap. These findings highlight the need for gender-responsive AI curricula, particularly deepfake safety education for male students, and demonstrate that even single-day workshops can narrow gender gaps in STEM aspirations and AI confidence.

13.
arXiv (CS.LG) 2026-06-15

Adaptive Oscillatory-State Alignment for Time Series Forecasting

arXiv:2606.06010v2 Announce Type: replace Abstract: Long-term time series forecasting benefits from inductive biases that expose recurring temporal structure. Existing periodic forecasting methods typically model recurrence through predefined periods, global spectral components, or fixed learnable templates. However, real-world temporal dynamics are rarely rigidly periodic: around a nominal cycle, oscillatory behavior often exhibits non-rigid periodicity (NRP), where cycle magnitude, cycle alignment, and local cycle duration vary over time. Under these conditions, fixed-template periodic modeling can become fundamentally mismatched to the underlying temporal states. We propose AOSNet, a Hilbert-guided forecasting framework that reformulates periodic forecasting from fixed template matching to adaptive oscillatory-state alignment. AOSNet extracts analytic-signal descriptors from both the observed sequence and a learnable global oscillatory prior, then adaptively aligns local states through a descriptor-conditioned gate that selectively preserves reliable observations while softly correcting mismatched regions. The learned prior serves not as a rigid repeated template but as a flexible oscillatory reference interpreted through local state dynamics. Experiments on eight public benchmarks and two cloud workload traces demonstrate leading or highly competitive accuracy with a compact model size and low inference latency, supporting repeated forecasting settings such as capacity planning and autoscaling. Controlled synthetic studies that isolate cycle-magnitude and cycle-alignment variation and combine them with cycle-duration changes show that the advantage of oscillatory-state alignment increases as NRP intensifies.

14.
arXiv (quant-ph) 2026-06-19

Quantum models with the Yang-Lee phase transition

arXiv:2606.19732v1 Announce Type: cross Abstract: In this article, we present four different $1+1$D quantum models that realize the Yang-Lee (YL) phase transition under a deformation that preserves $PT$ symmetry. These are the antiferromagnetic Ising spin chain in transverse and longitudinal magnetic fields, the massive Schwinger model, the Blume-Capel model, and the three-state quantum clock model. Using the state-operator correspondence, we identify the YL critical point, compute the scaling dimensions of the lowest operators in each model, and find perfect agreement with the exact results for the YL criticality in two dimensions. Using bosonization for the Schwinger model and the Polyakov-Hubbard transformation for the other models, we show that in all of these quantum models the YL critical point is described, as expected, by a massless bosonic field with an $i \phi^3$ interaction. In the quantum clock model, this critical field interacts with a massive bosonic field, and we identify the massless and massive states in the Hamiltonian spectrum. In addition, we numerically compute the two-point function of $\phi$ at the Yang-Lee critical point and show that it grows with distance, in agreement with theoretical expectations.

15.
arXiv (quant-ph) 2026-06-12

Kubo-Martin-Schwinger conditions for non-Hermitian systems

arXiv:2606.13251v1 Announce Type: new Abstract: We investigate the extension of the Kubo–Martin–Schwinger (KMS) thermal equilibrium condition to non-Hermitian Hamiltonians with real spectra and biorthogonal eigensystems, providing a systematic analysis through three complementary routes. Our central result is a thermodynamic characterisation of quasi-Hermiticity: for $H \in M_d(\mathbb{C})$ diagonalisable with real spectrum, the biorthogonal Gibbs functional $\omega_{\rm{bi}}(A) = Z_{\rm{bi}}^{-1} \sum_n e^{-\beta E_n}\langle\phi_n|A|\psi_n\rangle$ satisfies $\omega_{\rm{bi}}(A^\dag A) \geq 0$ for all $A$ if and only if $H$ is quasi-Hermitian. The proof constructs the metric $\eta$ directly from the eigenprojectors of $\omega_{\rm{bi}}$ via the Riesz representation theorem, with no prior choice of $\eta$, providing a metric-free certificate of quasi-Hermiticity outside the Mostafazadeh–Scholtz framework. Under the full quasi-Hermitian hypothesis, we prove that the $\eta$-Gibbs state $\omega_\eta(A) = Z_\eta^{-1}\, \rm{Tr}[\eta e^{-\beta H}A]$ satisfies all three analytic KMS conditions, using the Hadamard three-line theorem and Bari's theorem on Riesz bases. The result is non-trivial: the transported state $\hat\omega(X) = \rm{Tr}[e^{-\beta h}X\eta]/Z_\eta$ differs from the Gibbs state of the isospectral Hermitian partner $h = \eta^{1/2}H\eta^{-1/2}$ whenever $[\eta,h]\neq 0$, so the KMS property cannot be deduced from the Hermitian theory by similarity. The gap between this result and the full Haag–Hugenholtz–Winnink $C^*$-algebraic framework is identified. Failure modes at exceptional points and for complex spectra are analysed, and the relation to the Fagnola–Umanità quantum detailed balance condition for open systems is discussed.

16.
arXiv (CS.AI) 2026-06-18

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

arXiv:2605.22142v2 Announce Type: replace-cross Abstract: Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

17.
arXiv (CS.AI) 2026-06-15

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

arXiv:2606.13767v1 Announce Type: cross Abstract: Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performance. We present a historical framing, covering the past (full fine-tuning and original LoRA), the present (different variants of LoRA), and propose simpler, cheaper, parameter-efficient extensions by inducing sparsity within existing LoRA variants: Cheap LoRA (cLA), training a single low-rank factor with the other fixed (deterministically or, in its randomized variant, stochastically), and the chained circulant variant, ${c}^3$LA. We frame cLA as a structured instance of asymmetric LoRA, serving as a controlled column-subspace restriction of full fine-tuning. We derive information-theoretic generalization error bounds for these variants, marking one of the first endeavors in this area. Empirically, we evaluate 11 fine-tuning methods across 10 pre-trained models and 14 datasets, analyzing the fine-tuned models' performance and generalization using tools such as loss landscapes and spectral analysis. Despite the sensitivity of fine-tuned models to the pre-trained model, datasets, and other factors, our study suggests that restricting LoRA-based PEFT methods' adaptation to a sparse, structured column space remains competitive across tasks with their parameter-matched baselines while reducing up to 10% training time and peak GPU memory up to 15%, even with a naïve, non-optimized, sparse implementation. Our theoretical and empirical generalization measures provide a more consistent and principled approach to their cost-effective adaptation than commonly used analytical tools. Overview and code are available at: https://elicaden.github.io/Beyond_LoRA/.

18.
arXiv (CS.AI) 2026-06-12

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

arXiv:2602.04208v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed-insufficient under perceptual ambiguity, where reconsidering how to perceive is as important as deciding what to do. To address these limitations, we propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on 'self-uncertainty', inspired by uncertainty-driven exploration in Active Inference theory-requiring no additional training, no verifier, and only a single forward pass. SCALE broadens exploration in both perception and action under high uncertainty, while focusing on exploitation when confident-enabling adaptive execution across varying conditions. Experiments on simulated and real-world benchmarks demonstrate that SCALE improves state-of-the-art VLAs and outperforms existing TTS methods while maintaining single-pass efficiency.

19.
arXiv (CS.CV) 2026-06-16

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture–shape gap between CNNs and attention models.

20.
arXiv (CS.LG) 2026-06-15

Classification of Astronomical Spectra Using PCA-Compressed Flux and Inverse-Variance Features

arXiv:2606.13978v1 Announce Type: cross Abstract: This paper evaluates a signal-processing and supervised-learning pipeline for classifying SDSS DR17 astronomical spectra into stars, galaxies, and quasars. Each spectrum is represented by its measured flux and inverse-variance information, combining spectral shape with a wavelength-dependent reliability profile. After resampling onto a common logarithmic wavelength grid, the flux and inverse-variance vectors are standardized and separately compressed using principal component analysis. The resulting components are concatenated and used to train several classifiers. The best performance was obtained with the LightGBM gradient-boosting classifier, reaching $94.6\%$ accuracy and $92.1\%$ balanced accuracy on the test set.

21.
arXiv (CS.LG) 2026-06-15

Federated Learning for Feature Generalization with Convex Constraints

arXiv:2606.14416v1 Announce Type: new Abstract: Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the parameter strength of the global model. This prevents over-emphasizing well-learned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stability and preserve locally learned generalization capabilities during aggregation. A Gradient Signal to Noise Ratio (GSNR) analysis further validates the effectiveness of FedCONST in enhancing feature transferability and robustness. As a result, FedCONST effectively aligns local and global objectives, mitigating overfitting and promoting stronger generalization across diverse FL environments, achieving state-of-the-art performance.

22.
arXiv (CS.AI) 2026-06-18

What Must Generalist Agents Remember?

arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.

23.
arXiv (CS.LG) 2026-06-16

Continual Backdoor Training in IoT/CPS

arXiv:2606.14987v1 Announce Type: cross Abstract: Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also introduces new security vulnerabilities. In particular, backdoor attacks can exploit incremental updates, replay buffers, and representation reuse to implant persistent malicious behaviors that remain dormant during normal operation but activate upon specific triggers. In this paper, we present a backdoor attack in continual learning used in IoT/CPS systems. To this end, we formalize an IoT/CPS-specific threat model, analyze why continual learning amplifies backdoor persistence in IoT pipelines, and evaluate our technique under varying conditions. Our analysis highlights critical open challenges in securing lifelong learning in IoT/CPS and industrial IoT (IIoT) environments, as well as the need for heightened security controls.

24.
arXiv (CS.CV) 2026-06-11

Diffusion-based Cumulative Adversarial Purification for Vision Language Models

Vision Language Models (VLMs) have shown remarkable capabilities in multimodal understanding, yet their susceptibility to adversarial perturbations poses a significant threat to their reliability in real-world applications. Despite often being imperceptible to humans, these perturbations can drastically alter model outputs, leading to erroneous interpretations and decisions. This paper introduces DiffCAP, a novel diffusion-based purification strategy that can effectively neutralize adversarial corruptions in VLMs. We theoretically establish a provable recovery region in the forward diffusion process and meanwhile quantify the convergence rate of semantic variation with respect to VLMs. These findings manifest that adversarial effects monotonically fade as diffusion unfolds. Guided by this principle, DiffCAP leverages noise injection with a similarity threshold of VLM embeddings as an adaptive criterion, before reverse diffusion restores a clean and reliable representation for VLM inference. Through extensive experiments across six datasets with three VLMs under varying attack strengths in three task scenarios, we show that DiffCAP outperforms existing defense techniques by a substantial margin. Notably, DiffCAP significantly reduces both hyperparameter tuning complexity and the required diffusion time, thereby accelerating the denoising process. Equipped with theorems and empirical support, DiffCAP provides a robust and practical solution for securely deploying VLMs in adversarial environments. The source code is available at https://github.com/JasonFu1998/DiffCAP.

25.
arXiv (CS.CL) 2026-06-11

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study

Controlling the output of Large Language Models (LLMs) is a central challenge for their reliable deployment, yet a clear understanding of the involved trade-offs remains elusive. Current approaches to conditioning are often evaluated with a narrow focus on their effectiveness at injecting or removing a target concept, neglecting generation quality. We systematically investigate a range of conditioning methods in both injection and removal scenarios. We find that efficient steering methods frequently achieve conditioning at a steep cost to fluency. Furthermore, we identify a critical yet previously overlooked interaction with the training paradigm: activation steering methods are far less effective on instruction-tuned models than on their base counterparts. Simple prompting and full-fledged supervised fine-tuning, on the other hand, are viable options for concept injection, but are not as good at concept removal. Finally, cheaply computed textual metrics highly correlate to costly LLM-as-judge scores, and provide insights on the behavior of conditioning methods.