Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-18

IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

AudioLLMs enable speech recognition conditioned on textual prompts such as domain descriptions or entity lists. However, it remains unclear whether these models genuinely utilise such context or rely on parametric knowledge learned during pretraining. Existing benchmarks cannot answer this question because they evaluate transcription under fixed prompting conditions and rarely include explicit contextual inputs. We introduce IndicContextEval, a 56-hour multilingual benchmark of natural speech from 555 speakers across 8 Indian languages and 23 professional domains. We design a 7-level prompting framework that progressively introduces contextual signals, including metadata, natural-language descriptions, entity lists in English and native script, and adversarial prompts with incorrect entities. Evaluating five models reveals substantial differences in context utilisation behaviour, highlighting the need for explicit evaluation of contextual grounding in AudioLLMs.

02.
bioRxiv (Bioinfo) 2026-06-20

A network approach to DNA methylation clocks

Biological age predicts health and lifespan better than chronological age, but remains difficult to measure. One leading molecular proxy for biological age is DNA methylation, which underlies age predictors known as "clocks". These clocks use penalized linear regression to predict chronological age from methylation levels using selected cytosine–guanine pairs (CpGs) along DNA. Although they predict chronological age within a few years and track mortality risk, there are several issues. Different clocks share a vanishingly small number of CpG sites, many of which show weak associations with age. Also, the clocks often do not transfer across methylation array platforms. This paper takes a network approach to better understand these issues. By using 12 public datasets from human blood, we build a co-methylation network of the sites that show the strongest age correlation. After pruning weak links, we find that it has a small number of large modules of covarying CpGs surrounded by many small modules and singleton sites. These modules are biologically interpretable, as they are associated with CpG island contexts and enriched for distinct Gene Ontology functions. We also map five established clocks onto this network (Horvath, Hannum, AltumAge, Skin & Blood, and Han) and find that they select some CpGs from the same module. This suggests that they are more similar than they appear. The network structure also suggests new ways to build clocks. A simple clock that retains one CpG per module matches the performance of established clocks. A second one, built from module-level principal components, outperforms all five established clocks in three validation cohorts and is transferable across array platforms (Illumina Infinium Methylation 450K or EPIC arrays). Overall, the network perspective shifts attention from individual CpG sites to modules of covarying sites. This perspective helps explain why DNA methylation clocks perform so well despite their differences and provides a more systematic approach for developing the next generation of aging biomarkers.

03.
arXiv (CS.CV) 2026-06-18

Rethinking Air-Ground Collaboration: A Progressive Cross-Task Benchmark and Socialized Learning Framework

Air-ground collaborative perception is crucial for robust visual understanding in real-world dynamic environments. However, existing studies typically formulate collaboration as single-task cross-view fusion, overlooking the functional dependencies among localization, target association, and fine-grained parsing. In addition, the heterogeneous nature of aerial and ground views introduces substantial geometric, scale, and occlusion discrepancies, making uniform feature sharing vulnerable to negative transfer. To tackle these issues, we model air-ground perception as a progressive cross-task collaboration task and construct the Air-Ground Progressive Collaboration (AGPC) benchmark, a spatio-temporally aligned benchmark comprising more than 745K raw video frames. Built upon this benchmark, we propose Socialized Co-Perception (SCP), a coarse-to-fine framework that organizes collaboration progressively from aerial global localization to ground target association and identity-aware parsing. Its core module, the Dual-Layer Router (DLR), decouples input-side multi-scale expert selection from output-side task-conditioned modulation, enabling selective cross-view and cross-task interaction while suppressing harmful interference. Extensive experiments demonstrate the effectiveness of SCP. It achieves a 3.73\% coevolutionary gain and a 7.86\% improvement in average downstream performance. These results show that task-conditioned collaboration is more effective than uniform fusion for heterogeneous air-ground perception. The code is available at https://github.com/g1136639260-spec/AGSCP.

04.
arXiv (CS.CV) 2026-06-17

Neural Tree Reconstruction for the Open Forest Observatory

The Open Forest Observatory (OFO) is a collaboration across universities and other partners to make low-cost forest mapping accessible to ecologists, land managers, and the general public. The OFO is building both a database of geospatial forest data as well as open-source methods and tools for forest mapping by uncrewed aerial vehicle. Such data are useful for a variety of climate applications including prioritizing reforestation efforts, informing wildfire hazard reduction, and monitoring carbon sequestration. In the current iteration of the OFO's forest map database, 3D tree maps are created using classical structure-from-motion techniques. This approach is prone to artifacts, lacks detail, and has particular difficulty on the forest floor where the input data (overhead imagery) has limited visibility. These reconstruction errors can potentially propagate to the downstream scientific tasks (e.g. a wildfire simulation.) Advances in 3D reconstruction, including methods like Neural Radiance Fields (NeRF), produce higher quality results that are more robust to sparse views and support data-driven priors. We explore ways to incorporate NeRFs into the OFO dataset, outline future work to support even more state-of-the-art 3D vision models, and describe the importance of high-quality 3D reconstructions for forestry applications.

05.
arXiv (CS.AI) 2026-06-17

Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains

arXiv:2606.17269v1 Announce Type: new Abstract: In skill-constrained production-inventory systems, the qualified human capacity available tomorrow depends on training decisions made today: production requires certified workers, certifications decay unless maintained, and training consumes the same scarce worker hours that production needs now. We study a closed-loop skill-constrained model predictive controller that, at every shift, solves a finite-horizon mixed-integer program over production, inventory, backlog, and training, with binary predicted certification, hard production eligibility, and an interpretable terminal value that prices certified-capacity gaps at the horizon boundary; only the first-period action is applied before replanning. On synthetic, seed-controlled SkillChain-Gym scenarios - announced and surprise new-skill shocks, demand shocks, absenteeism, forecast- and availability-quality modes, capacity-boundary and training-rate sweeps, and negative controls - we evaluate the controller against production-only and maintenance-only ablations, static cross-training insurance plans, and a strong reactive heuristic, under an ex-ante locked configuration and paired statistics. The result is regime dependence, not superiority: no policy class dominates. Predictive control helps when skill or labor bottlenecks are forecastable early enough for training to complete; lean static insurance remains hard to beat under surprise shocks, near the demand-capacity boundary, and wherever pre-shock slack makes insurance cheap. Attribution ablations separate certification maintenance, re-acquisition of lapsed certifications, and greenfield skill acquisition. Forecastability, not adaptivity per se, decides when predictive control pays.

06.
arXiv (CS.CV) 2026-06-16

FrameOracle: Learning What to See and How Much to See in Videos

Vision-language models (VLMs) advance video understanding but operate under tight computational budgets, making performance dependent on selecting a small, high-quality subset of frames. Existing frame sampling strategies, such as uniform or fixed-budget selection, fail to adapt to variations in content density or task complexity. To address this, we present FrameOracle, a lightweight, plug-and-play module that predicts both (1) which frames are most relevant to a given query and (2) how many frames are needed. FrameOracle is trained via a curriculum that progresses from weak proxy signals, such as cross-modal similarity, to stronger supervision with FrameOracle-41K, the first large-scale VideoQA dataset with validated keyframe annotations specifying minimal sufficient frames per question. Extensive experiments across five VLMs and six benchmarks show that FrameOracle reduces 16-frame inputs to an average of 10.4 frames without accuracy loss. When starting from 64-frame candidates, it reduces inputs to 13.9 frames on average while improving accuracy by 1.5%, achieving state-of-the-art efficiency-accuracy trade-offs for scalable video understanding.

07.
arXiv (CS.AI) 2026-06-18

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

arXiv:2506.14126v2 Announce Type: replace-cross Abstract: Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

08.
arXiv (CS.LG) 2026-06-15

Machine Learning for Biomedical Raman Spectroscopy: From Spectral Acquisition to Clinical Translation

arXiv:2606.14169v1 Announce Type: new Abstract: Raman spectroscopy provides label-free, chemically specific characterization of biological systems and has become an important tool for cancer diagnosis, molecular subtyping, microbiological identification, and intraoperative decision support. Biomedical Raman spectra are, however, high-dimensional, noisy, and affected by fluorescence background, acquisition variability, and biological heterogeneity, making robust computational analysis essential. This review examines the role of machine learning across the biomedical Raman spectroscopy pipeline, from preprocessing and signal correction to unsupervised structure discovery, supervised diagnosis and molecular stratification, representation and transfer learning, explainability, biomarker discovery, and multimodal integration with imaging, pathology, and molecular profiling. Emphasis is placed on the use of machine learning not only for diagnostic classification, but also for biologically interpretable and clinically actionable analysis. We also discuss the main barriers to clinical translation, including limited dataset sizes, inter-instrument variability, inconsistent preprocessing, insufficient external validation, reproducibility concerns, and limited sharing of software, data, and metadata. We argue that progress will require methodological advances together with standardization, robust validation, explainability, and deployment-ready analytical frameworks. By integrating methodological, biomedical, and translational perspectives, this review outlines key directions for developing reliable and clinically deployable Raman-AI systems.

09.
arXiv (quant-ph) 2026-06-19

Quantum-Accelerated Self-Consistent Field: A Hybrid Algorithm

arXiv:2606.20176v1 Announce Type: new Abstract: We present the Grover adaptive search self-consistent field (GAS-SCF) algorithm. GAS-SCF leverages quantum arithmetic to construct an efficient oracle that marks target states (Fock states) which improve upon some initial classical energy estimate. Amplitude amplification then increases the probability of measuring these states. This approach offers a theoretical quadratic speed-up for the optimization problem encountered in SCF quantum chemistry and establishes a baseline against which structured optimization algorithms, such as QAOA and DQI may be compared. In this work, we classically simulate three examples as proofs of concept of the algorithm, the largest consisting of 26 qubits. We then extend our analysis to two larger systems, with O3 representing the largest case at 330 qubits. These examples are chosen to probe classically challenging SCF regimes. Achieving chemically relevant applications of GAS-SCF will require large-scale, fault-tolerant quantum hardware.

10.
arXiv (CS.LG) 2026-06-16

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

arXiv:2602.05779v2 Announce Type: replace Abstract: The Edge-of-Chaos (EoC) theory developed for the random initialization of deep networks allows more efficient training by both preserving information in the initial outputs of the network and minimising exploding or vanishing gradients through characterisation of the intermediate layers as Gaussian processes. This EoC theory provides formulae for the choice of the initialisation distribution variances of the weights and biases. For activations which are approximately linear around the origin, the EoC theory typically encourages the Gaussian process variance to converge towards zero with increasing depth. Here we consider the less studied setting of highly sparsity inducing activations where a large region of values near the origin are set to zero. In this setting we prove a new phenomenon whereby initialisations leading to larger fixed Gaussian processes are beneficial to training stability. This theory informs a new, yet simple, initialisation strategy that allows training DNNs and CNNs with as large as 90\% sparsity in the hidden layers.

11.
arXiv (CS.LG) 2026-06-18

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

arXiv:2606.18326v1 Announce Type: new Abstract: The application of machine learning models in practical tasks faces challenges such as class imbalance and multidimensional noise. This paper proposes RGNet, a neural network architecture based on the concept of the renormalization group (RG), for hierarchical coarse-graining of the feature space. The model sequentially compresses the input dimensionality and concatenates all scales before classification, allowing it to capture both local details and global patterns. The notion of RG-flows is introduced - interpretable low-dimensional representations whose visualization via t-SNE reveals a discrete curvilinear structure confirming the effectiveness of coarse-graining. Experimental results are presented on the imbalanced AI4I dataset. The obtained results demonstrate that RGNet is a universal, interpretable, and competitive solution for fault prediction in applications with imbalanced classes.

12.
Nature (Science) 2026-06-11

Daily briefing: Deep-sea whale graveyard is a treasure trove of fossils

作者:

Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better. Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better.

13.
arXiv (math.PR) 2026-06-17

Large deviation principle for friendship-biases in Galton–Watson trees

arXiv:2606.17381v1 Announce Type: new Abstract: In this paper we consider the friendship-bias of the vertices in an infinite rooted Galton–Watson tree. The friendship-bias of a vertex is the difference between the average degree of the neighbours of the vertex and the degree of the vertex itself. A vertex is said to be of type $\chi \in S$, with $S = \{-,0,+\}$, when its friendship-bias is, respectively, strictly negative, zero or strictly positive. We consider the fractions $f_l^\chi$ of vertices of type $\chi \in S$ along a random downward path up to branching depth $l \in \mathbb{N}$ and derive a large deviation principle (LDP) for the triple $(f_l^\chi)_{\chi \in S}$ as $l\to\infty$. The branching depth of a vertex counts the number of branchings that occur along the path that connects the vertex to the root of the tree. The rate in the LDP is $l$, while the rate function in the LDP is identified in terms of a variational formula minimising a relative entropy under a linear constraint. We focus on the case of binary branching, for which the rate function is already quite involved. We identify the qualitative properties of the rate function and show how it can be computed numerically. We briefly indicate how to proceed for more general branching and for vertex types along a tree consisting of a finite number of random downward paths. Our paper is the first to consider large deviations of vertex types.

14.
arXiv (quant-ph) 2026-06-24

Enhanced Tantalum Superconducting Resonator Performance via All-Surface Organic Monolayer Passivation

arXiv:2604.22112v2 Announce Type: replace-cross Abstract: Tantalum is a promising platform for superconducting quantum circuits, yet coherence times remain limited by dielectric losses from interfacial two-level systems (TLS), exacerbated by native oxide regrowth. Here, we implement molecular surface passivation using self-assembled organic monolayers on freshly etched tantalum and silicon in coplanar waveguide resonators. Surface characterization by contact angle, XPS, FTIR and TEM confirm the formation of ordered, nanometer-thick films that suppress oxide formation. Microwave measurements in the ~5-9 GHz range reveal internal quality factors up to 1.8x10^6 in the single-photon regime at 100 mK, representing a ~140% improvement over untreated devices with native oxide. Power and temperature dependent measurements attribute this enhancement to reduced TLS-induced losses. These results demonstrate that molecular passivation effectively engineers low-loss interfaces and provides a scalable route toward high-coherence superconducting quantum devices.

15.
arXiv (CS.CL) 2026-06-18

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human-validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user-independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion reasoning and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models' emotional reasoning. These results highlight a key challenge for memory-enhanced AI: systems designed for personalization may reinforce social inequalities. To mitigate these disparities, we curate a general-purpose preference dataset designed to reduce demographic profiles' influence on emotional understanding.

16.
arXiv (CS.CV) 2026-06-16

HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics

Hand-object interaction (HOI) inherently involves dynamics where human manipulations produce distinct spatio-temporal effects on objects. However, existing semantic HOI benchmarks focused either on manipulation or on the resulting effects at a coarse level, lacking fine-grained spatio-temporal reasoning to capture the underlying dynamics in HOI. We introduce HanDyVQA, a fine-grained video question-answering benchmark that comprehensively covers both the manipulation and effect aspects of HOI. HanDyVQA comprises six complementary question types (Action, Process, Objects, Location, State Change, and Object Parts), totalling 11.1K multiple-choice QA pairs. Collected QA pairs recognizing manipulation styles, hand/object motions, and part-level state changes. HanDyVQA also includes 10.3K segmentation masks for Objects and Object Parts questions, enabling the evaluation of object/part-level reasoning in video object segmentation. We evaluated recent video foundation models on our benchmark and found that even the best-performing model, Gemini-2.5-Pro, reached only 73% average accuracy, which is far from human performance (97%). Further analysis shows the remaining challenges in spatial relationship, motion, and part-level geometric understanding. We also found that integrating explicit HOI-related cues into visual features improves performance, offering insights for developing future models with a deeper understanding of HOI dynamics.

17.
arXiv (CS.AI) 2026-06-11

Agentic Software: How AI Agents Are Restructuring the Software Paradigm

作者:

arXiv:2606.05608v2 Announce Type: replace-cross Abstract: For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents – systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource – constitutes a fundamental restructuring of what software is, not an incremental tool improvement. We formalize the distinction between traditional deterministic software and agentic software: in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime. We trace the historical arc from licensed software to SaaS to Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users – with the agentic shift transferring not just operational complexity but decision-making complexity itself. We introduce Agentic Engineering as an expansion of the software engineering discipline into a new paradigm, distinct in its core object of study (agent systems rather than static source code), its control model (LLM-driven rather than human-predefined), and its human role (intent architect rather than code author). Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain's multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

18.
arXiv (CS.CL) 2026-06-15

Rethinking the Trust Region in LLM Reinforcement Learning

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning. Our code is available at https://github.com/sail-sg/Stable-RL.

19.
arXiv (CS.LG) 2026-06-16

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

arXiv:2603.12977v3 Announce Type: replace Abstract: Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

20.
arXiv (CS.CL) 2026-06-15

Small LLMs: Pruning vs. Training from Scratch

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5–0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

21.
arXiv (quant-ph) 2026-06-16

Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle

arXiv:2603.04377v3 Announce Type: replace Abstract: As quantum computing hardware rapidly advances, objectively evaluating the capabilities and error rates of new processors remains a critical challenge for the field. A clear and realistic understanding of current quantum performance is essential for guiding research priorities and driving meaningful progress. In this work, we apply and extend a protocol-based benchmarking methodology (Meirom, Mor, Weinstein Arxiv 2505.12441) that utilizes well-defined \underline{quantumness} thresholds. By evaluating performance at protocol level rather than the gate level, this approach provides a transparent and intuitive assessment of whether specific quantum processors, or isolated sub-chips within them, can demonstrate a practical quantum advantage. To illustrate the utility of this method, we compare two generations of IBM quantum computers: the older Eagle architecture and the newer Heron architecture. Our findings reveal the genuine operational strengths and limitations of these devices, demonstrating substantial performance improvements in the newer Heron generation. This work was made possible by IBM Quantum policies that enable independent and objective assessment of its quantum computers and sub-chips. We strongly encourage other companies to emulate the independent qubit availability and the fair pricing that allow researchers to perform such assessments.

22.
arXiv (CS.LG) 2026-06-15

A Low-Rank Subspace Analysis of LLM Interventions

arXiv:2606.14388v1 Announce Type: new Abstract: Interventions designed to modify a particular behavior in LLMs, such as refusal or sycophancy, often produce unintended changes in other behaviors. This lack of targeted control makes it difficult to design and implement reliable safety controls. To understand these side-effects, we introduce a diagnostic framework for analyzing interacting behaviors in LLMs. We model behaviors as low-rank subspaces in activation space, and study how interventions influence across behaviors. Across multiple instruction-tuned models (7B-70B) and across refusal, jailbreak, and sycophancy settings, we find that different behaviors share internal representations, and intervening on one behavior alters others in asymmetric ways. Some behaviors act as upstream control points whose interventions propagate broadly across other behaviors, while others remain more isolated. We relate these effects to two geometric quantities: (i) the overlap between behavior subspaces, measured as the average squared cosine of principal angles, and (ii) the angle between each behavior subspace and the decision subspace (capturing the model's final decision e.g., refuse vs. comply). Empirically, intervention effects on other behaviors tend to be larger for behavior pairs with higher subspace overlap, and for source behaviors whose subspaces lie closer (smaller angle) to the decision subspace. These findings highlight a challenge for targeted behavior control: behaviors are difficult to modify independently, as interventions can propagate through shared representations and asymmetric interactions.

23.
arXiv (CS.AI) 2026-06-11

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

arXiv:2606.11804v1 Announce Type: new Abstract: Trustworthy AI requires reliable data-processing pipelines, not only robust downstream predictive models. As an upstream component, data summarization determines which information is retained and passed to subsequent learning or decision modules. Therefore, adversarial perturbations to the summarization process can compromise trustworthy AI in an upstream manner: they may alter the selected summary, reduce its representativeness, and further degrade the utility of subsequent learning tasks. In this paper, we study adversarial attacks on continuous data summarization under similarity-level perturbations through DR-submodular optimization. We show that a class of multi-resolution image summarization objectives can be formulated as multilinear extensions of non-negative submodular set functions and satisfy DR-submodularity with $m$-weak monotonicity. We then formulate multi-target attack generation as a min-max problem, where one admissible perturbation of the similarity structure is optimized to degrade multiple target summarization models. To mitigate such perturbations, we formulate robust defense against mixed attack types as a regularized max-min problem. For both problems, we develop approximation algorithms with theoretical guarantees. Experiments on real-data and controlled clustered benchmarks show that the proposed attack is effective in representative low-to-moderate budget regimes and can induce downstream task-performance loss. The proposed defense improves the robustness–mitigation trade-off in structured settings, while also revealing the parameter sensitivity of robust protection on real data.

24.
arXiv (CS.LG) 2026-06-15

CANN-EUCLID: unsupervised constitutive artificial neural network model discovery from full-field data

arXiv:2606.14565v1 Announce Type: cross Abstract: Constitutive artificial neural networks (CANNs) provide interpretable material model discovery, but have so far been used in stress-supervised settings based on apparent stress-strain data from homogeneous tests. Because each test samples only a narrow loading path and provides homogenized rather than local stress information, robust discovery typically requires multiple loading modes to constrain the multidimensional response. This is challenging for soft biological tissues, where repeated testing, damage, and sample variability limit reliable information from a single specimen. Here, we combine CANNs with the stress-unsupervised full-field discovery framework EUCLID to identify sparse hyperelastic laws directly from displacement fields and reaction forces in one heterogeneity-inducing loading case. CANN-EUCLID minimizes equilibrium imbalance with sparsity-promoting regularization selecting compact active terms, without local stress measurements or a prescribed law. We evaluate the approach on isotropic and anisotropic benchmarks with prescribed ground-truth laws. When the ground truth is representable by the chosen CANN basis, our method recovers the correct terms with near-exact accuracy, including exponential terms with embedded parameters. When it is not contained in the basis, the method retains shared terms and approximates missing contributions using available basis functions. Generalization depends strongly on sampled deformation states: exponential strain-stiffening terms can be recovered accurately when sufficiently probed, but can produce large extrapolation errors when the stiffening regime lies outside the sampled domain. Forward FE validation simulations show that the discovered behavior accurately replicates the ground truth. These results establish stress-unsupervised CANN discovery as a promising framework for interpretable full-field constitutive model identification.

25.
arXiv (CS.AI) 2026-06-16

Cognitive Debt: AI as Intellectual Leverage and the Dynamics of Systemic Fragility

作者:

arXiv:2606.15078v1 Announce Type: new Abstract: We develop a formal theory of cognitive debt: the stock of unverified reasoning obligations that accumulates when individuals use AI as a substitute rather than a complement for first-principles cognition. The model features two state variables per agent, cognitive capital and cognitive debt, and a multiplicative production technology in which cognitive capital functions as collateral that determines the return to AI adoption. We establish six propositions. Rational agents incur positive cognitive debt because the costs are deferred, partially external, and masked by short-run productivity gains. Tranquil periods lower subjective risk assessments, raise AI substitution intensity, and compound leverage, generating a cognitive Minsky moment in which subjective risk falls while true systemic fragility rises. Expected crisis losses are convex in aggregate leverage. Post-crisis, output-target pressure can produce a false-correction loop in which agents patch AI failures with more AI. The decentralised equilibrium over-adopts substitutive AI relative to the social optimum because of systemic risk, cognitive public goods, and arms-race externalities. In a two-type heterogeneous-agent economy, high-cognitive-capital agents adopt AI more intensively and may eventually erode their unaided cognitive capital below that of initially lower-skilled agents.