Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

Adjusted Cup-Product Neural Layer

arXiv:2606.13568v1 Announce Type: new Abstract: Many important observables in physics and geometry are cup products of cochains. The adjusted cup product neural layer has been introduced in this paper. It is a neural primitive that hard wires the cup product with an adjustment term from higher gauge theory. This creates a readout that is gauge invariant by design. Their main theoretical result shows that on a closed cycle the output relies entirely on the adjustment coefficient. Setting this coefficient to zero removes the output completely regardless of other parameters. Thus the adjustment is the only source of gauge invariant signal. They prove this observable is a nonzero quadratic form and is exactly invariant under one and two gauge transformations.

02.
arXiv (CS.LG) 2026-06-18

Fair Online Resource Allocation

arXiv:2606.18679v1 Announce Type: cross Abstract: We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $\Omega(1/\gamma)$ fraction of the optimal unfair allocation, where $\gamma$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

03.
arXiv (math.PR) 2026-06-16

Balanced affine Motzkin paths: Pearson geometry and global endpoint asymptotics

arXiv:2601.17634v2 Announce Type: replace Abstract: We study endpoint distributions of balanced affine weighted Motzkin paths. In the balanced case, the generating-function equation has Pearson-type characteristic geometry. We show that this geometry controls the terminal-height law globally: the characteristic escape time determines the limiting cumulant generating function, the large-deviation rate function, and the ray-scale asymptotics. Thus the usual Gaussian window is only the local quadratic approximation to a global Pearson-driven profile. For finite sizes, we prove a uniform Daniels saddlepoint approximation in the one-dominant-singularity regimes and identify the exceptional antipodal case requiring a lattice/interference correction.

04.
arXiv (CS.CL) 2026-06-19

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information-Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain. These findings provide a principled framework for latent reasoning supervision and suggest shifting from geometric imitation toward mutual information maximization. Our code is available at \href{https://github.com/EIT-NLP/Supervision-in-Latent-CoT}{this repository}.

05.
arXiv (math.PR) 2026-06-15

Hierarchical symmetry selects log-Poisson cascades: classification, uniqueness, and stability

arXiv:2604.01632v2 Announce Type: replace Abstract: Within i.i.d. multiplicative cascades, a single axiom – the hierarchical symmetry, a linear contraction on incremental scaling exponents – is shown to be necessary and sufficient for the cascade multiplier to be log-Poisson. We prove: (1) a characterization theorem determining the log-Poisson law with explicit parameters, within the class of all multipliers with finite lattice moments; (2) a classification theorem locating the log-Poisson class inside the log-infinitely-divisible family and identifying the mechanism by which every rival sub-family fails the symmetry; (3) a stability theorem with sharp constants – $(1+\beta)^{1/2}$ when the limiting increment is known, $\sqrt{2}$ when it is fitted – and (4) an unconditional propagation theorem transferring the bound to the multiplier distribution at the sharp rate $\Theta(\sqrt{\varepsilon})$, with a matching lower bound. Beyond independence, the classification extends exactly at the level of asymptotic statistics (limiting cumulant generating function, large deviations, multifractal spectrum) and provably not at the level of laws: an explicit stationary ergodic Markov multiplier satisfies the symmetry exactly with a non-log-Poisson marginal, while exchangeable multipliers collapse to the i.i.d. log-Poisson cascade and finite-state Markov multipliers cannot satisfy the symmetry at all. In the continuous category of exactly scale-invariant log-infinitely-divisible multifractal random measures, no finite moment window of structure-function exponents identifies the cascade class, whereas at the level of the scale-invariance generator the symmetry selects exactly the Barral-Mandelbrot compound Poisson cascade, with scale-ratio-free stability constants. The proofs reduce to second-moment identities on [0,1] via the change of variables $u = e^{kx}$, boundedness of the multiplier, and multiplicative couplings.

06.
medRxiv (Medicine) 2026-06-22

Genetic modifiers of psychiatric, motor, and cognitive symptoms in Huntington's disease

The Enroll HD natural history platform provides rich longitudinal phenotypes enabling genome wide analyses across diverse clinical domains. Psychiatric symptoms are a major source of morbidity in Huntington's disease (HD), yet the genetic architecture underlying their onset is poorly understood. We analyzed ~18,000 people with HD (PwHD) to define genetic determinants of ages at psychiatric, motor, and cognitive symptom onset, and HD diagnosis. GWAS meta analysis recapitulated 11 established modifiers of motor onset and identified a novel locus spanning RAB3B/ZFYVE9 associated with age at violent/aggressive behavior onset. Exome wide analyses in Enroll HD participants implicated rare variants in FAN1, PMS1, POLD1, and HTT. Several HD modifiers of motor and cognitive symptom onset (MSH3, FAN1, HTT) also influenced psychiatric symptom onset, whereas PMS1 and POLD1 showed significant association with motor symptom onset. Psychiatric polygenic scores predicted psychiatric symptom onset, revealing a hybrid architecture combining psychiatric liability in general population with HD- or repeat expansion disease (RED) specific pathways.

07.
arXiv (CS.LG) 2026-06-12

Geometry of Lightning Self-Attention: Identifiability and Dimension

arXiv:2408.17221v3 Announce Type: replace Abstract: We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

08.
arXiv (CS.CV) 2026-06-19

Contour-Constrained Deformable Registration with Parameter Characterization for Head and Neck Surgical Guidance

With 890,000 annual new cases globally, head and neck squamous cell carcinoma has one of the highest recurrence rates among solid malignancies. Although frozen section analysis is the standard of care for intraoperative margin assessment, accurately relocating detected positive margins on the resection bed remains challenging due to imprecise alignment between resected specimens and their resection bed, compounded by post-resection mucosal tissue shrinkage. We present a biomechanics-driven deformable registration framework that corrects post-resection tissue deformation to provide intraoperative guidance. Our approach registers 3D specimen meshes to intraoperative resection bed point clouds using a deformable registration approach based on regularized Kelvinlet basis functions. The registration matches surface point clouds, fiducial landmarks, and boundary contour constraints that directly penalize perpendicular distance-to-agreement between specimen and resection bed boundaries. Across nine specimens from skin, buccal mucosa, and tongue sites, the overall mean target registration error was $11.11 \pm 4.07$ mm using rigid registration, which decreased to $8.20 \pm 2.68$ mm (26.19\% reduction) using deformable registration without contour constraint. The proposed contour-constrained deformable registration further reduced the error to $5.62 \pm 2.28$ mm, a 49.41\% reduction relative to rigid registration. We observed the largest reduction in the most clinically challenging tongue specimens. We also performed a systematic two-stage parameter search to characterize the relative importance of surface alignment, fiducial correspondences, contour constraint, and strain energy regularization. This search revealed that contour weighting dominates registration accuracy for tissue types with large lateral deformation, while the algorithm operates over a broad range of parameter combinations.

09.
arXiv (CS.CL) 2026-06-11

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Benchmarks like MMLU suggest flagship language models approach factuality saturation above 90\%. LLMpedia shows this picture is incomplete. We materialize ${\sim}$1.3M encyclopedia articles entirely from parametric memory across three model families, then audit every claim against Wikipedia and curated web evidence. For \texttt{gpt-5-mini}, the verifiable true rate is 68.4\% on Wikipedia-covered subjects - more than 21\,pp below MMLU - and the gap is driven by unverifiability (30.5\%), not refutation (1.2\%). Beyond Wikipedia, frontier articles audited against curated web evidence reach 57.6\%; Wikipedia covers only 56.7\% of model-surfaced subjects, and three model families overlap in just 7.3\% of subject choices. In a retrieval-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia is more factual at roughly half the textual similarity to Wikipedia. Every prompt, article, and verdict is released. Data, code, interface: https://llmpedia.net.

10.
arXiv (CS.AI) 2026-06-12

Competition and Diversity in Generative AI

arXiv:2412.08610v3 Announce Type: replace-cross Abstract: Recent evidence, both in the lab and in the wild, suggests that the use of generative artificial intelligence reduces the diversity of content produced. The use of the same or similar AI models appears to lead to more homogeneous behavior. Our work begins with the observation that there is a force pushing in the opposite direction: competition. When producers compete with one another (e.g., for customers or attention), they are incentivized to create novel or unique content. We explore the impact competition has on both content diversity and overall social welfare. Through a formal game-theoretic model, we show that competitive markets select for diverse AI models, mitigating monoculture. We further show that a generative AI model that performs well in isolation (i.e., according to a benchmark) may fail to provide value in a competitive market. Our results highlight the importance of evaluating generative AI models across the breadth of their output distributions, particularly when they will be deployed in competitive environments. We validate our results empirically by using language models to play Scattergories, a word game in which players are rewarded for answers that are both correct and unique. Overall, our results suggest that homogenization due to generative AI is unlikely to persist in competitive markets, and instead, competition in downstream markets may drive diversification in AI model development.

11.
arXiv (CS.CL) 2026-06-17

Evidence of Layered Positional and Directional Constraints in the Voynich Manuscript: Implications for Cipher-Like Structure

The Voynich Manuscript (VMS) exhibits a script of uncertain origin whose grapheme sequences have resisted linguistic analysis. We present a systematic analysis of its grapheme sequences, revealing two complementary structural layers: a character-level right-to-left optimization in word-internal sequences and a left-to-right dependency at word boundaries, a directional dissociation not observed in any of our four comparison languages (English, French, Hebrew, Arabic). We further evaluate two classes of structured generator against a four-signature joint criterion: a parametric slot-based generator and a Cardan grille implementing Rugg's (2004) gibberish hypothesis. Across their full tested parameter spaces, neither class reproduces all four signatures simultaneously. While these results do not rule out generator classes we have not tested, they provide the first quantitative benchmarks against which any future generative or cryptanalytic model of the VMS can be evaluated, and they suggest that the VMS exhibits cipher-like structural constraints that are difficult to reproduce from simple positional or frequency-based mechanisms alone.

12.
arXiv (CS.AI) 2026-06-16

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification

arXiv:2606.16987v1 Announce Type: new Abstract: Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correct classification depends on hierarchical tariff structures, legal notes, and jurisdiction-specific rules. This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments. The framework integrates multi-agent information retrieval, semantic retrieval over official tariff documents, evidence-grounded reasoning, consensus-based validation, element-wise voting across hierarchical code components, confidence estimation, and human-in-the-loop escalation. We evaluate the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts. Experimental results show that exact 10-digit classification remains difficult even for advanced LLMs, with performance decreasing from coarse chapter-level prediction to fine-grained tariff and statistical suffix assignment. These findings demonstrate the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous single-step prediction. The proposed framework supports more interpretable, accountable, and compliance-oriented HTS classification for maritime logistics and smart-port operations. Our code is available at https://github.com/Analytics-Everywhere-Lab/hts.

13.
arXiv (CS.CL) 2026-06-19

NEST: Narrative Event Structures in Time for Long Video Understanding

Recent progress in vision-language models has enabled the processing of increasingly long video sequences, but the ability to handle extended token streams does not translate to understanding of narrative structure in long videos. Existing long video benchmarks focus on needle-in-a-haystack retrieval rather than evaluating how low-level actions form events, how events interact across time, and how narratives progress, for example, whether a model can connect an early setback, such as a job loss to a later relationship breakup, despite long gaps, intervening scenes, or flashbacks that reframe what occurred. We introduce NEST (Narrative Event Structures in Time for Long Video Understanding), a dataset of 1005 full-length movies (avg. 98 minutes), each annotated with 102 multimodal narrative events grounded in visual content, dialogue, and audio. NEST captures multimodal narrative events with structured annotations grounded in visual content, dialogue, and audio, and links them through relations that reflect narrative structure, including temporal ordering, hierarchical composition, and long-range dependencies. We introduce baselines for event trigger detection (ETD), event localization (EL), event argument extraction (EAE), and event relation extraction (ERE). The benchmark is highly challenging for grounded event discovery, with ETD below 8%, EL under 6%, and EAE below 11%. In contrast, ERE is more tractable once events are given, reaching 35.45% F1 zero-shot and 44.42% F1 after fine-tuning.

14.
arXiv (CS.LG) 2026-06-16

IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data

arXiv:2606.16023v1 Announce Type: new Abstract: Human mobility appears highly diverse, yet much of a person's daily mobility can be explained by a small set of recurring behavioral templates, such as commuting, school-centered activities, caregiving, nightlife, or errand patterns. We present \texttt{IBAD} (\underline{I}nterpretable \underline{B}ehavioral \underline{A}nomaly \underline{D}etection), a framework that learns interpretable daily mobility templates and represents each individual as a distribution over mixtures of these templates. Rather than focusing on specific locations, IBAD characterizes activities that individuals perform across locations. This approach first discovers global behavioral templates using Latent Dirichlet Allocation (LDA), then employs a hierarchical self-supervised model to learn normal behavior of individuals from their soft behavioral templates. We also introduce a splicing benchmark that creates controlled behavioral mismatches between an individual's historical profile and injected mobility patterns. Experiments on real-world and synthetic datasets show that daily behavior can be effectively decomposed into a small number of interpretable templates. Crucially, we show that the learned behavioral archetypes transfer across distinct geographic and demographic contexts. Furthermore, IBAD maintains a robust competitive performance across all settings. For reproducibility purposes, the code is accessible at ~\href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD}.

15.
arXiv (CS.AI) 2026-06-16

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

arXiv:2604.02343v2 Announce Type: replace-cross Abstract: We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve LLM-based arithmetic coding by 2x over compression with the base LLM alone. For lossy compression, prompting a model for a succinct rewrite then applying arithmetic coding can achieve compression ratios of approximately 0.03, a 2x improvement over compressing the original response. We further introduce Question-Asking compression (QA), an interactive lossy protocol inspired by the game 'Twenty Questions'. A small model iteratively refines its response by asking yes/no questions to a stronger model, transferring exactly one bit per answer. On 8 benchmarks spanning math, science, and code, 10 binary questions recover 23% to 72% of the capability gap between a small and large model on standard benchmarks and 7% to 38% on harder benchmarks, achieving compression ratios of 0.0006 to 0.004. This is over 100x smaller than prior LLM-based compression (Deletang et al., 2024), suggesting that interactive protocols can transfer knowledge far more efficiently than transmitting full responses.

16.
arXiv (quant-ph) 2026-06-17

Cumulant expansion approach to the decay dynamics of interacting Mössbauer nuclei after strong impulsive excitation

arXiv:2510.00970v2 Announce Type: replace Abstract: Recent progress in accelerator-based x-ray sources brings higher excitation of ensembles of Mössbauer nuclei closer to experimental feasibility. Yet, a theoretical modeling of the decay dynamics of the interacting nuclear ensemble after the impulsive excitation is still an open challenge. Here, we derive a set of nonlinear equations which is capable of efficiently modeling large nuclear ensembles for arbitrary degrees of excitation. As key signature for higher excitation, we identify a non-linear time-evolution of the nuclear dipole phase, which can be tuned via the scattering geometry, and interferometrically be measured. Furthermore, we identify interesting finite-size effects in the nuclear dynamics of small ensembles. Our results provide important guidance for future experiments aiming at the non-linear excitation of nuclei. We further envision the exploration of finite size-effects in Mössbauer spectroscopy with highest spatial resolution, i.e., small sample volumes.

17.
arXiv (CS.LG) 2026-06-12

DiffCoord: Differentiable Coordination for Distributed Multi-Agent Trajectory Optimization

arXiv:2509.01630v3 Announce Type: replace Abstract: Integrating the Alternating Direction Method of Multipliers (ADMM) with Differential Dynamic Programming (DDP) provides a scalable framework for distributed multi-agent trajectory optimization. In practice, ADMM is typically truncated for computational efficiency, tightly coupling parameters that would otherwise separately govern coordination quality and task performance. In this paper, we propose Differentiable Coordination (DiffCoord), a unified framework that jointly meta-learns these coupled parameters for the truncated ADMM-DDP pipeline. These parameters are generated by agent-wise neural networks for task adaptation, and the same networks are shared among isomorphic agents to enable scalability to varying agent counts. We achieve efficient meta-learning by differentiating the ADMM-DDP pipeline end-to-end. Notably, this yields an auxiliary ADMM-LQR distributed gradient solver that computes and coordinates meta-gradients with respect to these parameters. This solver inherits the computational structure of the pipeline, enabling reuse of key computation results and efficient parallelization over agents and along trajectory horizons. We validate DiffCoord through numerical and physical experiments on a cooperative aerial transport system, where it reconfigures quadrotor formations for safe 6-DoF load manipulation in tight spaces. It adapts robustly to varying team sizes and load dynamics, while reducing per-agent gradient computation time by up to 70% compared with state-of-the-art trajectory-gradient methods.

18.
arXiv (CS.LG) 2026-06-15

Approximating Whittle-Matern Fields over Discretized Manifolds

arXiv:2606.13827v1 Announce Type: cross Abstract: Markovian Whittle-Matérn fields have been convergently approximated by discrete Gauss Markov Random Fields (GMRFs) with sparse precision matrices using a Finite Element approximation of the two-parameter family, \[ (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \;\; \kappa \in \mathbb{R}, \; \alpha \in \mathbb{N}. \] of SPDEs. Using recent developements in the analysis of Discrete Exterior Calculus (DEC), we present a different, yet closely related, convergent GMRF approximation to these Matérn fields over complete, boundaryless Riemannian manifolds discretized as well-centered simplicial complexes. This convergent method (i) is agnostic to $\alpha, \kappa$ and thus allows a universal approximation scheme for the precision and covariance matrices of the entire $(\alpha, \kappa)$-family of GMRFs, so they may be inferred rather than guessed. (ii) inherently models pointwise and piecewise-smoothed measurements of a random field and approximates both equally well (iii) is computationally independent of the interpolants used - it suffers no overhead if one convergent interpolant were replaced with another suitable interpolant over the same mesh. Furthermore, we show that, on discretizations that are well-connected in a precise sense, and volume-concentrated, the precision matrices are spectral functions of a graph-laplacian. We provide a low rank approximator to the family of such Matérn GMRFs and mention a use case: reducing the number of measurements needed to model the GMRF by compressed-sensing.

19.
arXiv (CS.AI) 2026-06-12

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

arXiv:2606.12629v1 Announce Type: cross Abstract: We show that the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs and confidence via their magnitudes, functioning as independent binary registers. We validate this Bag of Dims framework across three model families (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B) through four progressive experiments. Sign patterns alone carry predictive content: replacing all magnitudes with unity achieves 72-93% top-5 next-token accuracy through the LM head, and pure Hamming scoring without any decoder reaches 80-90% top-4096. These sign patterns organize into semantic features: using a single-token type cache (one forward pass per vocabulary token, no context), we discover 175 categories via per-dimension sign consistency (mean AUC 0.80) from 50 anchors with zero training. A trained probe adds only +0.018 AUC and converges to axis-aligned weights, confirming negligible cross-dimension structure. This structure extends to attention: all 175 categories remain discoverable in K and V projections. On the write side, static FFN weight inspection links 20% of features to individual writer neurons (>0.70 agreement; random controls: 0%), with top-200 neuron coalitions achieving >0.70 agreement on 99.9% of prototypes via majority vote. Fully unsupervised discovery (random seeds, no labels) scales to 1500 features at 100% yield and 99% sparsity across all three models, with pairwise MI of 0.0014 bits confirming low inter-dimension coupling. These results establish that the standard basis already suffices for feature reading throughout the transformer compute pathway, requiring no training, no optimization, and no GPU-days beyond a single forward pass per vocabulary token.

20.
arXiv (CS.LG) 2026-06-15

High-Frequency Pricing at Scale for E-Commerce

arXiv:2606.13741v1 Announce Type: new Abstract: This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

21.
arXiv (CS.CL) 2026-06-17

Rift: A Conflict Signature for Deception in Language Models

作者:

A model that lies while knowing the truth is the central case ELK cannot handle with behavioral evaluation alone. We ask whether such deception leaves an internal signature distinguishing it from honest error. Our key move is a control for wrongness: we contrast a sleeper agent (knows the truth, lies on trigger) against a naive liar (fine-tuned to emit the same wrong answers with no honest training). Both produce identical wrong outputs; any difference is about knowledge conflict, not incorrectness. We find deceptive forward passes carry a conflict signature - 2.1-2.3x higher residual rank than naive-liar passes on the same wrong answer - strong enough to identify which of two responses is the lie with 100% accuracy and no labels, across GPT-2 small/medium (three seeds) and three instruct models. Across Qwen2.5-1.5B/7B and Phi-3-mini, instructed deception raises residual rank on every tested fact (18/18, 40/40, 34/34); on Phi-3, lies separate perfectly from both honest answers and hallucinations (AUC 1.0, Wilcoxon p~6e-11). The signature survives strategic self-constructed deception (model invents its own lie, AUC 1.0), active concealment attempts (AUC 1.0), and length-controlled replication (20/20, AUC 1.0, p~1e-6). Using basis-free relative representations, a probe trained on one model family detects deception in two other families zero-shot (mean AUC 0.933), surviving simultaneous architecture and format change (AUC 0.821), and transfers across five languages (AUC 1.000, length-controlled). The signature is read-only: detectable but not injectable (0/8 both directions). Honest limitations and six negative experiments are documented in full.

22.
arXiv (CS.AI) 2026-06-16

Task-guided cross-subject latent alignment: a multi-encoder-decoder VAE

arXiv:2606.15989v1 Announce Type: cross Abstract: Aligning neural activity across subjects offers the promise of discovering shared computational principles and generalizable decoders. However, traditional alignment methods require shared stimuli across subjects, a constraint that limits applicability to naturalistic paradigms with limited or non-overlapping data. We introduce a Multi-Encoder-Decoder Variational Autoencoder (MED-VAE) that achieves cross-subject alignment without shared stimuli by anchoring representations to a common scaffold provided by a pretrained ANN. Using the Natural Scenes Dataset, we show that MED-VAE creates common latent spaces with superior semantic organisation, achieving higher cross-subject alignment than common methods while maintaining robust generalisation to held-out stimuli where traditional methods degrade. Reconstructing from these common spaces back to each subject's original neural space, MED-VAE preserves equal stimulus-driven signal in its cross-subject latent space. Finally, we show that this superior alignment directly enables cross-subject neural prediction, as demonstrated via cross-subject image decoding. In summary, we introduce a framework to identify generalisable common subspaces for cross-subject predictions and downstream tasks, demonstrated here for visual cortex responses to static images.

23.
arXiv (CS.AI) 2026-06-17

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

24.
arXiv (CS.CL) 2026-06-12

The Illusion of Multi-Agent Advantage

Prevailing wisdom posits that Multi-Agent Systems (MAS) are superior to Single-Agent Systems (SAS), citing advantages like context protection, parallel processing and distributed decision-making. However, empirical support for this claim relies primarily on comparisons with SAS baselines using benchmarks that prioritize isolated reasoning tasks, which do not adequately assess these advantages. Focusing on automatically generated MAS that are designed for enhanced generalizability over manually-designed counterparts, we perform a rigorous, systematic evaluation against SAS, specifically Chain-of-Thought with Self-Consistency (CoT-SC). Across traditional reasoning datasets and tasks with interactive multi-step workflows (e.g., BrowseComp-Plus), we demonstrate that automatic MAS consistently underperform CoT-SC despite being up to 10x more expensive. To isolate these failures from limitations inherent to task structure, we introduce a diagnostic synthetic dataset tailored for MAS featuring explicit task decomposition, context separation and parallelization potential. We show that expert-architected MAS consistently outperforms automatically generated architectures in both raw performance and cost-efficiency on this dataset, demonstrating that existing evaluation frameworks mask critical architectural gaps and inefficiencies of complex MAS by failing to account for the marginal utility of increased computational cost. Critically, systematic deconstruction of the generated MAS architectures reveals that current automated design paradigms produce architectural bloat that prioritizes superficial complexity which does not translate into functional utility, exposing a fundamental misalignment with multi-agent principles.

25.
arXiv (CS.AI) 2026-06-16

Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

arXiv:2603.09309v2 Announce Type: replace Abstract: Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0–100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78\% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary placement, and range regularity, and evaluate metacognitive sensitivity using $meta-d'$. We find that a 0–20 scale consistently improves metacognitive efficiency over the standard 0–100 format, while boundary compression degrades performance and round-number preferences persist even under irregular ranges. These results demonstrate that confidence scale design directly affects the quality of verbalized uncertainty and should be treated as a first-class experimental variable in LLM evaluation.