Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-12

PolyAlign: Conditional Human-Distribution Alignment

Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress the natural variation of human responses across languages, tasks, and dialogue settings. We study this problem as conditional human-distribution alignment: models should match the human response distribution appropriate to the current interaction context, rather than a universal response style. We introduce PolyAlign, a distribution-aware alignment framework that organizes bilingual interaction data into bucket-specific human reference distributions defined by language, interaction track, response family, and length. PolyAlign combines Bucket-Aware SFT, which balances optimization across heterogeneous buckets, with Human-Distribution Preference Optimization (HDPO), which regularizes preference learning using critic-estimated distance to bucket-specific human support. Across a bilingual evaluation suite covering English and Chinese single- and multi-turn settings, PolyAlign improves conditional naturalness and distributional faithfulness while preserving competitive task utility. The results suggest that post-training should move beyond global alignment objectives toward interaction-aware alignment with human response distributions.

02.
arXiv (CS.AI) 2026-06-18

Towards an Agent-First Web: Redesigning the Web for AI Agents

arXiv:2606.19116v1 Announce Type: new Abstract: The World Wide Web was built on an assumption held for three decades: the primary consumer of web content is a human being. This permeates every layer; its access model presumes human visitors, its economics rest on human attention, and its content targets human perception. The rapid emergence of AI agents as intermediaries between humans and web content invalidates this assumption. Yet the web resists agents through blanket blocking, CAPTCHA-based exclusion, and economic models that treat agent access as extraction rather than legitimate interaction. This paper proposes a principled redesign across three layers. At the access layer, agents acting for humans should inherit equivalent access rights, governed by rate limiting and agent identification metadata in HTTP requests, analogous to browser headers, alongside a dual-layer architecture serving human-readable and agent-optimized content from the same domain. At the economic layer, we propose an intent-based tier framework grounded in the agent-as-human-proxy principle: an agent's economic obligation mirrors that of the human it represents. A token-based subscription model meters content in tokens rather than pageviews, alongside a commissioned content economy anchoring AI content production in human intentionality. At the content layer, we identify epistemic recursion, the self-referential loop in which AI-generated content is consumed by agents to produce further content, progressively detaching web knowledge from human ground truth. We propose the Agent Text Markup Language (ATML), a four-level human supervision tier model, and a cryptographic provenance chain to counter this threat. Together these constitute ten design principles for an agent-first internet, one in which agents are first-class citizens whose integration requires renegotiating the web's foundational social contract across access, economics, and content.

03.
arXiv (CS.CV) 2026-06-18

DINO-Med3D: Bridging Dimension and Domain Gaps in Volumetric Segmentation via Progressive Adaptation

Although DINOv3 has demonstrated remarkable semantic discrimination in natural imagery, its direct application to volumetric medical segmentation is hindered by inherent dimension and domain disparities. To resolve these issues, we propose DINO-Med3D, a two-stage progressive framework that repurpose the pre-trained DINOv3 encoder for 3D medical tasks. In the first stage, we mitigate the dimension gap by introducing a multi-slice embedding module that incorporates pseudo-3D context, while simultaneously employing a segmentation proxy task to adapt representations learned from natural scenes to the medical domain. Subsequently, we further enhance volumetric understanding by adding lightweight 3D adapters into the frozen backbone to enforce global inter-slice continuity. Finally, to compensate for the spatial information loss inherent in the embedding process, we design a parallel detail recovery stream to explicitly preserve high-frequency boundary cues. Extensive experiments on five public datasets demonstrate that our approach successfully adapts DINOv3 to the medical domain and significantly outperforms state-of-the-art baselines.

04.
arXiv (CS.CL) 2026-06-16

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training. It also yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-play, can help sustain productive co-evolution in language. Vocabulary dropout is one simple instantiation of this principle.

05.
arXiv (CS.CV) 2026-06-16

HAFMat: Hybrid Priors Guided Adaptive Fusion for Single-Image Human Material Estimation

Physically based rendering (PBR) material estimation is a fundamental appearance decomposition task with broad applications in virtual content creation, relighting, and digital human rendering. However, estimating PBR materials from a single human image remains highly ill-posed, since illumination, geometry, and reflectance are heavily entangled in the observed appearance. To mitigate this ambiguity, we propose HAFMat, a hybrid-prior-guided framework for single-image human material estimation. Our method introduces guidance maps that encode complementary cues, including appearance, body geometry, structure, and prior material predictions from pre-trained models. A key observation is that these guidance cues are heterogeneous: some cues mainly provide texture-level constraints, while others convey higher-level semantic information. To exploit this property, we design a Multi-layer Adaptive Feature Fusion Mechanism, which adaptively fuses guidance features with decoder features at different stages. This design enables texture-dominant and semantic-dominant cues to guide material decoding at appropriate levels, leading to more accurate and physically plausible material estimation. Extensive experiments on both synthetic and real data demonstrate that our method achieves state-of-the-art performance in material estimation and downstream relighting.

06.
arXiv (CS.CV) 2026-06-18

Hallucination Detection and Correction in Medical VLMs via Counter-Evidence Verification

Vision-Language models (VLMs) reliability in medical diagnosis is challenged by trust-undermining hallucinations. Existing hallucination detection approaches mainly focus on identifying factual inconsistencies between generated text and reference data. While some studies analyze where models attend in images, they seldom verify whether such attention truly reflects the visual evidence supporting the generated text. To address this gap, we propose Co}unter-Evidence Verification (CoEV), a training-free plug-and-play framework that detects and corrects hallucinations through evidence-based factual consistency verification. CoEV performs bidirectional verification between textual assertions and visual evidence, testing whether each statement is supported by its corresponding evidence region, and assigns each statement into a four-quadrant diagnostic map capturing combinations of text factuality and visual grounding. CoEV detects hallucinated content and serves as a post hoc refinement tool, correcting hallucinations without retraining. Extensive experiments on four medical datasets show that CoEV combats hallucinations in VLMs.For hallucination detection, CoEV consistently outperforms existing methods, improving average PR-AUC and ROC-AUC by 3.0% and 3.9% absolute points respectively, with notable gains of up to 18.5% in specific VQA scenarios. For hallucination correction, it improves Micro-F1 by up to 12.5%, reduces hallucination rates by over 11.9% on medical report generation, and also boosts medical VQA accuracy. These results show that CoEV enables reliable detection and correction of hallucinations, providing clinicians with dependable, evidence-based cues for diagnosis. Code will be released upon acceptance.

07.
arXiv (math.PR) 2026-06-17

Limit theorems for descents and inversions of shelf-shuffles

arXiv:2510.00343v2 Announce Type: replace Abstract: We prove central limit theorems for the number of descents and inversions of permutations produced by shelf-shuffles. These are a model for casino card shuffling machines. We show the asymptotic normality of the number of descents in two limiting regimes depending on the ratio of cards to shelves. On the other hand, we study the inversions by employing a modification of the techniques from Islak's analysis of the statistics of riffle shuffles. In particular, we obtain a bound for the rate of convergence for inversions that is independent of the number of shelves.

08.
arXiv (CS.CL) 2026-06-17

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

09.
arXiv (CS.LG) 2026-06-19

Bioacoustic Geolocation: Species Sounds as Geographic Signals

arXiv:2505.18726v3 Announce Type: replace-cross Abstract: Can we determine someone's geographic location solely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? In this work, we tackle the challenge of global-scale audio geolocation, with a particular focus on wildlife and natural sounds. We posit that bioacoustic signals contain informative geolocation cues because of well-defined geographic ranges of species. To test this hypothesis, we benchmark image geolocation and soundscape mapping methods, design oracles and species-centric baselines, and propose a hybrid approach that combines species range prediction with retrieval-based geolocation. We further ask whether geolocation improves with species-diverse recordings and spatiotemporal aggregation across neighboring samples. Finally, we extend our study to multimodal geolocation with case studies from movies that combine both audio and visual content. Our results highlight the potential of incorporating bioacoustic signals into geospatial tasks, motivating future work on species recognition and audio geolocation.

10.
arXiv (CS.AI) 2026-06-19

Confidence-Aware Automated Assessment of Student-Drawn Scientific Models

arXiv:2606.20264v1 Announce Type: new Abstract: Student-generated drawings are widely used in science education to assess learners' conceptual understanding in modeling-based tasks aligned with the Next Generation Science Standards (NGSS). However, scoring such drawings requires expert human judgment to interpret complex visual representations, making large-scale assessment costly to implement and sustain in classroom settings. In this work, we study automated scoring of student-generated scientific drawings using a vision-based model. We evaluate a Vision Transformer (ViT) with parameter-efficient adaptation and propose a confidence-aware scoring framework that derives response-level confidence from test-time predictive distributions. This confidence signal enables selective automation by scoring high-confidence responses automatically while deferring uncertain cases for human review. Experiments on six NGSS-aligned middle school assessment items show that the proposed approach improves scoring reliability while supporting a practical trade-off between automated coverage and scoring risk, highlighting the value of confidence-aware methods for trustworthy educational assessment.

11.
arXiv (CS.CV) 2026-06-15

Trimodal Glioma Representation Alignment via Volumetric Contrastive Learning

Glioma grading and survival prediction require the integration of heterogeneous information collected at different spatial and biological scales. Histopathology describes tissue morphology, mRNA expression captures molecular activity, and magnetic resonance imaging provides a non-invasive view of tumor extent and radiological heterogeneity. Existing glioma prognosis models often combine only two of these sources, while their alignment objectives remain mostly pairwise. This paper introduces GLORIA, a novel trimodal framework for GLioma Omics - Radiology - hIstopathology Alignment. GLORIA processes whole-slide image regions, gene-expression profiles, and 3D MRI volumes through modality-specific encoders, projects them into a shared latent space, and aligns them with a Gramian contrastive loss that measures the volume spanned by the three modality embeddings. The aligned representations are fused through a cross-modal gating module and optimized jointly for three-class glioma grading and overall survival prediction. We evaluate GLORIA on a matched TCGA-GBM/LGG and BraTS21 cohort, comprising 132 patients with all three modalities. On the shared trimodal test set, GLORIA improves over the bimodal WSI-mRNA baseline in all the metrics considered.

12.
arXiv (CS.LG) 2026-06-16

If These Walls Could Talk: Critical Play with Large Language Models in Museums

arXiv:2606.15565v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly being used in museums to as role playing chatbots which let visitors talk to simulated versions of people and artefacts from the past. While such installations can be playful and engaging, they are also problematic because LLMs cannot be trusted to speak truthfully. I identify a fundamental dilemma for the use of LLMs in museum chatbots: LLMs cannot be trusted to tell the truth, and efforts to make them more reliable may ruin that which is attractive about the bots in the first place - their ability to engage in life-like conversation. In response, I propose designing for critical play with LLM-based bots: Designing for playful interactions with bots that are unreliable but still able to represent the past in an adequate and engaging manner - as fictional characters representing historical narratives, styles of discourse, diverse perspectives, humor and satire.

13.
arXiv (CS.AI) 2026-06-16

TNODEV: Toolbox for Neural ODE Verification

arXiv:2606.16567v1 Announce Type: new Abstract: Neural ordinary differential equations (neural ODE) have started to appear in safety critical settings such as continuous-time controllers for cyber-physical systems and classifiers integrated into automated decision pipelines, raising the question of whether their behavior can be formally verified. Existing tools dedicated to neural ODE provide only a single reachability call without iterative input set refinement, limiting the precision of their verdicts to whatever one reachability call can deliver. We present TNODEV, the first sound formal verifier for neural ODE that integrates a falsification checker, a fast interval-based reachability backend based on continuous-time mixed monotonicity, a verification and refinement loop with three input-set splitting heuristics, and a parallel scheduler in a single end-to-end pipeline. TNODEV supports safe-set inclusion verification on pure neural ODE, neural ODE in closed loop with a neural network controller and general neural ODE (GNODE), with the safe set specified either as an interval or as the half-space intersection induced by a target classification label. We evaluate TNODEV on a range of benchmarks across safe-set inclusion and classification-robustness properties, including a direct reachability comparison against NNV~2.0 and CORA and a verification comparison against NNV2.0 on MNIST general neural ODE classifiers.

14.
arXiv (CS.CL) 2026-06-12

Detecting Functional Memorization in Code Language Models

Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap.

15.
arXiv (CS.LG) 2026-06-19

MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery

arXiv:2606.19624v1 Announce Type: new Abstract: Reliable benchmarking is critical for developing machine learning models for tandem mass spectrometry (MS/MS) based molecule discovery. Subtle issues in experimental design and model evaluation procedures can degrade the trustworthiness of such benchmarks and lead to erroneous conclusions. We conduct a thorough review of model evaluation issues in the recent MS/MS machine learning literature, using the standard MassSpecGym benchmark suite as a case study to illustrate the impact of these issues. We find evaluation issues in at least 17 of 26 papers reporting MassSpecGym benchmark results in the first year of its adoption. We isolate three classes of failures: (i) data leakage, (ii) shortcut learning, and (iii) implementation bugs and metric divergence. Through extensive experimentation and code replication, we quantify the impact of these issues and show how they corrupt the evaluation standards MassSpecGym was designed to enforce. We distill our findings into recommendations generalizable to MS/MS challenges, benchmarks, and custom evaluation setups. We also release MassSpecGym v1.5, an implementation of our recommendations in the MassSpecGym benchmarking suite which addresses the failure modes identified in this audit. MassSpecGym v1.5 is publicly available at https://github.com/pluskal-lab/MassSpecGym.

16.
arXiv (CS.AI) 2026-06-12

"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms

arXiv:2606.12618v1 Announce Type: new Abstract: Robust lie detectors for language models could enable powerful techniques for auditing, monitoring, and post-hoc investigation of model behaviour, but evaluating them requires testbeds where models verifiably believe the opposite of what they say. We show that existing trained model organisms often fail this requirement, leaving prior positive and negative detection results difficult to interpret. We address this with 13 reasoning model organisms whose hidden beliefs are verified in chain-of-thought and shown to generalise to held-out tasks, alongside Varied Deception, a prompted-lying testbed covering a broad range of lie-inducing motivations. On these testbeds we evaluate four detectors: a chain-of-thought judge, a logprob classifier, and two activation probes, including Did-You-Lie (DYL), a new method for training follow-up probes. On prompted lying, across 31 open-weight models spanning 2B to 1T parameters, all four detectors show positive scaling with model capability. However, every activation- and logprob-based detector drops sharply on our trained model organisms, with DYL retaining the most signal; only the chain-of-thought judge remains strong, with 0.82 balanced accuracy, partly as an artefact of our verification process favouring CoT-readable beliefs. Current lie detectors therefore cannot support high-confidence claims about model beliefs, and we suggest research directions that may address some of their current limitations. We release our datasets, model organisms, and trained detectors.

17.
arXiv (quant-ph) 2026-06-15

Simultaneous Estimation of Partial-Transpose Moments with Active Memory Independent of the Moment Order

arXiv:2606.14204v1 Announce Type: new Abstract: We study the simultaneous estimation of partial-transpose moments $p_j(\rho_{AB})=\mathrm{Tr}[(\rho_{AB}^{T_B})^j]$, $j=2,\ldots,K$, of an unknown bipartite $n$-qubit state from independent copies under an explicit active-memory constraint. We give a sequential qubit-reuse realization of the partial-transpose permutation that uses at most $2n+1$ active qubits, independent of $K$, and estimates all moments $p_2,\ldots,p_K$ to uniform additive error $\epsilon$ with total copy complexity $O(K\log K/\epsilon^2)$. We also prove two converse bounds. First, any uniformly accurate simultaneous estimator requires $\Omega(K/\epsilon^2)$ copies in the worst case. Second, the same scaling holds on an explicit isospectral two-qubit negative-partial-transpose (NPT) family whose ordinary moments are constant while the partial-transpose moments vary. These results characterize the copy complexity of the partial-transpose moment hierarchy up to a logarithmic factor and extend simultaneous nonlinear-functional estimation from ordinary state powers to partial-transpose spectral data under active quantum memory independent of the target moment order.

18.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

19.
arXiv (quant-ph) 2026-06-15

Nanostructure modelling with early fault tolerant quantum computers

arXiv:2606.06442v2 Announce Type: replace Abstract: Semiconductor nanostructures are central to many developing technologies. Notably, double quantum dots are especially important for semiconductor spin-qubit architectures, quantum sensing applications, and quantum-dot solar cells. Accurate modelling is highly desirable but conventional methods can struggle when dynamics involve more than two interacting electrons. In this work, we present a quantum simulation framework capable of addressing multi-electron double quantum dots. We adopt an efficiently scaling 1$^st$ quantised representation of the system and develop algorithms based on both Trotterisation and Qubitisation. Incorporating insights from classical simulations enables us to produce resource estimates that are more realistic than those obtained from theoretical error bounds. Using a standard surface code model with physical noise at $10^{-3}$, our results indicate that the ground-state energy of four electrons in a double quantum dot can be estimated in approximately 22 hours using 226k physical qubits, or an eight-electron system in 3.3 days with 314k qubits (with runtimes falling dramatically when more qubits are available). We anticipate that incorporating recent advances in surface code architectures may reduce these costs significantly further. Our results suggest that early fault-tolerant quantum computers may become valuable tools for designing mature-era quantum technologies.

20.
arXiv (quant-ph) 2026-06-15

Efimov Effect in Ultracold Microwave-Shielded Polar Molecules

arXiv:2602.21433v2 Announce Type: replace-cross Abstract: A quantum-mechanical description is presented for the three-body physics of shielded dipolar molecules, including a prediction of observable Efimov physics. Despite the anisotropic and long-range nature of the interaction, shielding enables a regime in which universality emerges already at the two-body level and extends to the three-body sector, where Efimov physics emerges. On the negative side of the scattering-length resonance, computed trimer binding energies display the characteristic scaling expected for Efimov resonances. Finally, the sudden approximation can be used to create trimer bound states, starting from positive energy trap states as a way to create or detect these molecular trimers. Moreover, the three-body parameter expressed in dipolar units is found to be universal.

22.
arXiv (CS.CV) 2026-06-11

MARIC: Multi-Agent Reasoning for Image Classification

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

23.
arXiv (CS.AI) 2026-06-16

RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation

arXiv:2606.16113v1 Announce Type: new Abstract: Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce RecourseBench, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers – Data, Preprocessing, Model, Recourse Method, and Evaluation – governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.

24.
arXiv (CS.LG) 2026-06-16

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

arXiv:2606.15442v1 Announce Type: cross Abstract: We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $\Theta$ is represented by a reverse telescoping map $\Theta(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

25.
arXiv (CS.CL) 2026-06-17

PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

Authors:

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with a real-document benchmark of 122 tasks across five professional domains (financial, legal, medical, scientific, DevOps) using actual SEC filings, Federal Register rules, PubMed abstracts, arXiv papers, and GitHub postmortems. Paraphrasing, the strongest defense on synthetic benchmarks, shows no statistically significant attack success rate reduction on real documents (p=0.500) while degrading utility from 91.8% to 82.8%. We introduce PARSE (Provenance-Aware Retrieval Sanitization), a domain-aware, fact-preserving sanitization pipeline that classifies each sentence by injection likelihood, extracts structured facts before rewriting, and verifies fact preservation via a consistency-checking loop. A directiveness gate routes 59% of real enterprise documents to a lightweight path, concentrating computational cost on high-risk documents. PARSE achieves 15.6% attack success rate – a 38% reduction versus the 25.4% baseline – at 86.9% utility, the only condition that is both statistically significant (p=0.014, adequately powered) and maintains near-baseline utility. Practitioners should evaluate defenses on domain-matched real documents, not synthetic proxies.