Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation

arXiv:2606.16113v1 Announce Type: new Abstract: Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce RecourseBench, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers – Data, Preprocessing, Model, Recourse Method, and Evaluation – governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.

02.
arXiv (CS.AI) 2026-06-19

Creativity Reconsidered: Generative AI and the Problem of Intentional Agency

arXiv:2601.15797v2 Announce Type: replace Abstract: Many theorists maintain that conscious intentional agency is a necessary condition of creativity. We argue that this requirement, which we call the Intentional Agency Condition (IAC), should be abandoned. We motivate this by highlighting the problems this criterion encounters in the face of recent advances in generative AI, which is ostensibly creative despite being incapable of intentional agency. We present two corpus analyses to illustrate the rapidly increasing tendency of people to predicate creativity to generative AI. In response to this predicament, theorists of creativity have proposed a range of conflicting solutions, which we critically evaluate. We find that none of these satisfyingly resolves the initial predicament, and we therefore propose a novel approach. Our claim is that ascriptions of creativity are dependent on what we call creative ability. This solution explains why intentional agency is important for judgements of creativity, without being a necessary condition. Our approach thereby accommodates AI creativity without dismissing the intuition that perceived intentions are of key importance for ascriptions of creativity.

03.
arXiv (CS.LG) 2026-06-16

Amortized mean-shift interacting particles

arXiv:2606.15871v1 Announce Type: cross Abstract: Bayesian inference for inverse problems is run to evaluate integrals – posterior expectations, tail probabilities, and risks – across a stream of observations. The standard estimate averages the integrand over posterior samples, a Monte-Carlo average whose error decays only as the square root of the sample size, so accuracy demands many samples – prohibitive when each one calls a partial-differential-equation forward model. Mean-shift interacting particles need far fewer: they return a small set of signed-weight nodes – a deterministic quadrature whose weighted averages estimate those integrals. Finding the nodes, however, is a per-observation optimization that, in its most accurate form, reads the posterior score at every step – returning the cost it meant to save. We introduce amortized mean-shift interacting particles, a learned map that emits the weighted nodes from an observation and a few posterior samples in a single forward pass. Training asks only for joint parameter-observation samples and a posterior to draw from – a conditional normalizing flow, an empirical conditional, or any reference the user can sample – and the map learns to integrate that posterior from samples alone, evaluating neither its density nor its score. Once trained, it generalizes to unseen observations and integrands at any node budget and improves on independent samples in two ways: by reweighting them, provably no worse than the equal weights of Monte-Carlo; and by moving them, which empirically lowers it further. Across closed-form, sampled, learned, and physics-based posteriors – up to a thousand-coefficient groundwater field – it integrates more accurately than the same number of samples at every budget, and a posterior-whitened, dimension-aware kernel removes the high-dimensional wall. The result is a Pareto improvement on Monte-Carlo integration, not a competitor to drawing more samples.

04.
arXiv (quant-ph) 2026-06-17

Closest Accessible Symmetry reduction: a tool for Hamiltonian interpolation analysis

arXiv:2606.18161v1 Announce Type: new Abstract: We introduce a framework for analysing the spectrum of Hamiltonian interpolations without heavily relying on discretising the interpolation parameter. The method is based on the concept of accessible symmetries: a problem-class-dependent family of certifiable reflections that induce bipartitions of the Hilbert space. At each step, the interpolation Hamiltonian is projected onto the sectors of the accessible symmetry that is closest to being satisfied, yielding a hierarchy of weakly coupled pseudo-eigenspaces together with explicit residual couplings between them. We show that this representation captures qualitative signatures of quantum phase transitions, provides estimates of their location, and offers insights into their nature. The quality of the approximation is controlled by the compatibility between the accessible symmetry family and the problem instance. Although motivated in spirit by adiabatic quantum computation, our approach applies more broadly to the study of Hamiltonian phase diagrams, providing a new perspective on the spectral reorganisation of many-body quantum systems.

05.
arXiv (CS.AI) 2026-06-19

DataMagic: Transforming Tabular Data into Data Insight Video

arXiv:2606.20388v1 Announce Type: cross Abstract: Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

06.
arXiv (CS.AI) 2026-06-17

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

arXiv:2606.17383v1 Announce Type: cross Abstract: Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black–Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

07.
arXiv (CS.LG) 2026-06-11

Program Evaluation with Remotely Sensed Outcomes

arXiv:2411.10959v5 Announce Type: replace-cross Abstract: We study causal inference in experiments and quasi-experiments, where the economic outcome is imperfectly measured by a remotely sensed variable. The remotely sensed variable is low-cost, scalable, and predictive of the economic outcome in observational data; examples include satellite imagery and mobile phone activity. We model the remotely sensed variable as post-outcome: variation in the economic outcome causes variation in the remotely sensed variable. For example, changes in environmental quality cause changes in satellite imagery, not vice versa. Under this assumption, we propose a formula to nonparametrically identify the causal parameter by combining experimental and observational data. We develop a method for n^{-1/2} inference that is robust to misspecification and that does not restrict the algorithms used to process remotely sensed variables.

08.
arXiv (CS.CV) 2026-06-17

Bayesian Magnetic Resonance Joint Image Reconstruction and Uncertainty Quantification using Sparsity Prior Models and Markov Chain Monte Carlo Sampling

We propose a novel framework for uncertainty quantification using compressed sensing magnetic resonance image reconstruction. The problem is formulated within a Bayesian framework as a linear inverse problem, with prior distributions assigned to the unknown model parameters. Specifically, the image to be reconstructed is assumed to be sparse in a given basis. We develop a general framework applicable to any basis and as examples, we test the sparsity of the image in its (1) spatial gradients using a total variation prior model, and in its (2) wavelet transform. A Markov chain Monte Carlo (MCMC) method, based on a split-and-augmented Gibbs sampler, is then employed to sample from the posterior distribution of the unknown parameters. The non-differentiable conditional distributions are efficiently sampled using a proximal MCMC method. The proposed algorithms are validated on both single-coil and multi-coil datasets using various k-space sub-sampling patterns and ratios. The results demonstrate the superior performance of each proposed approach in reconstructing images compared to its counterpart optimisation-based method. Moreover, our framework effectively quantifies uncertainty, showing a notable correlation between estimated uncertainty maps and error maps computed using ground truth and reconstructed images, compared with existing deep learning-based methods.

09.
medRxiv (Medicine) 2026-06-15

Poly-Social Risk for Hypertension Among Black and Latina Women

Background: Hypertension is a leading modifiable cardiovascular risk factor prominently influenced by health-related social needs (HRSN). Whether detailed information on HRSN can improve identification of hypertension among minoritized women is unknown. Methods: Black and Latina women aged 18-65 years completed the Centers for Medicare and Medicaid Services Accountable Health Communities Screening Tool, assessing 13 HRSN domains. Hypertension was ascertained by a validated EHR-based algorithm or self-report of hypertension. Logistic regression tested associations of HRSN with hypertension. LASSO regression with 10-fold cross-validation was used to derive a poly-social risk score in the training set (random 70%) and tested in the validation set (30%) against a sociodemographic model (age, race, income, education). Results: Among 1302 participants (mean [SD] age 40.1 [11.3] years, 70.4% Black, 44.3% Latina), higher cumulative burden of HRSN was associated with increased odds of hypertension (adjusted odds ratio [aOR] for each additional domain of HRSN: 1.07 [95% CI 1.01-1.14], P=0.02). Food insecurity (aOR 2.30 [1.37-3.87], P= 0.002), lapse in utilities (aOR 1.44 [1.04-1.96], P=0.02), poor concentration (aOR 1.57 [1.13-2.17], P=0.007), and social isolation (aOR 1.77 [1.14-2.73], P=0.01) were associated with hypertension. In the validation set, the poly-social risk score did not improve discrimination for hypertension vs. the sociodemographic model (AUC 0.76 [95% CI 0.71-0.81] vs. AUC 0.80 [0.75-0.85]). Conclusion: In this cross-sectional analysis of Black and Latina women, greater cumulative social disadvantage was associated with hypertension. While inclusion of HRSN did not improve hypertension prediction beyond conventional sociodemographic indices, findings may inform targeted interventions among minorities at cardiometabolic risk.

10.
arXiv (CS.CV) 2026-06-18

Rethinking Text-to-Image as Semantic-Aware Data Augmentation for Indoor Scene Recognition

In the realm of computer vision, indoor image recognition presents challenges due to the intricate interplay of lighting conditions, occlusions, and diverse object arrangements within confined spaces. To address the lacks of training indoor images, we introduce a novel approach leveraging Stable Diffusion (SD) for the generation of synthetic images, which serve as a powerful data augmentation tool. The utilization of SD offers a principled framework for synthesizing diverse and realistic indoor scenes, thereby enriching the training data pool for robust indoor image recognition models. Experimental findings on the MIT Indoor Scene dataset reveal the potential of our proposed approach in enhancing the training of deep models when authentic data is limited. Furthermore, to prevent the misuse of SD synthetic images, we introduce a counter measure based on DIffusion Reconstruction Error (DIRE). The powerful DIRE presentation enables training robust classifiers only using lightweight deep models. Experiments show that our approach can perfectly recognize SD generated images with the accuracy of 100% using MobilenetV3.

11.
arXiv (CS.LG) 2026-06-16

Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards

arXiv:2606.15146v1 Announce Type: new Abstract: Stimulated word-of-mouth is a strategy that promotes information sharing through prompts or incentives. Optimizing stimulated word-of-mouth through social networks requires identifying and targeting connected users who are most susceptible to spillover, a phenomenon where the influence of recommendations extends beyond the immediate audience to impact their connected users. The probability of spillover varies across individuals, and their connections, leading to heterogeneity. Understanding and accurately estimating the spillover probabilities among users in social networks is crucial for improving the effectiveness of stimulated word-of-mouth. To address this, we present a novel contextual multi-armed bandit framework that learns individual spillover probabilities and ranks connected users to maximize rewards from stimulated word-of-mouth. Experiments on real-world network datasets demonstrate that accounting for spillover heterogeneity enhances the targeting precision of top-$k$ connected users, boosting rewards and outperforming baseline methods that do not learn individual spillover effects.

12.
arXiv (CS.CL) 2026-06-16

Retrievable Gradients: Continual Post-Training Without Cumulative Weight Drift

Continual post-training enables models to absorb emerging knowledge after deployment, but repeatedly updating shared parameters can accumulate weight drift, potentially causing catastrophic forgetting and degrading general capabilities. Retrieval-augmented generation avoids such parameter drift, yet often lacks the depth of parametric knowledge integration. In this paper, we propose ReGrad (Retrievable Gradients), a new paradigm that treats gradients as retrievable units of knowledge. ReGrad pre-computes document-specific gradients offline, stores them in an indexed Gradient Bank, and retrieves only query-relevant gradients at inference time for temporary weight adaptation. However, raw language-modeling gradients are optimized for token-level document reconstruction rather than for query-driven knowledge use. We therefore introduce a bi-level meta-learning objective that reshapes document-derived gradients into generalizable adaptation signals for downstream tasks. Experiments across general and domain-specific settings show that \textsc{ReGrad} outperforms CPT and RAG baselines, enabling scalable and reversible parametric knowledge injection without accumulating weight drift.

13.
arXiv (CS.LG) 2026-06-12

Towards Provably Fair Machine Learning: Bayesian Approaches For Consistent and Transparent Predictions

arXiv:2606.12615v1 Announce Type: new Abstract: ML classifiers deployed in high-stakes domains produce predictions whose quality varies systematically across subgroups. For granular subgroups defined by intersections of multiple features, predictions are often inconsistent with the observed data: the model's outputs contradict the evidence available for that subgroup. This problem is exacerbated by regularisation, which improves aggregate performance by collapsing small subgroups into larger groups, disproportionately affecting demographic minorities. We define two requirements for consistent prediction: determinism (identical individuals receive identical predictions) and statistical consistency (we cannot reject, at significance level alpha, the hypothesis that the predictions for a subgroup were drawn from the Bayesian optimal target distribution inferred for that subgroup). From these requirements we derive the Fair Bayesian classifier, which enforces both across every group and subgroup simultaneously and abstains whenever no consistent deterministic prediction is possible. On three benchmark datasets (Adult, COMPAS, and Bank Marketing), standard classifiers produce statistically inconsistent predictions for a substantial proportion of subgroups. Our classifier achieves zero consistency error by construction while exceeding baseline accuracy and multicalibration on every dataset tested. Statistical consistency provides a principled foundation for prediction quality with direct implications for algorithmic fairness. Minority demographics are disproportionately concentrated in small subgroups, precisely where frequentist inference is least reliable; addressing this inference problem is therefore a necessary step toward fair ML. By enforcing Bayesian consistency at the finest resolution the data supports, the our classifier demonstrates that exhaustive subgroup fairness with principled abstention is achievable in practice.

14.
Nature (Science) 2026-06-10

Mutation-dependent responses to sleep and exercise in clonal haematopoiesis

Clonal haematopoiesis (CH) activates inflammation and increases the risk of atherosclerosis1,2. Whether lifestyle alters CH clone expansion or the phenotypic programming of CH mutant cells, thereby affecting atherosclerosis, is unknown. Here, in humans and mice and across mutations in Jak2, Tet2, Trp53 and Dnmt3a, we demonstrate mutation-dependent responses to sleep and exercise in CH and show that mutant cells are uniquely sensitive to lifestyle. In two human datasets, moderate-to-vigorous physical activity was associated with lower prevalence of non-DNMT3A-driven CH. In atherogenic mice with Jak2V617F or Tet2 loss of function (LOF), but not Trp53 LOF or Dnmt3aR878H CH, uninterrupted sleep or exercise curtails clone expansion. In CH with the Jak2V617F mutation, sleep and exercise reduces clone expansion by selectively reprogramming mutant, but not cohabitant wild type, haematopoietic progenitor cells towards antiproliferative and metabolically healthy phenotypes by tempering bone marrow macrophage–haematopoietic progenitor cell IL-1β signalling. Sleep or exercise also lessens Jak2V617F-driven, Tet2 LOF-driven and Trp53 LOF-driven, but not Dnmt3aR878H-driven, atherosclerosis by locally reprogramming mutant vascular macrophages, independent of peripheral clone dynamics. In Jak2V617F, but not adjacent wild type, aortic macrophages, uninterrupted sleep blunts CLEC4E-dependent inflammasome activation, consequently diminishing lesions. Exercise, meanwhile, activates PAC1+ neurons in the locus coeruleus, raising the levels of peripheral noradrenaline, which signals through adrenergic receptor β2 (ADRβ2) whose expression is preserved by exercise in Jak2V617F, but not cohabitant wild type, aortic macrophages, selectively repressing their inflammatory programming and atherosclerosis. Our findings establish that healthy lifestyles gene-specifically diminish CH and selectively reprogram mutant haematopoietic progenitor cells and macrophages to maintain cardiovascular health. Sleep and exercise can slow clonal haematopoiesis and limit mutant cell-driven atherosclerosis.

15.
arXiv (CS.CL) 2026-06-17

Evidence of Layered Positional and Directional Constraints in the Voynich Manuscript: Implications for Cipher-Like Structure

The Voynich Manuscript (VMS) exhibits a script of uncertain origin whose grapheme sequences have resisted linguistic analysis. We present a systematic analysis of its grapheme sequences, revealing two complementary structural layers: a character-level right-to-left optimization in word-internal sequences and a left-to-right dependency at word boundaries, a directional dissociation not observed in any of our four comparison languages (English, French, Hebrew, Arabic). We further evaluate two classes of structured generator against a four-signature joint criterion: a parametric slot-based generator and a Cardan grille implementing Rugg's (2004) gibberish hypothesis. Across their full tested parameter spaces, neither class reproduces all four signatures simultaneously. While these results do not rule out generator classes we have not tested, they provide the first quantitative benchmarks against which any future generative or cryptanalytic model of the VMS can be evaluated, and they suggest that the VMS exhibits cipher-like structural constraints that are difficult to reproduce from simple positional or frequency-based mechanisms alone.

16.
arXiv (CS.AI) 2026-06-12

Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming

arXiv:2511.13271v2 Announce Type: replace-cross Abstract: The rise of Generative AI (GenAI) tools like ChatGPT has created new opportunities and challenges for computing education. Existing research has primarily focused on GenAI's ability to complete educational tasks and its impact on student performance, often overlooking its effects on knowledge gains. In this study, we investigate how GenAI assistance compares to conventional online resources in supporting knowledge gains across different proficiency levels. We conducted a controlled user experiment with 24 undergraduate students of two different levels of programming experience (beginner, intermediate) to examine how students interact with ChatGPT while solving programming tasks. We analyzed task performance, conceptual understanding, and interaction behaviors. Our findings reveal that generating complete solutions with GenAI significantly improves task performance, especially for beginners, but does not consistently result in knowledge gains. Importantly, usage strategies differ by experience: beginners tend to rely heavily on GenAI toward task completion often without knowledge gain in the process, while intermediates adopt more selective approaches. We find that both over-reliance and minimal use result in weaker knowledge gains overall. Based on our results, we call on students and educators to adopt GenAI as a learning rather than a problem solving tool. Our study highlights the urgent need for guidance when integrating GenAI into programming education to foster deeper understanding.

17.
arXiv (CS.LG) 2026-06-16

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

arXiv:2602.06694v3 Announce Type: replace Abstract: Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of data and compute or incur additional storage. In this work, we propose NanoQuant, the first post-training quantization (PTQ) method to compress LLMs to both binary and sub-1-bit levels. NanoQuant formulates quantization as a low-rank binary factorization problem, and compresses full-precision weights to low-rank binary matrices and scales. Specifically, it utilizes an efficient alternating direction method of multipliers (ADMM) solver to precisely initialize latent binary matrices and scales, and then tunes the initialized parameters through a block and model reconstruction process. Consequently, NanoQuant establishes a new Pareto frontier in low-memory post-training quantization, and enables sub-1-bit compression. NanoQuant makes large-scale deployment feasible on consumer hardware. For example, it compresses Llama2-70B by 25.8$\times$ in just 13 hours on a single H100, enabling a 70B model to operate on a consumer 8 GB GPU. Code is available at https://github.com/SamsungLabs/NanoQuant.

18.
arXiv (CS.LG) 2026-06-15

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

arXiv:2606.14079v1 Announce Type: new Abstract: We propose a spectral learning method for stochastic nonlinear dynamical systems represented with embedded latent transfer operators in deep feature spaces. We instantiate the method as Deep Spectral Encoder (DSE), an operator-based latent state-space model in which a time-invariant neural encoder implements learnable nonlinear feature maps from observations, and these features define Markovian latent states whose temporal evolution and observation mapping are described by the transfer and observation operators, respectively. Functional canonical correlation analysis in a learnable Galerkin-projected feature space provides state coordinates from past and future observations, and the two linear operators are estimated on the state coordinates as ridge-regularized closed-form solutions that coincide with Galerkin projections of the associated covariance operators. On this representation, we generalize sequential Bayesian filtering and Koopman spectral mode decomposition in feature space. Experiments on several scenarios show stable and superior performance with sequential Bayesian filtering and dynamic mode decomposition baselines even under noise and partial observability.

19.
arXiv (CS.LG) 2026-06-19

Adaptive Distance-Aware Trunk Deep Operator Learning for Long-Span Roadway Bridges

arXiv:2606.20015v1 Announce Type: new Abstract: Long-span roadway bridges exhibit highly localized structural responses under vehicular loading, making repeated FE analysis computationally expensive for applications such as influence surface generation and structural digital twins. Existing SciML approaches struggle to accurately capture these localized responses. To address this challenge, this study proposes an adaptive-trunk DeepONet for localized structural response prediction in large-scale bridge systems. The framework dynamically constructs a load-dependent learning domain using a KNN strategy, allowing the network to focus on structural influence zones. The trunk network is further enhanced using distance-aware features that encode the geometric relationship between the load and structural nodes. A physics-based full-field reconstruction is incorporated through a stiffness-informed Schur complement formulation, enabling predictions at adaptive nodes to be extended to the entire structural domain. To enable scalable training, response data are generated using a reduced-order equivalent shell model that preserves the dominant global behavior while significantly reducing computational cost. The proposed framework is validated on both a benchmark bridge model and the real-world Mussafah Bridge. Results show that the method achieves FEM-level accuracy with relative errors below 5%, while reducing the total response evaluation time (including full-field reconstruction) by approximately 60x; excluding the post-processing reconstruction step, the AD-DeepONet inference is up to four orders of magnitude faster than FEM. In addition, the framework enables rapid generation of full-field responses, influence lines, and influence surfaces under arbitrary vehicular loading configurations, demonstrating strong potential for large-scale bridge analysis and digital twin applications.

20.
arXiv (CS.CV) 2026-06-18

Aerial-ground LiDAR place recognition with patch-level self-supervised learning and expanded reciprocal re-ranking

LiDAR place recognition determines one's position on a prior point cloud map. The most studied ground-level LiDAR place recognition suffers from pre-visit requirements, incomplete coverage, and limited perspectives. Using pre-acquired, full-coverage Airborne Laser Scanning (ALS) data as an aerial prior map overcomes these drawbacks, making cross-view place recognition necessary and advantageous. However, aerial-ground LiDAR place recognition faces significant challenges, including the domain gap between aerial and ground point clouds, and false positives during initial retrieval. To address these challenges, we present a novel retrieval and re-ranking framework for aerial-ground LiDAR place recognition. Based on the priors that neighboring point cloud patches share similar semantics with anchor patch, our retrieval network introduces patch-level self-supervised learning modules at multiple scales and integrates with scene-level learning to improve global feature discriminativeness between aerial and ground point clouds. Furthermore, leveraging the structured spatial distribution of ALS point clouds, we introduce an Expanded Reciprocal (ER) re-ranking algorithm to exploit neighborhood information maximally and refine each feature based on neighbor features, which are then used to update the similarity matrix for final ranking. Extensive experiments demonstrate that our retrieval network outperforms existing state-of-the-art (SOTA) methods, achieving a 9.8\% improvement in average Recall@1 and a 3.2\% improvement in average Recall@1\% on the CS-Urban-Scenes, while also showing the best performance on the CS-Campus3D dataset. Additionally, our ER re-ranking algorithm further boosts the average Recall@1 by 4.9\% on CS-Campus3D and 10.2\% on CS-Urban-Scenes without additional training.

21.
arXiv (CS.CV) 2026-06-15

LiAuto-GeoX: Efficient Grounded Driving Transformer

Dense 3D reconstruction has demonstrated immense potential for spatial understanding, yet its viability as a real-time, onboard representation for autonomous driving remains an open challenge. Existing large-scale visual geometry models typically require substantial computational resources and lack the long-range geometric fidelity, surround-view consistency, and real-time efficiency demanded by dynamic driving environments. To bridge this gap, we present LiAuto-GeoX, an efficient grounded driving transformer designed for deployable, ego-centric 3D scene understanding. Our approach begins by learning a high-capacity driving geometry model from large-scale surround-view data, utilizing sparse LiDAR priors to provide robust geometric grounding in distant, ambiguous, or structure-sparse regions. We then instantiate this capability into a highly compact 155M-parameter onboard model through a novel geometry-preserving distillation framework. This framework employs mask-guided depth-aware distillation to retain fine-grained metric structures by emphasizing geometrically informative regions, and relative-pose relational distillation to enforce cross-view spatial consistency through pose-induced geometric relations. Extensive evaluations reveal that LiAuto-GeoX runs at 220 FPS on KITTI while maintaining high-fidelity dense reconstruction, enabling real-time deployment. The learned geometry transfers seamlessly to downstream autonomy tasks, achieving 90.6 PDMS in trajectory prediction, 24.63 mIoU in occupancy prediction, and 47.67 IoU in future-frame prediction. These all demonstrate that efficient dense 3D reconstruction can transcend its traditional role as a perception target to serve as a scalable, foundational geometric representation for next-generation autonomous driving.

22.
arXiv (CS.CV) 2026-06-15

FoleyGenEx: Unified Video-to-Audio Generation with Multi-Modal Control, Temporal Alignment, and Semantic Precision

We present FoleyGenEx, a unified video-to-audio (VTA) framework integrating multi-modal control, frame-level temporal alignment, and fine-grained semantics, enabling synchronized, versatile audio synthesis for diverse tasks. Existing VTA methods either have multi-modal control but weak temporal alignment or strong alignment but lack reference audio conditioning and semantic precision. FoleyGenEx fills this gap via three core innovations: a conditional injection mechanism for audio-controlled VTA and Foley extension, a multi-modal dynamic masking strategy preserving training synchronization, and an adverb-based data augmentation algorithm leveraging signal processing and large language models to enhance textual supervision with nuanced semantics. Experiments on AudioCaps, VGGSound, and Greatest Hits demonstrate its competitive controllable VTA performance against existing methods. Demo samples are available at https://foleygenex.github.io/FoleyGenEx.

23.
arXiv (CS.LG) 2026-06-19

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

arXiv:2606.19491v1 Announce Type: new Abstract: Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $\gamma^{-1}/\|\gamma^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

24.
arXiv (CS.CL) 2026-06-16

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

25.
bioRxiv (Bioinfo) 2026-06-16

Physics-Driven Zero-Shot Reconstruction of Isotropic 3D Fluorescence Microscopy under Undersampled Acquisition

Three-dimensional (3D) imaging represents the development of next generation of fluorescence microscopy. However, routine axial down-sampling makes isotropic resolution unrealistic. Here, we propose DeepUI, a physical zero-shot framework designed to achieve isotropic 3D fluorescence images from a low axial sampling rate. DeepUI fully leverages the intrinsic characteristics of 3D images through physics-guided degradation, which incorporates spatial-frequency joint learning to generate a scaled optical transfer function, combined with noise degradation and an up-sampling branch. Typically requiring just 5 minutes for training and 0.5 minutes for high-throughput and fast prediction, we demonstrate the superior performance of DeepUI to get isotropic results, and the exclusivity to axial down-sampling conditions, even in more challenging conditions, including defocused background, noise, and resolution blur.