Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-11

Marked random graphs with given degree sequence: large deviations on the local topology

arXiv:2401.00351v2 Announce Type: replace Abstract: We investigate the behavior of the empirical neighborhood distribution of marked graphs in the framework of local weak convergence. Here we extend known results by considering uniform random graphs with given degree sequences and i.i.d. marks on half-edges and vertices. We establish a large deviation principle for such families of empirical measures. The proof builds on Bordenave and Caputo's seminal 2015 paper, and Delgosha and Anantharam's 2019 introduction of BC entropy, relying on combinatorial lemmas that allow one to construct suitable approximations of measures supported on marked trees. Possible applications of these results are in the study of interacting diffusions on top of random graphs.

02.
arXiv (quant-ph) 2026-06-19

Measuring Rényi entropy with an Echo Protocol

arXiv:2504.05237v3 Announce Type: replace Abstract: We present efficient and practical protocols to measure the second Rényi entropy, whose exponential is known as the purity. Our approach is based on expressing the purity in terms of transition probabilities generated by an echo-type forward-backward evolution sequence, making it applicable to quantum many-body systems. Notably, our approach does not rely on random-noise averaging, a feature that can be extended to protocols to measure out-of-time-order correlation functions, as we demonstrate. By way of example, we show that our protocols can be practically implemented in superconducting qubit-based platforms, as well as in cavity-QED trapped ultra-cold gases.

03.
arXiv (CS.CL) 2026-06-16

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of "one model fits all" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.

04.
arXiv (CS.LG) 2026-06-17

Towards Fast GNN Surrogates for CO2 Migration in Complex Geological Formations

arXiv:2606.17180v1 Announce Type: new Abstract: This chapter discusses how a data-driven machine learning approach can reproduce key aspects of the physical behavior of multiphase flows in complex geological formations. We propose an end-to-end graph neural surrogate tailored to CO$_2$ plume migration forecasting in geological storage. The method is evaluated on the SPE11A benchmark, a well-known industry test case designed to assess CO$_2$ storage scenarios and characterized by sharp gas-water interfaces, strong advective transport, and rapid convective mixing with fingering development. The benchmark is reformulated as a graph in which nodes represent computational cells and edges encode transmissibility-based interactions enriched with geometric attributes. Directional transport arising from grid geometry, permeability contrasts, and geological heterogeneity is captured through an anisotropic message-passing mechanism, where interaction weights are computed via geometry-conditioned edge embeddings, biasing message aggregation toward physically relevant transport directions. Temporal evolution is modeled in latent space using an autoregressive residual formulation trained with multi-step supervision. The proposed model produces competitive forecasts of gas saturation and liquid-phase density, which are key indicators for CO$_2$ storage monitoring, with cumulative errors that remain moderate over extended forecasting horizons.

05.
arXiv (CS.CV) 2026-06-16

HAFMat: Hybrid Priors Guided Adaptive Fusion for Single-Image Human Material Estimation

Physically based rendering (PBR) material estimation is a fundamental appearance decomposition task with broad applications in virtual content creation, relighting, and digital human rendering. However, estimating PBR materials from a single human image remains highly ill-posed, since illumination, geometry, and reflectance are heavily entangled in the observed appearance. To mitigate this ambiguity, we propose HAFMat, a hybrid-prior-guided framework for single-image human material estimation. Our method introduces guidance maps that encode complementary cues, including appearance, body geometry, structure, and prior material predictions from pre-trained models. A key observation is that these guidance cues are heterogeneous: some cues mainly provide texture-level constraints, while others convey higher-level semantic information. To exploit this property, we design a Multi-layer Adaptive Feature Fusion Mechanism, which adaptively fuses guidance features with decoder features at different stages. This design enables texture-dominant and semantic-dominant cues to guide material decoding at appropriate levels, leading to more accurate and physically plausible material estimation. Extensive experiments on both synthetic and real data demonstrate that our method achieves state-of-the-art performance in material estimation and downstream relighting.

07.
arXiv (CS.CV) 2026-06-15

Generation of Maximal Snake Polyominoes Using a Deep Neural Network

Maximal snake polyominoes are difficult to study numerically in large rectangles, as computing them requires the complete enumeration of all snakes for a specific rectangle size, which corresponds to a brute force algorithm. This hinders the study of maximal snakes in larger rectangles. Moreover, most enumerable snakes lie in small rectangles, obscuring large-scale patterns. In this paper, we investigate the contribution of a deep neural network to the generation of maximal snake polyominoes from a data-driven training, where the maximality and adjacency constraints are not encoded explicitly, but learned. To this extent, we experiment with a denoising diffusion model, which we referred as Structured Pixel Space Diffusion (SPS Diffusion). We find that SPS Diffusion generalizes from small rectangles to larger ones, generating valid snakes up to 28x28 squares and producing maximal snake candidates on squares close to the current computational limit. The model is, however, prone to errors such as branching, cycles, or multiple snake components. Overall, the diffusion model is promising and suggests that complex combinatorial objects can be understood by deep neural networks, which is useful in their investigation.

08.
arXiv (CS.CV) 2026-06-11

Continual Learning with Support Boundary Experience Blending

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

09.
arXiv (CS.CL) 2026-06-16

Agentic Reinforcement Learning for Search Misaligns Instruction-Tuning

Agentic reinforcement learning (RL) trains large language models to use tools, but its impact on alignment is poorly understood. We study how agentic RL for search affects the alignment of instruction-tuned (IT) models. We find that RL-trained models inherit refusal reasoning by deflecting harmful requests into benign search queries, but this breaks down under a simple diagnostic trigger that elicits a search call before refusal can occur. Under this condition, RL models produce multi-step unsafe search actions and reasoning, reducing search query safety by up to 68.6% in Qwen and Llama models relative to their IT counterparts. The effect generalises across model families, scales, and RL algorithms. To understand why, we identify linear directions in the residual stream that control search query safety, and show that RL training progressively shifts search behaviour toward the harmful end of this direction. We thus propose representation-guided RL training, which adds a reward penalty based on projection toward the harmful search direction. Training on benign data alone, it restores IT-level alignment without reducing task accuracy and requires no additional training data. Together, our work provides the first framework for diagnosing, mechanistically analysing, and mitigating alignment degradation in agentic RL for search.

10.
arXiv (CS.CV) 2026-06-11

Brain-IT-VQA: From Brain Signals to Answers

Decoding visual content from fMRI signals recorded while a person views images, and specifically answering questions about the seen images, is a long-standing challenge. While significant progress has been made in recent years in visual question answering (VQA) from fMRI, performance remains limited. Moreover, although recent models can make increasingly accurate predictions, they have rarely been used as tools for understanding the structure of visual representations in the brain. We present Brain-IT-VQA, a framework for visual question answering from fMRI. Building on the Brain Interaction Transformer (Brain-IT), our method decodes language tokens from brain activity and integrates them with a language model to answer visual questions. Our model substantially outperforms previous fMRI-based captioning and VQA approaches. We further introduce NSD-VQA, a new dataset and benchmark for visual question answering from fMRI. Unlike existing image-fMRI VQA datasets, which typically provide only a few broad and weakly controlled questions per image, NSD-VQA provides on average 20 question-answer pairs per image across 20 controlled question categories that disentangle multiple levels of visual understanding. This enables more reliable and interpretable evaluation despite limited fMRI test data. Together, Brain-IT-VQA and NSD-VQA provide both a strong predictive framework and a tool for studying brain representations. Using this benchmark, we quantify which forms of visual and semantic information can be reliably decoded from fMRI responses to natural images. We further analyze the contributions of different brain regions across question types.

11.
arXiv (CS.LG) 2026-06-24

Similarity of Neural Network Representations in Superposition

arXiv:2604.00208v2 Announce Type: replace Abstract: Comparing internal representations is a central goal in neuroscience and machine learning, but standard linear alignment metrics (Representational Similarity Analysis, Centered Kernel Alignment, and linear regression) are frequently applied to neural activity coordinates rather than on the underlying features. We show this matters when neural systems operate in superposition, encoding more features than they have neurons via linear compression. Closed-form derivations prove that these metrics depend on the Gram matrices of each system's projection, not on the latent features themselves: alignment thus combines what a system represents with how it is encoded. For those interested in what features two systems share, this is a problem: Two networks can have identical feature content yet appear more dissimilar than networks exhibiting partial feature overlap. This apparent misalignment need not reflect lost information as compressed sensing guarantees sparse features remain recoverable from the compressed activity. We confirm this by training supervised TopK sparse autoencoders that realize solvable compressed sensing by construction, finding alignment on recovered latents restored even when raw-activation alignment remains deflated. We extend the result to unsupervised SAEs trained without ground-truth latents, and to pretrained vision and language model SAEs, where SAE-latent alignment exceeds raw-activation alignment, consistent with superposition in real systems.

12.
arXiv (CS.CL) 2026-06-15

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

Deployed language models must decide not only what to answer but also when not to answer. We present UniCR, a unified framework that turns heterogeneous uncertainty evidence including sequence likelihoods, self-consistency dispersion, retrieval compatibility, and tool or verifier feedback into a calibrated probability of correctness and then enforces a user-specified error budget via principled refusal. UniCR learns a lightweight calibration head with temperature scaling and proper scoring, supports API-only models through black-box features, and offers distribution-free guarantees using conformal risk control. For long-form generation, we align confidence with semantic fidelity by supervising on atomic factuality scores derived from retrieved evidence, reducing confident hallucinations while preserving coverage. Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics, lower area under the risk-coverage curve, and higher coverage at fixed risk compared to entropy or logit thresholds, post-hoc calibrators, and end-to-end selective baselines. Analyses reveal that evidence contradiction, semantic dispersion, and tool inconsistency are the dominant drivers of abstention, yielding informative user-facing refusal messages. The result is a portable recipe of evidence fusion to calibrated probability to risk-controlled decision that improves trustworthiness without fine-tuning the base model and remains valid under distribution shift.

13.
medRxiv (Medicine) 2026-06-22

Integration of lung tissue proteomics and genome-wide association data to identify lung cancer susceptibility proteins and potential drug targets

Background: Proteins directly impact disease development and act as drug targets. Therefore, we integrated genomic and lung tissue proteomics data to identify lung cancer susceptibility proteins, elucidating genetic mechanisms and candidate drug targets. Method: We profiled the proteome and genome in non-neoplastic lung tissue from 200 lung cancer patients. Using this data, we constructed genetic models to predict abundance across the proteome in lung tissue. We applied these models to genome-wide association study (GWAS) data from 55,174 lung cancer cases and 1,294,174 controls to evaluate their associations with the risk of lung cancer, overall and by major histological subtypes. Bayesian colocalization and Mendelian randomization (MR) analyses were used to prioritize putative causal proteins, which were cross-referenced with three main drug-protein databases to identify potential therapeutic targets. Results: We identified 29 proteins associated with lung cancer risk at a false discovery rate < 5%, including 25 for overall lung cancer, two (AQP3 and IL18) specifically for adenocarcinoma, and another two (HMGN2 and HLA-DMB) for squamous cell carcinoma. Of them, genes encoding 17 proteins reside at least 2Mb away from any known GWAS risk loci, including 14 for overall lung cancer (HYI, GPX1, GMPPB, DSP, HDDC2, MTCH2, SUOX, JMJD7, PDIA3, IL16, IQGAP1, SULT1A2, ARHGAP27, and TYMP) and three for subtypes (AQP3, IL18, and HMGN2). Among the 12 proteins located within the known risk loci, EPHX2, CLDN18, PSMD5, and CYP2S1 proteins showed an association independent of the proximal GWAS-identified lead variant. Colocalization and/or MR analysis suggested 11 potential causal proteins. Five of these candidate causal proteins (DSP, CLDN18, IQGAP1, IL18 and TYMP) are targeted by nine drugs already approved by the FDA or in phase III trials. Conclusion: Our study identified novel lung cancer susceptibility proteins and potential drug targets, offering valuable insights into lung cancer biology and future translational utilities.

14.
arXiv (CS.LG) 2026-06-24

Flow-Corrected Thompson Sampling for Non-Stationary Contextual Bandits

arXiv:2606.23933v1 Announce Type: cross Abstract: We study non-stationary linear contextual bandits where the reward model drifts over time, rendering classical contextual bandit algorithms brittle because historical data becomes systematically biased. We propose Flow-Corrected Thompson Sampling (fcTS), a Bayesian method that reuses experience by transporting past rewards to the present using an explicit drift model and incorporating each transported observation with a confidence weight that reflects transport reliability. This yields a unified template that specializes in (i) linear parameter drift via online slope estimation and reward correction, (ii) periodic variation via phase-aware reuse across cycles, and (iii) recurring regime switches via changepoint detection and regime-specific posterior memory. The resulting posterior updates remain closed-form under a linear Gaussian model and can be implemented efficiently with truncated, incrementally updated sufficient statistics. Across five controlled case studies and a semi-synthetic portfolio-selection benchmark with multiple overlapping non-stationarities, fcTS outperforms standard forgetting-based baselines (discounting, sliding windows, and periodic restarts), with the largest gains in settings exhibiting recurring temporal structure. These results demonstrate that when non-stationarity is structured, correcting and reweighting historical observations can be substantially more sample-efficient than uniformly discarding them.

15.
arXiv (CS.LG) 2026-06-12

From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

arXiv:2606.12498v1 Announce Type: cross Abstract: Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

16.
arXiv (CS.AI) 2026-06-16

Overcoming the Impedance Mismatch: A Theoretical Roadmap for Fusing Foundation Models and Knowledge Graphs

arXiv:2606.15656v1 Announce Type: new Abstract: Modern artificial intelligence remains fundamentally divided between the continuous, probabilistic spaces of Foundation Models and the discrete, deterministic structures of Knowledge Graphs. While Retrieval-Augmented Generation (RAG) attempts to connect them by serializing graph data into text, we argue this lexical bridging is merely a superficial patch. In this paper, we formalize the underlying structural and geometric friction as the Impedance Mismatch. By categorizing current neuro-symbolic integration strategies into a three-tiered hierarchy, we demonstrate that neither surface-level prompt injection nor continuous representation alignment can preserve the strict logical motifs required for reliable multi-hop reasoning. We define the specific mathematical limits, such as the Lexical Bottleneck and Topological Collapse, that show current architectures will eventually hallucinate or conflate semantic nodes. To achieve true semantic fusion, we propose a rigorous theoretical roadmap. We advocate for natively internalizing discrete symbolic structures through Structured Residual Streams, utilizing Vector Symbolic Architectures for latent sub-graph injection, and performing model updates via Orthogonal Subspace Editing. This actionable framework paves the way for models that seamlessly fuse the precision of symbolic logic with the expressivity of parametric memory.

17.
arXiv (CS.CL) 2026-06-19

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it difficult to separate appearance effects from identity differences. We introduce StylisticBias, a controlled benchmark for evaluating attribute-level social bias in MLLMs. We generate 500 photorealistic base faces and create about 50 single-attribute variations per face, producing about 25K images. This design keeps identity fixed and changes one visual attribute at a time. It lets us measure how specific cues shift model judgments. We evaluate six MLLMs across 25 binary social judgment scenarios. We find that age and body type dominate identity-level effects, while fashion style and other visual cues drive the largest attribute-level shifts. We further find that about 15 attributes account for nearly 80\% of the total variation, showing that bias is concentrated in a small set of visual cues. Sensitivity is strongest in judgments that are semantically aligned with appearance, especially socioeconomic and style-related judgments. We release StylisticBias as a benchmark for fine-grained bias evaluation in multimodal models. Code and dataset: https://github.com/timo-cavelius/StylisticBias and https://hf.co/datasets/shaghayegh/stylistic-bias-dataset.

18.
arXiv (CS.AI) 2026-06-12

Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

arXiv:2606.13125v1 Announce Type: cross Abstract: Reinforcement learning has rapidly emerged as a key component in the training of reasoning and coding models, yet it remains poorly understood from a mechanistic perspective. We study how and through what underlying processes capabilities are acquired or enhanced via reinforcement learning post-training. Our analysis, based on controlled math reasoning experiments with Qwen-2.5-1.5B, reveals two core mechanisms: strategy selection and strategy improvement. Our results highlight the role of SFT data and reinforcement learning data in activating these mechanisms, in particular showing how supervising the model on diverse reasoning strategies can enable strategy selection and how increasing difficulty in reinforcement learning data can enable strategy improvement. Taken together, our results provide mechanistic insight into RL training and suggest practical interventions to continue scaling reasoning capabilities.

19.
arXiv (CS.CL) 2026-06-24

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

"Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being compressed. We present Cavewoman, a two-channel evaluation protocol that scores every generation on task accuracy, realized per-item cost, and reference-text agreement against the model's unconstrained reference. We evaluate eight models on five datasets at five reduction levels, with both channels measured on the same items. Output compression cuts realized cost on most API models (1.4-2.4x per model, up to 3x in the best case) and on all four open-weight models under public-tier pricing. Input compression has the opposite effect, a strict lose-lose: it raises net cost rather than lowering it (~1.15x on the five-benchmark mean, up to 1.8x on the worst dataset and 2.7x under stronger compression), because models compensate with longer responses even as accuracy collapses. Under the same setting, surface text diverges from the unconstrained reference: on the non-reasoning models, roughly half of all generations are correct yet their surface text no longer entails the model's own unconstrained baseline generation. The divergence survives length-controlled re-scoring, multiple-comparisons correction, and replication under complementary semantic measures. Code and data are available at https://github.com/danielle34/cavewoman.

20.
medRxiv (Medicine) 2026-06-18

Distinct Neuronal, Proliferative, and Secretory Pathways are Perturbed in Cancer Survivors with Depressive Symptoms

Introduction Depression is highly prevalent among cancer survivors and may be biologically distinct, although clinical studies investigating these mechanisms remain limited. Thus, the aims of this study were to (1) identify perturbed biological pathways associated with depressive symptom severity in cancer survivors, and (2) investigate whether these pathways are common or distinct to those perturbed in an age-matched non-cancer cohort. Methods We analyzed cross-sectional self-reported and transcriptomic data from the Multi-Ethnic Study of Atherosclerosis (PHD #39341). Cancer survivors and an age-matched non-cancer cohort (target ratio 1:2) were identified. The 20-item Center for Epidemiologic Studies Depression Scale (CES-D) was used to split participants into low (CES-D

21.
arXiv (CS.CV) 2026-06-24

A Geometry-Informed Computer Vision Method for Detecting and Examining Overtaking Vehicles From A Bicycle

Instrumented bicycle studies have produced direct field evidence on vehicle passing behavior, but extracting overtaking events from continuous rear-facing video has remained dependent on manual, frame-by-frame annotation. This bottleneck constrains sample sizes and limits naturalistic cycling safety research. We present a geometry-informed computer vision pipeline that automates overtaking event detection from a single bicycle-mounted camera without multi-sensor configurations or explicit camera calibration. The system combines RT-DETR object detection with ByteTrack multi-object tracking through a three-stage geometric validation module enforcing bearing angle trend, apparent size growth, and spatial confirmation criteria derived from perspective projection principles. Validated on 315 manually annotated real-world overtaking events from urban roads in Ann Arbor, Michigan, the pipeline achieved 97.8% recall with zero false positives. The system identified overtaking intentions a mean of 2.44 seconds before vehicle passage, with 84.1% of events exceeding the 1.5-second human reaction time threshold, demonstrating feasibility for active cyclist warning. Lateral passing distance measurements from 96 events revealed 33.3% of passes below the 5-foot (152.4 cm) threshold, consistent with non-compliance rates in prior field and self-reported studies. A preliminary calibration-free lateral distance estimation approach using bounding box geometric features achieved mean absolute errors of 13-14 cm under leave-one-out cross-validation, sufficient to distinguish close passes from standard passes for safety categorization. By automating event isolation from consumer-grade footage, the system removes the primary annotation bottleneck of instrumented bicycle research and provides a scalable foundation for vehicle-bicycle interaction analysis across larger datasets and diverse urban environments.

22.
arXiv (CS.LG) 2026-06-11

A Multi-Modal Sensor Fusion Instrument for Measuring Regional Human Mobility: The Distributed Human Data Engine (DHDE)

arXiv:2603.21639v2 Announce Type: replace-cross Abstract: Accurately estimating human mobility in peripheral regional economies presents a fundamental measurement challenge: physical ground-truth sensors are sparse, behavioral intent signals are heterogeneous, and environmental friction introduces systematic bias into demand inference. We introduce the Distributed Human Data Engine (DHDE), a multi-modal sensor fusion architecture that addresses this challenge by integrating physical instrumentation (Edge-AI cameras), digital intent signals (route search impression metrics), behavioral records (90,350 spending records, 97,719 standardized survey responses), and meteorological data across four geographically distributed nodes in Fukui, Japan. The primary measurement-science contribution is the design, deployment, and cross-node validation of the DHDE as a sparse-sensor compensation instrument: a heterogeneous sensor fusion architecture that anchors non-stationary digital intent signals to concurrent physical ground-truth counts, correcting for systematic bias introduced by meteorological planning friction. The instrument is implemented as an ensemble inference pipeline (Random Forest and Ordinary Least Squares with Newey-West robust inference), calibrated across 397 daily observations and validated by chronological holdout replication across four geographically distinct node types. The primary OLS specification achieved an in-sample explanatory power of R2 = 0.810 and a chronological out-of-sample predictive performance of R2 = 0.683. Results identify an Under-Vibrancy Paradox where macro-regional visitor satisfaction correlates positively with crowd density (Spearman rank correlation rs = +0.150, p = 0.002). We estimate an annual proxy gap of 865,917 intent-implied visits, corresponding to JPY 11.96 billion (USD 72.6 million) in foregone revenue.

23.
arXiv (CS.AI) 2026-06-16

Task-guided cross-subject latent alignment: a multi-encoder-decoder VAE

arXiv:2606.15989v1 Announce Type: cross Abstract: Aligning neural activity across subjects offers the promise of discovering shared computational principles and generalizable decoders. However, traditional alignment methods require shared stimuli across subjects, a constraint that limits applicability to naturalistic paradigms with limited or non-overlapping data. We introduce a Multi-Encoder-Decoder Variational Autoencoder (MED-VAE) that achieves cross-subject alignment without shared stimuli by anchoring representations to a common scaffold provided by a pretrained ANN. Using the Natural Scenes Dataset, we show that MED-VAE creates common latent spaces with superior semantic organisation, achieving higher cross-subject alignment than common methods while maintaining robust generalisation to held-out stimuli where traditional methods degrade. Reconstructing from these common spaces back to each subject's original neural space, MED-VAE preserves equal stimulus-driven signal in its cross-subject latent space. Finally, we show that this superior alignment directly enables cross-subject neural prediction, as demonstrated via cross-subject image decoding. In summary, we introduce a framework to identify generalisable common subspaces for cross-subject predictions and downstream tasks, demonstrated here for visual cortex responses to static images.

24.
arXiv (CS.LG) 2026-06-19

Prior-Informed Flow Matching for Graph Reconstruction

arXiv:2601.22107v2 Announce Type: replace Abstract: We introduce Prior-Informed Flow Matching (PIFM), a conditional flow model for graph reconstruction. Reconstructing graphs from partial observations remains a key challenge; classical embedding methods often lack global consistency, while modern generative models struggle to incorporate structural priors. PIFM bridges this gap by integrating embedding-based priors with continuous-time flow matching. Grounded in a permutation equivariant version of the distortion-perception theory, our method first uses a prior, such as GraphSAGE or node2vec, to form an informed initial estimate of the adjacency matrix based on local information. It then applies rectified flow matching to refine this estimate, transporting it toward the true distribution of clean graphs and learning a global coupling. Experiments on different datasets demonstrate that PIFM consistently enhances classical embeddings, outperforming them and state-of-the-art generative baselines in reconstruction accuracy.

25.
arXiv (CS.CV) 2026-06-16

LOCUS: Local Visual Cue Search for Enhancing Fine-Grained Perception in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) remain unreliable on fine-grained visual perception, even when high-resolution inputs preserve the necessary local details. We identify this limitation as visual context rot: decisive evidence may exist in the full image, yet fail to be reliably selected and used amid redundant visual context. We propose LOCUS (LOcal visual CUe Search), a training framework that teaches MLLMs to internalize local evidence search through a verifiable proxy task. During training, LOCUS provides a local crop as a visual cue and optimizes the model to recover its spatial support in the full image using an IoU-based reward. The visual cue is used only during training, leaving the standard image-question inference interface unchanged. Experiments across fine-grained perception, hallucination, general understanding, and reasoning benchmarks show that LOCUS improves localization-sensitive visual understanding while preserving broad capabilities. Attention analyses further indicate stronger focus on task-relevant evidence regions, suggesting that training-time visual cue search provides an effective route to internalized fine-grained evidence selection.