Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-15

Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise

arXiv:2512.14967v2 Announce Type: replace Abstract: We present a novel numerical method for solving McKean–Vlasov forward–backward stochastic differential equations (MV–FBSDEs) with common noise, combining Picard iterations, elicitability and deep learning. The key innovation involves elicitability to derive a pathwise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise, without requiring computationally expensive nested Monte Carlo simulations. The mean-field interaction term is parameterized via a recurrent neural network trained to minimize an elicitable score, while the backward process is approximated through a hybrid feedforward and recurrent network representing the decoupling field. We validate the algorithm on a systemic-risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution. We further extend the model to quantile-mediated interactions, showcasing the flexibility of the elicitability framework beyond conditional means or moments. Finally, we apply the method to a non-stationary Aiyagari–Bewley–Huggett economic growth model with endogenous interest rates, illustrating its applicability to complex mean-field games without closed-form solutions.

02.
arXiv (CS.LG) 2026-06-16

Bayesian Tensor Decomposition with Diffusion Model Prior

arXiv:2606.03212v2 Announce Type: replace Abstract: Low-rank tensor decomposition (TD) is usually effective on clean, fully observed data, but it often degrades under severe missingness or noise. Low-rankness is itself a useful but limited structural prior, and additional handcrafted priors (e.g., sparsity or smoothness) still fall short of capturing the rich statistics of real-world data. To compensate for this weak inductive bias under heavy corruption, one would like to inject a learned, data-driven prior; however, the state-of-the-art diffusion models are not readily compatible with current TD and tractable posterior inference. To address these challenges, we introduce DiffBCP, a hybrid-prior Bayesian CP decomposition framework that couples a cumulative shrinkage process prior over the CP factors for automatic rank selection with an off-the-shelf pre-trained diffusion model as an implicit data prior on the reconstructed tensor. To make posterior inference tractable despite the coupling among the likelihood, low-rank constraint, and diffusion prior, we develop a split Gibbs sampler: CP factors admit conjugate updates, while the diffusion block is sampled via low-rank-guided denoising. A noise-adaptive coupling schedule further reduces sensitivity to hand-tuned annealing. Experiments on image inpainting and denoising, including high-resolution out-of-distribution images, show consistent gains over Bayesian, nonlinear, and plug-and-play TD baselines.

03.
arXiv (CS.LG) 2026-06-18

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

arXiv:2506.13196v5 Announce Type: replace Abstract: Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

04.
arXiv (CS.LG) 2026-06-15

Closed-loop discovery of out-of-distribution processing protocols by evolutionary search and uncertainty-aware learning

arXiv:2606.13859v1 Announce Type: cross Abstract: Many materials and chemical systems exhibit history-dependent responses, where functional outcomes are governed not only by final-state variables but by the time-dependent sequence of fields, temperatures, or chemical potentials applied during operation. Discovering new processing protocols is therefore a high-dimensional search problem in which the control variable is an entire waveform or sample history, and conventional strategies either remain confined to conservative interpolative families or become prohibitively measurement intensive. Here, a closed-loop workflow is introduced that couples evolutionary search over a compact waveform representation with uncertainty-aware deep kernel learning to generate, rank, and experimentally validate candidate protocols. Applied to ferroelectric thin films, with the scanning-probe tip-bias waveform as the protocol and the nonlinear electromechanical response as the reward, the workflow discovers waveform families that enhance nonlinearity by de-aging the film. Spatially resolved before/after measurements show that the best-performing waveforms selectively activate pre-existing, weakly pinned domain-wall segments, whereas the worst drive long-range irreversible switching. This framework reframes protocol tuning as out-of-distribution discovery, generalizable to synthesis and annealing trajectories, battery formation protocols, and other high-dimensional control problems.

05.
Nature (Science) 2026-06-23

Europe must seize the moment to lead on free and open science

作者: 未知作者

An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent. An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent.

06.
arXiv (CS.AI) 2026-06-16

Fine-Tuning a 7B Advisor on Free-Tier GPUs: An Adapter-Handoff Recipe and a Synthetic-Data Reliability Caution

arXiv:2504.15610v4 Announce Type: replace Abstract: Fine-tuning a 7B language model for specialized advising is attractive in resource-constrained settings, but multi-epoch runs routinely exceed the wall-clock limits of the free-tier GPUs (Kaggle, Colab) such users rely on. We report two things. First, a practical recipe: a three-epoch QLoRA fine-tune of Mistral-7B-Instruct-v0.3 (4-bit NF4, LoRA rank 16, via Unsloth) completed across two free-tier 16 GB GPUs (Tesla P100 then T4) by checkpointing only the small LoRA adapter (41.9M parameters) and resuming on the second machine. Adapter-only handoff is sufficient – optimizer and scheduler state need not be transferred – so the binding constraint is per-step VRAM and per-session wall-clock, not aggregate compute. Second, and more importantly, an honest evaluation that returns a cautionary result. On a blind held-out comparison against the un-fine-tuned base model, the fine-tuned model scored higher on similarity to the synthetic training distribution (BERTScore F1 +0.063, a fidelity not quality signal) but lower on advising quality: a blind LLM-as-judge preferred the base model on 46% of prompts versus 18%, and a source-verified factuality audit found four confident errors from the fine-tuned model on policy-sensitive topics against zero for the base. Auditing the training data with the same method, we find this is not a fine-tuning artifact: each audited error is already present in the Gemini-generated training answers, and a random-sample audit finds verifiable errors in a sizable fraction of responses (28-40%; single-judge, n=40). The data is therefore sufficient to account for the errors, which we attribute to the synthetic-data pipeline rather than the adapter-handoff method. We release the dataset, adapter, cross-GPU notebooks, and full evaluation harness so every result reproduces on a single 16 GB GPU.

07.
arXiv (CS.CL) 2026-06-12

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in RAG settings remains poorly understood. We conduct a systematic study across four benchmarks, revealing that LLMs exhibit poor calibration performance especially when noisy contexts are retrieved. Specifically, contradictory or irrelevant evidence tends to exacerbate the model's overconfidence issue. To address this, we propose NOVA Rules (NOise-Aware Verbal Confidence CAlibration Rules) to provide a principled foundation for resolving overconfidence under noise. We further design NOVA, a noise-aware calibration framework that synthesizes supervision from ~2K HotpotQA examples guided by these rules. By performing supervised fine-tuning (SFT) with this data, NOVA equips models with intrinsic noise awareness without relying on stronger teacher models. Empirical results show that NOVA yields substantial gains, improving ECE scores by 10.9% in-domain and 8.0% out-of-domain. By bridging the gap between retrieval noise and verbal calibration, NOVA paves the way for both accurate and epistemically reliable LLMs.

08.
arXiv (CS.CV) 2026-06-11

Continual Learning with Support Boundary Experience Blending

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

09.
medRxiv (Medicine) 2026-06-18

Intra-arterial recombinant human TNK tissue-type plasminogen activator (rhTNK-tPA) thrombolysis for acute medium vessel occlusion (MeVO-TNK): Study rationale and design

Background The optimal management of acute ischemic stroke caused by medium vessel occlusion (MeVO) remains uncertain. Recent randomized trials have failed to demonstrate a clear benefit of endovascular therapy in this population, whereas intra-arterial thrombolysis (IAT) has emerged as a biologically plausible alternative. However, prospective evidence supporting IAT in MeVO is lacking, and the optimal dosing strategy for stand-alone IAT remains undefined. Aim To preliminarily evaluate the efficacy and safety of intra-arterial tenecteplase (IA-TNK) plus standard medical therapy (SMT) compared with SMT alone in patients with acute MeVO stroke, and to explore a stepwise IA-TNK dosing strategy. Design The MeVO-TNK trial is a multicenter, prospective, randomized, open-label, blinded-endpoint (PROBE), exploratory phase II study. A total of 60 participants with imaging-confirmed MeVO will be randomized 1:1 to receive either IA-TNK plus SMT or SMT alone. Participants presenting beyond 6 hours from symptom onset must demonstrate salvageable penumbral tissue on advanced imaging. Those assigned to the intervention group will receive up to two intra-arterial boluses of tenecteplase (0.0625 mg/kg per bolus), with the second bolus administered based on angiographic assessment of reperfusion and safety. Outcomes The primary efficacy outcome is final infarct volume measured at 72{+/-}24 hours after randomization. Secondary efficacy outcomes include the proportions of patients achieving modified Rankin Scale (mRS) scores of 0-1, 0-2 and 0-3 at 90 days, a shift analysis of the mRS distribution at 90 days, early neurological deterioration, and National Institutes of Health Stroke Scale score at 7 days or discharge. The primary safety outcome is symptomatic intracranial hemorrhage within 24 hours. Conclusions This trial will provide preliminary evidence on the biological efficacy, reperfusion potential and safety of stand-alone IA-TNK for acute MeVO stroke, helping to address an important evidence gap and inform the design of future confirmatory studies.

10.
arXiv (CS.AI) 2026-06-16

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

arXiv:2606.15669v1 Announce Type: cross Abstract: Modern deep neural networks rely on Euclidean scalar activations (e.g., ReLU) and global normalization techniques (e.g., LayerNorm) to prevent gradient instability in deep architectures. However, these mechanisms inherently cause dead neurons, discard critical directional information, and destroy the orthogonality of feature representations. Inspired by the frequency-modulation transmission of biological axons, we propose the Z-Plane Neural Network, which maps hidden states into 2D phasor bundles on a hypersphere. We introduce a novel geometric activation function, Radial Bounding($\mathbf{x} / \max(1, \|\mathbf{x}\|_2)$), which limits the energy magnitude while preserving the phase (direction). We demonstrate mathematically that this isotropic activation maintains 1-Lipschitz continuity and prevents gradient vanishing by preserving tangential gradients. Empirically, a 100-layer Z-Plane Multi-Layer Perceptron (MLP)-entirely devoid of ReLU and LayerNorm-successfully converges on the MNIST dataset with 98.34% accuracy and absolute numerical stability, proving that bounded geometric activation alone is sufficient for stable deep learning.

11.
arXiv (CS.CV) 2026-06-11

Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

Knowledge Distillation (KD) and mixup have proven effective at inducing smoothness in class boundaries; KD captures inherent class relationships in probability distributions, and mixup enforces them through convex combinations of inputs. Their interaction, however, remains poorly understood, particularly when mixup is applied only during student training. In this setting, the teacher is queried on inputs drawn from a vicinal distribution it never saw during training, a controlled mismatch whose effect on knowledge transfer has not been characterised. We show that this mismatch causes the teacher's supervisory signal to be dominated by distributional confusion rather than inter-class structure. Despite it, the student does not merely imitate the teacher: it independently acquires greater linearity in the vicinal region, a structural property that the teacher lacks, and goes beyond dark-knowledge transfer. KD with mixup consistently improves student accuracy and reduces overconfidence by an order of magnitude relative to the baseline, across CIFAR and ImageNet with varying-capacity teachers. Crucially, calibration propagates from teacher to student independently of accuracy transfer, and temperature scaling governs a measurable accuracy-calibration trade-off that becomes more pronounced under vicinal training. These results reframe mixup distillation not as a degraded version of standard KD, but as a richer transfer channel that simultaneously shapes discriminative performance, uncertainty estimation, and representational geometry.

12.
arXiv (CS.CV) 2026-06-15

Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining (possibly erroneous) maps from open-source platforms with pervasive radio frequency (RF) data collected from multiple wireless user equipments and base stations. Unlike prior methods, our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. To address the challenges associated with real-world data imperfections, we introduce controlled noise to its RF data so as to simulate real-world conditions. Additionally, we develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics: the Jaccard index (intersection over union, IoU), the Hausdorff distance, and the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%. The comparative evaluation highlights the limitations of relying solely on RF data or on spatial data, as well as the effectiveness that AI can have on fusing data towards enhancing smart city mapping accuracy. We further validate our method on real-world data from the Oslo region, complementing the synthetic evaluation with a real deployment setting, where our best fusion model reaches 64.9% macro IoU. We additionally outline a strategy for deploying the model over larger areas by tiling the region with overlapping windows.

13.
arXiv (CS.LG) 2026-06-18

Seed-Guided Semi-Supervised Clustering by A-Contrario Anomaly Detection

arXiv:2606.18833v1 Announce Type: new Abstract: This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments – a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on a-contrario statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10–30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

14.
arXiv (CS.LG) 2026-06-12

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

arXiv:2606.13633v1 Announce Type: cross Abstract: Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial wildfire suppression that combines a hybrid neural-cellular automaton wildfire model with gradient-based design of targeted aerial drops. The wildfire model predicts spatially varying spread behavior from terrain, fuel, and wind data, while the intervention module determines binary drop actions with continuous-valued location and orientation parameters mapped to the simulation grid. Water and retardant are represented with distinct suppression effects, corresponding to immediate reduction of active burning and persistent reduction of future spread. To evaluate the robustness of the resulting suppression plans, we quantify both aleatoric uncertainty through Monte Carlo sampling of daily fire-state realizations and epistemic uncertainty through spatially correlated prediction-error perturbations. A case study based on the 2020 Bear Fire shows that the framework can generate coherent aerial suppression schedules for reducing total fire-affected area and can support uncertainty-aware analysis of wildfire intervention strategies.

15.
arXiv (CS.CL) 2026-06-16

When Cognitive Graphs Meet LLMs: BDEI Cognitive Pathways for Panic Emotional Arousal Prediction

Predicting individual panic emotional arousal timing before manifestation is essential for proactive emergency intervention. Existing methods incorporate cognitive elements but none explicitly model the emotional arousal process, making them ill-suited for emotional arousal timing prediction. We argue that grounding prediction in appraisal emotion theory is necessary because it explicitly models this process, but three problems must be solved. (1) Appraisal theory posits that emotion arises from simultaneous evaluation across multiple threat dimensions, yet no prior work fuses these inputs into risk perception. (2) Existing cognitive models lack an Emotion node, decoupling threat appraisal from emotional arousal and forcing emotions to be inferred indirectly from behaviors. (3) Given their generalizable cognitive reasoning, current approaches adopt LLMs as the primary decision-maker, yet overlook the fragility and hallucination-proneness of their outputs. To address these issues, we introduce PanicCognitivePath (PCP), a framework that addresses all three. A Psychological Safety Distance (PSD) model, grounded in psychological distance theory, maps four-domain signals into a unified risk metric as the entry condition for subsequent cognitive reasoning. An explicit Emotion node grounded in appraisal emotion theory is introduced into BDI, forming a Belief-Desire-Emotion-Intention (BDEI) pathway. Agents whose risk metric exceeds the PSD threshold enter this pathway, coupling threat appraisal directly to emotional arousal. The BDEI pathway governs all state transitions while the LLM is confined to parameter estimation for the Belief-to-Desire transition, confining hallucinations to a single step and preventing error propagation. Experiments on Hurricane Sandy show PCP improves arousal timing accuracy by 10.68% over baselines, reduces peak count error to 7.07%.

16.
arXiv (CS.AI) 2026-06-17

Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

arXiv:2606.17312v1 Announce Type: new Abstract: Large language models can arrive at the same answer through reasoning paths that are unstable, contradictory, or difficult to rank consistently – a failure mode especially prevalent in multi-step deductive reasoning. Existing methods assess reliability primarily through output dispersion – measuring how much sampled answers differ – but this discards a complementary signal: whether the model can consistently rank competing reasoning candidates. We propose structural uncertainty, a consistency-aware framework derived from the stability of self-preference-induced rankings over sampled reasoning solutions. Given a query, we generate multiple candidate solutions and ask the model to judge pairwise preferences among its own outputs. We aggregate self-preferences into ranking distributions via Bradley-Terry modeling with PageRank, and decompose the signal into two entropy-based components: across-trial ranking instability and within-trial candidate ambiguity. Across five LLMs and eight benchmarks, structural signals provide information complementary to answer dispersion: on logical and mathematical reasoning tasks, the combination improves identification of unreliable instances, while on factual retrieval the structural signal collapses toward uniformity, diagnosing a regime boundary where reasoning-level consistency evaluation is uninformative. The two components relate differently to accuracy: within-trial ambiguity correlates positively with correctness – consistent with settings where multiple plausible solution paths remain competitive – while across-trial instability correlates negatively, signaling unreliable reasoning. Structural uncertainty is best understood not as a universal confidence estimator, but as a regime-sensitive evaluator of logical reasoning consistency.

17.
arXiv (CS.LG) 2026-06-15

BigPower: Hierarchical Source-Level Module Power Estimation for CPUs with Large Language Models

arXiv:2606.13747v1 Announce Type: cross Abstract: Accurate power estimation is important for understanding and optimizing CPU power behavior, yet practical workflows often rely on simulation-derived information or post-silicon analysis. In this work, we present BigPower, a hierarchical source-level surrogate model for fine-grained module-level power estimation during CPU design. BigPower leverages large language model-based representations together with architectural hierarchy, module connectivity, configuration parameters, and workload context to estimate module-level power consumption directly from source-level design information, without requiring additional simulation during inference. Experimental results in the open-source XiangShan processor family demonstrate practical fine-grained power estimation across diverse configurations and workloads, offering an efficient alternative to conventional simulation-based workflows.

18.
arXiv (CS.CL) 2026-06-17

PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

作者:

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with a real-document benchmark of 122 tasks across five professional domains (financial, legal, medical, scientific, DevOps) using actual SEC filings, Federal Register rules, PubMed abstracts, arXiv papers, and GitHub postmortems. Paraphrasing, the strongest defense on synthetic benchmarks, shows no statistically significant attack success rate reduction on real documents (p=0.500) while degrading utility from 91.8% to 82.8%. We introduce PARSE (Provenance-Aware Retrieval Sanitization), a domain-aware, fact-preserving sanitization pipeline that classifies each sentence by injection likelihood, extracts structured facts before rewriting, and verifies fact preservation via a consistency-checking loop. A directiveness gate routes 59% of real enterprise documents to a lightweight path, concentrating computational cost on high-risk documents. PARSE achieves 15.6% attack success rate – a 38% reduction versus the 25.4% baseline – at 86.9% utility, the only condition that is both statistically significant (p=0.014, adequately powered) and maintains near-baseline utility. Practitioners should evaluate defenses on domain-matched real documents, not synthetic proxies.

19.
arXiv (CS.CV) 2026-06-11

SHERPA: Seam-aware Harmonized ERP Adaptation for Open-Domain 360$^\circ$ Panorama Generation

Panoramic imagery is increasingly used in world-generation, games, and simulation, where users may need not only photorealistic scenes but also stylized and non-photorealistic environments. Large-scale text-to-image diffusion and flow models provide broad style and semantic priors for this goal, but planar image training misaligns them with the wrap-around topology and polar regions of $360^\circ$ panoramas represented in equirectangular projection (ERP). We present SHERPA, a lightweight adaptation framework that combines frequency-selective Circular RoPE, Circular Latent Encoding/Decoding, image-side FFN adapters, and a Dual-Path Training Scheme. Circular RoPE replaces only the seam-sensitive high-frequency horizontal RoPE band with integer-periodic harmonics while preserving the pretrained lower-frequency spectrum. The Paired Panorama Path supervises geometry, while the Unpaired Style Path uses self-supervised yaw consistency for target-free stylized prompts. As a result, SHERPA generates $360^\circ$ panoramas across both photorealistic panorama domains and open-domain stylized prompts.

20.
arXiv (CS.LG) 2026-06-15

DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

arXiv:2606.14116v1 Announce Type: new Abstract: The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

21.
arXiv (CS.CV) 2026-06-18

On-Manifold Variational Learning with Heat-Kernel Priors

Learning unsupervised representations of medical imaging cohorts can reveal clinically meaningful prototypes without expert labels, which are often noisy and fail to capture true pathological heterogeneity. However, existing deep latent-variable models estimate Gaussian mixture priors via Euclidean averaging, producing prototypes that drift off the curved data manifold and degenerate as the number of sub-populations grows. We propose a manifold-anchored variational framework built on a geometry-aware Expectation-Maximization (EM) algorithm, whose M-step selects each sub-population prototype as the graph medoid with the highest diffusion centrality on a heat-kernel-weighted latent graph, ensuring that every prototype remains on-manifold. A Dirichlet energy regularizer enforces geometric smoothness of the latent space, and a per-sub-population uncertainty score enables label-free quality assessment. \rev{The manifold-anchored EM is a general-purpose geometric tool that extends standard EM and applies readily to other latent-variable models beyond this setting.} On cardiac scar and brain MRI benchmarks, our framework attains the highest accuracy among all compared methods, produces the sharpest prototypes reported to date, and remains stable at large sub-population counts where all baselines degenerate.

22.
medRxiv (Medicine) 2026-06-11

Wealth-Related Inequalities in Cesarean Section Utilization Among Facility-Based Births in Bangladesh: Evidence from Public and Private Healthcare Facilities

作者:

Background Bangladesh has experienced a rapid increase in cesarean section (CS) utilization over the past two decades. While previous studies have documented socioeconomic disparities in CS use, evidence on how wealth-related inequalities differ between public and private healthcare facilities remains limited. This study assessed the magnitude and drivers of socioeconomic inequality in CS utilization among facility-based births in Bangladesh. Methods We analyzed data from 3,008 facility-based births reported in the 2022 Bangladesh Demographic and Health Survey (BDHS). Survey-weighted multivariable logistic regression was used to identify factors associated with CS utilization. Wealth-related inequality was assessed using concentration curves and the Erreygers-corrected concentration index (ECCI). Regression-based decomposition of the standard concentration index was performed to quantify the contribution of socioeconomic, demographic, and healthcare-related factors to observed inequalities overall and separately for public and private facilities. Results Overall, 71.2% of facility-based births were delivered by CS, with substantially higher prevalence in private facilities (84.2%) than in public facilities (35.9%). Women delivering in private facilities had markedly higher odds of CS than those delivering in public facilities (adjusted odds ratio [AOR]: 9.07; 95% confidence interval [CI]: 7.17-11.47). Significant pro-rich inequality was observed overall (ECCI: 0.154; 95% CI: 0.117-0.191), with inequality substantially greater in public facilities (ECCI: 0.189; 95% CI: 0.114-0.264) than in private facilities (ECCI: 0.049; 95% CI: 0.014-0.084). Decomposition analysis showed that household wealth was the dominant contributor to inequality, particularly the richest wealth quintile, accounting for 81.5% of overall inequality, 63.8% in public facilities, and 109.7% in private facilities. Conclusions Wealth-related inequalities in CS utilization remain substantial in Bangladesh despite widespread use of the procedure. Although pro-rich inequality exists across both sectors, inequality is considerably greater in public facilities and is driven by different mechanisms across facility types. Policies should simultaneously improve equitable access to medically necessary CS and reduce unnecessary procedures, particularly within the private sector.

23.
arXiv (quant-ph) 2026-06-17

Frequency upconversion of infrared signals via molecular cavity optomechanical systems with gain

arXiv:2606.17877v1 Announce Type: new Abstract: Molecular cavity optomechanical systems have recently emerged as a promising platform for enhancing infrared detection sensitivity, owing to their ability to up-convert low-frequency infrared (IR) photons to visible frequency range. Generally, under red-detuned pumping in such systems, the ideal conversion efficiency of the IR signal approaches 1. To overcome this efficiency constraint, we propose a scheme that incorporates gain into the infrared cavity of a molecular cavity optomechanical system comprising two cavities and an ensemble of N molecules. The upconversion process, which relies on IR absorption and Raman scattering associated with specific vibrational modes, is significantly amplified by the incorporation of gain under the red-detuned conditions. Moreover, our analysis demonstrates that the added noise is maintained near 0.5.

24.
arXiv (CS.AI) 2026-06-18

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

作者:

arXiv:2510.05107v5 Announce Type: replace Abstract: The central challenge for AI agents is not only performance but accountability. Agents that act through opaque prompt sequences may produce correct outputs, but they provide little basis for verifying why an action was permitted, where an error occurred, or how responsibility should be assigned. This paper presents the Structured Cognitive Loop as an architecture for accountable behavior in large language model agents. SCL separates cognition, memory, control, and action into distinct modules. The language model proposes. External memory preserves verified state. A lightweight controller checks preconditions, prevents redundant actions, and authorizes execution before tools are used. We evaluate SCL against ReAct and common LangChain agent variants across travel planning, conditional email drafting, and constraint guided image generation. Across 360 episodes, SCL achieves 86.3 percent task success compared with 70.5 to 76.8 percent for prompt based baselines. It also improves goal fidelity, reduces redundant tool calls, increases reuse of intermediate state, and lowers unsupported assertions. This extended revision situates SCL within a broader architecture of epistemic accountability. Subsequent extensions integrate context aware Human in the Loop control, Pool Gated Retrieval, and the Horizon Warrant Commitment framework. Together these components define an agent architecture in which the model proposes, structure decides, evidence is warranted before use, and human judgment is embedded in the trace rather than imposed after the fact. The result is a foundation for AI agents whose decisions are not only effective but also authorized, inspectable, and accountable.

25.
arXiv (CS.CV) 2026-06-16

MeshLoom: Feed-Forward Non-Rigid Registration of Mesh Sequences

We present MeshLoom, a feed-forward registration network that directly reconstructs vertex deformations across mesh sequences. Our approach advances non-rigid registration beyond existing models, which are typically constrained by costly per-instance optimization, narrow object categories, pairwise-only inputs, or merely intermediate outputs. The network is simple and efficient, registering multiple meshes within seconds. At its core lies a topology-aware encoder–decoder design. Specifically, we first introduce a topology-aware point representation that encodes the anchor (reference) mesh's topology into its per-vertex features. This representation strengthens the network's understanding of the anchor-mesh geometry and disambiguates points that are Euclidean-close yet geodesically distant. We then propose a multi-modal encoder that fuses this anchor-mesh representation with complementary cues from each frame, such as shape latents and image features. These multi-source signals are compressed into a compact global motion embedding that captures dense inter-frame correspondence. A lightweight decoder then queries this global embedding with the anchor-mesh point representation, retrieving per-vertex deformations at target timestamps. Through extensive experiments across diverse motions and object categories, we show that MeshLoom achieves state-of-the-art results on non-rigid registration. In addition, we find that our global embedding-then-query paradigm naturally enables the network to generate deformations at intermediate timestamps, which extends MeshLoom to motion interpolation and mesh morphing. Project page: https://meshloom.github.io/ .