Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-19

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

arXiv:2606.20382v1 Announce Type: new Abstract: MultiModal Federated Graph Learning (MM-FGL) offers a natural collaborative training paradigm, but its practical deployment is challenged by two granularities of modality imbalance. Client-level imbalance occurs when certain clients lack entire modalities, while node-level imbalance occurs when individual nodes exhibit missing visual or textual attributes. While several relevant studies exist, our investigation reveals that they predominantly target graph-agnostic or centralized scenarios, rendering them difficult to adapt directly. To address these challenges, we formalize modality-imbalanced MM-FGL as an implicit graph-aware latent semantic representation synthesis problem. This paradigm recovers missing modal semantics directly within the representation space, thereby maximizing alignment with the original data's semantic distribution and mitigating the high variance induced by missing modalities. To this end, we propose FedMGS (Federated Modality-aware Graph Synthesis), which integrates three core components. The availability-aware graph encoder prevents missing modalities from contaminating local structural propagation. The prototype-guided latent semantic synthesizer establishes cross-client semantic anchors for unavailable modalities. The reliability-calibrated semantic fusion mechanism regulates the impact of recovered latent representations prior to predictive readout. Extensive experiments on four tasks show that FedMGS consistently outperforms competitive baselines with gains up to 17.41% with best efficiency-performance tradeoff.

02.
arXiv (CS.AI) 2026-06-19

GLARE: A Natural Language Interface for Querying Global Explanations

arXiv:2606.19735v1 Announce Type: new Abstract: While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.

03.
arXiv (CS.LG) 2026-06-15

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

arXiv:2606.13912v1 Announce Type: cross Abstract: Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude–phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $\alpha$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

04.
arXiv (CS.LG) 2026-06-16

Factorized Neural Operators Decompose Dynamic and Persistent Responses

arXiv:2606.16900v1 Announce Type: new Abstract: Physical systems often exhibit heterogeneous mechanisms, where rapidly evolving dynamics coexist with persistent structures. Capturing such multiscale physical behavior remains challenging for existing neural operators, which typically rely on single dominant inductive bias and therefore couple distinct physical responses into a shared representation. We introduce the Unified Green's Function Framework across domains and propose the Factorized Neural Operators (FaNO), which decompose spectral representations into equivariant dynamic responses and invariant persistent responses, leading to better interpretability and generalization. Mechanistically, we show that the two operator branches spontaneously specialize into distinct physical roles that remain consistent across scales and domains: the equivariant branch captures rapidly varying transient dynamics, whereas the invariant branch extracts coherent persistent structures. This factorized mechanism of FaNO improves prediction accuracy, parameter efficiency and cross-scale generalization across physical systems and domains. In particular, it maintains consistent predictions under long-horizon autoregressive rollout, cross-resolution extrapolation and physical-regime shifts. These findings suggest that scalable physical modeling may benefit from moving beyond single-inductive-bias formulations toward factorized operator representations that better reflect the heterogeneous organization of physical systems, accelerating the reliable deployment of machine learning for scientific computing and discovery.

05.
medRxiv (Medicine) 2026-06-12

Association of circulating endothelial progenitor cell count and functional outcome in patients with acute ischemic stroke due to intracranial large vessel occlusion

Background: Circulating endothelial progenitor cells (cEPCs) contribute to vascular repair following an ischemic stroke. The aim of the study was to evaluate the association between cEPCs and functional outcomes in patients with acute ischemic stroke (AIS) due to large vessel occlusion (LVO) who received endovascular therapy (EVT). Methods: Prospective study of patients with LVO-AIS who received EVT. Blood samples were obtained within 24 +- 12 hours and on day 7+-1 from stroke onset. cEPCs were detected using flow cytometry (CD34+/VEGFR2+/CD133+). The primary endpoint was a favourable functional outcome (modified Rankin Scale 0-2) at three months of follow-up. Secondary endpoints include baseline to 24 hours/day 7 changes in the National Institutes of Health Stroke Scale (NIHSS) score and collateral circulation (CC) status. Bivariate and multivariable logistic regression analyses were performed. Results: Included were 90 patients (73.2+-12.7 years, 41.1% women) in 42 of whom (46.7%) cEPCs were detected at 24 hours. On day 7, cEPCs were detected in 27 (43.6%) of 62 patients for which this information was available. Atrial fibrillation, prior anticoagulant treatment and stroke onset-to-door time

06.
bioRxiv (Bioinfo) 2026-06-13

Testing the reliability of AI-generated protein structures

Authors:

Although AlphaFold2 and its competitors have demonstrated remarkable abilities to predict protein structure, more work is needed to explore the limitations of these methods. Here we investigated the reliability of AlphaFold2 and ColabFold by creating a set of realistic but false protein sequences, using ColabFold to predict their structure, and then asking how often the program produces a high-scoring structure for a sequence that does not represent a protein. We determined that AlphaFold2 has a very small but non-zero false positive rate, estimated here at approximately 1 in 435 if one uses a threshold pLDDT score of 70 to define positive predictions. We also discovered, serendipitously, that some high-scoring sequences in the human genome were not false positives, but instead were previously unknown and un-annotated pseudogenes. These latter findings indicate that some well-established human annotations of protein-coding genes may have incorrectly extended the 5-prime untranslated regions too far. They also suggest that the false positive rate of AlphaFold2 is low enough that almost any high-scoring structure, even in a noncoding region, is worthy of further investigation.

07.
arXiv (CS.CV) 2026-06-16

SurroundNEXO: Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving

Modern autonomous driving depends on accurate metric 3D understanding for perception, reconstruction, and planning, which in turn requires reliable multi-camera depth prediction. However, the outward-facing nature of vehicle-mounted surround-view camera rigs inherently limits visual overlap across views, challenging the correspondence-based assumptions that underpin conventional multi-view geometry. To bridge this gap, we present SurroundNEXO, named after the Spanish word nexo for a geometric link, a low-overlap multi-camera metric depth framework that grounds cross-view reasoning in ego-centric geometry rather than dense visual correspondences. Instead of directly enforcing early global fusion, SurroundNEXO first assigns image tokens globally comparable ego-frame viewing directions through Ego-Ray Positional Encoding, then uses sparse LiDAR measurements as metric anchors to propagate absolute scale cues, and finally expands feature interaction progressively from view-local modeling to decomposed spatio-temporal reasoning and global integration. This design enables metric-scale depth prediction with improved spatial consistency across weakly overlapping cameras. Across low-overlap autonomous driving benchmarks, including NuScenes, Waymo and DDAD, SurroundNEXO reduces single-view error by 33.2%, improves cross-view consistency by 10.5%, and enhances metric reconstruction quality by 25.6% compared with SOTA methods. It further remains robust under extremely sparse depth prompts and exhibits strong zero-shot generalization to unseen camera layouts.

08.
arXiv (math.PR) 2026-06-18

Evolution of Conditional Entropy for Diffusion Dynamics on Graphs

arXiv:2510.19441v2 Announce Type: replace-cross Abstract: The modeling of diffusion processes on graphs is the basis for many network science and machine learning approaches. Entropic measures of network-based diffusion have recently been employed to investigate the reversibility of these processes and the diversity of the modeled systems. While results about their steady state are well-known, very few exact results about their finite-time evolution exist. Here, we introduce the conditional entropy of heat diffusion in graphs, and outline a mathematical framework that contextualizes diffusion and conditional entropy within the theories of continuous-time Markov chains and information theory. In particular, we highlight that this entropic measure satisfies an information-theoretical version of the second law of thermodynamics, thereby providing a parallelism between diffusion dynamics on networks and their physical counterparts. Furthermore, we obtain explicit results for its evolution on complete, path, and circulant graphs, as well as a mean-field approximation for Erdös-Rényi graphs. We also obtain asymptotic results for general networks and provide bounds for the evolution of conditional entropy. Finally, we experimentally demonstrate several properties of conditional entropy for diffusion over random graphs, such as the Watts-Strogatz model.

09.
arXiv (CS.CL) 2026-06-15

DLawBench: Evaluating LLMs Through Multi-Turn Legal Consultation

Lawyer-client consultation is a critical starting point for legal services. Effective legal assistance hinges on eliciting sufficient and truthful information from clients in order to devise strategies that best protect their interests. This task requires Large Language Models (LLMs) not only to perform robust legal reasoning, but also to strategically elicit material facts through multi-turn interactions and effectively guide clients with diverse personalities. Yet existing legal benchmarks overlook this interactive capability. To fill this gap, we introduce DLawBench, a diagnostic benchmark for real-world legal consultation. Drawing on realistic client behavior, we characterize lawyer-client interactions into four types: Cooperative, Dependent, Withdrawn, and Adversarial. Using dialogues grounded in real cases, DLawBench evaluates whether LLMs can effectively conduct legal consultation under realistic conditions. DLawBench comprises 461 cases from Chinese and U.S. law, 5,532 paired fact entries, 3,411 inquiry rubrics, and 3,348 issue-resolution rubrics, and evaluates 26 representative LLMs. Systematic experiments show substantial headroom: the best-performing model, GPT-5.5, achieves only 0.562 on consultation-grounded legal reasoning. More importantly, DLawBench exposes both sycophancy in legal consultation and a paradox: models perform worse when clients need guidance most.

10.
medRxiv (Medicine) 2026-06-15

Mucosal and Systemic Antibodies Associated with Clinical Protection in a Pertussis Controlled Human Infection Model

Background The engagement of mucosal and systemic immunity in preventing Bordetella pertussis colonization and infection in humans, the impact of prior vaccination on host immunity and protective outcomes, and the dynamics of the host response following exposure remain poorly understood. Methods Healthy adults were challenged with increasing colony-forming units (CFUs) doses, 106-108, of B. pertussis D420 intranasally (NCT05136599). Shedding (PCR and culturing) and symptom development were monitored up to 21 days post-challenge. Serum and nasal wash IgA and IgG were measured before challenge (baseline) and up to 6 months post-challenge. Findings Antibodies increased post-challenge only in infected individuals, primarily nasal IgA. Participants who remained uninfected had higher baseline levels of filamentous hemagglutinin (FHA)- specific mucosal IgA and IgG, and higher serum IgA against fimbriae 2/3 (FIM). FHA was negatively associated with bacterial load and was a key discriminator between shedders and non-shedders, up to one week post-challenge. By day 14 post-challenge, pertussis toxin (PT) IgG and FIM IgA in both serum and mucosal samples were negatively associated with bacterial colonization. The majority (96.7%) of acellular pertussis (aP) vaccine recipients (n=23, median age 2.0 years) became infected, compared to 69.4% of those who received whole-cell pertussis vaccine (n=36; median age 32.0 years), and their antibody responses remained distinct following infection. Interpretation Nasal FHA antibodies emerged as early predictors of protection against pertussis infection, while PT IgG and FIM IgA antibodies may reflect clearance after infection. aP-primed individuals were more susceptible to infection, despite their younger age and more recent vaccination. Funding CDC Contract #75D30122C15467 and CDC IPA Agreement #24IPA2417512 Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention, US Department of Health and Human Services.

11.
arXiv (CS.CL) 2026-06-16

Your "Pro" LLM Subscription May Actually Be "Free": Exposing Fingerprint Spoofing Risks in LLM Inference Services

As Large Language Model (LLM) APIs become ubiquitous, users increasingly rely on black-box fingerprinting to verify that providers are serving the advertised premium models. However, these methods may overlook adversarial providers who manipulate model weights to cheat the fingerprint process. We introduce a novel threat termed fingerprint spoofing, where a malicious provider stealthily serves a weaker model that has been parameter-efficiently fine-tuned to mimic a stronger model, thereby evading user-side fingerprinting. We first formally prove that user-side resource constraints (i.e., finite query budgets and weak fingerprinting classifiers) make current fingerprinting vulnerable to fingerprint spoofing. Guided by this theoretical analysis, we propose GhostPrint, a cost-effective attack framework leveraging surrogate modeling, reward-ranked fine-tuning, and knowledge distillation. Extensive evaluations in both static and continual fingerprinting settings demonstrate that GhostPrint allows weak models to consistently bypass representative fingerprint methods while maintaining utility at a low fine-tuning cost, exposing a critical vulnerability in current LLM fingerprinting pipelines.

12.
arXiv (CS.CL) 2026-06-12

Reasoning Models Know What's Important, and Encode It in Their Activations

Language models often solve complex tasks by generating long reasoning chains, consisting of many steps with varying importance. While some steps are crucial for generating the final answer, others are removable. Determining which steps matter most, and why, remains an open question central to understanding how models process reasoning. We investigate if this question is best approached through model internals or through tokens of the reasoning chain itself. We find that model activations contain more information than tokens for identifying important reasoning steps. Crucially, by training probes on model activations to predict importance, we show that models encode an internal representation of step importance, even prior to the generation of subsequent steps. The internal representations of importance in different models yield high agreement on which steps are important. The representation is distributed across layers, and does not correlate with surface-level features, such as a step's relative position or its length. Our findings suggest that analyzing activations can reveal aspects of reasoning that surface-level approaches fundamentally miss, indicating that reasoning analyses should look into model internals.

13.
arXiv (CS.AI) 2026-06-16

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

arXiv:2606.15673v1 Announce Type: new Abstract: Web agents act through long interaction sequences, yet existing benchmarks evaluate only terminal success, discarding all process information and offering little guidance on improvement. In this work, we conduct a process-level analysis of web agents. We introduce WebStep, a benchmark of 1,800 task instances with controlled difficulty and automatic semantic state tracking. Each website exposes a deterministic semantic MDP alongside the GUI: the agent operates on the interface, while the environment records high-level states and transitions in the background, enabling fine-grained analysis without manual annotation. Based on the semantic trajectory, we first show that process metrics reveal differences invisible to outcome evaluation: three agents whose success rates cluster within 31-33% diverge in exploration reach versus execution accuracy. Then, decomposing by skill characterizes the nature of these differences, exposing opposite per-skill rankings hidden within the same website: e.g., on Housing, OpenAI CUA outperforms Qwen3.5 by 23.7% on commit actions yet underperforms it by 15.6% on filtering, pinpointing a concrete skill to improve even within a domain. Bifurcation analysis further localizes the decisive error that loses the task and shows that this error is agent-specific rather than shared. Finally, these differences widen as tasks grow harder: success rate is similar on easy tasks but separates sharply as exploration becomes more demanding. Our process-level analysis opens a new avenue in web agent evaluation, providing fine-grained and actionable insight into where and how each agent should be improved.

14.
arXiv (CS.CL) 2026-06-16

Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation

Authors:

Per-token counterfactual credit estimation asks which token in a language-model rollout caused the final answer to be right or wrong: cut the transcript at a pivot, substitute an alternative token, replay continuations, and compare outcomes. Published methods re-feed the transcript prefix as a fresh prompt, assuming this reproduces the state the model passed through during generation. We measure what that assumption costs on a stock inference engine, with a three-pass design: continuations resumed from the verified decode-time KV state, an identical second exact pass (a replica noise floor), and a re-feed pass. Across six configurations and three models (including a GRPO-trained checkpoint), at low-margin decision tokens, re-feeding changes the credit estimate at rates 14-28 percentage points above the replica floor (7-21pp under a treatment-independent conditioning; problem-clustered t = 2.9-6.4). Most changes are zero-boundary crossings of the quantized estimator rather than polarity reversals, and the perturbation is consistent with mean-zero, so averaged quantities are largely safe; but selection is not: a critical-token set chosen by thresholding $|\hat{A}_t|$ under re-feed overlaps the exact-resume selection at Jaccard 0.34-0.90, versus a 0.63-0.96 replica ceiling. A causal confirmation closes the loop: under vLLM's batch-invariant kernels all three passes are identical on every measured channel, with both disagreement rates exactly zero. Replica passes themselves disagree on 9-23% of eligible estimates: single-sample credit measurements at decision tokens are unreliable under any replay. Settings were fixed in advance; exact-pass cache hits in the second campaign are instrumented (100% hit rate, 3,434 pivots); total compute was under 10 USD. We recommend that counterfactual credit studies resume decoder state or use batch-invariant kernels, and report a replica floor.

15.
arXiv (CS.AI) 2026-06-11

LSTM based IoT Device Identification

arXiv:2304.13905v2 Announce Type: replace-cross Abstract: While the use of the Internet of Things is becoming more and more popular, many security vulnerabilities are emerging with the large number of devices being introduced to the market. In this environment, IoT device identification methods provide a preventive security measure as an important factor in identifying these devices and detecting the vulnerabilities they suffer from. In this study, we present an end-to-end machine learning pipeline that identifies IoT devices in the Aalto university dataset (IoT devices captures) using Long Short-Term Memory (LSTM) networks. Raw network packet captures (PCAP) are processed into 25 engineered features, which are then arranged as sliding-window time-series sequences. We systematically evaluate sequence lengths from 2 to 20, reporting that performance improves approximately linearly up to length 6 and thereafter in a wave-like pattern, reaching its peak at length 18. On the final held-out test set with the optimal configuration, the model achieves an accuracy of 79.85% and a macro-averaged F1-score of 75.70% across 27 device classes.

16.
arXiv (CS.AI) 2026-06-18

Learning-Based Decision Making for Combustion Phasing Control in Multi-Fuel CI Engines with Latent Fuel Reactivity Estimation

arXiv:2606.18393v1 Announce Type: cross Abstract: Multi-fuel compression-ignition engines offer fuel flexibility but introduce uncertain, time-varying fuel reactivity, represented by cetane number (CN), which complicates cycle-to-cycle combustion-phasing control. This work formulates CA50 regulation under latent CN variation as a partially observable sequential decision problem and systematically evaluates controllers with increasing temporal and representational capacity, including LinUCB, history-augmented contextual bandits, observation-only DDPG, recurrent DDPG, and a proposed GRU-guided RL framework. A Gaussian-process surrogate trained on experimental multi-fuel engine data provides a controlled and reproducible evaluation environment. Results show that myopic and fixed-history bandit methods degrade under CN variation, observation-only RL suffers from latent-state aliasing, and generic recurrence is insufficient when CN evolves rapidly. The proposed framework learns a compact GRU-based representation of fuel reactivity from combustion history and conditions both actor and critic on this estimated signal rather than oracle CN. By training the policy on the same imperfect fuel-reactivity information available at deployment, the controller avoids train-deploy inconsistency in conventional online estimate-then-control pipelines. Across unseen CN trajectories, the policy achieves stable CA50 regulation with mean absolute tracking error below 0.25{\deg} CA at the training setpoint, while producing smooth, physically consistent SOI and glow-plug-power actuation. These results show that combustion control under latent, continuously evolving fuel dynamics requires more than standalone estimation or generic recurrence. By aligning fuel-reactivity inference with control policy learning, the proposed framework enables reactivity-aware decision-making using the same estimated state available during deployment.

17.
arXiv (math.PR) 2026-06-18

Extrema of microscopically slowed-down Gaussian fields

Authors:

arXiv:2606.19207v1 Announce Type: new Abstract: We introduce a family of Gaussian fields whose covariance structure exhibits an inhomogeneous, microscopic slowdown and it interpolates between a $\log$ profile (for a certain interpolation parameter $\alpha=0$) and a $\log\log$ profile (when the interpolation parameter is $\alpha=1/2$). We consider both one dimensional such objects (which we call {\it Branching Brownian Motions in a cooling environment}) as well as higher dimensional, spatial fields. We identify the correct centering of the maximum at time $T$ and prove tightness of the recentered maximum. While the exponent in the first-order growth varies linearly with $\alpha$, giving a leading order of $T^{1-\alpha}$, the second-order correction exhibits a phase transition at $\alpha=1/3$.

18.
arXiv (CS.CV) 2026-06-16

XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems

Autonomous medical and robotic systems increasingly rely on intelligent perception and reasoning capabilities to interpret visual data and support clinical decision making. Radiology report generation represents a critical component of such automated diagnostic workflows, yet existing end-to-end multimodal models often suffer from weak visual grounding, resulting in unreliable interpretations and omission of subtle clinical findings. This paper presents XMedFusion, a modular AI framework designed as an intelligent perception and reasoning module for autonomous medical systems. The proposed framework decomposes visual information into coordinated functional components that emulate expert-driven analysis, including a visual perception agent that extracts image-grounded evidence, a knowledge graph construction agent that structures clinically relevant findings, and a retrieval-guided drafting process that ensures a consistent reporting structure. A synthesis agent iteratively integrates visual and structured evidence through reasoning-driven verification to produce reliable and interpretable diagnostic outputs. Experimental evaluation on a public chest radiograph dataset demonstrates significant improvements over baseline vision-language models, achieving gains from 0.0493 to 0.3359 in BLEU-1, 0.0863 to 0.2440 in ROUGE-L, and 0.0829 to 0.1708 in METEOR, along with substantial improvements in semantic evaluation metrics such as Consistency (2.38 to 7.80) and Accuracy (2.34 to 6.93). The results highlight the effectiveness of structured multi-agent perception and reasoning for enhancing robustness, transparency, and automation in intelligent medical imaging systems, enabling integration into autonomous healthcare and robotic diagnostic workflows.

19.
arXiv (CS.CL) 2026-06-17

A Two-Phase Stability Study of LLM Judges and Bar Council Examiners on Thai Bar-Exam Free-Form Essays

Free-form legal essay evaluation in NLP treats expert inter-rater stability as a single ceiling number, and treats LLM-judge agreement with that ceiling as evidence of judge stability. We test both assumptions on the Thai bar examination through an identical-inputs protocol: three Bar Council-trained examiners (A, B, C) and a 26-LLM judge panel score the same 15 cross-graded answers from the same four inputs (question, official Bar Council grading regulation, gold answer, candidate answer). The headline finding is asymmetric. On 10 of 15 cells where the rubric prescribes both axes, all 29 raters converge in a tight band: panel agreement is universal. On the remaining 5 cells where the rubric does not prescribe how to grade a correct final answer that omits a decisive statutory citation, the human panel splits between two coherent readings (B/C majority at the upper rubric band, score 6-8; A minority at the lower band, score 1-2). The LLM judge population does not split symmetrically: 22 of 26 LLMs score in or near B/C's contested band, 3 sit in the regulation-silent middle gap, and only 1 (GPT-5.4 Nano) approaches A's band without consistently scoring within it. Zero LLMs in our 26-judge panel reproduce the minority human reading on the contested cells. The B/C-direction cluster spans every model size, vendor, and price tier we tested. An instrumented three-LLM anchor sub-panel (Claude 4.6 Opus, Gemini 3.1 Pro, GPT-5.4 Pro) carries determinism probes, input ablations, and bootstrap CIs, and reaches anchor panel $\alpha = 0.77$ on the 15 cells against human-panel $\alpha = 0.36$. The high LLM-panel $\alpha$ reflects systematic convergence on the majority reading rather than balanced reproduction of both readings; a benchmark that selects its LLM judge by maximising agreement with a human reference panel will inherit this asymmetry by construction.

20.
medRxiv (Medicine) 2026-06-18

Empirical Validation and Predictive Utility of the Perinatal Grief Scale in Men after Perinatal Loss

Background. The Perinatal Grief Scale (PGS) is a widely used instrument for assessing grief following pregnancy loss, yet no study has validated it specifically in men despite documented use in several studies. This gap is critical given fathers' persistent underrepresentation in perinatal bereavement research and the absence of empirically supported screening thresholds for this population. Methods. This cross-sectional validation study used data from the OPALE project (Observatory on PerinatAL hEalth) conducted by the CiaoLapo Foundation in Italy. Among 276 fathers who experienced stillbirth or miscarriage, we examined criterion validity by testing the association between PGS scores and trauma-related symptomatology assessed via three validated instruments: the Revised Impact of Event Scale (RIES, n=103), National Stressful Events Survey Short Scale (NSESSS, n=95), and SCL-90 (n=173). We systematically tested multiple threshold combinations to identify optimal discriminative performance. Results. The PGS demonstrated excellent criterion validity. The optimal threshold (PGS >=92) showed sensitivity 81.0%, specificity 81.8%, and Youden's J index 0.628. Fathers scoring >=92 had 19.12 times the odds of high trauma symptoms (95% CI: 9.35 to 39.14, p

21.
arXiv (CS.CV) 2026-06-18

DreamReg: Belief-Driven World Model for 2D-3D Ultrasound Registration

Ultrasound (US) is widely used for surgical navigation, yet real-time registration between intraoperative 2D slices and preoperative 3D volumes remains challenging due to partial observability, speckle noise, and the action-dependent US acquisition. Existing methods are one-shot or short-horizon, making it hard for them to gather evidence over time or capture how surgeons adjust probe motion based on on-screen feedback. We propose DreamReg, a belief-driven world-model framework that formulates 2D-3D registration as belief updating over rigid transformations. DreamReg maintains a latent belief state that summarizes past observations and poses information, and continuously refines the transformation through learned dynamics as new slices arrive. During training, DreamReg is exposed to probe-motion trajectories that mimic clinical scanning behavior and learns to update its belief by conditioning pose refinement on the current US observation. During inference, DreamReg refines registration via internal imagination: it rolls out the learned world model to simulate candidate probe motions and their predicted observations, and integrates these imagined outcomes to converge to an accurate rigid transformation. Experiments on CAMUS and u-RegPro datasets demonstrate improved robustness and competitive registration accuracy for real-time guidance compared with state-of-the-art methods.

22.
arXiv (CS.LG) 2026-06-18

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

arXiv:2606.18267v1 Announce Type: cross Abstract: Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős–Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

23.
arXiv (CS.LG) 2026-06-16

Drivers, Receivers, and Dynamic Linkages: The Directed Structure of SDG Interdependence, 2000–2024

arXiv:2601.20875v2 Announce Type: replace-cross Abstract: Governments with limited fiscal and administrative capacity need to know which Sustainable Development Goals (SDGs) propagate progress through the goal system and how quickly. We map the directed interdependence structure of all seventeen goals using a balanced panel of 114 countries observed annually from 2000 to 2024. The goal series are persistent, trending, and cross-sectionally dependent, so we apply two estimators matched to this regime: a Dumitrescu-Hurlin panel Granger non-causality test, run on first-differenced series, to recover the directed interaction network, and panel local projections with Driscoll-Kraay standard errors to measure the dynamic magnitude of 31 theory-derived indicator linkages. Of 272 directed goal pairs, 84 linkages survive false-discovery control (40 synergies, 44 trade-offs; network density 0.31). Synergies and trade-offs occur at comparable strength, so no single goal behaves as a universal accelerator, and the goal-level hierarchy itself is fragile. Driver-receiver rankings correlate weakly across lag orders and centrality metrics, and under a country bootstrap only two roles are distinguishable from zero: peace and strong institutions as the clearest net receiver, and poverty reduction as the most probable effect-size-weighted driver. The supported linkages are dynamic, accruing over four to five years: sanitation and poverty improvements are the strongest predictors of lower child mortality, and the education-child-health association is corroborated in independent World Development Indicators data across 183 countries. These results caution against rankings-based accelerator policy and support adaptive portfolios built on supported, time-lagged linkages monitored through constituent indicators.

24.
arXiv (CS.LG) 2026-06-16

Near-Optimal Stochastic Linear Bandits with Delay

arXiv:2606.16656v1 Announce Type: new Abstract: We study stochastic linear bandits with delayed feedback under several delay models and establish near-optimal regret guarantees. Our results identify when delayed linear bandits exhibit the same qualitative behavior as multi-armed bandits (MAB), and when the linear structure creates fundamentally new challenges. Specifically, (1) for loss-independent delays, where the delay does not depend on the realized loss (but potentially depends on the arm), we show that delays incur only an additive regret penalty. Under stochastic delays, this penalty scales with the expected delay, while under adversarial delays, it scales with the maximum number of outstanding observations. Notably, both delay penalties are dimension-free, improving upon the state-of-the-art results; (2) for loss-dependent delays, we show that linear bandits are substantially harder than MAB: unlike in MAB, we prove matching (up to log factors) upper and lower bounds in linear bandits, whose delay penalty depends on the square root of the dimension. (3) for the delay-as-payoff model, a special case of loss-dependent delay, we show that the optimal MAB guarantee, which depends only on the delay of the optimal arm, is also unattainable in linear bandits. Together, these results provide a sharp characterization of how delayed feedback interacts with linear generalization.

25.
arXiv (CS.CV) 2026-06-12

PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition

We present PROBE (PRobabilistic Occupancy BEV Encoding), a learning-free LiDAR place recognition descriptor that models each BEV cell's occupancy as a Bernoulli random variable. Rather than relying on discrete point-cloud perturbations, PROBE analytically marginalizes over continuous Cartesian translations via the polar Jacobian, yielding a distance-adaptive angular uncertainty $\sigma_\theta = \sigma_t / r$ in $\mathcal{O}(R{\cdot}S)$ time. The primary parameter $\sigma_t$ represents the expected translational uncertainty in meters, a sensor-independent physical quantity that enhances cross-sensor generalization while reducing the need for extensive per-dataset tuning. Pairwise similarity combines a Bernoulli-KL Jaccard with exponential uncertainty gating and FFT-based height cosine similarity for rotation alignment. Evaluated on four datasets spanning four diverse LiDAR types, PROBE achieves the highest accuracy among handcrafted descriptors in multi-session evaluation and competitive single-session performance relative to both handcrafted and supervised baselines. The source code and supplementary materials are available at https://sites.google.com/view/probe-pr.