Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

WildIFEval: Instruction Following in the Wild

Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, extracted from natural user instructions. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. WildIFEval clearly differentiates between small and large models, and demonstrates that all models have a large room for improvement on such tasks. We analyze the effects of the number and type of constraints on performance, revealing interesting patterns of model constraint-following behavior. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

02.
arXiv (CS.CL) 2026-06-24

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack of differentiation across review rounds. Notably, the dynamic shifts in reviewers' focus and sentiment tendencies throughout multiple review stages remain underexplored. To address this gap, the present study investigates the distribution and evolution of aspect-level sentiments and examines their correlation with the number of review rounds. We begin by segmenting the multi-round review comments of 11,063 accepted papers from Nature Communications and identifying fine-grained review aspect clusters. A manually annotated corpus of approximately 5,000 review sentences is then constructed. Using this dataset, we train a series of deep learning-based aspect sentiment classification models. Among them, the LCF-BERT-CDM model achieves the best performance, with a Macro-F1 score of 82.65%. Subsequent statistical analysis reveals a consistent trend: as the number of review rounds increases, the proportion of positive sentiments rises, while negative sentiments decline. Correlation analysis further indicates that aspect sentiment scores are negatively associated with the total number of review rounds. Key aspects exhibiting stronger correlations include "experiments", "research significance" and "result analysis".

03.
arXiv (CS.LG) 2026-06-16

DAL: A Practical Prior-Free Black-Box Framework for Piecewise Stationary Bandits

arXiv:2501.19401v5 Announce Type: replace Abstract: We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of piecewise stationary bandits without knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm with order-optimal regret as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses all state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough empirical validation.

04.
arXiv (CS.AI) 2026-06-16

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

arXiv:2605.27023v2 Announce Type: replace Abstract: Knowledge graphs (KGs) have become the core backbone of numerous downstream tasks such as question answering and recommender systems. However, despite all this, KGs are often very incomplete. To perform zero-shot knowledge graph completion in unseen KGs, which have different relational vocabularies from those used for pre-training, KG foundation models (KGFMs) receive a wide range of attention. Existing KGFMs often perform training using random negative triples, which are constructed by replacing the head or tail entity of a positive triple with a random entity. However, these negative triples are often constructed with limited quality, providing weak supervision for KGFM training. In this paper, we propose a simple yet effective adaptive negative sampling approach, KMAS, to enhance existing KGFMs. KMAS constructs hard negative triples through the updated relation embeddings generated from the existing KGFM's relation encoder. To further adaptively align with the evolving capability of the KGFM during the training process, KMAS adjusts the ratio of hard negative triples dynamically throughout the whole training process: after a warmup phrase, it increases the ratio linearly and then decreases linearly. Extensive experiments are conducted over 44 data sets. Experimental results demonstrate that our proposed negative sampling method can enhance many SOTA KGFMs without requiring excessive additional time or memory consumption.

05.
arXiv (CS.CL) 2026-06-24

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

作者:

We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept secret), and a reliability dimension where every turn must follow a strict JSON schema and an illegal action is silently discarded. The engine is private and each match uses a fresh random map seed and opponent, mitigating the data contamination that affects public benchmarks. Models receive a (near) rule-only prompt with no build-order advice (two tactical seed phrases were present during data collection; see Section 2.7). We benchmark 15 reasoning models across 54 matches and 5,258 actions. Findings: (1) the nuclear rush dominates (78% on the rules-coherent v0.11+ sub-corpus; 85% corpus-wide) with a sole-launcher signature that is largely mechanical under secret-simultaneous launch rules, not a cognitive deterrence failure; (2) military conquest is rare but faster (12.3 vs 18.9 turns); (3) diplomacy is prolific yet almost never consummated; (4) ~58% of illegal actions are fog/state errors, making the illegal-action rate a measure of belief-tracking; (5) – the least established, and the only one we label exploratory – a weak link associates reliability with winning. The corpus is small, unbalanced and not side-swapped, so the ranking is a preliminary descriptive view, not a contribution. Beyond ranking, the turn-by-turn traces of actions and messages make the corpus a lens on how LLMs reason under adversarial uncertainty – their belief-tracking, spontaneous deception, and per-model cognitive "personas" – which we frame as a future research direction. We release the replay format, an isometric viewer and all replays; engine source on request.

06.
arXiv (CS.CL) 2026-06-12

ProPlay: Procedural World Models for Self-Evolving LLM Agents

Self-evolving agents are expected to improve through interaction without external supervision, but this remains difficult in partially observable environments where agents must explore actively, learn from limited feedback, and decide when to trust prior experience. Existing LLM-agent methods often rely on memory or planning modules, yet they rarely close the loop between them to continually refine an internal understanding of environment dynamics. We introduce ProPlay, a procedural world model that supports procedure-level preplay, where agents can rehearse future procedural paths using the learned world knowledge. Rather than representing experience as isolated rules or low-level action constraints, ProPlay abstracts successful trajectories into procedures and organizes them in a procedure graph that captures causal transitions among task stages. Each transition is associated with a reliability record embedding to estimate its task-specific contribution from past outcomes. Before each episode, ProPlay simulates future procedural trajectories over known graph structures as structured soft guidance; after execution, it refines the graph using environment feedback. Experiments on public benchmarks show that ProPlay consistently improves environment understanding and self-evolution capability over strong baselines. Our code has been released in https://github.com/antman9914/proplay.

07.
arXiv (CS.AI) 2026-06-12

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

08.
arXiv (CS.CV) 2026-06-12

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate on token streams, features, or flat next-token logits, leaving two native structures of bitwise-residual VAR models underused: the per-bit Bernoulli prediction head and the additive multi-scale residual code field from which the image is assembled. We propose BitResEdit, a training-free editor for bitwise-residual VAR generators such as Infinity. BitEdit performs source-negative guidance by tilting the post-CFG per-bit log-odds along a source–target contrast computed on a shared edited prefix, then projects each update into a closed-form Bernoulli-KL trust region around the clean CFG sampler. ResEdit converts the sampled bits into per-scale continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's native sum-of-scales. Together they couple decision-time bit guidance with combination-time code composition, so masked-out latent features are preserved exactly by code arithmetic while localized, scale-aware edits are applied inside the target region. On PIE-Bench with Infinity-2B, BitResEdit attains the strongest text alignment among same-backbone VAR editors, improving CLIP on the edited region by +1.07 over the strongest prior editor while keeping background preservation competitive with it. Ablations show BitEdit and ResEdit play complementary roles in target alignment and background preservation.

09.
arXiv (CS.LG) 2026-06-24

Hybrid Sequence Modeling and Reinforced Verification for Controllable Target-Conditioned Decision Making

arXiv:2508.16420v3 Announce Type: replace Abstract: Target-conditioned sequence models provide a simple interface for controllable offline decision making, but the requested target return can be an unreliable control signal, especially when the target return lies in underrepresented regions of the dataset. This paper proposes Doctor, a hybrid sequence modeling and reinforced verification framework for controllable target-conditioned offline decision making. Doctor trains a shared masked trajectory Transformer with two complementary objectives: masked trajectory reconstruction for candidate generation and in-sample value learning for action-value verification. At inference time, the model samples multiple nearby target returns, generates candidate actions in parallel, and selects the action whose verified value is closest to the requested target return. We analyze this verifier-guided selection rule and show that its value-level alignment error is bounded by candidate-value coverage around the target return and verifier accuracy. Experiments on D4RL and EpiCare show that Doctor improves target-return alignment under reduced high-return coverage, remains competitive on standard offline return-maximization benchmarks, and enables a single policy to modulate between conservative and aggressive operating points in a simulated clinical decision-making task. These results suggest that reinforced verification can improve the controllability of target-conditioned policies.

10.
arXiv (CS.AI) 2026-06-12

Rarity-Gated Context Conditioning for Offline Imitation Learning-Based Maritime Anomaly Detection

arXiv:2606.13311v1 Announce Type: cross Abstract: Contextual anomaly detection aims to identify abnormal behavior conditional on context variables, but practical deployments often face highly imbalanced context distributions where rare regimes can be critical information. Under such frequency bias, context-conditioned models can produce unstable decisions and excessive false alarms in rare contexts. We propose Rarity-Gated Feature-wise Linear Modulation (RGFiLM), a rarity-aware conditioning module that combines feature-wise modulation (i.e., context-conditioned scaling and shifting of hidden features) with a gate controlled by a data-driven rarity score. The rarity score is estimated from the empirical distribution of context variables and regulates how strongly context modulates intermediate representations: the gate becomes more decisive under rare contexts while remaining conservative under frequent contexts. We evaluate RGFiLM on maritime trajectory anomaly detection using AIS motion sequences with ERA5 environmental context in an environment-sensitive detour scenario. When instantiated in a sequential anomaly scoring pipeline, RGFiLM achieves the best mean F1–False Positive Rate (FPR) trade-off among the compared context-agnostic and context-conditioned methods. These results suggest that explicitly accounting for context rarity is an effective approach for reducing false alarms in context-sensitive anomaly detection.

11.
arXiv (CS.LG) 2026-06-17

Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

arXiv:2606.17413v1 Announce Type: new Abstract: Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

12.
arXiv (CS.AI) 2026-06-17

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

arXiv:2501.00826v3 Announce Type: replace-cross Abstract: Cryptocurrency portfolio management requires the fusion of heterogeneous multi-modal signals, including structured price and on-chain time series, unstructured news text, and technical indicators, under high-volatility and real-time constraints. While deep learning approaches show predictive capability, their opacity limits practical adoption, and single large language model (LLM) agents struggle to process the breadth of modality-specific inputs needed for robust decision-making. We propose a multi-agent system (MAS) framework in which three modality-specialised agents, a Crypto Agent for market dynamics, a News Agent for weekly news sentiment, and a Trading Agent for signal fusion and portfolio execution, decompose the task across three communication architectures: hierarchical, collaborative, and debate. We evaluate four capability configurations: zero-shot, chain-of-thought (CoT), retrieval-augmented generation (RAG), and skill-augmented. In a 52-week backtest over calendar year 2025 across the top 15 L1 blockchain native cryptocurrencies by market capitalisation as of January 2025, the best configuration, Hierarchical (Skill), achieves a cumulative return of 133.52% and a Sharpe ratio of 1.502, outperforming single-agent variants, passive benchmarks, and deep learning baselines. An ablation study identifies the Crypto Agent as the most critical component, with its removal reducing cumulative return by 42.57 percentage points. A cross-model comparison further shows that MAS outperforms the single-agent baseline under GPT-4o, GPT-5, and Claude Sonnet 4.5, suggesting that the benefit of multi-agent coordination is model-agnostic. Unlike black-box deep learning models, every portfolio decision is traceable to explicit agent reasoning, offering an interpretable and effective approach to multi-modal cryptocurrency portfolio management.

13.
arXiv (CS.AI) 2026-06-15

ANSR-DT: A Neuro-Symbolic Framework for Adaptive and Explainable Digital Twins

arXiv:2501.08561v4 Announce Type: replace Abstract: Digital twins are increasingly used to monitor and optimize industrial systems, yet many existing frameworks remain difficult to interpret, slow to adapt, and limited in their ability to incorporate explicit domain knowledge. This paper presents ANSR-DT, an adaptive neuro-symbolic framework that unifies temporal anomaly detection, symbolic reasoning, and reinforcement-learning-based decision support within a single digital twin pipeline. ANSR-DT combines a CNN-LSTM model for multivariate pattern recognition with Prolog-based reasoning that converts learned signals into explicit rules, enabling transparent diagnoses and traceable decision paths. A PPO-based adaptation layer further refines operational responses under changing conditions while preserving interpretability. Experiments against 8 baselines show that ANSR-DT delivers competitive predictive performance together with stable rule extraction, scalable symbolic reasoning, and actionable explanations. Additional validation on the Skoltech Anomaly Benchmark (SKAB) further indicates that the framework transfers beyond synthetic settings. These findings position ANSR-DT as a practical foundation for trustworthy, adaptive, and explainable industrial digital twins.

14.
arXiv (CS.CL) 2026-06-16

Transfer Learning for FHIR Questionnaire Terminology Binding

Electronic prior authorization workflows require FHIR Questionnaire items to carry LOINC codes, yet most items in the HL7 Da Vinci CDS-Library lack these bindings. We treat this as a retrieval problem: given a Questionnaire item's text, find the correct LOINC code in a pool of 97,314 active codes. We compare six methods (TF-IDF, frozen MiniLM, BioBERT, BioLORD, contrastively fine-tuned MiniLM, and a TF-IDF+GPT reranker) on a 54-item evaluation set spanning three query styles (natural question, medium, and terse). No single method wins on every metric. BioLORD, a frozen encoder pre-trained on biomedical ontology definitions, has the best top-rank accuracy (R@1 = 0.185, MRR = 0.246) despite seeing no task-specific data, while a contrastive fine-tune on raw LHC-Forms pairs takes R@5 (0.389) and R@10 (0.426). A distribution-shift ablation shows why the fine-tune in our main table is not the strongest one: adding GPT-generated paraphrases to the raw pairs drops R@5 from 0.389 to 0.296, so the augmented union underperforms raw-only training on every metric except R@1. Performance peaks at 5k training pairs. Error analysis on BioLORD's R@1 failures shows that wrong-specificity and ambiguous-text cases together account for 59% of errors.

15.
arXiv (CS.CL) 2026-06-11

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the quadratic cost of attention makes inference both memory-intensive and slow. Context distillation mitigates this by compressing contextual information into model parameters, and recent work such as Doc-to-LoRA amortizes context distillation into a single forward pass that generates one LoRA adapter per document. However, producing a single monolithic adapter for all queries leads to irrelevant-query interference, limited compositional recall, and poor scalability to long-document reasoning. To address these challenges, we propose Doc-to-Atom (Doc2Atom), a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. Each atom is compiled into an independent micro-LoRA adapter and a provenance retrieval key. At inference time, a lightweight query router selects and assembles only the relevant atoms into a query-specific adapter, which is then injected into a frozen base model. The entire system is trained end-to-end through a multi-objective distillation framework. Experiments on six diverse QA benchmarks demonstrate that Doc2Atom outperforms Doc-to-LoRA baselines while reducing the memory cost of document internalization.

16.
arXiv (CS.CV) 2026-06-16

HadBalance: A Plug-and-Play Unified Global Geometric Prior Framework for Generalizable Biomedical Segmentation

Precise biomedical image segmentation is crucial for clinical diagnosis. Geometric cues (e.g., boundary, shape, and topology) can improve structural consistency, yet most are task-specific and lack a unified geometric foundation that generalizes across organs and modalities. We are motivated by the observation that several medical segmentation targets can be approximated as globally near-convex shapes. A convex region is one in which any two interior points can be connected by a line segment entirely contained within the region. In practice, medical targets may exhibit small local concavities or boundary irregularities; we refer to such globally convex-like shapes as near-convex. Motivated by this, we derive Hadwiger Shape Priors from Hadwiger's theorem as an interpretable global regularizer using three 2D measures: area A, perimeter P, and Euler characteristic chi, enabling transfer across organs and modalities. However, because medical datasets are shape-heterogeneous, enforcing near-convex priors uniformly can over-regularize non-convex anatomy with significant concavities, washing out concavities and fine details and degrading segmentation accuracy. To address this challenge, we propose Conflict-Aware Objective Balancing (CAOB), which integrates shape priors with segmentation in a gradient-aware manner. For each prior, CAOB removes only the gradient component that conflicts with segmentation while preserving the remaining aligned component, and adaptively regulates objective influences to prevent prior dominance. This enables stable use of shape priors on shape-heterogeneous data without erasing genuine concavities or fine structural details. We call this plug-and-play framework HadBalance.

17.
arXiv (quant-ph) 2026-06-15

Improved delta-kick cooling with multiple nonideal kicks

arXiv:2505.08413v2 Announce Type: replace Abstract: Delta-kick cooling is a technique employed to achieve low kinetic temperatures by decreasing momentum width at the cost of increased position width. In an ideal implementation, this method uses a harmonic potential to deliver a single near-instantaneous momentum kick. In practice, potentials that are approximately harmonic near their center are commonly used. As a result, the breakdown of the harmonic approximation far from the center limits the cooling performance. Inspired by aberration cancellation in optics, we propose to use compound matter-wave lens systems for $\delta-$kick cooling with Gaussian potentials. By strategically combining attractive and repulsive kicks, we show that it is possible to mimic the effect of a harmonic potential. For a test case with reasonable experimental parameters, our method suggests a reduction in kinetic temperature by a factor of $2.5$ using a 2-pulse sequence and by a factor of $3.2$ using a 3-pulse sequence.

18.
arXiv (CS.CV) 2026-06-24

LoT-Pass: Long-term-robust Image Watermarking for Image to Video Generation

The rapid progress of image-guided video generation (I2V) has raised concerns about its potential misuse in misinformation and fraud, underscoring the urgent need for effective digital watermarking. While existing watermarking methods demonstrate robustness within a single modality, they fail to trace source images in I2V settings. To address this gap, we introduce the concept of Robust Diffusion Distance, which measures the temporal persistence of watermark signals in generated videos. Building on this, we propose I2VWM, a cross-modal watermarking framework designed to enhance watermark robustness across time. I2VWM leverages a video-simulation noise layer during training and employs an optical-flow-based alignment module during inference. Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility, establishing a new paradigm for cross-modal watermarking in the era of generative video. \href{https://github.com/MrCrims/I2VWM-Robust-Watermarking-for-Image-to-Video-Generation}{Code Released.}

19.
arXiv (CS.CV) 2026-06-18

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49–53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

20.
medRxiv (Medicine) 2026-06-17

Frequency-dependent cognitive effects of Deep Brain Stimulation in Parkinson's Disease: A Systematic Review and Meta-Analysis

Background: Subthalamic nucleus deep brain stimulation (STN-DBS) improves levodopa-induced motor complications and cardinal motor symptoms of Parkinson's disease (PD), but stimulation frequency may differentially shape outcomes. This is evident for axial and gait symptoms, which may respond differently to lower-frequency stimulation. Whether frequency-dependent effects extend to cognition remains unclear. Objective: To investigate the cognitive effects of DBS at distinct frequencies in PD. Methods: We conducted a systematic review and meta-analysis (PROSPERO - CRD42024618253). PubMed, Web of Science, and EMBASE were searched for studies assessing cognitive outcomes under different stimulation frequencies. Eight cognitive domains were defined: verbal fluency, cognitive flexibility, executive control, working memory, attention, processing speed, episodic memory, and time processing. Multilevel random-effects meta-analyses were performed, with effect sizes expressed as Hedges' g. Results: Forty-three studies met the inclusion criteria, the majority (n = 31) involving STN-DBS. Twenty-one STN-DBS studies, including 355 patients, were included in the meta-analysis. Compared with HFS ([≥] 130 Hz), lower frequencies (4-80 Hz) were associated with better verbal fluency (g = 0.27) and cognitive flexibility (g = 0.38), with consistent effects across sensitivity and leave-one-out analyses. Accuracy-based executive control measures also favored lower-frequency stimulation. OFF-stimulation comparisons showed a concordant pattern. Evidence for other targets (PPN and NBM) was limited. Conclusions: Lower-frequency STN-DBS was associated with modest benefits in specific cognitive domains compared with HFS. These findings highlight the need for future research to determine how frequency interacts with stimulation location and symptom-specific networks to shape cognitive and cognitive-motor outcomes in PD.

21.
arXiv (CS.CV) 2026-06-16

YTClickbait21K: Human-Annotated Multimodal Dataset for YouTube Clickbait Detection Across Diverse Channels and Content Categories

Clickbait content on video-sharing platforms poses a significant challenge to information reliability, yet progress in automated detection has been constrained by the lack of large-scale, high-quality multimodal datasets. We present YTClickbait21K, a human-annotated YouTube clickbait dataset comprising 21,238 videos collected from 40 channels across 29 countries, covering diverse content categories such as news, entertainment, education, and gaming. Each sample includes structured metadata (title, description, engagement statistics) along with associated thumbnail images, enabling comprehensive multimodal analysis. To ensure annotation quality, every video was independently labeled by three annotators using a standardized decision framework that incorporates textual, visual, and cross-modal consistency cues, with final labels determined through majority voting. The dataset exhibits substantial inter-annotator agreement (k=0.65), confirming reliable labeling despite the inherent subjectivity of clickbait detection. By combining scale, annotation rigor, and multimodal richness, this dataset provides a robust benchmark for developing and evaluating machine learning models, facilitating research in cross-modal semantic understanding, and advancing automated content moderation systems.

22.
arXiv (CS.CV) 2026-06-16

Near–Real-Time Conflict-Related Fire Detection in Sudan Using Unsupervised Deep Learning

Ongoing armed conflict in Sudan highlights the need for rapid monitoring of conflict-related fire-affected areas. Recent advances in deep learning and high-frequency satellite imagery enable near–real-time assessment of active fires and burn scars in war zones. This study presents a near–real-time monitoring approach using a lightweight Variational Auto-Encoder (VAE)–based model integrated with 4-band Planet Labs imagery at 3 m spatial resolution. We demonstrate that these impacted regions can be detected within approximately 24 to 30 hours under favorable observational conditions using accessible, commercially available satellite data. To achieve this, we adapt a VAE–based model, originally designed for 10-band imagery, to operate effectively on high-resolution 4-band inputs. The model is trained in an unsupervised manner to learn compact latent representations of nominal land-surface conditions and identify burn signatures by quantifying changes between temporally paired latent embeddings. Performance is evaluated across five case studies in Sudan and compared against cosine distance, CVA, and IR-MAD using precision, recall, F1-score, and the area under the precision-recall curve (AUPRC) computed between temporally paired image tiles. Results show that the proposed approach consistently outperforms the other methods, achieving higher recall and F1-scores while maintaining viable precision in highly imbalanced fire-detection scenarios. Experiments with 8-band imagery and temporal image sequences yield only marginal performance gains over single 4-band inputs, underscoring the effectiveness of the proposed lightweight approach for scalable, near–real-time conflict monitoring.

23.
arXiv (quant-ph) 2026-06-15

Fulde-Ferrell superfluids in an asymmetric three-component Fermi Gas

arXiv:2602.24006v2 Announce Type: replace-cross Abstract: An asymmetric three-component Fermi gas, featuring Raman-induced spin-orbit coupling between the first and second components and contact interaction only between the first and third components, introduces both spin-orbit coupling and population imbalance-two mechanisms known to stabilize the Fulde-Ferrell superfluids.We systematically study Fulde-Ferrell superfluids in an asymmetric three-component Fermi gas { in two dimensions and at zero temperature} by finding the global minima of the thermodynamic potential. We reveal a new class of composite Fulde-Ferrell superfluids that emerges when strong spin-orbit coupling generates a double-well structure in momentum space within the lower spin-orbit-coupled band. The key features of these composite superfluids are identified.

24.
arXiv (CS.CV) 2026-06-24

BenchX: Benchmarking AI Models for Cancer Detection and Localization with Demographic and Protocol Biases

Artificial intelligence (AI) has achieved remarkable success in medical imaging, but it is widely recognized that these models often perform inconsistently across real-world clinical settings. Such inconsistencies occur when patient demographics and imaging protocols vary, for example, in detecting small tumors, analyzing scans from different contrast phases, or evaluating patients of different ages or sexes. To quantify these inconsistencies, we develop a large-scale, open benchmark of 85,355 CT scans that systematically evaluates 12 tumor-detection AI models across tumor size, location, patient subgroup, and imaging protocol. We leverage large language models (LLMs) to extract and organize subgroup information from clinical data, which makes the analysis both scalable and reproducible. Our benchmark reveals that current state-of-the-art AI models, optimized for average accuracy, perform poorly in rare or underrepresented subgroups, such as young, female African Americans. However, collecting sufficient annotated data for these rare cases is often impractical. The benchmark provides a foundation for building more reliable and robust AI models for tumor detection and highlighting the need for rigorous, subgroup-level evaluation in medical imaging and computer vision. Datasets, code

25.
arXiv (CS.AI) 2026-06-16

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

arXiv:2606.14813v1 Announce Type: cross Abstract: Jet tagging at the Large Hadron Collider increasingly relies on deep learning models trained on massive simulated datasets, leading to high computational costs and limited robustness to detector mismodeling. We introduce JetParticle-JEPA (JP-JEPA), a self-supervised Joint-Embedding Predictive Architecture that learns physically meaningful jet representations directly from continuous particle clouds without tokenization or reconstruction of raw inputs. Built on a Particle Transformer backbone, JP-JEPA predicts latent representations of masked particles while preserving fine-grained kinematic correlations. On the JetClass benchmark, JP-JEPA achieves performance comparable to fully supervised state-of-the-art methods on the full dataset, surpasses supervised baselines in low-label regimes, and significantly outperforms existing SSL approaches. On Top Quark and Quark-Gluon Tagging benchmarks, it remains on par with supervised methods. The learned representations also exhibit strong robustness to missing detector information and improved uncertainty behavior, highlighting JP-JEPA as a promising foundation-model framework for robust and data-efficient jet physics at the LHC.