Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

IoT-Zoo: A Container-Based Framework for Heterogeneous IoT Device Profiles and Reproducible Traffic Capture

arXiv:2606.15653v1 Announce Type: cross Abstract: The validation of networking and security solutions for the Internet of Things (IoT) requires realistic and reproducible experimental data. However, existing platforms often achieve scalability by replicating a limited set of device types, which restricts profile diversity and fails to capture the heterogeneity of real-world IoT environments. In this paper, we present IoT-Zoo, a container-based testbed designed to support reproducible experimentation through heterogeneous, dataset-driven IoT device profiles. Built upon Containernet, IoT-Zoo automates the deployment of multi-domain scenarios and supports real application protocols such as MQTT and RTSP. The platform provides a single-command interface for environment provisioning and automated traffic capture (PCAP), enabling the generation of consistent traffic baselines and reducing the operational effort required to evaluate networking and security solutions.

02.
arXiv (CS.LG) 2026-06-16

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

arXiv:2606.16975v1 Announce Type: cross Abstract: In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

03.
medRxiv (Medicine) 2026-06-12

Mathematical analysis of the overall survival after chemoradiotherapy of limited-stage small cell lung cancer and the effect of dose/fractionation

The purpose of this work is to analyze the 2-year overall survival (OS2y) of limited-stage small cell lung cancer (LS-SCLC) treated with chemoradiotherapy (CRT), aiming at characterizing the response of LS-SCLC, and in particular the /{beta} value and proliferation parameters. Through a systematic analysis of the literature, we collated a dataset containing 57 entries (3363 patients) of response of LS-SCLC treated with CRT. Radiotherapy schedules ranged from hyper- to hypofractionation. Four radiobiological models to describe the OS2y were investigated, with progressive levels of complexity including the effect of radiotherapy, chemotherapy, treatment year and toxicity. The Akaike Information Criterion (AIC) was used to compare models, and the profile likelihood methodology to compute confidence intervals. Model 4, which includes the effect of radiotherapy, chemotherapy, treatment year and dose-dependent toxicity, provided the best fits of the experimental data (lowest AIC value). While being the best model, model 4 still fails to provide a good prediction of the OS2y, in particular failing to predict the survival of the schedules achieving the lower/higher survivals. The radiobiological analysis of the dose-response of LS-SCLC to CRT does not allow to narrowly constrain the value of response parameters. We attribute this limitation to the large heterogeneity of this disease. Nonetheless, our analysis shows a large /{beta} value (>9 Gy, 95% CI), which implies a low fractionation effect in the radiotherapy of LS-SCLC. and an accelerated proliferation of tumor cells, {lambda}' > 1.6 Gy/day (95% CI), after a kick-off time of ~4-5 weeks, which supports the use of accelerated protocols to avoid the effect of tumor proliferation on the clinical outcome.

04.
arXiv (CS.AI) 2026-06-12

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

arXiv:2606.13051v1 Announce Type: new Abstract: Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.

05.
arXiv (CS.CV) 2026-06-15

FoleyGenEx: Unified Video-to-Audio Generation with Multi-Modal Control, Temporal Alignment, and Semantic Precision

We present FoleyGenEx, a unified video-to-audio (VTA) framework integrating multi-modal control, frame-level temporal alignment, and fine-grained semantics, enabling synchronized, versatile audio synthesis for diverse tasks. Existing VTA methods either have multi-modal control but weak temporal alignment or strong alignment but lack reference audio conditioning and semantic precision. FoleyGenEx fills this gap via three core innovations: a conditional injection mechanism for audio-controlled VTA and Foley extension, a multi-modal dynamic masking strategy preserving training synchronization, and an adverb-based data augmentation algorithm leveraging signal processing and large language models to enhance textual supervision with nuanced semantics. Experiments on AudioCaps, VGGSound, and Greatest Hits demonstrate its competitive controllable VTA performance against existing methods. Demo samples are available at https://foleygenex.github.io/FoleyGenEx.

06.
arXiv (CS.CV) 2026-06-16

Navigating Distribution Shifts in Medical Image Analysis: A Survey

Medical Image Analysis (MedIA) has become indispensable in modern healthcare, enhancing clinical diagnostics and personalized treatment. Despite the remarkable advancements supported by deep learning (DL) technologies, their practical deployment faces challenges posed by distribution shifts, where models trained on specific datasets underperform on others from varying hospitals, or patient populations. To address this issue, researchers have been actively developing strategies to increase the adaptability of DL models, enabling their effective use in unfamiliar environments. This paper systematically reviews approaches that apply DL techniques to MedIA systems affected by distribution shifts. Rather than organizing existing methods by technical characteristics, we explicitly bridge real-world clinical constraints – such as limited data accessibility, strict privacy requirements, and heterogeneous collaboration protocols – with the technical paradigms able to address them. By establishing this connection between operational constraints and methodological evolution, we categorize existing works into Joint Training, Federated Learning, Fine-tuning, and Domain Generalization, each aligned with specific healthcare scenarios. Beyond this taxonomy, our empirical analysis suggests that, as domain information becomes progressively less accessible across these paradigms, performance improvements become increasingly constrained, and further uncovers a gradual shift in methodological focus from explicit distribution alignment toward uncertainty-aware modeling, ultimately pointing to the need for more deployability-aware design in real-world MedIA.

07.
arXiv (CS.AI) 2026-06-15

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

arXiv:2605.07984v2 Announce Type: replace-cross Abstract: We study planning site formation in language models – where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary despite strong probe signal. We localize the Gemma-3-27B handoff to five attention heads through two-stage path patching that recover ~90% of the rhyme-routing capacity at the newline.

08.
arXiv (CS.CL) 2026-06-17

The Benchmark Illusion: Pruned LLMs Can Pass Multiple Choice but Fail to Answer

Compressing large language models reduces memory use and inference cost, but it can also create failures that standard benchmarks miss. A pruned model may still perform well on multiple-choice evaluations, yet fail to answer the same question in open generation. We ask what pruning changes: does it erase the correct answer, or does it make the answer harder to produce as the top output? We study this question with multilingual question answering, tracking the same questions before and after pruning. We find a benchmark illusion. Under high-sparsity pruning, especially Wanda, models often fail in greedy open generation while still selecting the correct answer under multiple-choice scoring. In these recognition-only errors, the answer is usually not gone, but demoted: it often reappears with beam search, sampling, or one in-context example. Overall, multiple-choice benchmarks can overstate the usability of compressed LLMs, creating an evaluation blind spot. Compressed models should be tested on what they can produce, not only on what they can recognize.

09.
Nature (Science) 2026-06-11

Daily briefing: Deep-sea whale graveyard is a treasure trove of fossils

作者:

Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better. Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better.

10.
arXiv (CS.CV) 2026-06-18

Semantic Robustness Certification for Vision-Language Models

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

11.
arXiv (CS.LG) 2026-06-19

Variational Consensus Monte Carlo for Bayesian Mixture

arXiv:2606.19643v1 Announce Type: cross Abstract: Motivated by the privacy, sensitivity and sharing limitations of health data, we present a comprehensive pipeline for inference of Bayesian mixture models within a federated learning setting, i.e. when data cannot be fully shared or pooled across compute nodes. We adopt a Consensus Monte Carlo (CMC) approach, in which an MCMC algorithm is run independently within each data silo to estimate local posterior distributions, which are then aggregated to approximate the posterior over the full data. The variational CMC approach of Rabinovich, Angelino and Jordan (2015) [1] frames the aggregation step as a variational inference problem, but their application to mixtures assumes the number of clusters and key mixture parameters to be known. Our main methodological contributions are: (i) an extension of variational CMC to over-fitted Bayesian mixture models that infer the number of clusters and all model parameters, without requiring conjugacy; (ii) novel cluster-matching algorithms suitable for cross-silo settings in which not every cluster appears in each local dataset; (iii) a number of inference strategies for the aggregation step, matched to different federated learning constraints; and (iv) guidelines for choosing among these in practice. A comprehensive simulation study validates the framework and allows us to compare to state-of-the-art federated learning alternatives. Notably, we show that when the composition of local datasets reflects the underlying clustering structure in the data, our approach can recover small clusters with greater accuracy than standard MCMC applied to the pooled data. We illustrate the framework on large-scale electronic health record data, identifying multi-morbidity patterns in a British geriatric population.

12.
medRxiv (Medicine) 2026-06-16

Adherence to Red Reflex and Vision Screening Recommendations: A Deep Dive into Primary Care Implementation Gaps

Introduction: Early childhood vision screening is critical for detecting amblyopia and other vision-threatening conditions. Despite screening recommendations during well-child visits, rates remain low. Red reflex assessment is recommended to identify serious ocular pathology, yet its use in primary care is not well described. We examined rates and drivers of vision screening in pediatric primary care. Methods: We conducted a retrospective review of electronic health records for children 3 to 5 years attending well-child visits in 2022 in one of three representative primary care clinics within a university health system. Outcomes were documented red reflex and functional vision tests. We evaluated associations with patient demographics and clinic site using multivariable logistic regression Results: Among 1,003 visits, 21.1% (n=212) had a documented red reflex assessment, and 60.8% (n=610) a functional vision test. Younger children (ages 3 and 4 vs. 5 years) had higher odds of red reflex assessment [adjusted odds ratio (aOR) 9.00 and 8.64], and lower odds of a functional vision (aOR 0.47 and 0.59) test. Females had higher odds of red reflex assessment (aOR 1.53). Other/Multiracial children had lower odds of red reflex assessment than Non-Hispanic White children (aOR 0.48). Screening rates varied significantly by clinic site Conclusions: Visual function and red reflex assessment are inconsistently performed in pediatric primary care, with particularly low rates of red reflex documentation. Screening rates varied between clinics and were affected by age. These findings highlight missed opportunities for early detection of vision-threatening conditions and identify targets for improving adherence to pediatric vision screening recommendations

13.
medRxiv (Medicine) 2026-06-17

Long-term mortality and cause-specific death after non-cardiac chest pain: a multicentre cohort study of 160,245 patients in China

Abstract Background Non-cardiac chest pain (NCCP) is commonly regarded as a low-risk condition. However, long-term mortality, cause-specific death, and high-risk subgroup characteristics remain poorly defined. Methods In this multicentre registry-linked cohort study, we linked the Chest Pain Center Registry from 101 hospitals in Hunan, China, with the Mortality and Cause of Death Registry. Adults diagnosed with NCCP from Jan 1, 2017, to Dec 31, 2021, were included. We assessed 3-year all-cause, cardiovascular, and non-cardiovascular mortality using Cox, restricted cubic spline, and Fine-Gray models. Findings Among 160,245 patients, 4674 deaths occurred within 3 years (2.9%). Mortality increased sharply after 60.5 years. Age [≥] 60.5 years (adjusted hazard ratio [aHR] 7.49 [95% CI 6.89-8.14]), rural residence (time-varying aHR 1.46 [1.35-1.57] in year 1 and 1.66 [1.46-1.89] in years 1-3), and male sex (aHR 1.47 [1.38-1.57]) independently predicted death. Three-year mortality ranged from 0.3% in younger urban women to 8.4% in older rural men. Cardiovascular diseases accounted for 56.4% of deaths among older patients, whereas other non-cardiovascular causes (22.8%) and malignancy (20.8%) were the largest categories among younger decedents. Interpretation NCCP is not uniformly benign. Age, rural residence, and sex identify patients who could benefit from risk-stratified follow-up, with cardiovascular prevention prioritised for older rural men and broader non-cardiovascular assessment considered for younger patients.

14.
arXiv (CS.CL) 2026-06-16

VeriGraph: Towards Verifiable Data-Analytic Agents

LLM-based agents have demonstrated strong capabilities in data-intensive analytical tasks, yet their outputs are rarely verifiable: a reliance on linear text trajectories makes their reasoning difficult to audit. In particular, deterministic computations over raw data and semantic deductions over natural-language claims are often entangled in an unstructured stream, leaving numerical conclusions hard to reproduce and qualitative judgments hard to inspect. To address this, we propose VeriGraph, a traceable neuro-symbolic reasoning framework that enables agents to construct an explicit heterogeneous evidence directed acyclic graph (DAG) during execution. VeriGraph introduces three evidence-expansion primitives, namely computational, grounding, and derivational expansion, to connect raw data, interpreter variables, computed results, and natural-language claims in a unified graph. Under this formulation, structural traceability is reduced to graph reachability from raw data sources to terminal claims, while semantic support is measured by claim-level evidence evaluation. To improve graph construction, we further design a graph-based policy optimization strategy with a composite reward that jointly supervises answer correctness, computational integrity, and derivational coherence. Experiments on four benchmarks show that VeriGraph-8B achieves the highest overall score among all baselines. More importantly, VeriGraph produces auditable evidence graphs with substantially stronger claim grounding, achieving a 87.61\% Grounding Rate under our claim-level evidence support evaluation. These results suggest that explicit evidence-graph construction is a promising path toward verifiable data-analytic agents. Our code is available at https://github.com/ignorejjj/VeriGraph.

15.
arXiv (CS.AI) 2026-06-12

PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization

arXiv:2606.12942v1 Announce Type: new Abstract: Generative listwise ranking with Large Multimodal Models (LMMs) aims to capture global list context in a single forward pass, but its effectiveness degrades in long-context multimodal scenarios. We identify a recurring failure mode, parse collapse, where the autoregressive decoder produces fluent yet incomplete rankings by silently omitting candidates and terminating early. This failure stems from limited context utilization rather than simple formatting mistakes, making prompt engineering and constrained decoding insufficient. We propose PRISMR (Parameterized Representation Internalization for Semantic Multimodal Ranking), a framework that replaces transient in-context list processing with parametric structural conditioning. PRISMR uses a lightweight hypernetwork to encode multimodal candidates in parallel and generate item-specific LoRA weights, which are synthesized into an instance-specific adapter for a LMM. This paradigm enables more robust internalization of list structure while preserving the base model. We further introduce a large-scale multimodal review-ranking benchmark for evaluation. Experiments demonstrate that PRISMR substantially reduces parse collapse, improves listwise ranking performance, and transfers effectively across domains and instruction-tuned backbones.

16.
arXiv (CS.LG) 2026-06-16

An RRAM-based Hardware Implementation of a Radial Basis Function Neuron for Edge Classifiers

arXiv:2606.14739v1 Announce Type: cross Abstract: The deployment of modern machine learning (ML) solutions on resource-constrained edge devices highlights implementation challenges. This is especially true for extreme edge applications that include safety-critical components, such as autonomous navigation tasks. This paper demonstrates an artificial neural network (ANN) design leveraging Metal-Oxide Resistive RAM (RRAM) -based Analogue Content Addressable Memory (ACAM) as an efficient hardware substrate for performing metric-based classification and online adaptation on the edge. The proposed design is based on a custom Template piXeL (TXL) cell used for building the ACAM module, where each TXL cell acts as a configurable receptive field neuron. These cells employ a Radial Basis activation function to calculate the distance of an input from the programmed receptive field. The TXL can be organised into dense arrays for calculating the distance of a high-dimensional input against all stored prototypes, effectively performing fast and energy efficient similarity search. This hardware engine enables on-the-fly learning, where the receptive field parameters can be tuned to track domain shift. Through simulation of the proposed TXL-RBF classifier we can achieve 89.1\% accuracy on the MNIST dataset while consuming 185fJ per cell per operation when operating at 100MHz.

17.
arXiv (CS.CV) 2026-06-15

Compressing Image Style Training into a Single Model Forward

Diffusion-based style transfer must balance inference efficiency with stylization fidelity. Adapter-based methods are efficient, but they inject style as an external condition and can either weaken reference-specific appearance or copy reference semantics into the generated image. Optimization-based personalization methods such as LoRA internalize style more effectively, but require a separate training process for every new style. We introduce i2L (image-to-LoRA), a framework that amortizes style LoRA training into a single forward pass. Given one or more reference images, i2L predicts LoRA weights for a text-to-image model, enabling immediate style instantiation without per-style optimization. The architecture combines an image encoder, learnable LoRA queries, and compressed decoding heads that generate adapted matrices. Training on semantically diverse style pairs encourages the predictor to preserve appearance cues while suppressing reference-content copying. Experiments on Z-Image, FLUX.2, and Hidream-O1 show that i2L improves style fidelity, prompt alignment, and perceptual quality over existing baselines. Because i2L produces explicit LoRA weights, it also supports asymmetric classifier-free guidance, multi-reference style fusion, and composition with controllable-generation modules.

19.
bioRxiv (Bioinfo) 2026-06-18

Structure-Based Immunoinformatics Design of a CTB-Adjuvanted Multi-Epitope Mucosal Vaccine Against Helicobacter pylori

Background: Helicobacter pylori coloniz the gastric mucosa of nearly half of the global population and is classified as a Group I carcinogen by the World Health Organization due to its strong association with gastric cancer. The growing prevalence of antibiotic-resistant H. pylori strains significantly compromises current therapeutic strategies, emphasizing the urgent need for effective prophylactic approaches. Research design and methods; In this study, a novel multi-epitope vaccine was designed targeting H. pylori, incorporating epitopes from four key virulence proteins: BabB, SabB, SabA, and VacA. Using an immunoinformatics-guided structural vaccinology approach, B- and T-cell epitopes were predicted, prioritized based on immunogenicity, conservation, population coverage, and non-homology to human proteins, and assembled into the final vaccine construct. To enhance immunogenicity and specifically stimulate mucosal immune responses, the cholera toxin B subunit (CTB) was fused at the N-terminal via an EAAAK linker, a novel application in H. pylori multi-epitope vaccines. The PADRE universal epitope and additional linkers were incorporated to optimize epitope presentation and helper T-cell activation. Results: Comprehensive evaluations of physicochemical, antigenic, allergenic, and toxic properties were conducted, followed by secondary and tertiary structure modeling, refinement, and validation. Conformational B-cell epitopes were mapped, and molecular docking, binding affinity analysis, energy minimization, and molecular dynamics simulations confirmed structural stability and receptor interactions. Codon optimization and in silico cloning predicted efficient expression in Escherichia coli, while immune simulations suggested robust humoral and cellular responses. Conclusions: This study presents a promising multi-epitope vaccine candidate against H. pylori, offering a rational framework for future experimental validation and potential clinical application.

20.
arXiv (CS.CL) 2026-06-11

Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-personality induction, multi-personality induction, and personality switching. Experiments show that personality induction improves image captioning performance but can impair performance on tasks requiring precise reasoning, such as visual question answering (VQA). Balancing and residual effects are observed during multi-trait composition and dynamic switching, indicating that model behavior is co-modulated by both previous and current personality constraints. Existing prompt-based personality induction methods show limited transferability to multimodal settings. Our work reveals the dynamic and complex nature of personality modeling in MLLMs and underscores the need for robust, tailored methods for personality induction and evaluation. The code will be released when the paper is accepted.

21.
arXiv (CS.AI) 2026-06-11

Signed Compression Progress on a Sealed Audit is Goodhart-Resistant

arXiv:2606.11417v1 Announce Type: cross Abstract: Compression progress is a long-standing proposal for intrinsic motivation: reward an agent when its world model becomes better at predicting or compressing experience. The folk claim is that this reward is "credible" because it is paid only for learning. We make this precise and prove it. If intrinsic reward is the signed decrease of a fixed sealed-audit loss, r_t = E(theta_{t-1}) - E(theta_t), then cumulative reward telescopes exactly to endpoint audit improvement, so no policy can push reward up indefinitely while true audit performance stagnates or degrades. For finite audit panels the same result holds with a sharp false-positive budget: cumulative empirical reward is at most true audit improvement plus 2 Delta_n(F, delta), the uniform audit deviation of the model class. This is horizon-free: adaptivity over time costs nothing once the sealed panel uniformly controls the class. The theorem also identifies the failure modes: the guarantee disappears if progress is clipped, scored on the agent's own stream, exposed to a high-capacity model on a reusable panel, or applied to a neural class that makes Delta_n vacuous. We give a Lean 4 mechanization of the structural core (telescoping, the finite-audit bound, finite Gibbs, and the entropy floor) and an experiment suite on ARC-TGI grid-transformation generators with adaptive holdout attacks. Experiments confirm the theory: finite-audit deviation scales as n^{-0.527}; signed progress resists clip-farming, stream leakage, and noisy-TV curiosity; naive reusable audits are exploitable by black-box scalar feedback, while standard release defenses keep the attack below the 2 Delta_n threshold. Signed compression progress on a sealed audit is an accounting signal of genuine improvement.

22.
arXiv (CS.CL) 2026-06-15

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

Long-context language modeling requires not only extending context windows but maintaining coherent understanding of entity states and relationships across thousands of tokens – a challenge that semantic similarity alone cannot address. KGERMAR addresses this by constructing dynamic, context-specific knowledge graphs from input text during inference, enabling domain-adaptive retrieval that leverages both semantic similarity and explicit entity relationships. The framework performs real-time entity and relation extraction to build contextual knowledge graphs, then integrates graph-structural embeddings with textual semantics through a multi-component memory architecture. Three memory banks – contextual, semantic, and structural – are maintained with retrieval signals fused via learned weights to capture both surface-level semantics and deeper relational patterns. Evaluated on SlimPajama (84.7K training examples), WikiText-103 (4,358 examples), PG-19 (100 examples), and Proof-pile (46.3K examples), KGERMAR achieves up to 8.5\% lower perplexity and 2–2.5x better memory efficiency than memory-augmented baselines across context lengths from 1K to 32K tokens, with superior in-context learning performance across five NLU tasks. The dynamic knowledge graph construction approach advances memory-augmented language modeling by enabling domain-specific knowledge representation that adapts to input contexts rather than relying on fixed knowledge bases.

23.
PLOS Medicine 2026-05-21

Novel symptoms associated with eclampsia could improve detection and save lives

by Alice Beardmore-Gray, Andrew Shennan Eclampsia is a life-threatening complication of pre-eclampsia, yet remains difficult to predict. In this Perspective, Alice Beardmore-Gray and Andrew Shennan highlight a recent study that identifies 10 novel prodromal symptoms of eclampsia, with potential to better predict which women are at risk and therefore reduce delays in intervention.

24.
arXiv (math.PR) 2026-06-18

Law of the Iterated Logarithm for $p$-Walks on $\mathbb{Z}$

作者:

arXiv:2606.19131v1 Announce Type: new Abstract: The $p$-rotor walk on $\mathbb{Z}$ is a self-interacting walk that interpolates between the simple random walk and the deterministic rotor walk. While the weak convergence of this model to a perturbed Brownian motion is known, its almost sure asymptotic boundaries have not been characterized. In this paper, we establish the exact Law of the Iterated Logarithm (LIL) for the $p$-rotor walk. Utilizing the decomposition of the walk into a martingale perturbed by its running extrema, we obtain first a functional Law of the Iterated Logarithm for the linearly interpolated paths of the $p$-walk. We then obtain the classical LIL constants by solving a calculus of variations problem over the perturbed Strassen set.

25.
arXiv (CS.LG) 2026-06-18

TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults

arXiv:2606.18539v1 Announce Type: new Abstract: Time series forecasting (TSF) underpins consequential decisions in energy, transportation, finance, and healthcare, yet TSF models are almost universally ranked by a single number (e.g., average error) on clean held-out data, under the implicit assumption that it predicts deployed reliability. However, real faults are not i.i.d noise but structured events with temporal shape, broken cross-variable dependencies, regime change coupled with missingness, and causal propagation across a sensing pipeline. Treating TSF robustness as a data-quality problem, we present TS-Fault, a benchmark that evaluates forecasting models under explicit, parameterized fault scenarios with controllable semantic difficulty. TS-Fault organizes recurring failures into four modes along two orthogonal axes (observation- vs mechanism-level; univariate vs multivariate) and injects each fault into the most prediction-critical window via a unified importance score. This design enables robustness to be tested against the structures models actually rely on, rather than reduced to generic noise sensitivity. We evaluate 21 models across 6 datasets, 4 modes, and 5 difficulty levels under a paired clean/corrupt protocol. The results reveal three findings that contradict common leaderboard intuition: (i) clean-data accuracy anti-correlates with robustness; (ii) clean rankings are preserved under observation-level faults but reshuffled under mechanism-level faults; and (iii) all catastrophic failures occur under mechanism-level faults, with foundation models achieving the highest clean-data accuracy yet exhibiting the greatest fragility. The code is publicly available at https://github.com/Ray-zyy/TS-Fault.