Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

Prism: Cost-Efficient Multi-LLM Serving via GPU Memory Ballooning

arXiv:2505.04021v3 Announce Type: replace-cross Abstract: Inference providers must maintain availability for many LLMs, including low-volume but essential models, making resource efficiency increasingly important as token prices fall. Analysis of production traces reveals a dynamic bursty-group pattern in which sets of models become active together and shift over time; existing space- and time-sharing approaches lack principled mechanisms to adapt to this variability, forcing trade-offs between SLO adherence and efficiency. We observe that elastic memory allocation can unify spatial and temporal sharing. Based on this insight, we have developed Prism, a memory-centric LLM co-serving framework that applies memory ballooning to reclaim memory across models and support both forms of sharing under a single scheme. Prism's balloon driver, referred to as kvcached, has been open-sourced at https://github.com/ovg-project/kvcached, and deployed in production environments across 10K+ GPUs.

02.
arXiv (CS.CL) 2026-06-19

Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems

Real-world computer-use tasks often span multiple applications and devices, requiring agents to coordinate heterogeneous environments under dynamic runtime failures. Existing multi-device agent systems support task decomposition and cross-device assignment, but recovery remains largely coarse-grained: when execution fails, they typically retry the same strategy, reassign the subtask, or revise the global plan, without systematically modeling the device-local strategy space. This limits their ability to distinguish failures that can be repaired within the current device from those that require cross-device replanning. We propose H-RePlan, a hierarchical replanning framework for multi-device agents with unified API–CLI–GUI execution. H-RePlan equips each device with interchangeable execution strategies and separates device-local strategy recovery from orchestrator-level global replanning through a compact cross-layer failure abstraction. To evaluate this capability, we introduce HeraBench, a fault-injected benchmark that constructs cross-device workflows over Linux and Android devices and injects strategy- and device-level failures. Experiments show that H-RePlan substantially outperforms single-strategy and coarse-grained multi-device baselines, achieving higher completion, instruction adherence, and perfect-pass rates while reducing the token cost required for reliable end-to-end success. These results demonstrate that scope-aware hierarchical recovery is essential for robust multi-device agent execution.

03.
arXiv (CS.AI) 2026-06-19

Autonomous Event-Driven Multi-Agent Orchestration for Enterprise AI at Scale

arXiv:2606.20058v1 Announce Type: new Abstract: Enterprise AI aims to move toward continuous event monitoring, detection, and action across specialist agents, yet existing multi-agent systems largely assume discrete request-response workflows and remain underexplored at enterprise scale. We evaluate DAG Plan and Execute and ReAct across 208 production-derived enterprise scenarios spanning Persona (

04.
arXiv (CS.CV) 2026-06-12

Appearance-Invariant Detection of Suggestive Motion via Laban Movement Descriptors

Content moderation in online multiplayer 3D virtual environments is increasingly automated, yet detection has focused on images, video, and audio, leaving suggestive motion a blind spot. We present a motion-only classification pipeline that detects suggestive and explicit movement from SMPL skeleton trajectories using Laban Movement Analysis (LMA) descriptors. On a dataset spanning everyday, artistic, suggestive, and explicit movement (17+ hours of video), a logistic regression trained on 61-feature LMA descriptors reaches 68% binary SFW/NSFW accuracy (70% random forest) under a leak-free evaluation protocol. At this level, our descriptor performs comparably to a learned video model trained on the same motion re-rendered as appearance-free video, a gray figure with no clothing, skin, or scene. The indirectness (tortuosity) of each joint's trajectory, measured as the ratio of the joint's path length to its net displacement, peaks at the suggestive tier, showing that the Direct-to-Indirect polarity of Laban's Space factor provides an interpretable marker of the shift from functional to suggestive motion. Ultimately, Laban-based kinematic descriptors offer a lightweight, interpretable approach to suggestive-motion detection: every decision decomposes into named, theory-grounded features. Because the classifier operates on pose trajectories alone, moderation can run directly on avatar poses in virtual environments, with no appearance data.

05.
arXiv (CS.CV) 2026-06-12

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet extending them to jointly produce speech and 3D facial animation remains largely unexplored despite its importance for natural human-computer interaction. A key challenge is the mismatch between the discrete semantic reasoning of LLMs and the dense temporal dynamics required for 3D facial motion. We propose Expressive Omni (Ex-Omni), an open-source model that augments OLLMs with native speech-accompanied 3D facial animation. Ex-Omni decouples semantic reasoning from temporal generation through a blendshape-aware speech unit generator and a blendshape decoder, where speech units provide temporal scaffolding and hidden speech representations carry facially relevant cues. We further introduce a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection, as well as InstructS2SF-1200K, a dataset consisting of 1200K samples for pre-training. Extensive experiments show that Ex-Omni maintains competitive speech understanding and generation ability while achieving better audio-visual synchronization and lower face-generation latency than cascaded pipelines.

06.
Nature (Science) 2026-06-08

Distributed control circuits across a brain-and-cord connectome

Just as genomes revolutionized molecular genetics, connectomes (maps of neurons and synapses) are transforming neuroscience. To date, the only organisms with complete connectomes are worms1–3, sea squirts4, and comb jellies5 (103–104 synapses). By contrast, the fruit fly is more complex (108 synaptic connections), with a brain that supports learning and spatial memory6,7 and an intricate ventral nerve cord analogous to the vertebrate spinal cord8–12. Here we report the first densely-reconstructed adult fly connectome that unites the brain and ventral nerve cord, and we leverage this resource to investigate principles of neural control. We show that effector neurons (motor neurons, endocrine cells, and efferent neurons targeting the viscera) are primarily influenced by sensory neurons in the same body part, forming local feedback loops. These local loops are linked by long-range circuits involving ascending and descending neurons organized into behavior-centric modules. Single ascending and descending neurons are often positioned to influence the voluntary movements of multiple body parts, together with the endocrine cells or visceral organs that support those movements. Brain regions involved in learning and navigation supervise these circuits. These results reveal an architecture that is distributed, parallelized, and embodied, reminiscent of distributed control architectures in engineered systems13,14.

07.
arXiv (math.PR) 2026-06-17

The Loss of Tension in an Infinite Membrane with Holes of Decaying Spatial Density

arXiv:2606.17792v1 Announce Type: new Abstract: What is the effect of randomly removing material from an infinite stretched membrane? Under what conditions can the membrane still sustain tension? This problem was introduced by Robert Connelly in connection with applications of rigidity theory in the natural sciences, and was later studied in M. V. Menshikov, K. A. Rybnikov, and S. E. Volkov, "The loss of tension in an infinite membrane with holes distributed according to a Poisson law" (2002); a discrete version was also considered in Robert Connelly, Konstantin Rybnikov, and Stanislav Volkov, "Percolation and the Loss of Tension in an Infinite Triangular Lattice" (2001). We study a mathematical framework based on a non-homogeneous Poisson point process whose intensity $\lambda$ tends to zero at infinity. The hole shapes are i.i.d.\ and independent of their locations. We show that if the intensity does not decay too quickly, then tension is still lost throughout the whole plane, as in the homogeneous model studied in 2002. Conversely, we give sufficient conditions under which complete loss of tension does not occur. Thus, both destruction and non-destruction regimes are possible even when the intensity tends to zero, indicating a phase transition in the model. The processes studied here are closely related to bootstrap percolation.

08.
arXiv (CS.CL) 2026-06-12

SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity Index), a seven-dimensional diagnostic measure of the semantic-pragmatic burden imposed by a target–text pair. Across SemEval-2016 and VAST, SICI predicts LLM accuracy better than surface proxies and shows substantial cross-scorer reliability ($\alpha=0.771$). More importantly, LLM errors change regime as SICI increases: low-complexity examples invite over-attribution, especially Against predictions; intermediate examples form an unstable boundary; and high-complexity examples rapidly concentrate on None. This phase-transition-like structure persists across GPT-3.5, GPT-4o-mini, DeepSeek-V3, and GPT-4o, although stronger models move the boundaries. A 15-method intervention study further shows that prompting, retrieval, and debate often shift models along the attribution–abstention axis rather than removing the high-complexity bottleneck.

09.
arXiv (CS.AI) 2026-06-18

Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems

arXiv:2606.18882v1 Announce Type: cross Abstract: This work investigates the application of a domain-shift aware neural network for regression tasks aimed at estimating unbalance masses in rotating shafts under varying operating conditions. Experimental data were collected from a test rig in which a primary shaft, equipped with a flange carrying unbalanced masses, was driven at different rotational speeds, while a secondary shaft could be optionally activated to introduce domain discrepancy. The unbalance masses were positioned at a fixed radial distance, and the dynamic response of the system was recorded using triaxial accelerometers. The inverse problem of mass estimation is formulated within a domain adaptation framework, where the network is trained with a maximum mean discrepancy strategy to align feature representations across source and target distributions. The results demonstrate the effectiveness of explicitly addressing domain shift in improving prediction accuracy, especially when the system's physical behavior and sources of domain discrepancy are not fully known and fall outside the training conditions. These findings highlight the potential of domain-shift aware models for regression tasks in Structural Health Monitoring.

10.
arXiv (CS.AI) 2026-06-18

Examining Human-Like Behaviors in LLMs: A Multi-Dimensional Analysis of Model Behaviors, User Factors, and System Prompts

arXiv:2606.18258v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit a wide range of human-like behaviors, from expressing thoughts and emotions, to engaging in relationship-building with users, to refusing requests and maintaining boundaries. Despite their prevalence, researchers and practitioners lack methods and empirical insights to make informed decisions about when and what types of human-like behaviors LLMs should exhibit. To fill this gap, we present a multi-dimensional analysis of the prevalence, potential effects, and controllability of these behaviors using LLM-as-a-judge and human evaluation. Across 21,000 multi-turn conversations from four widely used models (gpt-4o, gpt-4.1-mini, claude-sonnet-4.6, gemini-2.5-flash), we find that human-like behaviors are pervasive but vary across models and user factors (conversation goals and user profiles). In terms of perceived appropriateness, human evaluators judged self-referential and relationship-building behaviors as less appropriate from LLMs than from humans, but boundary-maintaining behaviors more appropriate from LLMs than from humans. Finally, we show that system prompting can control these behaviors, though it requires careful evaluation to avoid unintended effects. We discuss the implications of our findings and provide recommendations for responsible LLM design and evaluation.

11.
arXiv (CS.AI) 2026-06-16

AP-GRPO: Anchor-Gated Phonetic Alignment with Policy Optimization for Pathological Speech Reconstruction

arXiv:2606.15540v1 Announce Type: cross Abstract: Pathological speech from patients with neurodegenerative and neuromotor disorders is often acoustically distorted and linguistically fragmented, making pathological speech reconstruction necessary to recover intended textual content from distorted and incomplete speech recordings. Crucially, such recordings are rarely uniformly degraded: some words or short phrases remain reliable and can serve as audible anchors for reconstructing the corrupted surrounding content. We introduce Anchor-gated Phonetic Group Relative Policy Optimization (AP-GRPO), a GRPO framework with phonetic reward that aligns speech language models (SLMs) through audible-anchor preservation and inter-anchor phonetic compatibility to the original speech signal. AP-GRPO consists of: (i) an anchor-gated reward that matches reliable audible anchors in clear regions; and (ii) an inter-anchor phonetic alignment reward that evaluates whether recovered contents are phonetically supported by the corresponding corrupted inter-anchor speech span. Across four disease conditions, AP-GRPO improves faithful speech reconstruction, and the learned anchor constraint automatically adapts to each condition and thus reveals interpretable disease-specific profiles: conditions with severe articulatory degradation require stronger anchor enforcement, whereas milder impairment or linguistically impaired conditions rely more on phonetic alignment for inter-anchor recovery.

12.
arXiv (CS.CV) 2026-06-17

LiveStarPro: Proactive Streaming Video Understanding with Hierarchical Memory for Long-Horizon Streams

Despite the remarkable progress of Video Large Language Models (Video-LLMs), current online architectures still struggle to simultaneously process continuous video streams, decide autonomously when to respond, and preserve long-horizon contextual memory. These obstacles undermine real-time responsiveness and cause severe forgetting throughout prolonged interactions. In this work, we introduce LiveStarPro, a live streaming assistant that is designed for proactive video understanding over long-horizon streams. The design of LiveStarPro rests on three complementary components. The first component is Streaming Verification Decoding (SVeD), an inference framework that identifies the appropriate response timing through single-pass perplexity verification, thereby eliminating the dependency on explicit silence tokens. The second component is Streaming Causal Attention Masks (SCAM), a training strategy that enforces incremental video-language alignment over variable-length streams. The third component is Tree-Structured Hierarchical Memory (TSHM), a recursive memory architecture that organizes evicted historical information into event chains and consequently enables efficient retrieval from effectively unbounded video streams. To facilitate a comprehensive evaluation under realistic online conditions, we further present OmniStarPro, a large-scale benchmark that spans 15 diverse real-world scenarios and that extends to hour-scale streams for the assessment of long-term recall. Extensive experiments demonstrate that LiveStarPro consistently surpasses existing methods, attaining a 28.9% improvement in semantic correctness and an 18.2% reduction in timing error, while its streaming key-value cache further yields a 1.58x inference speedup over the same model without caching. The model and the code are publicly available at https://github.com/sotayang/LiveStarPro.

13.
arXiv (CS.LG) 2026-06-19

Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge

arXiv:2606.19964v1 Announce Type: new Abstract: Tsetlin Machine (TM) is a logic-based machine learning approach that relies on simple bitwise operations and finite-state automata, which makes it attractive for edge AI deployments. Recent work has focused on co-processor and accelerator designs based on Tsetlin Machines (TMs). Although these designs achieve high performance, they typically depend on tightly coupled interfaces, microcode-style programming, and external host processors, limiting flexibility and ease of programming. In this work, we present a domain-specific RISC-V microprocessor architecture and design flow tailored for TM inference. Leveraging the modular structure of RISC-V, we design a reduced instruction subset processor that retains programmability while targeting improved performance and lower energy consumption for TM workloads. Instruction profiling is employed to guide instruction reduction, followed by datapath and control path simplifications tailored to TM inference. Both the baseline RV32IM core and the proposed reduced core are evaluated across multiple datasets and compared with Binarized Neural Networks (BNNs), which serve as a hardware-efficient baseline due to their reliance on bitwise operations during inference. Results show that TM achieves comparable or higher accuracy (e.g., up to 88.18% on CIFAR-2 compared to 60.0% for BNN) while reducing execution time by up to 98% across multiple datasets. Furthermore, the proposed design achieves an average $29.7\times$ reduction in energy consumption, demonstrating its effectiveness for programmable and efficient edge AI systems.

15.
arXiv (CS.CV) 2026-06-12

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate on token streams, features, or flat next-token logits, leaving two native structures of bitwise-residual VAR models underused: the per-bit Bernoulli prediction head and the additive multi-scale residual code field from which the image is assembled. We propose BitResEdit, a training-free editor for bitwise-residual VAR generators such as Infinity. BitEdit performs source-negative guidance by tilting the post-CFG per-bit log-odds along a source–target contrast computed on a shared edited prefix, then projects each update into a closed-form Bernoulli-KL trust region around the clean CFG sampler. ResEdit converts the sampled bits into per-scale continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's native sum-of-scales. Together they couple decision-time bit guidance with combination-time code composition, so masked-out latent features are preserved exactly by code arithmetic while localized, scale-aware edits are applied inside the target region. On PIE-Bench with Infinity-2B, BitResEdit attains the strongest text alignment among same-backbone VAR editors, improving CLIP on the edited region by +1.07 over the strongest prior editor while keeping background preservation competitive with it. Ablations show BitEdit and ResEdit play complementary roles in target alignment and background preservation.

16.
arXiv (CS.AI) 2026-06-19

Learner-based Concept Drift Detection: Analysis and Evaluation

arXiv:2606.20216v1 Announce Type: cross Abstract: Machine learning algorithms deployed for evolving streaming environments must handle the non-stationary data distributions, commonly referred to as concept drift. The presence of concept drift poses a major challenge for many real-world applications because it can severely degrade their predictive performance, hindering their ability to support robust decision-making. Consequently, the timely and efficient detection of drift events is critical for sustaining high accuracy over time. This study examines theoretically the concept drift characteristics and numerous drift detection algorithms across several categories. Furthermore, we evaluate their performance on both synthetic and real-world datasets exhibiting diverse streaming scenarios and drift characteristics, such as abrupt and gradual changes. This study aims to enhance understanding of the complex notion of concept drift characteristics and behavior of drift detectors, along with their applicability to diverse contexts.

17.
arXiv (CS.AI) 2026-06-16

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

arXiv:2606.16952v1 Announce Type: cross Abstract: The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training corpus. In this work, we present a customizable empirical auditing framework designed to detect and explain such data disclosures. Our framework introduces a mechanism to distinguish between "true disclosures"-where the system directly reproduces a user's information-and "phantom disclosures''-where the system incidentally generates a user's data. By partitioning input data into training and holdout sets and applying rigorous statistical hypothesis testing, we determine if observed disclosures are consistent with strict privacy baselines, such as zero-learning or specific Differential Privacy (DP) bounds. Crucially, this approach requires no model access, no canary insertion, and no reference model training -only the synthetic output and a held-out control set. We demonstrate that this framework effectively functions as a membership inference attack, providing empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. Our approach is model-agnostic, applies to any synthetic data generation mechanism, and requires orders of magnitude fewer computational resources than shadow-model or canary-based alternatives.

18.
arXiv (quant-ph) 2026-06-16

Quantum Field-Theoretic Predictions of {\Psi}-Epistemic Models of Quantum Mechanics

arXiv:2605.12546v2 Announce Type: replace Abstract: {\Psi}-epistemic models of quantum mechanics imply that the quantum state does not correspond to physical reality, but instead reflects the observer's knowledge of the underlying quantum system. The epistemic view of the quantum state has the potential to shed light on several foundational problems of quantum theory and has attracted considerable attention in the literature. On the other hand, the Pusey-Barrett-Rudolph theorem demonstrated that broad classes of {\psi}-epistemic models must lead to predictions that deviate from those of quantum mechanics. Although the original theorem involved entangled joint measurements on composite systems, alternative no-go theorems involving measurements on single quantum systems were developed shortly thereafter. Experimental investigations of the deviations predicted by {\psi}-epistemic models from quantum mechanics are still ongoing. So far, such tests have been performed within the framework of non-relativistic quantum mechanics and predominantly rely on quantum information based measurement procedures. In this work, we show that {\psi}-epistemic models can give rise to deviations from standard quantum field-theoretic predictions through modifications of polarized scattering cross sections and decay widths. Our results do not require a relativistic formulation of ontological models or of the Harrigan-Spekkens criterion; the essential assumption is merely that measurements implemented through relativistic processes can still be represented within the ontological framework by well-defined response functions and probabilities. The present work constitutes a proof-of-principle study demonstrating that particle physics tests of the ontological status of the quantum state are possible and that {\psi}-epistemic models may exhibit experimentally distinguishable signatures in particle phenomenology.

19.
bioRxiv (Bioinfo) 2026-06-15

VrySure: A Multi-Task AI Scientific Fraud Detection Platform for Identifying Manipulated and AI-Generated Biomedical Research Images

Integrity of scientific data is critical in biomedical research, where images often serve as primary evidence for experimental observations and conclusions. Advances in image-editing technologies and generative artificial intelligence (AI) have increased the accessibility and realism of visual manipulation, making detection through manual review increasingly challenging. To empower our laboratory researchers to continuously monitor and uphold scientific rigor and data integrity, and serve the global scientific community, we developed VrySure, an easy-to-deploy, AI-driven multi-task platform for automated image-integrity screening in biomedical research. VrySure integrates four detection modules: cross-image transformation detection, within-image copy-move detection, splicing detection in blot and gel images, and AI-generated image detection. The system identifies potentially manipulated images and, when possible, localizes suspicious regions using bounding-box outputs to support downstream verification. To support development and evaluation, we constructed task-specific datasets by combining public biomedical image resources, curated manipulated examples, and synthetic images generated by multiple generative AI systems. We evaluated VrySure using region-level F1 score, recall, precision, false negative rate (FNR), and false discovery rate (FDR) across multiple manipulation categories and compared its performance with two commonly used commercial image-integrity screening platforms under a predefined benchmark protocol. Under the tested conditions, VrySure achieved a higher F1 score and recall, lower FNR, and maintained a low FDR for within-image copy-move detection, splicing detection, and AI-generated image detection, while showing comparable performance in transformation detection. Beyond automated screening, VrySure is designed to support source-data comparison and evidence-based assessment in scientific integrity investigations. By integrating multiple detection capabilities into a unified and scalable workflow, VrySure provides a practical framework to improve the efficiency and consistency of image-integrity screening in biomedical research.

20.
arXiv (CS.CV) 2026-06-11

ViT-FREE: Efficient Face Recognition via Early Exiting and Synthetic Adaptation

Vision Transformers (ViTs) have gained significant attention in computer vision and shown strong potential for face recognition (FR). However, their high computational cost makes deployment on resource-constrained devices challenging, motivating the need for methods that balance efficiency and accuracy. In this work, we investigate early exiting in pretrained ViTs as a simple yet effective training-free strategy for efficient FR inference. Leveraging the uniform feature dimensionality across transformer encoder blocks, we introduce ViT-FREE, a multi-exit framework that enables face verification directly from intermediate representations without modifying or retraining the backbone model, and thus, reducing inference cost. Empirically, we show that patch embeddings and attention maps evolve progressively across depth, exhibiting high similarity between consecutive ViT blocks and increasing alignment with the final representation. This indicates gradual feature refinement and attention convergence, suggesting that intermediate layers already provide stable and discriminative representations suitable for early exiting. Through extensive experiments on multiple FR benchmarks, we systematically analyze the accuracy-efficiency trade-off across exit depths. Our results demonstrate that later exits achieve a highly favorable balance, with exiting at layer 10 yielding up to a 20% speedup while incurring only a 1.5 drop in verification performance on benchmarks such as IJB-C. Also, we propose ViT-FREE_FT, a lightweight exit-specific fine-tuning strategy that adapts only the projection layers using a small synthetic dataset while keeping the transformer backbone frozen. This approach improves the performance of shallow exits while preserving the efficiency benefits and leaving deeper exits largely unaffected.

21.
arXiv (CS.CV) 2026-06-18

Neural Phase Correlation

Correspondence is fundamentally relational: it seeks the unknown transformation between two observations of a common scene, not the content of either. Yet the dominant learning-based methods do not represent the transformation as a first-class object in the architecture. They encode each image independently and let a learned similarity function or a deep decoder discover the mapping implicitly. Phase correlation is the canonical exception, measuring the inter-image relationship directly in the Fourier domain, but the rigidity of its fixed basis confines it to global translation. We introduce a learned generalization of phase correlation that lifts this restriction by learning the basis on which the transformation decomposes. The same algebraic primitive extends to dense non-rigid deformations and to unitary dynamics. On the ACDC cardiac-MRI benchmark the framework matches or exceeds prior published baselines on both registration directions. On CAMUS echocardiography it matches state-of-the-art without auxiliary scoring or adaptive-smoothness mechanisms. Applied to time-evolved wavefunction pairs of the 1-D quantum harmonic oscillator, the same framework recovers the Hermite-function eigenstates and the quantized energy levels of the unknown Hamiltonian from observation pairs alone.

22.
arXiv (quant-ph) 2026-06-16

Retrocausal capacity of a quantum channel: Communicating through noisy closed timelike curves

arXiv:2509.08965v3 Announce Type: replace Abstract: We study the capacity of a quantum channel for retrocausal communication, where messages are transmitted backward in time, from a sender in the future to a receiver in the past, through a noisy postselected closed timelike curve mathematically represented by the channel. We completely characterize the one-shot retrocausal quantum and classical capacities, and we show that the corresponding asymptotic capacities are equal to the average and sum, respectively, of the channel's max-information and its regularized Doeblin information. This endows these information measures with a novel operational interpretation. Furthermore, our characterization can be generalized beyond quantum channels to all completely positive maps. This imposes information-theoretic limits on transmitting messages via postselected-teleportation-like mechanisms with arbitrary initial- and final-state boundary conditions, including those considered in various black-hole final-state models.

23.
arXiv (CS.CV) 2026-06-19

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator–the depth predictor defining the constraint–is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/

24.
medRxiv (Medicine) 2026-06-18

Evaluating Deep-Learning Based Quantification of Breast Arterial Calcification on Mammography for Cardiovascular Risk Assessment

Purpose: To develop and evaluate a deep learning model for automated quantification of breast arterial calcification (BAC) on screening mammography and to assess whether AI-derived BAC burden predicts major adverse cardiovascular events (MACE) in women. Methods: In this retrospective study, 202,006 women who underwent screening mammography without history of MACE were included. A BAC segmentation model was trained on an expert-annotated dataset using a multi-task U-Net with a ResNet-18 encoder to detect and segment BAC. BAC burden was quantified as area (mm{superscript 2}) from model-generated masks using DICOM pixel spacing and categorized by tertiles into low, intermediate, and high. The PREVENT score and incident MACE were identified from electronic health records. Cox proportional hazards models were developed to evaluate AI-derived BAC burden and PREVENT score alone, and combined models for 5 - and 10-year cardiovascular risk prediction. Results: Among 202,006 women (mean age 54.8{+/-}11.7 years), 23.1% had AI-detected BAC, and 7,701 (3.8%) developed incident MACE during a median follow - up of 7.5 years. On the geographically held-out test set, the BAC model achieved an AUROC of 0.97, Dice score of 0.6678, and Pearson correlation of 0.961 between AI-derived and manually annotated BAC burden. BAC burden increased with age and was higher among women who developed MACE. Five - year MACE incidence increased across BAC categories from 1.5% in women without BAC to 6.9% in those with high BAC burden. BAC burden alone showed modest prediction of MACE, with 5-year and 10-year AUROCs of 0.661 and 0.650, respectively, while PREVENT achieved AUROCs of 0.781 and 0.771. Adding BAC to PREVENT produced minimal improvement in discrimination. Conclusion: Deep learning-based BAC quantification from routine mammography is feasible, accurate, and associated with future cardiovascular risk. Although BAC added little to PREVENT for overall discrimination, it may serve as a scalable opportunistic imaging biomarker to identify women at elevated cardiovascular risk and support preventive care.

25.
arXiv (CS.AI) 2026-06-11

Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends

arXiv:2606.12207v1 Announce Type: cross Abstract: Embodied intelligence now spans navigation, household assistance, manipulation, autonomous driving, aerial agents, and multimodal large-model control. This expansion has made benchmark construction a central bottleneck for reliable evaluation. Unlike static datasets, embodied benchmarks combine task specifications, environments, robot data, demonstrations, annotations, metrics, evaluation scripts, and release policies into a single evaluation system. This survey reviews the literature through a five-stage construction pipeline: requirement and task construction, data acquisition, data cleaning and annotation, benchmark suite generation and metric definition, and evaluation execution with diagnostic feedback. For each stage, the survey analyzes the transition from manual curation to traditional automation, foundation-model assistance, and agentic closed-loop workflows. It also compares qualitative construction costs across human labor, data and asset acquisition, compute and simulation, validation and debugging, governance and maintenance, and rework risk. The main conclusion is that automation does not simply reduce benchmark cost. Instead, it often shifts cost toward validation, auditability, version control, and long-term governance. Progress in embodied evaluation will therefore depend not only on larger benchmark suites, but also on construction pipelines that are diagnosable, auditable, and responsibly refreshable.