Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

MimicIK: Real-Time Generative Inverse Kinematics from Teleoperation with FK Consistency

arXiv:2606.15148v1 Announce Type: cross Abstract: Inverse kinematics (IK) remains a critical bottleneck for real-time robot manipulation. Classical numerical solvers achieve high geometric precision but often suffer from discontinuous branch switching and unstable behavior near kinematic singularities during closed-loop deployment. Meanwhile, learned IK approaches frequently struggle to balance spatial accuracy, motion smoothness, and real-time efficiency, particularly when trained on noisy human teleoperation data. We present MimicIK, a real-time generative inverse kinematics framework that learns smooth and robust joint-space motion priors from teleoperation demonstrations through conditional flow matching. Given the current joint configuration and a target end-effector pose, MimicIK predicts continuous delta-joint commands using an efficient two-step iterative refinement process based on a Minimal Iterative Policy (MIP) backbone. To enforce physical consistency, we further introduce an FK consistency loss, a differentiable forward-kinematics regularization that penalizes task-space deviations from the target pose during training. We evaluate MimicIK on a real-world 6-DOF robot dataset containing 8,848 teleoperation demonstrations. MimicIK achieves a mean position error of 4.65 mm, a 10 mm success rate of 92.01\%, and a trajectory spike rate of only 7.99\%. Compared with a UNet diffusion baseline, our method improves both spatial accuracy and motion smoothness while reducing inference latency from 21.66 ms to 6.74 ms. Furthermore, unlike deterministic MLP baselines that catastrophically diverge under out-of-distribution deployment, MimicIK remains stable near singular configurations and enables robust 20 Hz real-time control on deployment hardware.

02.
arXiv (quant-ph) 2026-06-11

Super-Link Fragility in Asymmetric W-Class States under Quantum Noise

arXiv:2606.12307v1 Announce Type: new Abstract: The asymmetric three-qubit W-class state $|\overline{W_3^L}\rangle$ defines an isosceles entanglement-network geometry, (a) two vertex-base (VB) links form stronger bipartite connections, (b) while the base-base (BB) link is weaker. This suggests that concentrating entanglement into a super-link may be advantageous for quantum-network tasks. Here, we show that this intuition is incomplete. We analytically compare the bipartite concurrence dynamics of the symmetric |W> state and the asymmetric $|\overline{W_3^L}\rangle$ state, which differ both in entanglement-network geometry and excitation sector under standard noise models. In the absence of noise, the concurrence hierarchy is C_{VB} > C_W > C_{BB}$. Under phase damping, this hierarchy is preserved for all noise strengths and no entanglement sudden death occurs. Under amplitude damping, however, the hierarchy is reordered. The symmetric |W> state becomes the most robust, while the base-base concurrence of $|\overline{W_3^L}\rangle$ vanishes at the finite threshold of parameter $\gamma$. We term this reordering as the Super-Link Fragility Effect. The same structural asymmetry that produces a stronger vertex-base link also makes it more vulnerable to energy dissipation when coupled with multi-excitation amplitudes. Under depolarization, the asymmetry advantage is erased, with $C_W$ and $C_{VB}$ sharing the same sudden-death threshold for some value of the parameter p, while $C_{BB}$ disappears earlier at some other value of the parameter p. The generalized amplitude damping channel continuously connects the damping-dominated regime to the pure-excitation limit, where the initial hierarchy is restored. These results show that entanglement robustness in $W$-class resources is controlled not by initial concurrence alone, but by the joint structure of entanglement-network geometry, excitation sector, and noise symmetry.

03.
Nature (Science) 2026-06-17

Towards Conversational AI for Disease Management

While large language models (LLMs) have shown promise in diagnostic dialogue1, their capabilities for effective management reasoning—including disease progression, therapeutic response, and safe medication prescription—remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE)1−3 through a new LLM-based agentic system optimized for multi-visit clinical management and dialogue. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini’s long-context capabilities4, combining in-context retrieval with structured reasoning to align its output with up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialists and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. Though AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE’s strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.

04.
arXiv (CS.LG) 2026-06-15

BigPower: Hierarchical Source-Level Module Power Estimation for CPUs with Large Language Models

arXiv:2606.13747v1 Announce Type: cross Abstract: Accurate power estimation is important for understanding and optimizing CPU power behavior, yet practical workflows often rely on simulation-derived information or post-silicon analysis. In this work, we present BigPower, a hierarchical source-level surrogate model for fine-grained module-level power estimation during CPU design. BigPower leverages large language model-based representations together with architectural hierarchy, module connectivity, configuration parameters, and workload context to estimate module-level power consumption directly from source-level design information, without requiring additional simulation during inference. Experimental results in the open-source XiangShan processor family demonstrate practical fine-grained power estimation across diverse configurations and workloads, offering an efficient alternative to conventional simulation-based workflows.

05.
arXiv (CS.CV) 2026-06-16

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

Authors:

Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. The proposed Deformable Spatio-Temporal Encoder (DSTE) attends to a sparse set of informative locations across frames, avoiding the quadratic cost of dense attention while retaining the ability to capture irregular motion patterns. A Causal Anomaly Classifier (CAC) applies dilated causal convolutions over snippet-level features and optimizes a contrastive multiple-instance learning objective that separates anomalous and normal representations without frame-level labels. To meet deployment constraints, an Adaptive Confidence Scheduler (ACS) dynamically skips low-information frames at inference time, reducing redundant computation in static scenes. Evaluated on UCF-Crime, ShanghaiTech, and CUHK Avenue, VigilFormer achieves AUC scores of 87.83%, 97.21%, and 89.74% respectively, at 41.5 FPS on a single GPU, outperforming recent weakly-supervised methods in both accuracy and speed.

06.
arXiv (CS.CL) 2026-06-16

Your "Pro" LLM Subscription May Actually Be "Free": Exposing Fingerprint Spoofing Risks in LLM Inference Services

As Large Language Model (LLM) APIs become ubiquitous, users increasingly rely on black-box fingerprinting to verify that providers are serving the advertised premium models. However, these methods may overlook adversarial providers who manipulate model weights to cheat the fingerprint process. We introduce a novel threat termed fingerprint spoofing, where a malicious provider stealthily serves a weaker model that has been parameter-efficiently fine-tuned to mimic a stronger model, thereby evading user-side fingerprinting. We first formally prove that user-side resource constraints (i.e., finite query budgets and weak fingerprinting classifiers) make current fingerprinting vulnerable to fingerprint spoofing. Guided by this theoretical analysis, we propose GhostPrint, a cost-effective attack framework leveraging surrogate modeling, reward-ranked fine-tuning, and knowledge distillation. Extensive evaluations in both static and continual fingerprinting settings demonstrate that GhostPrint allows weak models to consistently bypass representative fingerprint methods while maintaining utility at a low fine-tuning cost, exposing a critical vulnerability in current LLM fingerprinting pipelines.

07.
arXiv (quant-ph) 2026-06-15

Dealing with locality in QAOA

arXiv:2606.14447v1 Announce Type: new Abstract: Shallow-depth QAOA on sparse, high-diameter MaxCut instances faces a locality bottleneck: at depth \(p\), local observables can depend only on a bounded neighborhood of the circuit interaction graph. We propose a transport-augmented QAOA that keeps the MaxCut cost Hamiltonian unchanged but enriches the mixer with optimized, unweighted shortcut couplings (scheduled \(XX+YY\)) to collapse the effective interaction-graph diameter. Using exact finite-depth support recursions, we relate optimal shortcut placement to bounded-diameter graph augmentation, and show in benchmarks that (unlike ma-QAOA) performance becomes effectively size-invariant once the diameter is reduced. For bipartite families (base diameter 4), reducing the interaction path to \(d=1\) raises the ensemble-averaged approximation ratio from 0.7378 (ma-QAOA) to 0.9767 at \(p=1\) (\(\sigma=0.0251\), nine system sizes); on random trees (base diameter 10), at \(p=2\) it improves from 0.9226 to 0.9997 (\(\sigma=0.0001\)).

08.
arXiv (CS.LG) 2026-06-16

Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

arXiv:2602.03957v2 Announce Type: replace Abstract: Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.

09.
medRxiv (Medicine) 2026-06-16

Preventing postpartum depression through mitigating breastfeeding grief: A convergent parallel mixed methods study

Background: Women who did not meet their breastfeeding goals often experience breastfeeding grief (BG) and may be likely to have postpartum depression (PD). Furthermore, PD is nearly twice as common in African American (AA) women as in Non-Hispanic White women. No research exists on BG and its role in PD. This study examined the BG experiences of AA women and its possible contributions to PD symptoms. Methods: A convergent parallel mixed methods design was used. A purposive sample of 16 AA women with children aged 6 months to 2 years with BG participated in individual semi-structured interviews about their experiences of BG and completed an online survey including the Edinburgh Postnatal Depression Scale (EPDS). Qualitative and quantitative data were analyzed using reflexive thematic analysis and descriptive statistics, respectively. Both data were integrated using joint display of data and side-by-side comparison. Results: The mean age of participants was 29.5 years. Four meaning-based themes about BG were generated including: We looked forward to breastfeeding, But it did not go as expected, So we grieve, and These would have helped. From quantitative results, 87.5% of participants reported a history of PD symptoms and almost 44% had EPDS scores >11. All participants reported that experiencing BG contributed to their PD symptoms. Findings suggest that BG influenced PD symptoms in AA women without prior diagnosis of depression. Conclusions: Qualitative and quantitative findings from this novel exploratory study revealed an overlap that AA women with BG report PD symptoms. Clinicians should support women to achieve their breastfeeding goals to prevent BG and PD. Keywords: African American; Breastfeeding grief; Mental health; Mixed methods; Postpartum depression

10.
medRxiv (Medicine) 2026-06-16

Reporting patterns of adverse drug withdrawal events using individual case safety reports in United States and European databases

Introduction: Adverse drug withdrawal events (ADWEs) are a key safety concern with deprescribing but are infrequently reported in trials. Although pharmacovigilance systems have advanced our understanding of medication-related harms, it is unclear how extensively these systems have been used for ADWEs. Objectives: To examine the reporting patterns of ADWEs for all drugs recorded in United States and European pharmacovigilance databases between 2004 and 2023. Methods: A retrospective study was conducted using two pharmacovigilance databases, the publicly available FDA-FAERS dataset and EMA-EV Level 2A (individual-level) dataset. ADWE cases were identified using relevant MedDRA preferred terms. Data on patient characteristics, reporter type, drugs, indication, ADWE outcomes, dechallenge/rechallenge, seriousness criteria, time to onset, duration, and causality were summarised. Results: A total of 158,505 ADWE reports were analysed (FDA-FAERS: 145,514; EMA-EV: 12,987), with mean ages of 46.1 (FDA; 55.3% female) and 45.5 years (EMA; 57.1% female). The frequently reported drug classes were opioids (FDA: oxycodone, 29.8%; EMA: buprenorphine, 19%), antidepressants (FDA: duloxetine, 32%; EMA: venlafaxine, 25.9%) and gabapentinoids (FDA: pregabalin, 6.7%; EMA: pregabalin, 6.0%). The most common adverse outcomes were other serious medical conditions (FDA=63.9%; EMA=46.0%), hospitalisation (FDA=15.9%; EMA=28.3%), and disability (FDA=13.3%; EMA=6.2%) and these outcomes varied significantly based on sex and age group (p

11.
arXiv (CS.CL) 2026-06-11

FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse

Large Language Model (LLM)-based multi-agent systems are increasingly powerful, but current agentic workflow optimization paradigms make an unsatisfying trade-off. Task-level methods spend substantial offline compute yet deploy only a single workflow, leaving complementary candidates unused, while query-level methods synthesize a new workflow per query at substantial inference cost. Our motivating analysis shows these paradigms are more complementary than competing: workflows discovered during offline search often solve different subsets of queries, and many queries handled by expensive query-level generation can already be solved by cheaper precomputed workflows. This suggests a different objective: rather than searching for one universally best workflow or regenerating one per instance, we should build a compact bank of reusable, complementary workflows and select among them adaptively at inference time. Doing so requires solving three coupled problems: generating complementary rather than redundant candidates, compressing them into a small deployable portfolio, and assigning each query to the right workflow under a performance-cost trade-off. To this end, we present FlowBank, a three-stage framework for portfolio-based agentic workflow optimization. Diversifying proposes DiverseFlow to steer search toward under-covered queries and produce a high-coverage candidate pool. Curating proposes CuraFlow to compress this pool into a compact portfolio with minimal redundancy. Matching casts deployment as edge-value prediction on a query-workflow bipartite graph and routes each incoming query to the portfolio member with the best predicted utility. Across five benchmarks, FlowBank achieves the highest average score among the evaluated methods while remaining cost-competitive, improving over the strongest automated and handcrafted baselines by 4.26% and 14.92% relative, respectively.

12.
bioRxiv (Bioinfo) 2026-06-16

Evidence for recombination in dengue virus genomes

Recombination is a key driver of RNA virus evolution, yet its extent and evolutionary implications in dengue virus (DENV) remain incompletely understood. We conducted a comprehensive, genome-wide recombination screen across 6,905 complete DENV genomes representing all four serotypes, 82 countries, and eight decades of sampling (1944-2023) retrieved from the Bacterial and Viral Bioinformatics Resource Center. Using seven complementary recombination detection methods implemented in RDP5, we identified 66 recombination events across 53 unique recombinant sequences, of which 29 are newly described. Events included intra-genotypic (n = 18), inter-genotypic (n = 32), and inter-serotypic (n = 16) exchanges spanning 14 genotypes and four continents, with no meaningful serotype-level enrichment (Cramer's V = 0.054). Recombination was concentrated in non-structural genes, most frequently NS3 (19 events), NS5 (17), and NS2 (12), while the capsid gene contained no recombination events, consistent with strong functional constraint. Single-nucleotide polymorphism analyses confirmed low divergence between recombinants and their inferred parents in both recombinant and non-recombinant regions. Phylogenomic analysis of 6,642 sequences revealed that recombinants cluster significantly closer to their major parents (p = 8.9 x 10-6 ) and that their removal does not significantly alter tree topology (p = 0.898), suggesting that the short length of recombinant regions limits phylogenetic conflict. We also introduce RECOSIM, an unsupervised machine-learning tool for recombination detection that achieved higher precision than RDP5 on both simulated (93.4% vs. 80.0%) and empirical (98.1% vs. 39.3%) datasets. Collectively, these results establish recombination as a widespread, pan-serotypic phenomenon in DENV with implications for genomic surveillance, vaccine evaluation, and evolutionary inference.

13.
arXiv (CS.AI) 2026-06-18

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

arXiv:2605.22142v2 Announce Type: replace-cross Abstract: Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

14.
arXiv (quant-ph) 2026-06-11

Recirculating Quantum Photonic Networks for Fast Deterministic Quantum Information Processing

arXiv:2602.11033v2 Announce Type: replace Abstract: A fundamental challenge in photonics-based deterministic quantum information processing is to realize key transformations on time scales shorter than those of detrimental decoherence and loss mechanisms. This challenge has been addressed through device-focused approaches that aim to increase nonlinear interactions relative to decoherence rates. In this work, we adopt a complementary architecture-focused approach by proposing a recirculating quantum photonic network (RQPN) that minimizes the duration of quantum information processing tasks, thereby reducing the requirements on nonlinear interaction rates. The RQPN consists of a network of all-to-all connected nonlinear cavities with dynamically controlled waveguide couplings, and it processes information by capturing a photonic input state, recirculating photons between the cavities, and releasing a photonic output state. We demonstrate the RQPN's architectural advantage through two examples: first, we show that processing all qubits simultaneously yields faster operations than single- and two-qubit decompositions of the three-qubit Toffoli gate. Second, we demonstrate implementations of a measurement-free correction for single-photon loss, achieving up to seven-fold speedups and significantly improved hardware efficiency relative to state-of-the-art architecture proposals. Our work shows that a single hardware-efficient recirculating architecture substantially reduces the temporal overhead of multi-qubit gates and quantum error correction, thereby lowering the barrier to experimental realizations of deterministic photonic quantum information processing.

15.
arXiv (quant-ph) 2026-06-16

Optical Creation of Synthetic Microgravity for Quantum Degenerate Gases

arXiv:2606.14985v1 Announce Type: cross Abstract: Microgravity environments provide unique opportunities for ultracold-atom experiments by enabling long interrogation times and reduced acceleration-induced dynamics. However, their realization has largely been restricted to specialized facilities such as drop towers, sounding rockets, and space-based laboratories. Here we realize synthetic microgravity for quantum degenerate gases using optically engineered force landscapes that compensate Earth's gravity to the milli-g level while maintaining continuous confinement of the atomic ensemble. These force landscapes are generated by dynamically painted optical dipole potentials and calibrated in situ through Bloch oscillations in a vertical optical lattice, enabling precise control of the residual acceleration. We use this capability to demonstrate matter-wave beam splitting with arm separations of several hundred microns. We further implement a Bloch-band atom interferometer in which interaction-induced dephasing is strongly suppressed through controlled three-dimensional expansion in the synthetic microgravity potential. This reduction of mean-field effects restores near-$\sqrt{N}$ scaling of interferometric sensitivity for large quantum degenerate ensembles. Our results establish a versatile platform for realizing synthetic microgravity with trapped quantum gases in terrestrial laboratories, bringing the advantages of microgravity experiments to continuously operating systems and opening new opportunities for quantum sensing, matter-wave interferometry, and precision measurements.

16.
arXiv (CS.CV) 2026-06-15

HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation

We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.

17.
Nature Medicine 2026-06-17

General-purpose chatbots outperform clinical AI tools on physicians’ real-world questions

Authors: Unknown Author

Specialized clinical AI tools are entering medical practice with little independent testing. In a head-to-head evaluation across two public benchmarks and real questions from physicians, three general-purpose frontier large language models outperformed two leading clinical AI tools, which performed no better than Google search AI overview.

18.
arXiv (math.PR) 2026-06-11

Improved Amenability Bounds for Local Coordination Games

arXiv:2606.01963v2 Announce Type: replace-cross Abstract: We study local pure coordination games on finite social networks, continuing the framework of Hutchcroft, Rospuskova, and Tamuz. They showed that low inefficiency in local coordination forces the underlying graph to be amenable, with a square-root loss in the amenability parameter. We improve this loss in the binary unbiased setting. Using Shapley values of a mutual-information game associated with the players' local outputs, we prove that if the average disagreement is at most $\varepsilon$, then the graph is $(O(\varepsilon\log(1/\varepsilon)),r)$-amenable. This gives a sharper quantitative converse between local coordination and graph amenability.

19.
arXiv (CS.CV) 2026-06-16

Power Battery Detection

Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense endpoints of cathode and anode plates from industrial X-ray images for quality inspection. Manual inspection is inefficient and error-prone, while traditional vision algorithms struggle with densely packed plates, low contrast, scale variation, and imaging artifacts. To address this issue and drive more attention into this meaningful task, we present PBD5K, the first large-scale benchmark for this task, consisting of 5,000 X-ray images from nine battery types with fine-grained annotations and eight types of real-world visual interference. To support scalable and consistent labeling, we develop an intelligent annotation pipeline that combines image filtering, model-assisted pre-labeling, cross-verification, and layered quality evaluation. We formulate PBD as a point-level segmentation problem and propose MDCNeXt, a model designed to extract and integrate multi-dimensional structure clues including point, line, and count information from the plate itself. To improve discrimination between plates and suppress visual interference, MDCNeXt incorporates two state space modules. The first is a prompt-filtered module that learns contrastive relationships guided by task-specific prompts. The second is a density-aware reordering module that refines segmentation in regions with high plate density. In addition, we propose a distance-adaptive mask generation strategy to provide robust supervision under varying spatial distributions of anode and cathode positions. The source code and datasets will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/X-ray-PBD}{PBD5K}.

20.
arXiv (CS.LG) 2026-06-19

Influence-Guided Concolic Testing of Transformer Robustness

arXiv:2509.23806v2 Announce Type: replace-cross Abstract: Concolic testing for neural networks alternates concrete execution with constraint solving to search for inputs that flip model decisions. We present a concolic tester for Transformer classifiers that uses SHAP estimates to rank pending path predicates by their impact on the current prediction. To support self-attention with multiple heads in execution backed by SMT solving, we implement attention semantics in pure Python that are compatible with the solver and make the softmax boundary explicit by concretizing exponentiation arguments. We evaluate our method on CIFAR-10 across three compact Transformer classifiers, ResNet18, and VGG16 under a one-pixel budget and a 900s horizon. Across the 500 model–input pairs in this matched comparison, our method achieves 60% success, compared with 15% for a differential evolution baseline that treats the model as a black box. In the primary two-layer Transformer branch-ordering study, SHAP-based predicate prioritization raises success from 56% to 60% and reduces median attack time by 51%. These results show that influence-guided path exploration can make concolic testing a practical way to find adversarial examples in Transformer models.

21.
arXiv (quant-ph) 2026-06-11

Scaling-optimal purification of noisy qubit unitary channels

arXiv:2606.12394v1 Announce Type: new Abstract: We consider the problem of purifying noisy qubit unitary channels. Given the ability to apply an unknown qubit unitary channel followed by depolarizing noise, we aim to construct a superchannel that purifies the noisy unitary back to the original unknown unitary. We first provide numerical evidence that sequential strategies can strictly outperform parallel strategies when the number of channel uses is finite, highlighting the fundamental distinction from state purification. We then provide a concrete $\mathrm{U}(2)$-covariant parallel protocol based on a novel entanglement-assisted quantum error-correcting code that suppresses the first-order noise strength as $O(1/n)$ with $n$ channel uses and show this scaling is asymptotically optimal in the low-noise regime, even when sequential strategies are allowed.

22.
arXiv (CS.CV) 2026-06-18

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language models (MLLMs) can create a shortcut: the privileged target may guide tokens mainly based on the text reference target rather than the image. We propose ViGOS, a visually grounded OPSD framework for MLLM post-training. The student first writes a visual description and then reasons toward the final answer. For valid rollouts, an image-only perception teacher supervises the description, while a privileged reasoning teacher supervises the reasoning and final answer on the same student prefix. A reference teacher is used only for invalid rollouts to recover the output format. Across general vision-language, expert reasoning, visual math, spatial grounding, and visual-language-prior benchmarks, ViGOS keeps the main benefits of OPSD and improves image-grounded behavior in shortcut-prone settings.

23.
arXiv (CS.CV) 2026-06-15

Towards Physically Realizable Adversarial Attenuation Patch against SAR Object Detection

Deep neural networks have demonstrated excellent performance in SAR target detection tasks but remain susceptible to adversarial attacks. Existing SAR-specific attack methods can effectively deceive detectors; however, they often introduce noticeable perturbations and are largely confined to digital domain, neglecting physical implementation constrains for attacking SAR systems. In this paper, a novel Adversarial Attenuation Patch (AAP) method is proposed that employs energy-constrained optimization strategy coupled with an attenuation-based deployment framework to achieve a seamless balance between attack effectiveness and stealthiness. More importantly, AAP exhibits strong potential for physical realization by aligning with signal-level electronic jamming mechanisms. Experimental results show that AAP effectively degrades detection performance while preserving high imperceptibility, and shows favorable transferability across different models. This study provides a physical grounded perspective for adversarial attacks on SAR target detection systems and facilitates the design of more covert and practically deployable attack strategies. The source code is made available at https://github.com/boremycin/SAAP.

24.
arXiv (CS.CV) 2026-06-16

Geometric Action Model for Robot Policy Learning

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

25.
arXiv (CS.CV) 2026-06-15

Vanishing Depth: Training Generalized Depth Adapters with Sinusoidal Depth Preprocessing for Pretrained RGB Encoders

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose a self-supervised training approach that extends pretrained RGB encoders with a depth adapter to incorporate and align metric depth into a combined latent space without interfering with the pretrained RGB feature extraction. In combination with our sinusoidal depth encoding, the depth adapter enables generalized and robust depth density and distribution invariant feature extraction. Our depth adapters improve a wide set of generalized RGB baselines across a spectrum of relevant RGBD downstream tasks in segmentation, pose estimation, and depth completion – without the necessity of finetuning. Most importantly, we achieve 56.05 mIoU in the SUN-RGBD segmentation, while outperforming SOTA depth-aware and multi-modal encoders in our experiments. When no depth is present, one can activate our depth adapter with an empty map, use single pixel depth clues, or monocular depth estimation to include the depth aware feature extraction into subsequent downstream tasks.