Paper Plaza - AcademicHub

01.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.11320

Semantic Segmentation of Node and Edge Diagrams for Assistive Technology

Authors:

Michael Cormier ↗Yichun Zhao ↗Laura Paul ↗Cameron Swift ↗Duc Tri Dang ↗Miguel Nacenta ↗

In this paper, we present a novel set of related models for semantic segmentation of node-link diagrams. These diagrams are frequently used to represent mathematical graphs, relationships between concepts, and flowcharts. Such diagrams are difficult to access non-visually; while some assistive interfaces have been designed for node-link diagrams, they rely upon a machine-readable representation of the diagram, whereas such diagrams will generally be made available as bitmap images. Our compact deep learning models show excellent quantitative and qualitative performance on a large synthetic dataset of node-link diagrams, reaching per-pixel accuracy over 93\%.

Read & Discuss → View Source →

02.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.14306

Thinking Outside the [Chat]Box: Bridging Computer Science and Industrial Design for Cognitive-Inclusive Generative AI

Authors:

Virginia Francisco ↗Daniel Guasch ↗Raquel Herv\'as ↗

arXiv:2606.14306v1 Announce Type: cross Abstract: Current Generative AI (GenAI) interfaces remain largely constrained to chatbox interaction, which can impose high cognitive demands on users and create substantial barriers for people with intellectual disabilities (ID), including prompt formulation difficulties, response overload, and limited mechanisms to assess information reliability. To explore alternative interaction models for cognitive accessibility, we conducted a cross-disciplinary co-design challenge in which two student cohorts (Computer Science and Industrial Design) developed interface concepts from the same set of functional requirements (e.g., prompt scaffolding, structured output, GUI-based refinement, transparency, and personalization). Comparing the resulting proposals reveals both convergence on foundational requirements (notably initial calibration, proactive prompting, and direct manipulation of response fragments) and complementary contributions that outline a multi-layered support system. Computer Science teams primarily produced structural scaffolding, emphasizing predictability, navigability, and trust through mechanisms such as reliability indicators, explicit sources, and context management for long conversations. Industrial Design teams emphasized experiential scaffolding, focusing on pacing, attention guidance, multimodality, and proactive agency, including step-by-step response flows, focus modes, and assistant-like integrations. We synthesize these findings into a dual-layer scaffolding framework that expands the design space for cognitively accessible GenAI interaction beyond chat-centric models and motivates future work on expert refinement, technical feasibility, and empirical validation with users with ID.

Read & Discuss → View Source →

03.

medRxiv (Medicine) 2026-06-23 DOI: HASH:f6edf63a97193e465783b694bee94187

Attention and memory in Parkinson's disease: a discriminant analysis approach

Authors:

Calabria ↗Guallar ↗Garcia-Sanchez ↗Pascual Sedano ↗Kulisevsky ↗

Background. Cognitive impairment in Parkinson's disease (PD) is highly prevalent and heterogeneous. Assessing multiple cognitive domains is challenging and risks redundancy. This study evaluated whether a discriminant analysis approach could optimize the selection of specific tasks and measures for identifying attention and memory deficits in PD. Methods. Thirty PD patients and 25 cognitively unimpaired (CU) controls completed four experimental tasks: two assessing attention (flanker and spatial Stroop), one for recognition memory, one for working memory (n-back). Following group-level difference analyses, a discriminant analysis was performed to identify which tasks, and performance metrics possessed the highest sensitivity for distinguishing PD patients from CU individuals. Results. At the group level, PD patients exhibited significantly worse conflict costs in both attention tasks and lower sensitivity scores (d') in the recognition memory task compared to CU controls. The discriminant analysis revealed that time-based measures from the spatial Stroop task and the sensitivity score from the recognition memory task provided the highest discriminating power to differentiate between the two groups. Conclusion. These findings suggest that cognitive deficits in PD can be identified with high diagnostic accuracy using a targeted subset of metrics, eliminating the need for extensive and redundant neuropsychological testing batteries for attention and memory, without needing an extensive number of cognitive tasks for attention and memory.

Read & Discuss → View Source →

04.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2606.20123

QPU-scale randomized benchmarking via Bell-pair injection

Authors:

Haripriya Pettugani ↗Mar\'ia Aguado-Y\'a\~nez ↗Astryd Park ↗Daniel Bultrini ↗James R. Wootton ↗

arXiv:2606.20123v1 Announce Type: new Abstract: Mirror randomized benchmarking (MRB) is an established technique that provides a global error metric at the scale of a whole QPU. To expand upon this we introduce Mirror Quantum Awesomeness (MQA), a hybrid protocol that adds a structured entangling layer to MRB circuits. This enables per-edge correlation dynamics to be tracked via mutual information while preserving the MRB infidelity estimate. The resulting analysis of the injected entangled pairs locates a critical circuit depth, beyond which rudimentary error mitigation techniques can be expected to fail. A topological variant, Topological MQA, supplies a second critical depth via a decoder based on the surface-code decoding problem. Both are validated in simulation and demonstrated on the 156-qubit \texttt{ibm\_fez} and \texttt{ibm\_kingston} processors, where MQA closely agrees with MRB on the entanglement infidelity and the critical depth for \texttt{ibm\_fez} is found to be $\sim 50$.

Read & Discuss → View Source →

05.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15986

Learning the generating functional for variance reduction in lattice QCD

Authors:

Ryan Abbott ↗Yang Fu ↗Daniel C. Hackett ↗Gurtej Kanwar ↗Fernando Romero-L\'opez ↗Phiala E. Shanahan ↗

arXiv:2606.15986v1 Announce Type: cross Abstract: The generating functional in quantum field theory provides the natural framework for constructing correlation functions as derivatives with respect to source operators. We present a methodology that leverages machine-learned normalizing flows to reduce the variance of arbitrary $N$-point correlation functions of bosonic operators in lattice gauge field theory calculations by encoding a representation of the generating functional. We show that it is possible to systematically approach noiseless estimators of correlation functions in this framework. We demonstrate this methodology with applications to calculations of glueball correlation functions and Wilson loops in Quantum Chromodynamics and Yang-Mills theory. The results show up to three orders of magnitude variance reduction.

Read & Discuss → View Source →

06.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2606.13444

Clustering Node Attributed Networks with Graph Neural Networks and Self Learning

Authors:

Rodrigo de Sapienza Luna ↗Daniel Ratton Figueiredo ↗

arXiv:2606.13444v1 Announce Type: new Abstract: Graph clustering - partitioning the node set of a graph into disjoint subsets that reflect some latent information - is a fundamental problem as it finds applications in a myriad of different scenarios. While this classic problem has been tackled for decades by different communities, a recent variation of the problem driven by real data considers the scenario where nodes have attributes that are also informative. This has triggered novel methods that simultaneously leverage network information (edges) and node information (attributed) in the design of novel clustering algorithms. This work proposes a novel framework that builds on prior works that have applied graph neural networks (GNN) to graph clustering. The proposed framework operates in rounds of self learning in a fully unsupervised setting. In each round, a GNN generates representations for nodes that are used to cluster the nodes. This clustering influences the graph used to generate the node representation in the next round. Moreover, a context graph built in each round using the original graph is used to generate the node representations. Empirical results show that the proposed methodology extracts information from both network edges and node attributes in synthetic data, outperforming algorithms focused solely on the network or attributes when neither are very informative. Multiple rounds of learning also improve the performance and always outperforms a long single round of training (i.e., classic GNN graph clustering). When considering real datasets, empirical results indicate that the proposed methodology is competitive to state-of-the-art methods when cluster sizes are balanced.

Read & Discuss → View Source →

07.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2512.14648

Adaptable Segmentation Pipeline for Diverse Brain Tumors with Radiomic-Guided Subtyping and Lesion-Wise Model Ensemble

Authors:

Daniel Capell\'an-Mart\'in ↗Abhijeet Parida ↗Zhifan Jiang ↗Nishad Kulkarni ↗Krithika Iyer ↗Austin Tapp ↗Syed Muhammad Anwar ↗Mar\'ia J. Ledesma-Carbayo ↗Marius George Linguraru ↗

Robust and generalizable segmentation of brain tumors on multi-parametric magnetic resonance imaging (MRI) remains difficult because tumor types differ widely. The BraTS 2025 Lighthouse Challenge benchmarks segmentation methods on diverse high-quality datasets of adult and pediatric tumors: multi-consortium international pediatric brain tumor segmentation (PED), preoperative meningioma tumor segmentation (MEN), meningioma radiotherapy segmentation (MEN-RT), and segmentation of pre- and post-treatment brain metastases (MET). We present a flexible, modular, and adaptable pipeline that improves segmentation performance by selecting and combining state-of-the-art models and applying tumor- and lesion-specific processing before and after training. Radiomic features extracted from MRI help detect tumor subtype, ensuring a more balanced training. Custom lesion-level performance metrics determine the influence of each model in the ensemble and optimize post-processing that further refines the predictions, enabling the workflow to tailor every step to each case. On the BraTS testing sets, our pipeline achieved performance comparable to top-ranked algorithms across multiple challenges. These findings confirm that custom lesion-aware processing and model selection yield robust segmentations yet without locking the method to a specific network architecture. Our method has the potential for quantitative tumor measurement in clinical practice, supporting diagnosis and prognosis.

Read & Discuss → View Source →

08.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2510.22397

NetBurst: Event-Centric Forecasting of Bursty, Intermittent Time Series

Authors:

Satyandra Guthula ↗Jaber Daneshamooz ↗Charles Fleming ↗Kesheng Wu ↗Walter Willinger ↗Arpit Gupta ↗

arXiv:2510.22397v2 Announce Type: replace-cross Abstract: Network operators monitor their infrastructure by collecting telemetry data such as packet counts, byte rates, or flow volumes, yet answering the questions that effective operations demand – forecasting future load, diagnosing and characterizing anomalies, and searching for and retrieving historical precedents – requires more than raw measurements. Bridging this gap calls for learned representations: compact per-entity summaries that capture temporal dynamics from each entity's univariate time series. Time-series foundation models are the natural starting point, but they are designed for dense, periodic benchmark datasets – the mild statistical regime. However, network telemetry data inhabits the wild regime: operationally relevant events are rare, separated by variable-length stretches of low or no activity (``ebbs''), with intermittent bursts of heavy-tailed extremes (``tides''). We present NetBurst, an event-centric pipeline that collapses ebbs, separates each time series into a stream of burst timings and a stream of burst magnitudes, and learns a single representation serving all three operational tasks. Compared to the strongest competitors among eight baselines – including Amazon's Chronos-2 and Datadog's Toto – and across nine production telemetry configurations, NetBurst reduces median forecasting error by $1.3$–$116\times$ on wild-regime data with a $1.0$–$7.5\times$ better match to the true burst distribution, and matches baselines on mild-regime benchmarks. For characterizing anomalies, NetBurst produces balanced, well-spread clusters that are $16\times$ more describable in operator-familiar terms under a novel interpretability score, and cluster-filtered search delivers $7.5\times$ faster end-to-end retrieval.

Read & Discuss → View Source →

09.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2606.11843

Quantum iterative approach to the Traveling Salesman Problem

Authors:

Arturo Rodr\'iguez-Almaz\'an ↗Guillermo Rivas ↗Ricardo S. Alonso ↗Daniela Falc\'o ↗Mir Amir Hosseini ↗

arXiv:2606.11843v1 Announce Type: new Abstract: The Traveling Salesman Problem (TSP) is a classical NP-hard problem in combinatorial optimization, where determining the shortest route among a set of cities becomes computationally prohibitive as the problem size increases. This work explores quantum computing as an alternative approach to address this complexity. Unlike existing methods that primarily rely on quantum annealing, we propose a quantum iterative framework integrating Quantum Phase Estimation (QPE) and Grover's search algorithm. Route costs are encoded as quantum phases, enabling QPE to efficiently evaluate them, while Amplitude Amplification, implemented via the Grover-Long algorithm, iteratively refines the solution space toward the optimal route. A proof-of-concept case study on a small-scale TSP instance demonstrates the feasibility of this approach and its potential for scaling to larger optimization problems. Furthermore, under an expectation-based analysis, the algorithm exhibits an expected computational complexity of $O(\frac{m^2\log_2(m)\log_2(1/\epsilon)}{\sqrt{\epsilon}})$ which depends on the error tolerance parameter $\epsilon$. This estimation omits the initialization term, which we expect future refinements to render subdominant to Phase Estimation.

Read & Discuss → View Source →

10.

arXiv (CS.LG) 2026-06-25 DOI: arXiv:2505.15437

Adaptive Cumulative Mass Calibration with Conformal Prediction

Authors:

Daniil Kazantsev ↗Eric Moulines ↗Maxim Panov ↗Nikita Kotelevskii ↗Mohsen Guizani ↗

arXiv:2505.15437v3 Announce Type: replace-cross Abstract: Reliable probability estimates by classifiers are essential in high-risk applications. In practice, however, predicted probabilities are often miscalibrated, and many existing post-hoc calibration methods typically lack guarantees that a specific notion of calibration is achieved after the correction procedure is applied. We introduce a set-based perspective on calibration through the notion of cumulative mass calibration and the corresponding error measures. We propose a new calibration procedure based on conformal prediction that forms cumulative probabilities with guaranteed marginal coverage. We introduce an adaptive temperature scaling algorithm, with the temperature tuned for each input to satisfy the conformal coverage constraint. As we show, this procedure can be efficiently implemented. Across image classification tasks, particularly in settings with many classes, our method improves newly introduced calibration error measures (CMCE and $\alpha$-CMCE) and standard metrics (such as ECE, cw-ECE, MCE) over the existing baselines.

Read & Discuss → View Source →

11.

arXiv (CS.AI) 2026-06-24 DOI: arXiv:2510.04033

A global log for medical AI

Authors:

arXiv:2510.04033v2 Announce Type: replace Abstract: Modern computer systems rely on syslog, a universal protocol that records critical events across heterogeneous infrastructure. Medicine's rapidly growing AI stack has no equivalent. As medicine deploys AI tools at scale, there is no standard way to record how, when, by whom, and for whom these models are used. Without such records, it is difficult to measure real-world performance and outcomes, detect adverse events, or identify bias and dataset drift. Here we introduce MedLog, a protocol for event-level logging of medical AI. Each time an AI model interacts with a human, another algorithm, or an automated workflow, MedLog creates a record. Each record contains nine core fields: header, model, user, target, inputs, artifacts, outputs, outcomes, and feedback. We apply MedLog across four deployments in the US, Switzerland, and Vietnam: ICU deterioration prediction, tetanus progression monitoring from wearable signals, automated sepsis quality reporting, and patient attendance prediction. MedLog records capture model behavior, workflow interactions, and downstream outcomes, including AI performance degradation during severe weather events in patient attendance prediction and increased laboratory testing after ICU deterioration alerts. MedLog limits the data footprint through risk-based sampling, lifecycle-aware retention policies, and write-behind caching, enabling deployment in low-resource settings. It also supports detailed traces for complex, agentic, or multi-stage workflows, creating a foundation for continuous monitoring, auditing, and improvement of medical AI.

Read & Discuss → View Source →

12.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.13989

Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech

Authors:

Alef Iury Siqueira Ferreira ↗Lucas Rafael Stefanel Gris ↗Luiz Fernando de Ara\'ujo Vidal ↗Frederico Santos de Oliveira ↗Christopher Dane Shulby ↗Anderson da Silva Soares ↗Arlindo Rodrigues Galv\~ao Filho ↗

arXiv:2606.13989v1 Announce Type: cross Abstract: Recent alignment-free non-autoregressive (NAR) text-to-speech (TTS) models formulate synthesis as a conditional infilling task, bypassing explicit duration predictors and external aligners. When speech is represented with neural codec tokens, the infilling problem becomes discrete, making Discrete Flow Matching (DFM), a Continuous-Time Markov Chain (CTMC) framework for discrete generation, a natural fit. However, inference-time control for stable low-step conditional infilling remains underexplored. We propose Mask, Sample, Revise, an inference-time CTMC stack for alignment-free DFM-TTS. The stack combines predictor-free guidance to strengthen text conditioning, prompt-matched conditional coupling to align the probability path with the acoustic prompt, and SC-ReMask, a schedule-constrained remasking mechanism that introduces token-to-mask transitions so early de-masking decisions can be revised. These components require no post-hoc fine-tuning and operate in a single tau-leaping sampler. Controlled ablations show that this stack improves intelligibility and robustness in the low-NFE prompted setting, outperforming unguided and guidance-only samplers with substantially more steps.

Read & Discuss → View Source →

13.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17399

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

Authors:

Huu Danh Nguyen ↗

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

Read & Discuss → View Source →

14.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.12360

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

Authors:

Leon Bergen ↗Usha Bhalla ↗Sidharth Baskaran ↗Max Loeffler ↗Raphael Sarfati ↗Dhruvil Gala ↗Ryan Panwar ↗Santiago Aranguri ↗Thomas Fel ↗Atticus Geiger ↗Matthew Kowal ↗Siddharth Boppana ↗…

arXiv:2606.12360v1 Announce Type: new Abstract: Language-model post-training is the main stage at which model behavior is shaped, yet it still largely involves optimization of scalar rewards that summarize diverse desiderata. This abstraction gives practitioners little visibility into what their data actually teaches models, allowing spurious correlations to be learned by a model and inducing undesirable behaviors such as over-stylization and sycophancy. To address this problem, we ask: can we inspect a preference dataset before optimization and decide, at the level of concepts, which behaviors a model should be allowed to learn? Motivated by this, we introduce a data-centric post-training pipeline that uses interpretability protocols to develop statistical hypotheses for the latent concepts separating preferred from dispreferred generations, making them explicit for fine-grained user feedback. Building on this view, we unify several interpretability-based training protocols as ways of shaping rewards via feature or data interventions. Empirically, we show that our pipeline diagnoses undesirable signals in existing preference data, mitigates off-target learning, and can also help amplify or shape desired properties such as safeguards and model personality. More broadly, our results suggest that interpretability can turn post-training from optimizing opaque proxy rewards into a process of auditing and sculpting the learning signal itself.

Read & Discuss → View Source →

15.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2503.20387

Electric Field Distortions in Surface Ion Traps with Integrated Nanophotonics

Authors:

Guochun Du ↗Elena Jordan ↗Tanja E. Mehlst\"aubler ↗

arXiv:2503.20387v3 Announce Type: replace Abstract: The integration of photonic components into surface ion traps provides a scalable approach for trapped-ion quantum computing, sensing, and metrology, enabling compact systems with enhanced stability and precision. However, the introduction of optical apertures in the trap electrodes can distort the trapping electric field. This can lead to excess micromotion (EMM) and ion displacement which degrade the performance of quantum logic operations and optical clocks. In this work, we systematically investigate the electric field distortion in a surface ion trap with integrated waveguides and grating couplers using Finite Element Method (FEM) simulations. We analyze methods to reduce these distortions by exploiting symmetries and transparent conductive oxide materials.

Read & Discuss → View Source →

16.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.12066

Performance Analysis of YOLOv11 and YOLOv8 for Mixed Traffic Object Detection under Adverse Weather Conditions in Developing Countries

Authors:

Quoc Thuan Nguyen ↗Ha Anh Vu ↗Ngo Dang Thanh Ngan ↗Minh Phuc Hoang Ngoc ↗

In modern vehicular systems, robust performance under harsh conditions has become a critical problem of autonomous driving. Our study delivers a comprehensive evaluation of the newest iteration of the YOLO series, which is YOLOv11 Nano architecture benchmarked against the widely adopted YOLOv8 Nano as a baseline on a custom fused dataset that combines the Indian Driving Dataset (IDD) [1] and Berkeley Deep Drive Dataset (BDD100K) [2]. We have analyzed the trade-offs among detection accuracy, inference speed, and computational efficiency in high-entropy scenarios involving dense mixed traffic, rain, and low-light conditions. Specifically, YOLOv11n achieves a mean Average Precision (mAP@50) of 46.6%, with a notable 3.2% improvement in Precision over the baseline, effectively reducing false positives in cluttered scenes. Furthermore, the proposed model exhibits enhanced energy efficiency, requiring 22% fewer FLOPs (6.3G vs. 8.1G) while maintaining real-time inference speed of 70.9 FPS on a Tesla T4 GPU, offering an optimal trade-off for safety-critical edge deployment.

Read & Discuss → View Source →

17.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.14748

Is My Vision-Language Data in Your AI? Membership Inference Test (MINT) Demo 2

Authors:

Daniel DeAlcala ↗Gonzalo Mancera ↗Julian Fierrez ↗Aythami Morales ↗Ruben Tolosana ↗Ruben Vera-Rodriguez ↗

We present the Membership Inference Test (MINT) Demo 2, a framework designed to improve transparency in machine learning training processes. MINT is a technique for experimentally determining whether specific data were used during machine learning model training. We establish the theoretical framework and propose multiple architectures for MINT depending on the amount of information known about the models that are being audited. Experimental results using a popular face recognition model, 4 state-of-the-art LLMs, and multiple, diverse, and large-scale public image and text databases achieve promising accuracy levels in the detection of training data of up to 90%. Building on these results, we introduce a comprehensive web platform1 that expands these capabilities to image and text modalities. The platform integrates a diverse technological stack, including MINT, aMINT, and gMINT, allowing users to audit a wide range of models. This demonstrator aims to promote AI transparency and provides a practical tool to foster compliance with emerging AI regulations.

Read & Discuss → View Source →

18.

arXiv (CS.CV) 2026-06-24 DOI: arXiv:2606.24057

EPEdit: Redefining Image Editing with Generative AI and User-Centric Design

Authors:

Hoang-Phuc Nguyen ↗Dinh-Khoi Vo ↗Trong-Le Do ↗Hai-Dang Nguyen ↗Tan-Cong Nguyen ↗Vinh-Tiep Nguyen ↗Tam V. Nguyen ↗Khanh-Duy Le ↗Minh-Triet Tran ↗Trung-Nghia Le ↗

The demand for image manipulation has seen a significant increase recently. Traditional tools like Photoshop and Capture One, while powerful, require considerable expertise to use effectively. Generative AI has introduced alternative platforms, such as Luminar Neo, Pixlr X, and Canva. However, many of these solutions, including resource-heavy models like Stable Diffusion, often require substantial retraining and fine-tuning, leading to high costs for users. To address these challenges, we introduce Efficient Photo Editor (EPEdit), an application that integrates a robust backend framework with a user-friendly front-end interface. EPEdit supports a wide range of creative image editing tasks, including image generation, object replacement, object removal, background modification, changes in object pose or perspective, region-specific editing, and thematic collection design, all guided by masks and prompts. Users can interact with the system through simple text commands or by marking areas for precise adjustments, making it accessible even to those without technical expertise. At its core, EPEdit leverages zero-shot image editing algorithms based on Stable Diffusion model, removing the need for additional fine-tuning. This approach enables efficient image manipulation and thematic collection creation. User evaluations for tasks of image editing, thematic design, and overall system performance demonstrate that EPEdit outperforms existing solutions, offering a user-friendly, cost-effective solution for comprehensive image editing.

Read & Discuss → View Source →

19.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15417

From Frames to Temporal Graphs: In-Context Egocentric Action Recognition with Vision-Language Models

Authors:

Bessie Dominguez-Dager ↗Francisco Gomez-Donoso ↗Miguel Cazorla ↗Marc Pollefeys ↗Daniel Barath ↗Zuria Bauer ↗

Action reasoning in egocentric video requires capturing fine-grained transitions of hand-object interactions, a task where general-purpose Vision-Language Models (VLMs) often struggle when operating directly on raw pixels. We propose to decouple visual perception from symbolic reasoning by converting videos into Temporal Action Graphs. In a multi-stage prompting pipeline, we first generate dense natural language narratives over short temporal windows as a semantic bottleneck, then formalize them into structured, open-vocabulary graph representations. On the EGTEA and Epic-Kitchens-100 datasets, the symbolic representation unlocks efficient in-context learning: few-shot graph demonstrations yield substantial accuracy gains over zero-shot frame and graph-based inference alike. Even in the zero-shot setting, graph-based reasoning remains competitive with pixel-based inference despite potential pretraining contamination favoring the latter. Across 11 open-weight VLMs from 6 model families ranging from 2B to 235B parameters, our findings indicate that current VLMs are more effective as symbolic reasoners than as direct visual observers. By projecting video into the language domain, we provide a scalable, fine-tuning-free alternative to end-to-end approaches that better leverages these models' latent reasoning strengths. The code will be made public.

Read & Discuss → View Source →

20.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2512.04801

Hybrid VQE-CVQE algorithm using diabatic state preparation

Authors:

John P. T. Stenger ↗C. Stephen Hellberg ↗Daniel Gunlycke ↗

arXiv:2512.04801v2 Announce Type: replace Abstract: We propose a hybrid variational quantum algorithm that has variational parameters used by both the quantum circuit and the subsequent classical optimization. Similar to the Variational Quantum Eigensolver (VQE), this algorithm applies a parameterized unitary operator to the qubit register. We generate this operator using diabatic state preparation. The quantum measurement results then inform the classical optimization procedure used by the Cascaded Variational Quantum Eigensolver (CVQE). We demonstrate the algorithm on a system of interacting electrons and show how it can be used on long-term error-corrected as well as short-term intermediate-scale quantum computers. Our simulations performed on IBM Brisbane produced energies well within chemical accuracy.

Read & Discuss → View Source →

21.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2606.17406

Graph Neural Networks for Semi-Supervised Image Classification with Multi-Feature Aggregation

Authors:

Marina Chagas Bulach Gapski ↗Vinicius Atsushi Sato Kawai ↗Gustavo Rosseto Leticio ↗Lucas Pascotti Valem ↗Daniel Carlos Guimar\~aes Pedronette ↗Mohand Said Allili ↗

Feature extraction involves the identification and extraction of salient characteristics or patterns, including edges, textures, shapes, and color attributes. Contemporary feature extractors predominantly leverage deep learning architectures, such as Convolutional Neural Networks (CNNs) and Vision Transformers (VITs). The availability of diverse feature extractors in the literature provides a wide range of feature representations. Features extracted from an image depend on the specific application, the chosen extractor, and its configuration. Therefore, integrating complementary information by combining distinct extractors offers a promising way to enhance performance. Graph Neural Networks (GNNs), particularly Graph Convolutional Networks (GCNs), have emerged as powerful and widely adopted approaches for semi-supervised image classification, as they effectively leverage both labeled and unlabeled data while exploiting the underlying graph structures that capture relationships among samples. This study proposes a novel approach for GNNs in scenarios where labeled data is scarce, by integrating diverse sets of feature and graph representations derived from various extractors in classification scenarios. Experimental investigations were conducted, encompassing combinations of distinct feature and graph extractors, as well as rank aggregation strategies. The primary contributions of this work are underscored by the experimental findings, which demonstrate that the strategic combination of feature and graph representations, coupled with the application of manifold learning for graph processing, leads to significant improvements in classification accuracy across the majority of experimental conditions. Furthermore, the utilization of rank aggregation techniques to integrate features from different extractors was shown to enhance classification accuracy.

Read & Discuss → View Source →

22.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2410.00812

Generative causal testing to bridge data-driven models and scientific theories in language neuroscience

Authors:

Richard Antonello ↗Chandan Singh ↗Shailee Jain ↗Aliyah Hsu ↗Sihang Guo ↗Jianfeng Gao ↗Bin Yu ↗Alexander Huth ↗

Representations from large language models are highly effective at predicting BOLD fMRI responses to language stimuli. However, these representations are largely opaque: it is unclear what features of the language stimulus drive the response in each brain area. We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.This approach is successful at explaining selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified microROIs in prefrontal cortex. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying predictive models. Finally, we show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity. These results demonstrate that LLMs can be used to bridge the widening gap between data-driven models and formal scientific theories.

Read & Discuss → View Source →

23.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2601.09115

A saturation-absorption rubidium magnetometer with multilevel optical Bloch-equation modeling for intermediate-to-high fields

Authors:

Mayand Dangi ↗Prateek Rajan Gupta ↗Joseph Kasti ↗Nivedan Vishwanath ↗Michael Zepp ↗David Smith ↗Benedikt Geiger ↗Jennifer T. Choy ↗

arXiv:2601.09115v2 Announce Type: replace Abstract: We present SASHMAG (Saturated Absorption Spectroscopy High-field MAGnetometer), an atomic sensor designed for precision magnetic-field measurements in the intermediate-to-high field regime ($>0.2\,T$) using Rubidium-87 ($^{87}Rb$). The sensor operates in the hyperfine Paschen-Back regime, where the hyperfine and Zeeman interactions decouple, and utilizes counter-propagating pump-probe configuration in Faraday geometry to resolve isolated, Doppler-free Zeeman transitions. To interpret the resulting spectra in this strongly field-dependent regime, we developed a comprehensive multilevel optical Bloch-equation model solved explicitly in the uncoupled $\ket{m_I, m_J}$ basis, capturing state mixing and nonlinear saturation dynamics. This model reproduces measured spectra at sub-Doppler resolution and is consistent with analytical expectations for power broadening and thermal Doppler scaling. Magnetic field estimation is performed using a physics-constrained optimization routine that infers the magnetic field by minimizing the residual between experimentally extracted line centers and calculated transition frequencies from the field-dependent Hamiltonian. We demonstrate magnetic field retrieval from $0.2\,T$ to $0.4\,T$ with a precision of $\pm 0.0017 \,T$). Furthermore, the validated simulation establishes a foundation for generating synthetic training datasets, paving the way for autonomous, Machine Learning-enhanced magnetometry in applications ranging from MRI to fusion reactors.

Read & Discuss → View Source →

24.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.20504

Entropy Estimation in Multi-Qutrit Systems via Variational and Classical Neural Networks

Authors:

Sai Sakunthala Guddanti ↗Anil Prabhakar ↗Ria Rushin Joseph ↗

arXiv:2606.20504v1 Announce Type: cross Abstract: We present a systematic study of von Neumann entropy estimation in multi-qutrit quantum systems using two complementary approaches: variational quantum algorithms (VQAs) and classical convolutional neural networks (CNNs), evaluated using an ideal (noise-free) quantum simulator. For systems up to three qutrits, we construct and evaluate 11 hardware-efficient SU(3)-inspired ansatzes. A parameter sweep shows that estimation accuracy is primarily determined by the number of trainable parameters, provided sufficient entanglement is present. Based on this study, we fix the parameter count to approximately 120 for subsequent experiments, observing that increasing entangling-gate counts beyond a threshold yields only marginal improvements. For larger systems (two to five qutrits), we use a CNN trained on measurement outcomes from tensor-product mutually unbiased bases. The model achieves accurate and stable predictions and exhibits a systematic improvement in performance with system size, with the highest errors for two-qutrit systems and the lowest for five-qutrit systems. Notably, using only 12.5% of the measurements required for full state tomography is sufficient to reach 90th-percentile absolute errors of approximately 0.13-0.16 nats for both four- and five-qutrit systems. The CNN model is also robust to shot noise and generalizes well to out-of-distribution states. Overall, within the simulated settings studied here, our results indicate a transition in practical methods: VQAs are effective for small systems, while CNN-based estimators offer improved scalability and robustness for larger qutrit systems.

Read & Discuss → View Source →

25.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2604.26963

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Authors:

Yifei Wang ↗Hancheng Ye ↗Yechen Xu ↗Cong Guo ↗Chiyue Wei ↗Qinsi Wang ↗Dongting Li ↗Tingjun Chen ↗Hai "Helen" Li ↗Danyang Zhuo ↗Yiran Chen ↗

arXiv:2604.26963v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops, and a spatial shift from chat-scale, GPU-only execution to repository-scale, GPU-CPU co-located execution. Consequently, coordinating heterogeneous resource demands of agentic execution has emerged as a critical system challenge. We design and implement MARS, an efficient and adaptive co-scheduling system that globally coordinates heterogeneous agentic workloads under coupled GPU-CPU resource pressure. By establishing holistic visibility across GPU inference and CPU tool execution via a unified information stream, an external control plane in MARS decouples admission from execution to prevent heterogeneous resource oversubscription. An internal agent-centric scheduler further minimizes the end-to-end critical path by prioritizing latency-sensitive continuations and adaptively retaining KV cache state only when warm resumption yields a latency benefit. Our evaluations show that MARS reduces end-to-end latency by up to 5.94x while maintaining nearly maximal system throughput. We further integrate MARS as the serving backend for the OpenHands coding agent framework, demonstrating its real-world effectiveness by accelerating end-to-end task completion time by up to 1.87x. Our source code is publicly available at https://github.com/Afterglow231/MARS_preview .

Read & Discuss → View Source →

Explore the Frontier of Global Academia