Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture

Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlusions. Contemporary trackers in this domain rely on the motion of objects rather than appearance for association. Nevertheless, they struggle to maintain object identities when targets undergo frequent and strong occlusions. The high similarity of object appearances makes integrating appearance-based association nontrivial for agricultural scenarios. To solve this problem we propose CropTrack, a novel MOT framework based on the combination of appearance and motion information. CropTrack integrates a reranking-enhanced appearance association, a one-to-many association with appearance-based conflict resolution strategy, and an exponential moving average prototype feature bank to improve appearance-based association. Evaluated on publicly available agricultural MOT datasets, CropTrack demonstrates consistent identity preservation, outperforming traditional motion-based tracking methods. Compared to the state of the art, CropTrack achieves significant gains in association accuracy and identification precision scores with a lower number of identity switches.

02.
arXiv (CS.CL) 2026-06-19

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared across architectures. Across four instruction-tuned model families (Qwen2.5-1.5B, Gemma-2-2B, Llama-3.2-1B, Ministral-3-3B) finetuned identically, a difference-in-means direction achieves 99.6% separation of aligned and misaligned activations at each model's final layer. Causal steering by subtracting this direction reduces code spillover by 21-51 points, while a secure-code control confirms content specificity. Cross-architecture transfer via ridge regression maps yields large behavioral suppression (up to 46 points) but fails specificity controls as random and orthogonal directions perform comparably. We identify a two-tier specificity structure: within-model directions are causally specific and actionable; cross-model directions are causally real but non-specific. An asymmetric transfer topology emerges, with Gemma and Qwen acting as geometric donors and Llama as a receiver. These findings define the limits of linear cross-architecture correction and recommend within-model probing for auditing.

03.
arXiv (CS.CV) 2026-06-11

RSTR: Reducing SpatioTemporal Redundancy in Diffusion Transformers

Diffusion Transformers (DiTs) have achieved remarkable success in image generation, yet their deployment is hindered by high computational costs. We identify two sources of redundancy. First, temporal redundancy: Classifier-Free Guidance (CFG) applies costly dual forward passes at every timestep, yet guidance matters only at specific steps, and variable scales at critical steps can compensate for skipping others. Second, spatial redundancy: under variable guidance, different transformer blocks exhibit heterogeneous sensitivity, yet uniform calibration across all blocks wastes computation while failing to address their varying requirements. We present RSTR, the first framework to jointly reduce spatiotemporal redundancy in diffusion transformers. Stage-1 addresses temporal redundancy through evolutionary search, discovering sparse guidance schedules with variable scales. Stage-2 addresses spatial redundancy through adaptive rank allocation, assigning calibration capacities to transformer regions based on their sensitivity. Experiments on DiT-XL/2, PixArt-$\alpha$, FLUX, and state-of-the-art Qwen-Image demonstrate 50%-70% compute savings while maintaining or improving quality. On DiT-XL/2, RSTR achieves 57% savings with 15% FID improvement; on Qwen-Image, 3.43$\times$ speedup with preserved quality.

04.
arXiv (CS.LG) 2026-06-16

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

arXiv:2606.15569v1 Announce Type: new Abstract: Test-time training (TTT) adapts a pretrained model to each prompt via parameter updates, improving accuracy under pretraining-to-test distribution shifts. Yet, its performance often suffers from instability and sensitivity to hyperparameters such as update steps and subspace. We explain this behavior through a decision-theoretic lens, treating TTT as implicit Bayesian inference in the kernel regime. Under a Gaussian process benchmark, we show that TTT reduces prediction error when updates are spectrally matched to the prompt's signal-to-noise ratio and aligned with query-relevant eigen-directions. This perspective underpins the following results: (1) we show when fixed update steps and subspaces fail under distribution shifts, motivating adaptive strategies; (2) we prove that selecting update steps via prompt evidence admits a PAC-Bayes guarantee against overfitting; and (3) we characterize the Bayes-optimal update subspace under a linear-Gaussian correction model, yielding a scoring rule for selecting Transformer blocks and heads. Our theory helps explain the empirical instability of TTT, taking a step toward principled guidance for when, how far, and which directions to adapt.

05.
PLOS Medicine 2026-06-18

Association between initial benzodiazepine prescribing patterns and time to benzodiazepine discontinuation: A population-based retrospective cohort study

by Nikki Bozinoff, Tanya S. Hauck, Robert A. Kleinman, Matthew E. Sloan, Beth A. Sproule, Simone N. Vigod, Jennifer Wyman, Priscila Pequeno, Tara Gomes Background Long-term benzodiazepine use has been associated with increased risk of morbidity and mortality. Preventing long-term use through safer prescribing practices has received little attention to date. We sought to better understand associations between initial prescription characteristics and duration of benzodiazepine use. Methods and findings This was a retrospective population-based cohort study of 1,820,808 adults in Ontario with incident benzodiazepine prescriptions between January 1, 2013 and December 31, 2020, with follow-up to December 31, 2021. The primary exposure was duration of the index prescription (≤7 days—referent group, 8–14 days, 15–30 days, or >30 days). Secondary exposures were: (a) duration of action of index benzodiazepine(s) prescription (short-acting, long-acting or both); (b) number of benzodiazepine dispensed on index (1 or 2+); and (c) mean daily dose of the index prescription in Diazepam Milligram Equivalents (DMEs). The primary outcome was time to benzodiazepine discontinuation in days. Multivariable models were adjusted for age, sex, anxiety, insomnia, and substance use disorders as well as other important comorbidities and socio-demographic characteristics. The median age at index was 53 years (Interquartile Range (IQR) 38–67), and 62.6% were women. The median time to discontinuation in women was 16 days (IQR: 6–29) while the median time to discontinuation in men was 19 days (IQR: 6–29). Lorazepam was the most commonly prescribed benzodiazepine on index (63.9%), followed by clonazepam (17.3%) and diazepam (5.8%). In multivariable Cox Proportional Hazards Models, longer index prescriptions were associated with a lower likelihood of benzodiazepine discontinuation (adjusted Hazard Ratio (aHR) 0.54 (95% Confidence Interval (CI) [0.54,0.54]) for 8–14 days; aHR 0.26 (95% CI [0.25,0.26] for 15–30 days and aHR 0.14 (95% CI [0.14,0.14]) for >30 days, compared to ≤7 days, respectively). Being prescribed two or more benzodiazepines versus 1 was also associated with a reduced likelihood of discontinuation (aHR 0.59 (95% CI [0.57,0.61])), as was being prescribed long-acting benzodiazepines (aHR 0.80 (95% CI [0.80,0.80])) or a combination of short and long acting benzodiazepine (aHR 0.84 (95% CI [0.80,0.88])) versus short-acting benzodiazepines alone. Mean daily doses of >5 to ≤10 DME and >10 to ≤20 DME were associated with an increased likelihood of discontinuation (aHR 1.03 (95% CI [1.03,1.03]); aHR: 1.03 (95% CI [1.03,1.04])), whereas doses >20 DME were associated with a reduced likelihood of discontinuation (aHR 0.98 (95% CI [0.97,0.98])) compared with ≤5 DME. Findings may be subject to bias from unmeasured confounding. Conclusion This large population-based cohort study found that prescribing shorter courses of benzodiazepines, use of a single benzodiazepine, use of a short-acting agent, were associated with reduced likelihood of long-term benzodiazepine use. Findings suggest that simple changes to prescribing practices could reduce prolonged benzodiazepine use and the morbidity and mortality associated with long-term use of these medications.

06.
arXiv (CS.CV) 2026-06-18

Semantic Robustness Certification for Vision-Language Models

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

08.
PLOS Computational Biology 2026-06-23

CARGO: A cytometry analysis framework via Regularized graph optimal-transport

by Abida Sanjana Shemonti, Grzegorz B. Gmyrek, Katrien L. A. Quintelier, Sofie Van Gassen, Yvan Saeys, Marcella Willemsen, Joachim G. J. V. Aerts, Eva V. E. Madsen, J. Paul Robinson, Alex Pothen, Bartek Rajwa Conventional data visualization techniques in single-cell analysis (such as two-dimensional dot plots, SPADE, PCA, t-SNE, or UMAP) often fall short in enabling an intuitive understanding of high-parameter flow cytometry data. These methods tend to oversimplify complex biological relationships, lack biologically meaningful interpretations, and offer no principled framework for downstream quantitative analysis. To address these limitations, we present a graph-based (network-based) visualization framework grounded in optimal transport theory. In this framework, cell populations are defined by their marker-expression profiles, and inter-population similarity is quantified using an efficiently computable optimal transport formulation known as the Sinkhorn distance. Our approach produces biologically consistent two-dimensional graph layouts using a phenotype-aware Hamming distance. Structural differences between sample graphs are characterized through a customized graph-edit distance that captures changes in population size, marker expression, and relationships between populations. We demonstrate our methods on two flow cytometry datasets: one from a clinical trial of dendritic cell-based immunotherapy in malignant peritoneal mesothelioma, involving 14 patients sampled at three time points with 14-color panels, and another from FlowCAP-II, which involved 43 acute myeloid leukemia patient samples analyzed with 7-color panels. Our framework produces robust, quantitative visual summaries of cell populations and supports statistical analysis based on graph edit distances, thereby offering new insights into disease progression and treatment response. Ultimately, our method bridges the gap between flow cytometry data visualization and biological interpretation.

09.
arXiv (quant-ph) 2026-06-16

Scalable Graph State Generation with O(1) Local Feedforward in Quantum Networks

arXiv:2606.16375v1 Announce Type: new Abstract: The development of quantum networks faces a key challenge: the contradiction between probabilistic long-range entanglement generation and finite coherence time. Existing routing protocols typically focus on global state computation or path optimization. As the network scales up, classical delays accumulate and exacerbate decoherence, leading to a decrease in entanglement fidelity. To reduce routing decision delays to levels far below the coherence time of qubits, we propose a protocol based on local measurement and classical feedforward. This protocol reduces the local decision complexity to amortized O(1) level, ensuring that the decision delay is always much smaller than the coherence time of qubits. We map this protocol onto a dual-species trapped-ion platform and perform hybrid simulations. The results show that the proposed protocol performs well in terms of both resource efficiency and time feasibility. Noise analysis indicates that readout fidelity is the main bottleneck of this protocol, but noise suppression can be achieved by employing an erasure transformation in the dual-species architecture, combined with spatial multiplexing and branch independence, thereby ensuring the generation of high-fidelity star subgraphs. This protocol provides a clear path to achieving high-fidelity star subgraphs. These subgraphs can serve as general modules, merging to construct arbitrary subgraphs, providing a feasible solution for future fault-tolerant distributed quantum computing.

10.
PLOS Medicine 2026-05-06

Pathways of emergency care for severely ill children in Nigerian and Ugandan hospitals: A process mapping study

作者:

by Rami Subhi, Abiodun Sogbesan, Dan Muramuzi, Mikael Burhin, Ayobami A. Bakare, Adegoke G. Falade, Freddy E. Kitutu, Freddie Ssengooba, Carina King, Sumit Kane, Belinda Dawson-McClaren, Hamish R. Graham, the MOXY-Implementation Research Collaboration Background Child mortality remains high in countries with weak emergency care systems. Facility organisation for paediatric emergency care is heterogeneous and under-described. We examined how hospitals in Uganda and Nigeria are organised to deliver emergency care for neonates and children. Methods and findings We conducted a qualitative, multi-method study in 26 purposively selected secondary and tertiary facilities in Uganda and Nigeria from October 2023 to December 2024. Embedded researchers documented patient pathways, resources for care, and care processes for severely ill children (

11.
medRxiv (Medicine) 2026-06-22

Virtual Responsive Neurostimulation Implantation: From Intracranial Connectivity to Optimized Lead Placement

Responsive neurostimulation (RNS) is an implanted device that delivers direct brain stimulation for drug-resistant focal epilepsy. Individual responses are highly variable, and no validated framework exists to predict outcome or guide lead placement before implantation. We hypothesized that this variability is partly explained by lead placement in relation to patterns of functional connectivity in brain networks. Fourty-nine patients with drug-resistant focal epilepsy who underwent pre-implantation intracranial EEG (iEEG) and RNS implantation across three independent epilepsy centers were retrospectively studied. We developed a composite functional connectivity score, based on simple Spearman correlation, combining the standard deviation and kurtosis of interictal iEEG connectivity distributions to predict the response outcome in a training cohort (HUP, n=18) and validated in two independent cohorts (NYU, n=17; UCSF, n=14). We accounted for a spatial mismatch between iEEG and RNS electrodes with a distance-based correction. The score was extended to generate patient-specific 3D maps of predicted RNS efficacy across 200 simulated, or virtual RNS, lead configurations. Accuracy of the score in predicting clinical outcome was 72% at the group level, 61% at the individual patient level, and, after distance-based optimization, 100% in patients with RNS electrodes placed close to location of iEEG electrodes. Applied to the validation cohort, the same score reached 68% accuracy (71% balanced accuracy, 55% sensitivity, 88% specificity). The spatial combination of the scores at different SEEG contacts localization gives a spatial score for each patient. Responders showed significantly higher spatial scores than non-responders, supporting that actual RNS lead placement in responders was located in map-identified favorable regions. Interictal iEEG functional connectivity predicts individual RNS response across independent epilepsy centers, and patient-specific 3D maps derived from this biomarker could prospectively guide lead implantation toward favorable network regions, opening a promising avenue toward network-informed RNS surgical planning.

12.
arXiv (CS.CV) 2026-06-11

Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift

The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to train on data where the label is strongly tied to some non-causal cue, then evaluate on examples where that tie no longer holds. This idea is well established for classification tasks, but for semantic segmentation the specific failure modes are not well understood. We show that a model may achieve reasonable overlap while assigning the wrong semantic label, swapping one plausible foreground class for another, even when object boundaries are largely correct. We focus on this semantic label-flip behaviour and quantify it with a simple diagnostic (Flip) that counts how often ground truth foreground pixels are assigned the wrong foreground identity while remaining predicted as foreground. In a setting where category and scene are correlated during training, increasing the correlation consistently widens the gap between common and rare test conditions and increases these within-object label swaps on counterfactual groups. Overall, our results motivate assessing segmentation robustness under distribution shift beyond overlap by decomposing foreground errors into correct pixels, flipped-identity pixels, and missed-to-background pixels. We also propose an entropy-based, ground truth label-free `flip-risk' score, which is computed from foreground identity uncertainty, and show that it can flag flip-prone cases at inference time. Code is available at https://github.com/acharaakshit/label-flips.

13.
arXiv (CS.AI) 2026-06-16

Exploiting Search in Symbolic Numeric Planning with Patterns

arXiv:2606.16329v1 Announce Type: new Abstract: In this paper, we present a procedure for numeric planning based on Symbolic Pattern Planning (SPP). Given a numeric planning problem $\Pi$, a pattern $\prec$ is a sequence of actions used to define a formula encoding the subsequences of $\prec$ executable from a starting state $S$. Cardellini, Giunchiglia, and Maratea (2024a) follow the Planning as Satisfiability approach by defining, at each step $n \ge 0$, a formula $\Pi^\prec_n$ in which $(i)$ the pattern $\prec$ is computed only for $n=0$ in the initial state $I$ of $\Pi$, and then exploited at each step $n$, $(ii)$ the starting state $S$ is set to $I$, and $(iii)$ the set $G$ of goals is required to hold in the last state that can be reached by one of the subsequences of $\prec$ concatenated $n$ times. The procedure begins with $n=0$, terminates as soon as $\Pi^\prec_n$ is satisfiable, and otherwise proceeds by incrementing $n$. In this paper, possibly at each step, $(i)$ we symbolically search for an intermediate state $P$ reachable from $I$, closer to a goal state, $(ii)$ dynamically recompute the pattern $\prec_h$ – to be used in the next step – in $P$, $(iii)$ refine the pattern $\prec_g$ used to reach $P$, and $(iv)$ start the new search from the state $S$ which can be either the initial state $I$ or the last computed intermediate state $P$, exploiting the computed patterns $\prec_g$ and $\prec_h$ to define the pattern $\prec$ to be used in the search. In particular, at each step, we define a formula $\Pi^{\prec}_{S,P}$ encoding the existence of a state $P'$ closer than $P$ to a goal state, with $P'$ reachable from the starting state $S$ when using the pattern $\prec$. We present different techniques for producing such formulas, each corresponding to a different strategy for exploring the search space. We prove their correctness and completeness, the latter under certain conditions.

14.
arXiv (CS.CV) 2026-06-11

SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining

Mining hard, safety-critical scenes from driving logs is bottlenecked by the absence of difficulty labels, and no single proxy, collision risk, trajectory ambiguity, or semantic rarity suffices to find such scenes on its own. We present SceneMiner, a unified, camera-only bird's-eye-view pipeline that emits complementary mining signals from a frozen vision-language backbone in a single forward pass, with no LiDAR or radar: a retrieval embedding for text-prompted scenario search, a multi-label scene-tag distribution, and a continuous physics-based risk score (a motion forecast is a byproduct, not a contribution). Building such a multi-head model exposes our central finding, a failure mode we term cross-task interference: adding or upgrading one head shifts a shared activation stream and degrades weight-frozen sibling heads, so freezing parameters alone is insufficient. Our contribution, identity-preserving multi-task fine-tuning, removes this interference by zero-initializing every new sub-module and freezing every parameter that feeds the shared stream. The mining heads are thereby preserved bit-identically while training only ~102k parameters. The tagging head reaches mAP 0.4614 (micro-F1 0.5557) on 20 scene tags by pooling each scene into 32 visual tokens, and the embedding head supports text-prompted retrieval, validated qualitatively. Code is available at: https://anonymous.4open.science/r/sceneminer_anonymous-64E5

15.
arXiv (CS.LG) 2026-06-18

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

arXiv:2606.19149v1 Announce Type: cross Abstract: Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

16.
arXiv (math.PR) 2026-06-18

Geometric obstructions to Lipschitz transport between weighted Hessian $\mathrm{CD}(\kappa,\infty)$ manifolds

arXiv:2606.11085v2 Announce Type: replace Abstract: We construct a weighted Riemannian manifold $(\mathbb R^2,g,\mu)$ satisfying $\mathrm{CD}(1/2,\infty)$, the curvature-dimension condition, with the following property: if $\gamma$ denotes a centered Gaussian measure on $\mathbb R^2$, then there is no Lipschitz map $T:(\mathbb R^2,\|\cdot\|) \to (\mathbb R^2,g)$ satisfying $T_\#\gamma=\mu$. Building on this, we prove a Weyl-type asymptotic law for the eigenvalues of the weighted Laplacian $-\Delta_{g,\mu}$ and show that they are asymptotically negligible when compared to the eigenvalues of $-\Delta_{\gamma}$. These results give strong counterexamples to two questions of E. Milman and complement the recent counterexample of Aryan.

17.
arXiv (CS.CL) 2026-06-11

Neuron-based Personality Trait Induction in Large Language Models

Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron-based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PersonalityBench, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and is designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that manipulates the values of these identified personality-related neurons. This method enables fine-grained control over the traits exhibited by LLMs without training and modifying model parameters. Extensive experiments validate the efficacy of our neuron identification and trait induction methods. Notably, our approach achieves comparable performance as fine-tuned models, offering a more efficient and flexible solution for personality trait induction in LLMs. We provide access to all the mentioned resources at https://github.com/RUCAIBox/NPTI.

18.
arXiv (CS.LG) 2026-06-16

FEnc$^2$: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding

arXiv:2606.16359v1 Announce Type: cross Abstract: Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning but incurs extreme computational and memory overhead. These costs come not only from expensive low-level primitives, including Number Theoretic Transform (NTT), rotation, and key-switching, but also from inefficient ciphertext packing at the application level. Existing packing strategies typically preserve either neighboring data elements or feature grouping, but not both, leading to wasted ciphertext slots, excessive rotations, and inflated ciphertext counts. We propose FEnc2, a unified and principled fragment-based encoding framework for CKKS-based private convolutional neural network inference. FEnc2 optimizes slot utilization, rotation complexity, and ciphertext density through two components: 1)Conv-aware Encoding, which analytically selects an optimal fragment size to decouple spatial dependencies and jointly minimize inner-outer rotations across layers, and 2)Arch-aware Ct Compression, which restores ciphertext density after feature- or channel-reduction layers. Together, these transformations reshape encrypted workload structure and reduce homomorphic operations by one to two orders of magnitude. With full memory capacity utilized, i.e., at maximum batch size, FEnc2 achieves end-to-end latency speedups over the state-of-the-art Orion of up to 228.83x on GPU and 226.06x on CPU for LeNet on MNIST, and up to 4.55x on GPU and 9.43x on CPU for MobileNet on ImageNet. FEnc2 is hardware-agnostic yet architecturally transformative: by optimizing encrypted tensor layout before execution, it reduces ciphertext count and workload pressure on hardware, complementing primitive-level optimizations such as NTT and keyswitch accelerators. These results show that application-level data layout is a first-order architectural design dimension for encrypted inference and an important enabler for next-generation FHE systems.

19.
arXiv (CS.CV) 2026-06-12

Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

Background and Objective: Falls among elderly people can cause serious injury and reduce quality of life. Timely prediction and detection are essential to prevent harm and support well-being. We propose a portable, low-power, battery-operated, vision-based fall prediction and detection system using HPE on an AMD Kria K26 System-on-Module (SOM). The objective is a non-intrusive, privacy-preserving system for real-time fall detection. Methods: The system uses an Intel RealSense D455 range-sensing camera connected to the K26 SOM by USB. It captures synchronized RGB and depth frames, 640 x 480 x 3 and 640 x 480 pixels, at 60 FPS. The SOM runs a three-stage pipeline with quantized YOLOX, Anchor-to-Joint (A2J), and fall-detection models. YOLOX identifies human bounding boxes from RGB frames, then discards the RGB frames to preserve privacy. A2J uses depth frames to estimate 15 joint keypoints per person. A CNN uses selected joint coordinates (x, y, z) to classify fall activity. YOLOX was trained on CrowdHuman; A2J on ITOP, MP-3DHP, UR Fall Detection, and a custom SDSU PSG dataset; and the CNN on UR Fall Detection and SDSU PSG. The design used a single-core DPU with a serial pipeline and a dual-core DPU running YOLOX and A2J with multiple threads. Results: Quantized accuracy was evaluated using IoU >= 50% for YOLOX, mAP with a 10-cm rule for A2J, and classification accuracy, (TP + TN)/(TP + TN + FP + FN), for the CNN. Accuracies were 74%, 84.13%, and 75.85%. Throughput improved from 2.5 FPS for the single-threaded pipeline to 4.5 FPS for the multi-threaded version. Conclusion: Results demonstrate the feasibility of privacy-preserving fall detection on an AMD Kria K26 edge device. On-device HPE and fall classification runs without cloud dependency, supporting elderly monitoring and assistive healthcare. Future work will improve model accuracy and speed.

20.
arXiv (CS.AI) 2026-06-12

The Query Channel: Information-Theoretic Limits of Masking-Based Explanations

arXiv:2604.16689v2 Announce Type: replace Abstract: Masking-based post-hoc explanation methods, such as KernelSHAP and LIME, estimate local feature importance by querying a black-box model under randomized perturbations. This paper formulates this procedure as communication over a query channel, where the latent explanation acts as a message and each masked evaluation is a channel use. Within this framework, the complexity of the explanation is captured by the entropy of the hypothesis class, while the query interface supplies information at a rate determined by an identification capacity per query. We derive a strong converse showing that, if the explanation rate exceeds this capacity, the probability of exact recovery necessarily converges to one in error for any sequence of explainers and decoders. We also prove an achievability result establishing that a sparse maximum-likelihood decoder attains reliable recovery when the rate lies below capacity. A Monte Carlo estimator of mutual information yields a non-asymptotic query benchmark that we use to compare optimal decoding with Lasso- and OLS-based procedures that mirror LIME and KernelSHAP. Experiments reveal a range of query budgets where information theory permits reliable explanations but standard convex surrogates still fail. Finally, we interpret super-pixel resolution and tokenization for neural language models as a source-coding choice that sets the entropy of the explanation and show how Gaussian noise and nonlinear curvature degrade the query channel, induce waterfall and error-floor behavior, and render high-resolution explanations unattainable.

21.
medRxiv (Medicine) 2026-06-22

Associations of Chemical Exposures with Psychological Distress and Depression Diagnosis among Waste Pickers in Brasilia, Brazil: A Cross-Sectional Study

Introduction: Waste pickers face chemical exposures. We evaluated whether chemical exposure is associated with psychological distress and depression. Methods: A 2017 cross-sectional survey included 1,141 waste pickers working in the Estrutural open dump in Brasilia, Brazil. Participants self-reported occupational exposure to 11 chemical categories, 17 psychological distress symptoms, and depression diagnoses. Associations of chemical exposure with mean psychological distress scores and depression prevalence were assessed, adjusted for age, sex, marital status, and income. Results: Mean psychological distress score was higher among those exposed to any chemical (mean of 8.1 vs 6.1; adjusted mean difference [aMD]: 1.8 [0.9, 2.7]) and higher among those exposed to each of 11 chemical categories, for example, smoke (aMD: 1.2 [0.6, 1.7]), batteries (aMD: 1.5 [1.0, 1.9], and oils (aMD: 1.3 [0.9, 1.8]). Depression was more prevalent among those exposed to oils (16.6% vs 10.6%; adjusted prevalence difference [aPD]: 6.3% [95% CI: 2.3, 10.2]), cleaning products (aPD: 5.4% [1.2, 9.5]), medications (aPD: 4.7% [0.6, 8.8]), and aerosols (aPD: 5.3% [1.3, 9.3]) but, not smoke, batteries, greases, insecticides, solvents, paints, chemical containers, or any chemical. Conclusion: These associations highlight the need to consider policy level protections for waste pickers to reduce chemical exposure and guard against psychological distress. Further research is necessary to explore which specific chemicals, within broad chemical categories, are associated with psychological distress and depression.

22.
arXiv (quant-ph) 2026-06-16

High-fidelity two-qubit gates in a 7-qubit register for quantum networks

arXiv:2606.14847v1 Announce Type: new Abstract: Quantum networks based on optically active solid-state spins may enable quantum technologies including long-range quantum communication and distributed quantum computing. Network nodes containing multiple high-fidelity qubits can facilitate large-scale fault-tolerant operation. However, the stringent error thresholds remain out of reach for multi-qubit registers. In this work, we demonstrate high-fidelity two-qubit gates in a 7-qubit register, based on nuclear spins coupled to a nitrogen-vacancy (NV) center in diamond. We analyze crosstalk in highly connected spin systems, develop an efficient optimization procedure, and characterize the gates using gate set tomography. The two-qubit gate fidelities (best: 99.61(5)%, average: 99.18(2)%) demonstrate a multi-qubit register at the threshold for distributed quantum computation. Finally, as an example application, we perform a variational quantum eigensolver (VQE) simulation of the ground-state energy of H2 and LiH molecules. These results demonstrate one of the key prerequisites for scalable quantum networks based on solid-state spins.

23.
bioRxiv (Bioinfo) 2026-06-18

Benchmarking attention-based methods for vision transformers' interpretability in retinal fundus imaging

Deep learning models based on Vision Transformers (ViTs) have shown strong performance in retinal fundus imaging, but their interpretability remains poorly understood. In particular, attention-based attribution methods are widely used to explain ViT predictions, despite limited evaluation of their faithfulness and biological relevance in medical imaging. Here, we systematically benchmark four attention-based interpretability methods for RETFound, a retinal ViT-based foundation model, that we previously fine-tuned to predict 17 retinal vascular phenotypes from UK Biobank fundus images1. We compare raw attention, attention rollout, gradient-weighted attention rollout, and Chefer's hybrid relevance-based method using both qualitative visualisation and quantitative evaluation frameworks. To assess attribution faithfulness, we perform perturbation-based deletion and insertion experiments, quantifying changes in model predictions as highly attended image regions are progressively removed or restored. To evaluate biological specificity, we run structure-aware analyses combining attribution maps with vessel segmentation and artery-vein labels through the Relative ratio of Attention Intensity (RAI) metric. Across models, attribution maps differed substantially depending on the selected interpretability method, highlighting the need for rigorous quantitative evaluation. Among the evaluated approaches, gradient-weighted attention rollout consistently achieved the strongest perturbation performance and produced attribution maps most closely aligned with the anatomical definition of the predicted retinal traits. Furthermore, vessel-type specific models systematically concentrate attention on the corresponding vascular structures despite being trained using only a single scalar value per image as supervision. These findings demonstrate that attention-based attribution methods capture biologically meaningful vascular representations, while also revealing method-dependent variability in attribution behaviour. This work provides a quantitative framework for evaluating interpretability methods in medical imaging with annotated segmentation and contributes toward more transparent and biologically grounded medical AI systems.

24.
arXiv (CS.AI) 2026-06-17

No-Free-Fairness: Fundamental Limits and Trade-offs in Learning Systems

作者:

arXiv:2606.17810v1 Announce Type: cross Abstract: In this paper, we establish a set of theoretical impossibility results, termed the No-Free-Fairness theorems, that identify three fundamental sources of disparity in learning systems. First, we show that when a task exhibits irreducible cost on a subgroup, any decision rule must trade off overall performance with disparity, yielding an inherent fairness–cost frontier. Second, we prove that even in ideal, noise-free settings where a perfectly fair and accurate solution exists, finite-sample learning alone induces nontrivial subgroup disparity, ruling out distribution-free fairness guarantees. More seriously, enforcing strict relative fairness creates a statistical bottleneck: achieving low cost may require exponentially many samples. Third, we show that limitations of the model class can independently induce disparity: if the model cannot represent accurate solutions for a subgroup, fairness remains unattainable regardless of data or training procedure. Overall, these results demonstrate that unfairness is not solely a consequence of biased data or suboptimal optimization, but arises from the intrinsic structure of decision problems, the constraints of finite data, and the expressivity of models. Our framework applies broadly beyond standard supervised learning, and suggests that achieving fairness requires explicit trade-offs and should be treated as a core design consideration.

25.
arXiv (CS.CV) 2026-06-16

Self-Questioning Vision-Language Models: Reinforcement Learning for Compositional Visual Reasoning

Vision-Language Models (VLMs) are AI systems that process both images and text, yet they often struggle with compositional visual reasoning questions that require chaining multiple steps together, such as identifying objects, counting them, and comparing the results. Existing approaches improve this reasoning by training models on human-written step-by-step explanations, but creating these annotations is expensive and difficult to scale. We propose a self-questioning framework that trains a VLM to break visual questions into smaller sub-questions and answer each one before producing a final response, using a reinforcement learning algorithm called Group Relative Policy Optimization (GRPO). The model is never shown examples of how to decompose questions, it discovers this behavior on its own, guided by a reward signal that scores whether the output contains sub-questions and whether the final answer is correct. We apply this framework to a 3-billion-parameter model, training on both synthetic scenes of geometric shapes (CLEVR) and real-world photographs (A-OKVQA). On A-OKVQA, both self-questioning and standard reinforcement learning substantially improve accuracy over the untrained model (52.2% and 51.6% vs. 46.8%). We introduce the first self-questioning VLM by rewarding not only the final answer like standard RL but additionally for generating intermediate sub-questions, enabling it to discover compositional decomposition strategies. These results suggest that teaching AI systems to ask themselves intermediate questions is a promising strategy for complex visual reasoning, particularly when the difficulty of a question warrants explicit step-by-step decomposition.