Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-17

HeteRo-Select: Informativeness as the Participation Driver in Heterogeneous Federated Learning

arXiv:2508.06692v2 Announce Type: replace Abstract: Federated learning systems typically allocate gradient compression by link speed. This is sensible when bandwidth and data informativeness align. However, under non-IID data, these signals often decorrelate or invert. A bandwidth-driven allocator then risks compressing the most informative gradients hardest. We propose HeteRo-Select, a framework that replaces bandwidth with a per-client informativeness score as the primary driver of compression. The score jointly governs three decisions per round: client selection, compression ratio, and server aggregation weight, with bandwidth retained only as a hard ceiling. Score-proportional selection provably reduces the effective heterogeneity of the chosen subset; score-proportional compression provably lowers aggregate top-$k$ error at fixed traffic. Under the exact FedCG simulation protocol, HeteRo-Select delivers a $1.78\times$ speedup and an $18.2\%$ reduction in traffic on CIFAR-10. The same configuration, unchanged, scales from a $7{,}850$-parameter logistic regression to an $11.27$M-parameter ResNet-18, hitting the accuracy target on three of four benchmarks. When bandwidth and informativeness are deliberately anti-correlated, the method still achieves the target accuracy with less traffic than the normal-bandwidth run.

02.
medRxiv (Medicine) 2026-06-16

Daily Healthy Eating Index (HEI-2020) scoring reveals diet quality patterns masked by aggregation

The Healthy Eating Index (HEI-2020) is conventionally computed by aggregating intake across days before scoring. Digital food logging enables an alternative: scoring each day and averaging daily scores. These methods are not equivalent. The HEI's density-based structure and component caps cause aggregation to inflate adequacy scores when intake is irregular. Using Food & You data, we show daily HEI correlates more strongly with microbiome diversity, and recommend co-reporting both metrics.

03.
arXiv (CS.AI) 2026-06-24

Sol Video Inference Engine: Agent-Native Full-Stack Acceleration Framework for Efficient Video Generation

arXiv:2606.23743v1 Announce Type: cross Abstract: Modern video diffusion models achieve higher generation quality through scaling, but this also increases inference cost. Although many acceleration methods have been proposed, a central challenge is that the most effective acceleration strategy is highly instance-specific: a recipe that works well for one combination of model, hardware, and inference configuration often does not transfer to another. Different models vary in architecture, numerical sensitivity, and attention concentration patterns. Inference settings differ in spatial and temporal resolution and video duration, while hardware platforms differ in memory hierarchy, supported numerical formats, and kernel throughput. These factors create a large tuning space, making manual performance engineering costly. We present Sol Video Inference Engine, an agentic, native, training-free acceleration framework for video diffusion models. It organizes five broadly applicable techniques, cache, sparse attention, token pruning, quantization, and kernel fusion, into an agentic acceleration stack for instance-specific optimization. For a concrete deployment target defined by a model, hardware platform, and serving configuration, parallel skill agents optimize the implementation of each technique, an agent integrator composes them into a global acceleration stack, and a human validator provides feedback on generation quality. We instantiate this workflow on three video models with different sizes and architectures: 64B Cosmos3-Super, 22B LTX-2.3, and 2B SANA-Video. With little human effort, the full stack achieves more than 2x end-to-end acceleration while maintaining near-lossless VBench quality, demonstrating the effectiveness of the agent framework for video diffusion acceleration.

04.
arXiv (CS.AI) 2026-06-12

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

arXiv:2606.12425v1 Announce Type: cross Abstract: Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.

05.
arXiv (CS.AI) 2026-06-11

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

arXiv:2606.12406v1 Announce Type: cross Abstract: Contact-rich manipulation requires force sensitivity, but many robot arms lack dedicated force sensors due to their high cost. We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors. NEXT trains in 1 minute from only 10 minutes of free-motion data, yet achieves estimates comparable to dedicated joint-torque sensors. NEXT enables force-feedback teleoperation on low-cost arms and improves policy learning through Force-Informed Re-Sampling Training (FIRST), which up-samples pre-contact and contact segments during behavior cloning. Across five long-horizon tasks, FIRST outperforms prior force-aware policies by over 17% in task progress. Together, NEXT and FIRST bring force-aware teleoperation and policy learning to off-the-shelf robots without additional sensing hardware. Video results and code are available at https://jasonjzliu.com/factr2

06.
arXiv (CS.CL) 2026-06-12

Order Is Not Control

AI alignment, interpretability, steering, and neural perturbation studies identify order-inducing objects. We argue that order is not control. Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. We identify it across biological, LLM, adapter, and stochastic-operator panels. The laws are local: an intervention can be admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port, and comparator. Control is assigned when finite effort moves a target or outcome-readout class under the same denominator while damage, null/evasive, invalid format, overdrive, and unnecessary effort stay bounded. Mouse ALM, C. elegans, and zebrafish panels provide physical response-operator evidence while excluding coordinate identity and controller conclusions. LLM panels show generated-output response laws: across four material conditions, response vectors are predictable at 72.8-73.7% component-sign accuracy, rising to 84.3-84.8% on nonzero components; held-out observers predict system-effect and target/oracle families at 93.6% and 91.7% accuracy. Constitution-conditioned adapters reshape susceptibility as prepared media, and stochastic-operator panels separate measured opportunity from deployable action policies. This gives a driven-dissipative response-system account at the mesoscopic control level: drives act through prepared media, baths, and receivers, producing admitted movement, impedance, sinks, or overdrive. The evidence supports local admitted control and measurable stochastic response operators, while leaving deployable pre-generation control, hidden/logit causal sufficiency, biological-to-LLM coordinate identity, and literal thermodynamic quantities outside scope.

07.
arXiv (CS.AI) 2026-06-19

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

arXiv:2606.20209v1 Announce Type: cross Abstract: Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

08.
medRxiv (Medicine) 2026-06-19

Rumination as a cognitive vulnerability factor in perinatal bereavement: evidence from the CARING study

Purpose. Perinatal loss is associated with a high risk of persistent psychological distress, including prolonged grief, depression, anxiety, and post-traumatic stress symptoms. Cognitive processes such as rumination may play a crucial role in maintaining and amplifying distress following loss, yet their specific contribution in perinatal bereavement remains underexplored. Methods. The CARING (Cognitive Analysis and Rumination INvestigation in perinatal Grief) study employed a cross-sectional design involving 298 parents who experienced perinatal loss within the previous five years. Participants completed an anonymous online survey including measures of depressive rumination (Ruminative Response Scale, RRS), angry rumination (Anger Rumination Scale, ARS), perinatal grief (Perinatal Grief Scale, PGS), general psychopathology (SCL-90), and post-traumatic stress symptoms (NSESSS). Non-parametric analyses were conducted to examine associations between rumination patterns and psychological outcomes. Results. Higher levels of rumination were significantly associated with greater perinatal grief, depressive and anxiety symptoms, and post-traumatic stress. Depressive rumination showed consistently stronger associations with all outcomes compared to angry rumination. Participants presenting both depressive and angry rumination exhibited the highest levels of grief intensity, psychological distress, and PTSD symptoms, suggesting a graded relationship between rumination patterns and severity of distress. Rumination levels were not significantly associated with gestational age at loss or with having received psychological support. Conclusions. Rumination, particularly in its depressive form, appears to function as a transdiagnostic cognitive vulnerability factor in perinatal bereavement. These findings highlight rumination as a potential target for early screening and tailored psychological interventions aimed at reducing long-term distress following perinatal loss.

09.
arXiv (CS.AI) 2026-06-18

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

arXiv:2606.19328v1 Announce Type: cross Abstract: Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on passive data collection and suffer from poor sample efficiency, especially during the early stages of learning. We introduce a model-based approach that actively directs exploration by jointly reasoning over uncertainties in the reward, dynamics, and value functions. Our method, Uncertainty-Balanced Preference Planning (UBP2), uses ensembles of reward, dynamics, and value function models to evaluate candidate trajectories according to a unified score that combines expected reward, terminal value, and epistemic uncertainty. Planning under this objective yields an explicit tradeoff between exploitation and information acquisition without requiring ad hoc exploration heuristics. Under standard regularity assumptions, we establish sublinear regret guarantees for both finite-horizon and infinite-horizon settings. Empirically, experiments on the Meta-World benchmark show UBP2 achieves substantially higher sample efficiency than model-free preference-based methods and non-optimistic model-based baselines.

10.
arXiv (CS.AI) 2026-06-25

Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

arXiv:2509.14274v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have demonstrated significant promise in formal theorem proving. In this study, we investigate the ability of LLMs to discover novel theorems and produce verified proofs. We propose a pipeline called Conjecturing-Proving Loop (CPL), which iteratively generates mathematical conjectures and attempts to prove them in Lean 4. A key feature of CPL is that each iteration conditions the LLM on previously generated theorems and their formal proofs, enabling parameter-free improvement of proof strategies via in-context learning. We provide both theoretical and experimental evidence that CPL increases the discovery rate of hard-to-prove theorems compared to frameworks that generate statements and proofs simultaneously. Moreover, our experiments show that reusing the LLM's own formally verified outputs as context consistently improves subsequent proof success, demonstrating the effectiveness of self-generated in-context learning for neural theorem proving. The source code is available at https://github.com/auto-res/ConjecturingProvingLoop.

11.
arXiv (CS.LG) 2026-06-16

BRICKS-WM: Building Reusability via Interface Composition Kinetics for Structured World Models

arXiv:2606.16489v1 Announce Type: new Abstract: Model-based Reinforcement Learning (MBRL) has achieved remarkable success in continuous control by leveraging latent world models. However, prevailing approaches typically rely on monolithic latent dynamics, entangling environment dynamics into a coupled process. This coupling severely limits reusability: altering the agent necessitates retraining the entire world from scratch, even if the environment remains constant. To address this, we introduce BRICKS-WM (Building Reusability via Interface Composition Kinetics for Structured World Models), a framework for the modular assembly of structured world models. Driven by the insight that the physical world is composed of independent entities, we posit that global dynamics can be modeled as a composition of distinct dynamical modules interacting via latent interfaces. As a minimal instantiation, we factorize the latent state space into an actuated Agent module and an external Background module, bridged by a learned latent interface. Unlike prior object-centric methods that prioritize visual segmentation, BRICKS-WM enforces a functional separation in transition dynamics, ensuring that background dynamics remains agnostic to the agent's dynamics. Empirically, BRICKS-WM achieves control performance comparable to strong monolithic baselines when trained from scratch, and enables the reuse of frozen background dynamics across agents.

12.
arXiv (CS.CL) 2026-06-24

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its efficiency-accuracy trade-offs remain unclear due to the lack of comprehensive evaluation. We address this gap with the largest-scale empirical analysis to date of training-free sparse attention, evaluating six methods across multiple model families and sizes, sequences up to 128K tokens, and sparsity levels up to 0.95 (i.e., $1/20$ attention budget) on nine diverse tasks. We first organise the rapidly evolving landscape of sparse attention methods into a taxonomy along four design axes. Our analysis then yields actionable insights: 1) sparse attention is effective: larger sparse models outperform smaller dense ones at equivalent cost, improving the Pareto frontier; 2) for the training-free methods we study, fine-grained per-query importance estimation during prefilling remains impractical-due to both the cost of estimation and the lack of sparse kernels that translate fine-grained sparsity into wall-clock gains-forcing a task-dependent choice between global-to-token and block-to-block selection. Instead, during decoding, token-to-page selection becomes feasible, enabling better generalisation and higher sparsity tolerance; 3) longer sequences tolerate higher sparsity, suggesting that fixed-budget methods in production are suboptimal. Together, these findings provide practical guidance for deploying sparse attention and methodological recommendations for future evaluations. Our code is available at https://github.com/PiotrNawrot/sparse-frontier.

13.
arXiv (CS.CV) 2026-06-25

Are We There Yet? Exploring the Capabilities of MLLMs in Assistive AI Applications

Multimodal Large Language Models (MLLMs) have redefined visual understanding by combining vision encoders with large-scale language models. This unified architecture enables strong performance on tasks like image captioning, visual question answering, and multimodal dialogue, often in zero- and few-shot settings. Their general-purpose capabilities and flexible interfaces make MLLMs a promising foundation for real-world vision-language applications. Assistive AI aims to help users interact with their environments through natural language. These scenarios demand robust visual recognition, contextual reasoning, and multilingual comprehension-capabilities that MLLMs are believed to offer. However, their effectiveness in assistive settings remains to be fully understood. In this work, we explore whether MLLMs can support Assistive AI by evaluating state-of-the-art models on real-world tasks: recognizing everyday objects like currency, answering questions based on scene text, and reading visually presented content across multiple languages. To this end, we developed a system, NetraLink, using a head-mounted GoPro to capture real-world egocentric data, and collected a benchmark covering these assistive scenarios. Our findings provide a comprehensive diagnostic of current MLLMs, highlighting their strengths and limitations in enabling assistive technologies grounded in visual perception and language interaction.

14.
arXiv (CS.LG) 2026-06-18

Signature filtering: a lightweight enhancement for statistical watermark detection in large language models

arXiv:2606.18430v1 Announce Type: new Abstract: Statistical watermarks help organizations attribute large language model (LLM) outputs, yet existing detectors often struggle when watermark signals are weak, texts are repetitive, or watermarks are edited. We propose signature filtering, a detection-time module that enhances watermark detection without modifying watermark embedding and text generation. It learns a small set of ``signature'' tokens whose presence makes watermark tests unreliable, and removes these tokens before detection. The signatures are obtained by solving a mixed-integer linear program on a small training set, with constraints that maximize the true positive rate. We additionally derive finite-sample and asymptotic bounds under several attacker models (color-blind, color-adaptive, and distributionally correlated). On four well-known watermark families (Kgw, Sweet, Unigram, Exp), four benchmark corpora (C4, MBPP, HumanEval, Code-Search-Net), and six LLMs (Opt-1.3b, Opt-6.7b, Llama2-13b, Llama3.1-8b, Qwen2.5-14b, Phi-3-medium-14b), 2- and 3-gram signatures raise detection rates in weak-signal and low-entropy settings from 8~31% without filtering to 78~99% with filtering, while keeping false positives controllable and often negligible. In stress tests where we scramble sentences and perturb 25~50% of tokens by dilution, deletions, and substitutions, 2-gram filters for Kgw-style watermarks preserve most of the clean-text detection gains, often matching or outperforming the advanced WinMax watermark detector. Signature filtering thus provides a simple, scalable, and model-agnostic add-on to strengthen watermark-based provenance checks for LLM text in information processing workflows.

15.
arXiv (CS.LG) 2026-06-12

Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines

arXiv:2606.13454v1 Announce Type: cross Abstract: Equilibrium Propagation offers a compelling alternative to traditional machine learning for training energy-based networks. Here we demonstrate a hybrid optical-digital implementation of EP using a Spatial Photonic Ising Machine (SPIM). The SPIM exploits the gauge transformation method to optically encode both continuous neuron states and rank-1 binary trainable patterns as phase modulations via a spatial light modulator, with inference realized using a finite difference scheme. The experimental system is evaluated on the Wine classification dataset. The potential of this approach, including the use of continuous couplings and structured coupling matrices, is evaluated numerically on the more complex MNIST dataset. Our work provides a concrete pathway toward energy-efficient physical implementations of Equilibrium Propagation.

16.
arXiv (quant-ph) 2026-06-25

Rounding Almost Commuting Hamiltonians

arXiv:2605.26096v2 Announce Type: replace Abstract: Commuting Hamiltonians lie at the boundary between classical constraint satisfaction and quantum many-body physics, exhibiting rich quantum structure while remaining more tractable than general noncommuting models. In contrast, physical Hamiltonians are rarely exactly commuting, which naturally motivates the study of almost commuting Hamiltonians. Despite their relevance, the implications of approximate commutation are only poorly understood. In this work, we show how to efficiently approximate any almost commuting $2$-local qubit Hamiltonian by a commuting one: we give a new locality-preserving algorithmic rounding technique that maps any $2$-local Hamiltonian $H=\sum_{i=1}^m h_i$ with $\|[h_i,h_j]\| \leq \epsilon$ to a nearby Hamiltonian $\hat{H}$ whose terms pair-wise commute, and which is within overall distance $\|H-\hat{H}\| = O(m\,\epsilon^{1/6})$. As a consequence, we show that $\delta$-approximations to the ground energy for $\epsilon$-almost commuting $2$-local qubit Hamiltonians lie in $\mathsf{NP}$ when $\delta \gg m\epsilon^{1/6}$, extending the classical containment well beyond the commuting setting. Finally, we present two applications of our rounding framework: Gibbs sampling and fast Hamiltonian simulation for almost commuting systems.

17.
arXiv (quant-ph) 2026-06-15

Nanostructure modelling with early fault tolerant quantum computers

arXiv:2606.06442v2 Announce Type: replace Abstract: Semiconductor nanostructures are central to many developing technologies. Notably, double quantum dots are especially important for semiconductor spin-qubit architectures, quantum sensing applications, and quantum-dot solar cells. Accurate modelling is highly desirable but conventional methods can struggle when dynamics involve more than two interacting electrons. In this work, we present a quantum simulation framework capable of addressing multi-electron double quantum dots. We adopt an efficiently scaling 1$^st$ quantised representation of the system and develop algorithms based on both Trotterisation and Qubitisation. Incorporating insights from classical simulations enables us to produce resource estimates that are more realistic than those obtained from theoretical error bounds. Using a standard surface code model with physical noise at $10^{-3}$, our results indicate that the ground-state energy of four electrons in a double quantum dot can be estimated in approximately 22 hours using 226k physical qubits, or an eight-electron system in 3.3 days with 314k qubits (with runtimes falling dramatically when more qubits are available). We anticipate that incorporating recent advances in surface code architectures may reduce these costs significantly further. Our results suggest that early fault-tolerant quantum computers may become valuable tools for designing mature-era quantum technologies.

18.
arXiv (CS.AI) 2026-06-11

Towards an Inferentialist Account of Information Through Proof-theoretic Semantics

arXiv:2605.05368v5 Announce Type: replace-cross Abstract: Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting components. First, conceptual analysis: the metaphysics of information. Dretske expressed the key concepts of information in terms of intentionality, truth, and transmissibility. We replace truth with inferability, and trace the consequences of this replacement. Second, logic: proof-theoretic semantics (P-tS) provides a mathematical-logical realization of inferentialist reasoning. Using P-tS, we develop the first steps towards a mathematical-logical theory of an inferentialist primitive unit of information, the 'inferon'. This proof-theoretic approach counterpoints the model-theoretic view of information articulated in situation theory. Furthermore, we argue that it facilitates addressing all three components of van Benthem and Martinez's categorization of the understandings of information, as range, as correlation, and as code. Our focus is on information-as-correlation. Third, systems: the P-tS tools we develop provide the basis for a mathematical account of distributed systems modelling – a key tool from informatics for understanding the organization of information processing systems. This yields a reasoning-based theory of information flow in models of distributed systems. Overall, we seek to give a conceptually rigorous mathematical-logical account of information and its role within informatics, grounded in inference and reasoning.

19.
arXiv (CS.CL) 2026-06-25

Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming

Recent advances in sequence modeling have significantly improved ASR systems, bringing them close to human-level recognition accuracy and enhancing robustness across diverse acoustic conditions and languages. In contrast, Forced Alignment has not experienced comparable progress, and traditional HMM-GMM frameworks remain widely adopted and highly competitive. To address this gap, we propose an end-to-end, fully differentiable neural architecture specifically designed for phoneme alignment. The model consists of an encoder that processes the input signal and a decoder that produces alignment decisions. The encoder is structured into two complementary branches: one dedicated to phoneme identity verification and the other to phoneme boundary detection. The decoder is implemented as a trainable module based on differentiable soft dynamic programming. The entire system is optimized end-to-end using a novel contrastive loss that encourages clear separation between steady-state phoneme regions and transition boundaries. The proposed approach outperforms the current state of the art in phoneme alignment on hand-annotated English benchmarks, achieves strong word-level generalization results, and demonstrates generalization on unseen languages.

20.
arXiv (CS.LG) 2026-06-17

Geodesic Calculus on Implicitly Defined Latent Manifolds

arXiv:2510.09468v3 Announce Type: replace Abstract: Latent manifolds of autoencoders provide low-dimensional representations of data, which can be studied from a geometric perspective. We propose to describe these latent manifolds as implicit submanifolds of some ambient latent space. Based on this, we develop tools for a discrete Riemannian calculus approximating classical geometric operators. These tools are robust against inaccuracies of the implicit representation often occurring in practical examples. To obtain a suitable implicit representation, we propose to learn an approximate projection onto the latent manifold by minimizing a denoising objective. This approach is independent of the underlying autoencoder and supports the use of different Riemannian geometries on the latent manifolds. The framework in particular enables the computation of geodesic paths connecting given end points and shooting geodesics via the Riemannian exponential maps on latent manifolds. We evaluate our approach on various autoencoders trained on synthetic and real data.

22.
arXiv (CS.CL) 2026-06-17

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

23.
arXiv (CS.CL) 2026-06-24

ComputeFHE: A Privacy-Preserving General-Purpose Computation Library

Fully Homomorphic Encryption (FHE) enables computations to be performed directly on encrypted data while preserving data confidentiality. However, its practical applications remain limited by high computational costs and development complexity. This paper presents ComputeFHE, an open-source C++ library that facilitates the development of privacy-preserving applications based on the TFHE cryptosystem. The library provides encrypted integer and fixed-point data types together with arithmetic, logical, comparison, conditional, and oblivious array-access operations which allow developers to implement algorithms using a familiar imperative programming paradigm. ComputeFHE supports both conventional TFHE arithmetic based on standard two-input logic gates and an optimized Arithmetic Logic Unit (ALU) architecture utilizing FHE-friendly logic primitives. Experimental results demonstrate significant reductions in the number of required bootstrapping operations, achieving performance improvements of up to 3.9x for selected operations. In addition, the library includes a simulation mode that enables testing, debugging, and complexity analysis without performing actual cryptographic computations while providing circuit complexity and bootstrapping costs. Built on top of OpenFHE, ComputeFHE offers a practical and accessible framework for developing and evaluating privacy-preserving algorithms and applications.

24.
arXiv (CS.AI) 2026-06-11

Autoregressive Direct Preference Optimization

arXiv:2602.09533v2 Announce Type: replace Abstract: Direct preference optimization (DPO) has emerged as a promising approach for aligning large language models (LLMs) with human preferences. However, the widespread reliance on the response-level Bradley-Terry (BT) model may limit its full potential, as the reference and learnable models are assumed to be autoregressive only after deriving the objective function. Motivated by this limitation, we revisit the theoretical foundations of DPO and propose a novel formulation that explicitly introduces the autoregressive assumption prior to applying the BT model. By reformulating and extending DPO, we derive a novel variant, termed Autoregressive DPO (ADPO), that explicitly integrates autoregressive modeling into the preference optimization framework. Without violating the theoretical foundations, the derived loss takes an elegant form: it shifts the summation operation in the DPO objective outside the log-sigmoid function. Furthermore, through theoretical analysis of ADPO, we show that there exist two length measures to be considered when designing DPO-based algorithms: the token length $\mu$ and the feedback length $\mu'$. To the best of our knowledge, we are the first to explicitly distinguish these two measures and analyze their implications for preference optimization in LLMs.

25.
arXiv (CS.CV) 2026-06-11

Intelligent Skin Cancer Detection Using a Multispectral Metasurface and a Hybrid

Skin cancer is among the most prevalent malignancies worldwiAdbe satnradcitts early detection is essential for improving patient survival and reducing treatment costs Conventional dermoscopic and visual imaging techniques are primarily limited to the visible spectrum and often fail to capture subtle spectral signatures associated with early stage malignancies This study proposes an innovative framework that integrates a multispectral metasurface for imaging with a hybrid deep learning architecture based on Convolutional Neural Networks and Vision Transformers The designed metasurface enables noninvasive acquisition of rich spectral information highly sensitive to tissue alterations while the hybrid CNN ViT model simultaneously extracts local and global features to robustly classify skin lesions Simulation-based evaluations demonstrate that the proposed method achieves approximately 98 accuracy 95 percentages sensitivity and 99 perentage specificity surpassing conventional RGB-based and single-architecture approaches Qualitative analyses using attention maps reveal that the model focuses on clinically relevant lesion regions improving interpretability Overall the results indicate that combining metasurface based multispectral imaging with hybrid deep learning can introduce a new generation of diagnostic tools in dermatology and pave the way for portable fast and highly accurate clinical systems