Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-19

Improved Stochastic Optimization of LogSumExp

arXiv:2509.24894v4 Announce Type: replace-cross Abstract: The LogSumExp function, dual to the Kullback-Leibler (KL) divergence, plays a central role in many important optimization problems, including entropy-regularized optimal transport (OT) and distributionally robust optimization (DRO). In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. We propose a novel convexity- and smoothness-preserving approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the Safe KL divergence. Our experiments and theoretical analysis of the LogSumExp-based stochastic optimization, arising in DRO and continuous OT, demonstrate the advantages of our approach over existing baselines.

02.
arXiv (math.PR) 2026-06-11

Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision Processes

arXiv:2605.12191v2 Announce Type: replace-cross Abstract: Given rationals $\alpha$ and $\beta$, the sure-almost-sure problem for a threshold Boolean objective $\varphi$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $\varphi$-value at least $\alpha$ (i.e. sure $\alpha$ satisfaction) and with probability $1$ the outcome has $\varphi$-value at least $\beta$ (i.e. almost-sure $\beta$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $\varphi$-value at least $\alpha$ and with probability at least $1 - \varepsilon$ the outcome has $\varphi$-value at least $\beta$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that eventually, from every point in the infinite run, the average payoff becomes greater than a given threshold within a finite window length. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

03.
arXiv (CS.CL) 2026-06-19

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared across architectures. Across four instruction-tuned model families (Qwen2.5-1.5B, Gemma-2-2B, Llama-3.2-1B, Ministral-3-3B) finetuned identically, a difference-in-means direction achieves 99.6% separation of aligned and misaligned activations at each model's final layer. Causal steering by subtracting this direction reduces code spillover by 21-51 points, while a secure-code control confirms content specificity. Cross-architecture transfer via ridge regression maps yields large behavioral suppression (up to 46 points) but fails specificity controls as random and orthogonal directions perform comparably. We identify a two-tier specificity structure: within-model directions are causally specific and actionable; cross-model directions are causally real but non-specific. An asymmetric transfer topology emerges, with Gemma and Qwen acting as geometric donors and Llama as a receiver. These findings define the limits of linear cross-architecture correction and recommend within-model probing for auditing.

04.
arXiv (CS.LG) 2026-06-19

Tracking Representation Dynamics in Large Language Models with Persistent Homology

arXiv:2606.19542v1 Announce Type: new Abstract: Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking the topology of activation spaces throughout fine-tuning. Across four transformer language models ranging from 1B to 7B parameters and three alignment objectives corresponding to helpful, harmless, and mixed training data, we find that the majority of topological reorganization occurs during the earliest stages of training. A dense checkpoint analysis reveals a transient peak in topological activity followed by rapid stabilization. We further show that different alignment objectives induce distinguishable topological trajectories, while instruction-tuned and pretrained models exhibit qualitatively different patterns of evolution. Our results suggest that persistent homology provides a complementary perspective on alignment, revealing representation-level changes that are not apparent from behavioral metrics alone.

05.
arXiv (quant-ph) 2026-06-12

Explicit Quantum Circuit Simulation of Nonlinear 1-Dimensional Fluid with Carleman-linearized Boltzmann Method

arXiv:2606.12770v1 Announce Type: new Abstract: Quantum computation of fluid dynamics has attracted growing attention as a key application of fault-tolerant quantum computers anticipated in the coming decade, with lattice Boltzmann methods emerging as a particularly promising approach. Explicit and efficient elementary-gate-level circuit simulations, however, have so far been demonstrated only in the linear case. Here we include the leading nonlinearity through second-order Carleman linearization of the one-dimensional Boltzmann equation, and demonstrate, via explicit quantum-circuit simulation, the preparation of the final-time state using a Taylor-expansion-based ODE solver based on the quantum singular value transformation. With this construction, we analyze the gate and qubit complexities, which scale logarithmically with the grid size, the nonlinearity captured by the higher-order Carleman linearization, and the practical utility of higher-order expansions in the Taylor ODE solver. The construction provides a concrete baseline for computational cost reduction and further developments such as extensions to higher dimensions, complex geometries, and the extraction of physical quantities, towards industrially useful quantum CFD.

06.
bioRxiv (Bioinfo) 2026-06-18

Structure Bioinformatics of Eight Human ATP Synthase Fo Subunits and Their AlphaFold3-Predicted Water-Soluble QTY Analogs

Human mitochondrial ATP synthase is an essential rotary motor enzyme that produces most of the cellular ATP through oxidative phosphorylation. Its membrane-embedded Fo sector contains highly hydrophobic transmembrane subunits that are challenging to study in aqueous environments without detergents. This study explores whether applying the QTY code can reduce the hydrophobicity of selected ATP synthase Fo subunits while preserving their overall molecular structures. We applied the QTY code to eight human ATP synthase Fo subunits: ATP6, ATP8, ATPK, ATP68, ATPMK, AT5G1, AT5G2, and AT5G3. Hydrophobic amino acids leucine (L), isoleucine (I), valine (V), and phenylalanine (F) in transmembrane regions were systematically replaced with hydrophilic glutamine (Q), threonine (T), and tyrosine (Y). Four native subunits with available CryoEM structures from human ATP synthase (PDB: 8H9S) were superposed with their AlphaFold3-predicted QTY analogs. The native ATP synthase Fo subunits superposed well with their respective QTY analogs. For the CryoEM-native comparisons, RMSD values ranged from 0.565[A] to 2.546[A]. For the AlphaFold3-native comparisons of subunits without CryoEM structures, RMSD values ranged from 0.204[A] to 0.297[A]. Despite substantial QTY substitutions in the transmembrane regions, ranging from 38.89% to 50.79%, the QTY analogs retained similar overall folds, molecular weights, and isoelectric points. Hydrophobic surface analysis showed that the QTY analogs had reduced hydrophobic patches compared with their native counterparts, with average hydrophobicity decreasing from 0.2959 in native proteins to -1.1023 in QTY analogs. These structural bioinformatics studies suggest that the QTY code can be applied to ATP synthase Fo subunits to generate more hydrophilic, potentially water-soluble analogs while preserving overall structural similarity. These results extend the application of the QTY code to the membrane-embedded Fo sector of ATP synthase and provide a foundation for future experimental studies testing whether these QTY analogs can be expressed, purified, and evaluated for assembly or proton-transfer-related functions.

07.
medRxiv (Medicine) 2026-06-22

Characteristics and Outcomes of Gene-Elusive Dilated Cardiomyopathy

Background and Aims Genetic testing in dilated cardiomyopathy (DCM) guides risk stratification and family screening. Likely pathogenic or pathogenic (LP/P) variants are identified in approximately one-third of patients, leaving many without a genetic diagnosis. Cohort studies suggest that "gene-elusive" patients have a lower risk of adverse events. This study aims to better characterise this group and identify factors associated with adverse outcomes. Methods Consecutive and unrelated DCM patients undergoing genetic testing and returning no LP/P variants were retrospectively recruited and compared to two control cohorts of DCM patients carrying LP/P variants in LMNA and TTN for a primary composite endpoint of end-stage heart failure (ESHF) or malignant ventricular arrhythmia (MVA). Results Among patients without prior MVA, the composite endpoint occurred in 36/423 (8.5%) gene-elusive, 14/39 (35.9%) LMNA and 11/100 (11%) TTN cardiomyopathy patients (log-rank p

08.
arXiv (CS.AI) 2026-06-15

Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis

arXiv:2606.08881v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have demonstrated strong generalization in robotic manipulation, yet existing evaluations are primarily conducted in simulation or on expensive robotic platforms, leaving their robustness on affordable real-world robots largely unexplored. We present a standardized real-world benchmark for evaluating representative VLA and imitation learning policies on the low-cost SO-101 robotic platform. The benchmark comprises four representative manipulation tasks together with unified evaluation protocols, enabling systematic comparison under embodiment uncertainty. Using real-world teleoperated demonstrations, we fine-tune and evaluate $\pi_{0.5}$, SmolVLA, Wall-X, and ACT directly on the physical platform. Beyond conventional task success rates, the benchmark incorporates a structured failure taxonomy, semantic- and execution-level failure decomposition, and recovery-aware evaluation metrics to characterize policy robustness. Experimental results show that stronger pretrained VLA policies generally outperform the imitation learning baseline, although performance remains highly task-dependent under low-cost robotic deployment conditions. Execution instability emerges as the dominant failure source, while recovery capability varies substantially across architectures. These results highlight the importance of failure and recovery analysis beyond binary task success and establish SO-101 as a practical benchmark for evaluating embodied AI systems under realistic low-cost robotic deployment conditions.

09.
arXiv (CS.CL) 2026-06-15

Detecting Historical Turning Points in Italian Media: A Complex Systems Approach to a Diachronic News Corpus

The increasing availability of large-scale textual corpora has opened new possibilities for data-driven, quantitative approaches to historical analysis using Natural Language Processing (NLP). However, diachronic corpora with historical relevance from the pre-digital era remain scarce and often incomplete. We present a quantitative approach to historical analysis based on the reconstruction and exploration of a diachronic corpus of around 600,000 articles from the Italian newspaper "La Repubblica", covering all the articles published from the 1st of January 1985 to the 31st of December 2000 - a period of major political, social, and geopolitical change in Italy and globally. Using NLP techniques, we analyze the text at both lexical and semantic levels; we then apply tools from complex systems and statistical physics to trace shifts in media discourse over time. This allows us to detect key transition periods, such as the transition from the First Republic to the Second Republic in Italy, or major international conflicts like the Gulf War or the Kosovo War, without relying on prior labeling. The results show how combining computational linguistics with ideas from complex systems can offer new quantitative insight into historical changes, opening up new paths for studying the dynamics of media and society through large-scale textual data.

10.
arXiv (CS.CV) 2026-06-16

EcoBin: A Two-Stage Deep Convolutional Neural Network for Contamination-Aware Waste Classification

Waste classification models have become highly accurate at sorting waste, often exceeding 95% on benchmark datasets. However, these models fail to account for contamination in recyclable waste. We present EcoBin, a two-stage deep convolutional neural network that classifies household waste by its disposal pathway and that explicitly accounts for contamination. The first stage is a base waste classifier built on an EfficientNetV2-S backbone that assigns each of the thirty waste categories in our dataset to one of four disposal pathways. The second stage is a contamination classifier that inspects any item routed toward recycling and overrides the decision to garbage when contamination is detected. Because no public dataset of contaminated recyclables exists, we synthesize one by segmenting images of clean recyclable objects with a U2-Net model and compositing realistic contamination textures onto their surfaces. The first stage achieves 87.42% test accuracy and a 96.13% pathway-adjusted accuracy. Meanwhile, the contamination stage distinguishes clean from contaminated items with a 0.99 ROC-AUC. On a test set of contaminated recyclables, the complete pipeline routes 24 of 25 items correctly, compared with only 1 of 25 for the base classifier alone. A McNemar's test confirms that the improvement contributed by the contamination stage is statistically significant (p < 0.001).

11.
arXiv (CS.AI) 2026-06-11

Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions

arXiv:2604.25018v2 Announce Type: replace-cross Abstract: The Internet of Everything (IoE) represents an evolution of the Internet of Things (IoT) by integrating people, data, processes, and things into a unified intelligent ecosystem. IoE aims to enhance automation, decision-making, and service efficiency across multiple application domains such as smart cities, healthcare, industry, and next-generation wireless networks. This paper provides a structured overview of the IoE concept, its core components, architectural foundations, enabling technologies, and major research challenges. Finally, open research directions toward 6G-enabled intelligent IoE systems are discussed, with emphasis on scalability, security, privacy, and energy efficiency.

12.
arXiv (CS.CL) 2026-06-12

Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding of AR-LLMs to dLLMs. Spiffy performs auto-speculation to eliminate the overheads of an independent draft model, structuring draft states in the form of a novel directed draft graph to take advantage of the bidirectional, blockwise nature of dLLM generation. These draft graphs are calibrated offline to maximize acceptance rates and are dynamically pruned during inference for improved computational efficiency. We present a detailed formulation of Spiffy and demonstrate its ability to accelerate LLaDA, Dream, and SDAR models in combination with KV caching and threshold-based dynamic unmasking leading to up to $8.6\times$ reduction in model inferences and $6.3\times$ acceleration in token rate.

13.
arXiv (CS.CV) 2026-06-16

From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models

Standard Large Language Models (LLMs) are predominantly designed for static inference with pre-defined inputs, which limits their applicability in dynamic, real-time scenarios. To address this gap, the streaming LLM paradigm has emerged. However, existing definitions of streaming LLMs remain fragmented, conflating streaming generation, streaming inputs, and interactive streaming architectures, while a systematic taxonomy is still lacking. This paper provides a comprehensive overview and analysis of streaming LLMs. First, we establish a unified definition of streaming LLMs based on data flow and dynamic interaction to clarify existing ambiguities. Building on this definition, we propose a systematic taxonomy of current streaming LLMs and conduct an in-depth discussion on their underlying methodologies. Furthermore, we explore the applications of streaming LLMs in real-world scenarios and outline promising research directions to support ongoing advances in streaming intelligence. We maintain a continuously updated repository of relevant papers at https://github.com/EIT-NLP/Awesome-Streaming-LLMs.

14.
arXiv (quant-ph) 2026-06-15

Spin-orbit coupling by design in quantum state engineering of atomically defined quantum dots

arXiv:2606.14487v1 Announce Type: cross Abstract: Tuning spin-orbit coupling is essential in controlling both spin and charge in confined semiconductor nanostructures, yet it is rarely a truly controllable parameter. Here, we show control over the spin-orbit Hamiltonian in quantum dots and the resulting quantum states by tailoring the confinement potential with atomic-scale precision. Using scanning tunnelling microscopy and spectroscopy, we pattern individual Cs ions into designer quantum dot structures on the surface of indium antimonide, in which electrons from a two-dimensional electron gas are confined with chosen in-plane electric-field gradients. We then quantify the atomic level structure, both spatially resolving the orbital character of the electronic states and their magnetic-field evolution. We demonstrate that the level structure, including the induced zero-field splitting, can be tailored by the designed geometry of the local electric fields. These effects can be described using a Hamiltonian that allows consistent treatment of the confinement-induced spin-orbit coupling beyond the conventional Bychkov-Rashba description. This Hamiltonian is derived from a multiband k.p model and takes the energy dependence of the relevant physical parameters into account. Such precise control of spin-orbit coupling in semiconductor quantum dots is relevant to quantum and spintronic technologies.

15.
arXiv (CS.CV) 2026-06-11

RSTR: Reducing SpatioTemporal Redundancy in Diffusion Transformers

Diffusion Transformers (DiTs) have achieved remarkable success in image generation, yet their deployment is hindered by high computational costs. We identify two sources of redundancy. First, temporal redundancy: Classifier-Free Guidance (CFG) applies costly dual forward passes at every timestep, yet guidance matters only at specific steps, and variable scales at critical steps can compensate for skipping others. Second, spatial redundancy: under variable guidance, different transformer blocks exhibit heterogeneous sensitivity, yet uniform calibration across all blocks wastes computation while failing to address their varying requirements. We present RSTR, the first framework to jointly reduce spatiotemporal redundancy in diffusion transformers. Stage-1 addresses temporal redundancy through evolutionary search, discovering sparse guidance schedules with variable scales. Stage-2 addresses spatial redundancy through adaptive rank allocation, assigning calibration capacities to transformer regions based on their sensitivity. Experiments on DiT-XL/2, PixArt-$\alpha$, FLUX, and state-of-the-art Qwen-Image demonstrate 50%-70% compute savings while maintaining or improving quality. On DiT-XL/2, RSTR achieves 57% savings with 15% FID improvement; on Qwen-Image, 3.43$\times$ speedup with preserved quality.

16.
arXiv (quant-ph) 2026-06-12

Information gain and measurement disturbance for quantum agents

arXiv:2402.08060v3 Announce Type: replace Abstract: The traditional formalism of quantum measurement (hereafter ``TQM'') describes processes where some properties of quantum states are extracted and stored as classical information. While TQM is a natural and appropriate description of how humans interact with quantum systems, it is silent on the question of how a more general, quantum, agent would do so. How do we describe the observation of a system by an observer with the ability to store not only classical information but quantum states in its memory? In this paper, we extend the idea of measurement to a more general class of sensors for quantum agents which interact with a system in such a way that the agent's memory stores information (classical or quantum) about the system under study. For appropriate sensory interactions, the quantum agent may ``learn'' more about the system than would be possible under any set of classical measurements – but as we show, this comes at the cost of additional measurement disturbance. We experimentally demonstrate such a system and characterize the tradeoffs by considering the channel capacity required to erase the effect of a measurement.

17.
arXiv (CS.AI) 2026-06-18

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

arXiv:2606.18315v1 Announce Type: cross Abstract: Sequential output generation with large-scale Transformer and diffusion decoders pays a memory cost that grows with sequence length, plus iterative per-step computation. Replacing them with small feed-forward decoders restores efficiency but produces unstructured latent representations that limit closed-loop control: phase-conditioned action generation and cross-step latent carry-over both require a latent geometry with stable basins. This article proposes Ghost Attractor Networks, a theoretically derived dynamical decoder whose latent evolves under a learned potential with drift and produces a basin-attractor structure by construction. Three desiderata (multi-modality, decoder-level single-pass switching, and constant memory) motivate the potential-drift form, and mode transitions arise as saddle-node bifurcations with ghost-attractor escape. A hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. Empirically, a Ghost trained end-to-end with a behavioral-cloning and contrastive objective exhibits the predicted gradient-flow contraction in its potential, with the gradient norm decaying by 67 percent across five integration steps on 1430 held-out samples. Ghost is evaluated as a robotic action decoder. A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer at 462 times fewer parameters and 32 times lower latency, and beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent. On the LIBERO-10 closed-loop benchmark, phase conditioning on Ghost's basin-structured latent yields a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reaches a 95.7 percent final success rate.

18.
PLOS Medicine 2026-05-21

U = U for all: Advancing equity in HIV prevention

by Thiago S. Torres, Paula M. Luz Suppression of HIV with antiretrovirals eliminates HIV transmission risk, summarized as Undetectable = Untransmittable (U = U). However, U = U literacy remains unevenly understood and shared, and stigmas persist. Equitable and accurate awareness of U = U requires culturally tailored interventions, improved provider education, and supportive policy environments beyond biomedical evidence alone. Suppression of HIV with antiretrovirals eliminates HIV transmission risk, summarized as Undetectable = Untransmittable (U=U). However, U=U literacy remains unevenly understood and shared, and stigmas persist. In this Perspective, Thiago Torres and Paula Luz outline what is needed to improve equity and accuracy in global awareness and education of U=U.

19.
medRxiv (Medicine) 2026-06-15

Primary care practitioners preconception health literacy and information-seeking: A cross-sectional survey.

Background Parental health before pregnancy influences maternal and child outcomes. Primary care professionals, including general practitioners [GPs], midwives, and naturopaths, can provide preconception care, yet many report limited knowledge and difficulty accessing relevant information. This study described Australian GPs, midwives, and naturopaths preconception health literacy, including knowledge and ability to access information. Methods Between July and September 2022, Australian GPs, midwives, and naturopaths completed a 32-item online cross-sectional survey. Participants were recruited through professional associations, and data were analysed using descriptive and inferential statistics Results Participants (N=373) included naturopaths (40.7%), GPs (32.4%), and midwives (26.8%). Reported barriers to clinician health literacy including lack of preconception care resources (25.5%), and limited clinician knowledge (23.6%). The proportion identifying limited clinician knowledge differed significantly between professions (GP: 31.4%; midwives: 23.0%; naturopaths: 17.8%; p=0.030). The highest level of accurate knowledge regarding preconception exposures was for pre-pregnancy obesity (82.7%), while low birth weight was the most accurately identified preconception outcomes (83.7%). Incorrect responses were most common for maternal multivitamin use as an exposure (28.3%) and childhood leukaemia as an outcome (26.3%). Differences between professions were strongest for infant outcomes, with moderate associations observed for shoulder dystocia (V=.2355), precipitous labour (V=.2173), macrosomia (V=.2060), labour dystocia (V=.2018) and cryptorchidism (V=.2018). Discussion Preconception health literacy varies across primary care professions. Clinicians require greater access to targeted resources and education tailored to their differing scopes of practice and experience. Improving clinician preconception health literacy may strengthen consistent evidence-based care and support better maternal, child, and long-term family health outcomes.

20.
arXiv (CS.CL) 2026-06-11

When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models

Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic multilingual evaluation of VLA models by translating the LIBERO benchmark into ten languages, revealing severe performance degradation under non-English instructions, with success rates dropping by 30-50%. Through fine-grained analysis of task executions, we find that language influence is highly non-uniform across steps: certain steps exhibit strong language dependence and dominate overall task failure, while others are largely language-agnostic. Based on this insight, we propose a step-wise inference-time intervention that aligns representations according to step language sensitivity, substantially improving performance under linguistic variation. Our results indicate that language robustness in VLA models is fundamentally a step-wise control problem, highlighting the importance of temporally structured analysis for reliable embodied agents.

21.
medRxiv (Medicine) 2026-06-22

Toward less intrusive pubertal assessment: longitudinal evaluation of tanner and non-tanner metrics in East African adolescents

Background: Accurate pubertal assessment is essential in pediatric endocrinology and adolescent health research. While Tanner staging remains the gold standard, its subjective nature and invasive genital examination limit feasibility and acceptability, especially in longitudinal studies and culturally sensitive settings. This study evaluated less intrusive pubertal assessment combinations that maintain discriminative accuracy. Methods: We conducted a longitudinal study among 200 uncircumcised, sexually naive males aged 15-17 years in Southwestern Uganda, with quarterly follow-up over three years. Clinicians assessed Tanner staging metrics (pubic hair, testicular volume, penile length, scrotal color), axillary hair, and serum testosterone. Markov transition models estimated Tanner stage progression. Ordinal logistic regression and area under the receiver operating characteristic curve (AUC) analyses quantified discriminative performance of individual and combined metrics. Results: At baseline, participants were distributed across Tanner stages II (6.0%), III (13.5%), IV (55.0%), and V (25.5%). Among individual metrics, pubic hair distribution best predicted overall Tanner stage (AUC=0.867), while penile length was least predictive (AUC=0.833). The full four-metric Tanner model achieved high discrimination (AUC=0.993). However, a less intrusive combination of pubic hair and scrotal color achieved comparable discrimination (AUC=0.942), improving to AUC=0.953 with axillary hair and age. Markov modeling demonstrated frequent bidirectional transitions between Tanner stages IV and V, reflecting variability in longitudinal staging. Conclusions: A minimally intrusive assessment combining pubic hair, scrotal color, axillary hair, and age reliably predicts pubertal stage, offering an acceptable alternative to traditional Tanner staging for research and surveillance contexts where genital manipulation is impractical or unethical.

22.
arXiv (CS.AI) 2026-06-18

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

arXiv:2601.14968v2 Announce Type: replace-cross Abstract: Most existing time series classification methods adopt a discriminative paradigm that maps input sequences directly to one-hot encoded class labels. While effective, this paradigm struggles to incorporate contextual features and fails to capture semantic relationships among classes. To address these limitations, we propose InstructTime, a novel framework that reformulates time series classification as a multimodal generative task. Specifically, continuous numerical sequences, contextual textual features, and task instructions are treated as multimodal inputs, while class labels are generated as textual outputs by tuned language models. To bridge the modality gap, InstructTime introduces a time series discretization module that converts continuous sequences into discrete temporal tokens, together with an alignment projection layer and a generative self-supervised pre-training strategy to enhance cross-modal representation alignment. Building upon this framework, we further propose InstructTime++, which extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models. InstructTime++ leverages specialized toolkits to mine informative implicit patterns from raw time series and contextual inputs, including statistical feature extraction and vision-language-based image captioning, and translates them into textual descriptions for seamless integration. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of InstructTime++.

23.
arXiv (CS.CV) 2026-06-17

Remote sensing data imputation using deep learning for multispectral imagery

Remote sensing techniques have been increasingly utilised in aquatic applications in recent years. A common challenge in using optical satellite data is the presence of missing observations due to cloud cover. These data gaps can lead to missed detection of critical events, such as algal blooms, in lakes of high interest to water authorities. As a result, enhancing the completeness of optical satellite datasets is crucial for improving the monitoring and prediction of algal blooms. In this study, we compared a traditional data imputation method (i.e., linear interpolation) with deep learning models for reconstructing missing spectral bands across four lakes with historical records of algal blooms. The deep learning models adopted include CNN-based architectures (i.e., CNN, Inception Resnet, and Autoencoder) and CNN-LSTM-based architectures (i.e., CNN-LSTM, Resnet-LSTM, and Autoencoder-LSTM). Our results demonstrated that deep learning models substantially outperformed the baseline linear interpolation method in imputing spectral band values within artificially masked regions. Among these models, CNN delivered the best performance across most lakes. Furthermore, we evaluated the performance of algal bloom indices (i.e., Green/Red and NDCI) derived from the imputed imagery by comparing them with the observed data. Our results demonstrate that deep learning models are effective for imputing missing data in PlanetScope SuperDove imagery, enabling more reliable applications in water monitoring.

24.
arXiv (CS.LG) 2026-06-11

GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators

arXiv:2606.08343v2 Announce Type: replace Abstract: We introduce GENERIC-FNO, the first neural operator to embed the full GENERIC (metriplectic) structure of nonequilibrium thermodynamics – reversible, energy-conserving dynamics and irreversible, entropy-producing dynamics coupled through the degeneracy conditions – directly in function space. Existing structure-preserving neural operators enforce at most a single conservation law or reversible (Hamiltonian) structure, while thermodynamically consistent learning has been confined to finite-dimensional, graph, or particle systems. GENERIC-FNO closes this gap: it learns the energy and entropy functionals as neural operators and parameterizes the Poisson and friction operators as diagonal Fourier multipliers sandwiched between rank-one projections that enforce the degeneracy conditions exactly, by construction, with no penalty term, update projection, or residual. The degeneracy identities hold to machine precision (residuals ~10^-13) for any initialization, dimension, or resolution, so the continuous-time dynamics conserve the learned energy and produce entropy exactly; the explicit time stepping adds only a small O(dt^2) drift (per-step residual ~10^-6). We further note that the (E,S,L,M) decomposition of a given flow is not unique, and introduce a gauge-invariant dissipation diagnostic separating reversible from dissipative dynamics independently of the learned functionals. Across three operator backbones (1D/2D FNOs and DeepONet) and four PDEs spanning reversible, dissipative, and mixed regimes, GENERIC-FNO preserves its exact structural guarantees zero-shot across a 4x super-resolution range (64 to 256), recovers the ground-truth ordering of physical dissipation, and is competitive with strong unconstrained and energy-penalized baselines, outperforming them on several dissipative and mixed problems at comparable or fewer parameters.

25.
arXiv (CS.CL) 2026-06-18

Output Vector Editing for Memorization Mitigation in Large Language Models

Large language models memorize and reproduce sequences from their training data, creating privacy, copyright, and security risks. Existing neuron-level mitigation methods equate editing with zeroing out neuron activations, but the activation only controls whether a neuron engages; the output vector is what writes to the residual stream and, through superposition, encodes multiple features. We propose output vector editing, a constrained-optimization weight edit that locates a small set of MLP neurons responsible for a memorized continuation and minimally modifies their output vectors to introduce a distractor in vocabulary space, redirecting their residual-stream contributions while leaving activations unchanged. Evaluating on four models from 360M to 7B parameters (SmolLM-360M, OLMo-1B, OLMo-7B, Llama2-7B), we center on OLMo-7B (whose open weights and pretraining corpus enable systematic mining) and mine 6831 memorized sequences, achieving up to 87.9% suppression. The 2.7$\times$ gap over zero ablation on the same located neurons shows the suppression comes from the output-vector edit, not localization alone. Four edit modes span a spectrum from aggressive suppression to minimal redirection; in ensemble they cover 96.5% of memorized sequences, while our recommended single-mode configuration reaches 81.5% with no catastrophic locality failures. We further identify a mechanistic boundary at ${\sim}14%$ of sequences unreachable by MLP-only editing; while these failures are not attention-driven overall, ablating the top contributing attention heads recovers 60–64% of them, with stronger recovery on continuations that copy tokens from the prefix, positioning attention as a complementary fallback rather than a primary mechanism. Edit mode ordering and the success-locality trade-off transfer across all four models, with success rates scaling with model size rather than family.