Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-11

FOCUS: DLLMs Know How to Tame Their Compute Bound

Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In this work, we identify a key inefficiency in DLLM decoding: while computation is parallelized over token blocks, only a small subset of tokens is decodable at each diffusion step, causing most compute to be wasted on non-decodable tokens. We further observe a strong correlation between attention-derived token importance and token-wise decoding probability. Based on this insight, we propose FOCUS, an inference system designed for DLLMs. By dynamically focusing computation on decodable tokens and evicting non-decodable ones on-the-fly, FOCUS increases the effective batch size, alleviating compute limitations and enabling scalable throughput. Empirical evaluations demonstrate that FOCUS achieves up to 3.52$\times$ throughput improvement over the production-grade engine LMDeploy in large-batch settings, while preserving or improving generation quality across multiple benchmarks.

04.
medRxiv (Medicine) 2026-06-10

Human-centred design approaches to health facility design: Evidence from perinatal care settings in Ethiopia and Bangladesh

While significant progress has been made in perinatal outcomes over recent decades in low- and middle-income countries (LMICs), maternal and newborn quality improvement initiatives often fail to account for the spatial conditions in which they are implemented. Health systems are increasingly deploying evidence-based care models into built environments that are not optimally structured to meet the needs of its patient population. As the principal users, patients and health care workers can offer pragmatic insights about improving these structural designs. Our objective was to gather insights from patients, providers, and companions about how the physical design of their health facilities influenced their experience receiving or delivering perinatal care. We conducted a prospective observational study using a human-centred design (HCD) approach to analyse perceptions of the quality of perinatal care across two low resource settings: Ethiopia and Bangladesh. Using engagement and assessment tools, we conducted interviews, focus groups, facility walk-throughs, co-design workshops, and infrastructural assessments with patients, companions, providers, and Ministry of Health representatives. Descriptive statistics and thematic analysis were used to identify key learnings and develop recommendations. Across both countries, participants identified the need for facility layouts that better support privacy, mobility during labour, alternative birth positions, companion involvement, cultural and religious practices, sanitation, and provider visibility. Based on these insights, we developed six recommendations to better align health facility infrastructure with maternal and newborn care delivery needs. Our findings suggest that investments in health facility infrastructure may improve care experiences and help enable respectful, safe, and evidence-based maternal and newborn care. Alongside targeted spatial improvements, government authorities responsible for health facility planning should incorporate participatory design processes to ensure infrastructure reflects the needs of patients, companions, and providers and supports high-quality care delivery.

05.
arXiv (CS.CV) 2026-06-15

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Vision-Language Models (VLMs) are increasingly susceptible to sophisticated adversarial attacks, including adaptive strategies specifically designed to bypass existing defenses. To address this vulnerability, we propose MirrorCheck, a robust and model-agnostic detection framework that operates effectively in both unimodal and multimodal settings. MirrorCheck leverages Text-to-Image (T2I) models to regenerate visual content from captions produced by the target model and assesses semantic consistency by comparing feature-space embeddings between the original and synthesized images. To enhance robustness against adaptive attacks, MirrorCheck introduces a stochastic defense strategy that randomly selects T2I generators and image encoders from a diverse model zoo. Additionally, we incorporate a novel One-Time-Use (OTU) perturbation applied to the selected encoder embeddings, regulated by a scaling factor, which decreases the effectiveness of adaptive attacks. Extensive experiments across multiple threat scenarios demonstrate that MirrorCheck consistently outperforms baseline methods, and maintains its utility even under strong adaptive adversarial conditions.

06.
arXiv (quant-ph) 2026-06-19

Smooth time-dependent control of dipolar Bose-Einstein condensates

arXiv:2606.20507v1 Announce Type: cross Abstract: We consider protocols for control of dipolar Bose-Einstein condensates where the critical role is played by the long-range anisotropic interatomic magnetic dipole-dipole interaction. The phase diagram of such a condensate has been explored theoretically and experimentally with certain values of the interatomic scattering length corresponding to superfluid and supersolid phases, where supersolidity appears as a modulation in the ground state density. Preparation of this modulated ground state is challenging, since excitations appear as a result of a finite-time evolution required to produce qualitative changes in the wavefunction density. To solve this problem we consider the time-dependent control of a dipolar Bose-Einstein condensate using shortcuts to adiabaticity techniques, concentrating on design of the time-dependent scattering length, a parameter of the system easily tunable by contemporary experiments. The first technique is the variational approach based on the Euler-Lagrange equations for a separable ansatz describing the evolution of the superfluid state. Secondly, we study the transition from superfluid to supersolid using a direct optimization protocol. We discuss the fidelity of the developed protocols in terms of the evolution time.

07.
PLOS Computational Biology 2026-06-05

Heuristic multi-site optimization for protein sequence design using Masked Protein Language Models

Authors:

by Lijuan Wang, Yuze Wang, Chen Qiu, Liwei Xiao, Xianliang Liu, Junjie Chen Protein sequence design for tailored functional properties is a fundamental task in protein engineering, with critical applications in drug discovery and therapeutic development. Efficient navigation of the combinatorial vastness of protein sequence space to identify functional variants remains a formidable challenge. Conventional approaches, which predominantly rely on template-based local search or single-residue mutagenesis, are constrained by their susceptibility to local optima and their potential risk of destabilizing native structural stability. In this study, we introduce ProtHMSO, a heuristic multi-site optimization framework leveraging masked protein language models (ProtLMs) for context-aware sequence exploration. ProtHMSO mimics natural evolutionary mechanisms by employing ProtLM-derived substitution probabilities to guide heuristic searches for synergistic mutations, thereby constraining combinatorial search spaces through evolutionary and biophysical priors. ProtHMSO is further applied to replace the exploration strategies in genetic algorithms (GAs) and Monte Carlo tree search (MCTS) for improving their convergence efficiency. Benchmark experiments demonstrate that protein sequences generated by ProtHMSO exhibit superior functional performance and closer alignment with natural sequence distribution, compared with state-of-the-art methods. These advancements highlight that ProtHMSO has strong potential and compatibility to accelerate functional protein discovery, offering a robust framework for efficient and context-aware exploration of protein sequence space.

08.
arXiv (CS.CL) 2026-06-17

A Multifaceted Analysis of Social Biases in Large Language Models

Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness across varied contexts is critical to their safe and responsible deployment. In this study, we undertake a comprehensive examination of four widely adopted LLMs, probing their underlying biases and inclinations across the dimensions of politics, ideology, alliance, language, and gender. Through a series of carefully designed experiments, we investigate their political neutrality using news summarization, ideological biases through news stance classification, tendencies toward specific geopolitical alliances via United Nations voting patterns, language bias in the context of multilingual story completion, and gender-related affinities as revealed by responses to the World Values Survey. Results indicate that while the LLMs are aligned to be neutral and impartial, they still show biases and affinities of different types.

09.
arXiv (quant-ph) 2026-06-12

Non-Hermitian skin effect induced by spatial noncommutativity

arXiv:2606.12961v1 Announce Type: new Abstract: In all known schemes for the non-Hermitian skin effect, the non-Hermitian ingredient that drives the skin localization, whether asymmetric hopping or gain and loss, is invariably introduced by hand as an independent model parameter along the skin direction. Here we show that when two spatial coordinates do not commute, the skin effect can break free of this paradigm: a gain-loss potential applied along one coordinate automatically generates non-reciprocity along the other through the coordinate noncommutativity, driving all eigenstates to pile up exponentially at a boundary. We term this phenomenon the noncommutative skin effect. The inverse skin length is proportional to the noncommutativity parameter and is given by an analytic formula, exact in the thermodynamic limit and verified by exact diagonalization of lattice models; the reflection symmetry of the imaginary potential furnishes an exact criterion for the presence or absence of the effect, valid rigorously for finite-size systems. For a sinusoidal imaginary potential, the skin direction of all eigenstates flips collectively at parameter points fixed purely by geometry. Because the flip point is independent of the potential strength, the reversal constitutes a zero-crossing measurement scheme intrinsically robust against systematic errors, from which the noncommutativity parameter can be extracted directly. The qualitative transition of the eigenstates from uniform to exponentially localized renders the effect a nonperturbative probe of spatial noncommutativity, and the Peierls-phase structure of its lattice model is in principle accessible to cold-atom synthetic dimensions, photonic resonators, and topolectrical circuits.

10.
arXiv (CS.CL) 2026-06-12

A Survey on Long-Term Memory Security in LLM Agents: Attacks, Defenses, and Governance Across the Memory Lifecycle

The emergence of writable, cross-session persistent memory in LLM agents introduces a qualitatively different threat landscape from conventional input-centric security concerns, characterized by three properties: persistence, statefulness, and propagation. To systematically characterize this landscape, we propose a Memory Lifecycle Framework that organizes attacks, defenses, and their cross-phase dependencies along two axes: six lifecycle phases (Write, Store, Retrieve, Execute, Share & Propagate, Forget & Rollback) and four security objectives (Integrity, Confidentiality, Availability, Governance). This analysis in turn exposes the need for formal security guarantees at the system level, motivating Verifiable Memory Governance(VMG), a framework of five architectural primitives that specifies what verifiable mechanisms a long-term-memory system must provide to maintain auditable, recoverable control over its memory state. Our analysis indicates that robust Long-Term Memory (LTM) security cannot be retrofitted at retrieval or execution time alone, but must be anchored in storage-time provenance, versioning, and policy-aware retention from the outset.

11.
arXiv (CS.CV) 2026-06-16

Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs

Radiographic assessment of lower-limb alignment (LLA) is important for predicting joint health and surgical outcomes in total knee arthroplasty. Traditional measurement methods are manual and time-consuming, while recent machine learning approaches typically rely on locating a fixed set of anatomical landmarks. This dependence limits flexibility and may require re-annotation when clinical definitions change. To address this, we propose an automated workflow using Implicit Neural Shape Functions (INSF). Rather than relying on explicit landmark coordinates, we encode the anatomy into a compact latent space and regress clinical alignment measurements directly from these latent codes. This architecture allows for rapid extendability to new tasks without altering the backbone representation. We trained our method on an internal dataset of 566 knee radiographs, each annotated with the outline of the femur and tibia. We evaluated it on both an internal test dataset of 50 patients and a separate external set of 402 preoperative cases from the MRKR dataset. Manual clinical measurements are available for these data, and the MRKR measurements will be made publicly accessible. Performance was comparable to state-of-the-art landmark-based methods and manual agreement, while offering a flexible shape representation that can be extended to additional measurement tasks.

12.
arXiv (quant-ph) 2026-06-11

Logical error estimation from syndrome data of surface-code experiments

arXiv:2606.11496v1 Announce Type: new Abstract: Decoders for quantum error correction (QEC) experiments rely on detector error models (DEMs), which encode, for each error, its probability and the detectors and logical observables it flips. Here we show that estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. We evaluate our methods using open-source data from surface-code memory experiments performed on Google's Willow chip, and we carry out analogous surface-code experiments on IBM's \texttt{ibm\_miami} processor. Despite the different physical error scales of the Google and IBM devices, in both cases our estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the $5\%-10\%$ level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.

13.
arXiv (CS.CV) 2026-06-16

Disentangling Hallucinations: Orthogonal Semantic Projection for Robust Interpretability

As Vision-Language Models are increasingly deployed in safety-critical applications, the trustworthiness of their explanations becomes crucial. Explainable AI (XAI) methods for Vision-Language Models often suffer from semantic hallucination, where attribution maps highlight prominent image regions even when prompted with incorrect text descriptions (e.g., highlighting a dog when prompted ``cat''). Although this problem is widespread, a formal mathematical analysis of XAI methods and CLIP embeddings is largely missing in the literature. We demonstrate that this phenomenon is not specific to a single architecture but is a fundamental consequence of Linear Semantic Leakage in high-dimensional embedding spaces. We propose a unified theoretical framework, Linear Semantic Attribution (LSA), which generalizes across discriminative methods. We introduce OSP, a geometric intervention that utilizes the residual property of OMP to disentangle unique semantic signals from shared concepts. We prove theoretically and demonstrate empirically that OSP minimizes hallucination by orthogonalizing the query vector against distractor concepts, rendering the attribution model blind to shared features while preserving fidelity for correct prompts. Our code is available at: https://github.com/emirhanbilgic/Orthogonal-Semantic-Projection

14.
arXiv (CS.AI) 2026-06-15

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

arXiv:2605.05407v2 Announce Type: replace Abstract: Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical information. In this paper, we introduce PRISM, a framework that tightly couples perception (VLM) and decision (LLM) through a dynamic question-answer (DQA) pipeline. Instead of passively accepting the VLM's description, the LLM critiques it, probes the VLM with goal-oriented questions, and synthesizes a compact image description. This closed-loop interaction yields a sharp, task-driven understanding of the scene. We evaluate PRISM on the ALFWorld and Room-to-Room (R2R) benchmarks. We show that: (1) PRISM significantly outperforms state-of-the-art image-based models, (2) our Interactive goal-oriented perception pipeline yields systematic and substantial gains, and (3) PRISM is fully automatic, eliminating the need for handcrafted questions or answers.

15.
arXiv (CS.LG) 2026-06-11

On the Stability of Growth in Structural Plasticity

arXiv:2605.15435v2 Announce Type: replace Abstract: Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

16.
arXiv (CS.CL) 2026-06-18

Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

Recent advances in large language models (LLMs) have given rise to time-series question answering (TSQA), which formulates time-series analysis as natural-language question answering. However, directly feeding raw numerical series into LLMs suffers from a tokenization bottleneck: Byte Pair Encoding fragments continuous values into unstable tokens whose embeddings lack meaningful metric structure, resulting in the loss of magnitude, scale, and trend information. Prior methods use patch-based encoders that split the series into fixed windows, locking in one granularity that breaks patterns and hides exact timesteps, through a separate module that rarely transfers across datasets with different lengths or sampling rates. To address this challenge, we propose CADE (Contrastive Alignment with Direct Embedding), a novel framework for TSQA built upon two key components: direct timestep embedding and semantic alignment. The proposed framework maps each timestep directly into the LLM embedding space through a point-wise linear encoder and MLP projector, preserving exact index-level access while eliminating the need for patching and padding. To further bridge the semantic gap between time-series and language representations, we introduce a novel one-directional supervised contrastive loss that aligns time-series embeddings with frozen class-name text anchors. Experimental results on the public Time-MQA benchmark demonstrate that our framework consistently improves performance across six TSQA tasks, outperforming both open-source and proprietary LLM baselines.

17.
arXiv (CS.AI) 2026-06-15

MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design

arXiv:2606.14202v1 Announce Type: cross Abstract: Large Language Models (LLMs) have advanced Automatic Heuristic Design (AHD) by enabling heuristic generation through reasoning and code synthesis. Existing LLM-based AHD architectures mainly follow two paradigms: Natural Evolution, which uses crossover and mutation to explore heuristic programs, and Metacognitive Evolution, which refines reasoning through reflection. However, Natural Evolution discards reasoning traces, weakening knowledge inheritance and exploitation, while Metacognitive Evolution lacks population-level recombination, limiting exploration and increasing the risk of premature convergence. These limitations reduce search efficiency, stability, and solution quality on complex problems. To address this gap, we propose MeEvo, a dual-layer AHD framework that cyclically couples Natural Evolution and Metacognitive Evolution. Natural Evolution explores heuristic code while recording reasoning traces, fitness values, and errors into a shared history; Metacognitive Evolution then reflects on this history to generate improved heuristics that re-enter the parent pool for the next cycle. This design enables population-driven exploration and reflection-driven refinement to reinforce each other. Experiments on five optimization problems with two LLM backbones show that MeEvo achieves stronger and more stable performance than existing LLM-based AHD architectures, especially on complex constrained tasks.

18.
arXiv (CS.AI) 2026-06-16

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

19.
arXiv (CS.LG) 2026-06-17

Informative Missingness to Generate Irregular Clinical Time Series

arXiv:2606.17106v1 Announce Type: new Abstract: Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology, making it important to model it directly rather than treat it as a preprocessing artifact. Here we present a diffusion-based approach for generating clinical time series that jointly models laboratory values and their observation patterns using the public Data Analytics Challenge on Missing Data Imputation (DACMI) benchmark derived from MIMIC-III. To preserve realistic sampling, we align chart times into 4-hour intervals and segment admissions into 7-day windows, producing trajectories that pair each lab value with a corresponding observation indicator. Standard transformations and normalization are applied to stabilize training. Our method extends the TimeDiff framework to learn continuous lab values and discrete missingness patterns through complementary diffusion objectives. Experiments show that the generated data closely match real patient trajectories across individual lab distributions and joint value-missingness embeddings, demonstrating that diffusion models can capture clinically meaningful dependencies between patient physiology and clinicians' testing behavior under MNAR-like (missing-not-at-random) missingness. These preliminary results indicate that our model can serve as an initial component toward developing clinical foundation models. By producing synthetic priors that preserve key physiology-missingness relationships, this work motivates the subsequent training of Prior-Data Fitted Networks capable of leveraging informative missingness, which we will investigate in the extended work.

20.
arXiv (CS.AI) 2026-06-11

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions

arXiv:2509.10303v2 Announce Type: replace-cross Abstract: Online reinforcement learning (RL) approaches have demonstrated strong performance on Job Shop Scheduling (JSP) and Flexible JSP (FJSP) problems by learning scheduling policies through direct interaction with simulated environments. However, these methods often require extensive training interactions, limiting their sample efficiency and practical applicability. Motivated by this challenge, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), an offline RL algorithm that learns effective scheduling policies directly from static, suboptimal datasets. CDQAC couples a quantile-based critic with delayed policy updates to estimate the return distribution of machine-operation pairs. Extensive experiments on JSP and FJSP benchmarks demonstrate that CDQAC consistently outperforms the data-generating heuristics, surpasses state-of-the-art offline and online RL baselines, and is highly sample efficient, requiring only 1 to 5% of the original dataset to learn high-quality policies. Our analysis suggests that, in scheduling, offline RL performance is governed mainly by state-action coverage rather than the quality of individual trajectories. Scheduling couples a dense reward aligned with the makespan objective with equal-length trajectories across heuristics, enabling effective learning from a broad range of behaviors. Consistent with this observation, datasets generated by a simple random heuristic with broader coverage let it outperform policies trained on datasets produced by stronger heuristics such as Genetic Algorithms.

21.
medRxiv (Medicine) 2026-06-18

Effectiveness and Safety of Bempedoic Acid Across Clinically Relevant Subgroups: Insights from the CLEAR Taiwan Study

Background Despite available lipid-lowering therapies (LLT), many patients fail to achieve low-density lipoprotein cholesterol (LDL-C) targets. This gap persists across clinically relevant subgroups. Bempedoic acid has demonstrated effective LDL-C lowering with a favorable safety profile in the CLEAR Taiwan study; however, its effects across subgroups in Asian populations remains limited. Methods The phase IV CLEAR Taiwan study (NCT06925100) enrolled patients with inadequately controlled hypercholesterolemia who received bempedoic acid for 12 weeks in addition to background LLT. This analysis evaluated changes in lipid parameters, high-sensitivity C-reactive protein (hsCRP), and safety outcomes in clinically relevant subgroups, including cardiovascular risk, diabetes, age, statin tolerance, and sex. Results A total of 180 patients were included. Bempedoic acid achieved significant LDL-C reductions in all subgroups. Numerically greater LDL-C reductions were observed in primary prevention, statin-intolerant, younger (< 65 years), and female patients, while comparable reductions were observed across diabetes status. Reductions in non-high-density lipoprotein cholesterol, total cholesterol, and apolipoprotein B were consistent with LDL-C findings. Significant decreases in hsCRP were observed in all subgroups, with numerically greater reductions in patients aged < 65 years and those without diabetes. Bempedoic acid was well tolerated, with a low incidence of adverse events and no new safety signals identified. Changes in liver enzymes, renal function, and uric acid were minimal within subgroups. Conclusion Subgroup analyses from the CLEAR Taiwan study demonstrate consistent efficacy and safety of bempedoic acid across clinically relevant subgroups and support its use as a flexible option to address residual gaps in lipid management.

22.
arXiv (CS.CL) 2026-06-15

AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization

Large reasoning models typically follow a read-then-think paradigm: they observe the complete input, reason over a static context, and then produce the answer. Yet many real-world scenarios are inherently dynamic, such as audio and video stream, where information arrives as a continuous stream and models must reason, update, and respond under partial observations. Recent streaming reasoning methods allow models to think while reading, but they largely rely on supervised imitation of pre-constructed trajectories, which limits their flexibility. In this paper, we propose AdaSR, an adaptive streaming reasoning framework that enables models to reason during input streaming and perform final deliberation once the stream is complete, learning when to think, and how much computation to allocate across different stages. To optimize this hierarchical reasoning process, we introduce Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming reasoning and deep reasoning phases, providing more fine-grained advantage assignment instead of uniformly distributing a single sequence-level advantage over all tokens. HRPO integrates format, accuracy, and adaptive thinking rewards to enforce valid reasoning protocols, preserve final task performance, and encourage latency-aware computation allocation. Experiments show that AdaSR achieves a better balance among reasoning accuracy, computational efficiency, and streaming latency compared with supervised fine-tuning baseline. We release our code at https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR.

23.
arXiv (CS.AI) 2026-06-19

OnDeFog: Online Decision Transformer under Frame Dropping

arXiv:2606.19721v1 Announce Type: cross Abstract: In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

24.
arXiv (CS.CV) 2026-06-16

Decoupled Object-Centric Video Understanding for Generating Robotic Manipulation Commands

Translating video demonstrations into executable robot commands remains challenging because existing methods often fail to identify which objects are functionally involved in the demonstrated action. As a result, they may generate commands that are linguistically plausible but operationally ambiguous. We propose an object-centric video understanding framework that decouples action recognition from object identification to generate precise, grammar-free manipulation commands. Our approach integrates Temporal Shift Modules (TSM) for efficient spatio-temporal action classification with a novel Object Selection algorithm that identifies task-relevant objects through trajectory-based role classification, blur detection, and overlap minimization. The selected objects are then processed by Vision-Language Models (VLMs) for robust category recognition and zero-shot generalization. Evaluated on a modified Something-Something V2 dataset, our method achieves 86.79\% action classification accuracy and BLEU-4 scores of 0.337 on standard objects and 0.261 on novel objects. These results improve over the strongest task-specific baseline by 80.2\% and 143.9\%, respectively. Larger gains are observed in METEOR and CIDEr, reaching 157.9\% and 171.7\% on novel objects. Across all semantic metrics, our approach consistently outperforms task-specific methods and remains competitive with, or surpasses, large general-purpose VLMs while retaining a modular, object-centric design.

25.
medRxiv (Medicine) 2026-06-19

"Us with them": Co-designing a caesarean section consent and debriefing intervention in West Cameroon

Background Women-centred maternity care is a rights issue that determines the use of services. Such care ensures responsiveness to womens needs which is enacted through shared decision-making, review and response. In the West Region of Cameroon, informed consent (IC) and Debriefing for caesarean section (c-section) have been shown to be suboptimal or absent. This paper describes the participatory design of a quality-improvement hospital-based intervention. Methods From February to May 2025, we conducted a co-design process with three groups of stakeholders: 59 post c-section women and community representatives, 78 frontline c-section providers, and 29 directors of public and private hospitals. We followed four phases: planning, conducting, evaluating, and reporting. The conduct phase comprised five all-day workshops with post c-section women and community representatives, followed by five all-day workshops with the c-section providers. Finally, we held an 11th workshop with the hospital directors to scrutinize suggested interventions, evaluate their feasibility, and establish a consensus on their components. We described the intervention using the TIDieR (Template for Intervention Description and Replication) checklist. We documented the co-design process, using open-ended narratives to delineate interventions, and carried out real-time synthesis on visual aids (whiteboards and flipcharts). Intervention feasibility was quantified using a structured ad hoc matrix, while insights on facilitators and barriers were captured through qualitative free-text entries. We coupled data collection with constant comparison and triangulation through contemporaneous field notes, photographic documentation, and thematic mapping of stakeholders perceptions and interactive dynamics. Results Participants perspectives on the co-design were positive, and their motivation were very high although less than 50% reported previous involvement in co-design processes. More than 80% of participants found rated the co-design process as either good or very good. The final intervention comprised four components: (i) an in-service training; (ii) a standard operating procedure including a harmonised consent form and debriefing checklist; (ii) systematic supportive supervision, monitoring & evaluation; and (iv) a routine clinical audit. Each group of stakeholders upheld specific dimensions of the consent and debrief intervention. Post c-section women and community members emphasized emotional support, written discharge advice after debriefing, and zero tolerance of suboptimal consent and debriefing practices. Frontline c-section providers insisted on robust documentation for medico-legal protection. Hospitals Directors emphasized capacity-building and cultural friendliness. All the groups supported womans autonomous decision making. The intervention feasibility was rated high or very high by hospital directors except for the financial, infrastructural and technical domains. Conclusion This co-design process yielded a context-specific, multi-component intervention that was well accepted and deemed feasible across stakeholders. It provides a methodological approach to strengthening informed consent and debriefing as core elements of women-centred, accountable maternity care, and warrants implementation.