Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-18

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

02.
arXiv (quant-ph) 2026-06-11

Invariants of Sequential Circuits and Generalized Non-Abelian Statistics

arXiv:2606.11527v1 Announce Type: cross Abstract: Non-invertible symmetries in quantum many-body systems generally give rise to sequential unitary circuits that move symmetry defects. In this paper, we investigate invariants defined by sequences of such circuits, which move non-invertible defects and generate a Berry phase evaluated on quantum states with defects. We show that this Berry phase generally defines an invariant under local deformations, provided that the sequential circuits preserve the locality of those deformations. This invariant also rules out a short-range-entangled state that preserves the non-invertible symmetry, thereby signaling the 't Hooft anomaly of a non-invertible symmetry purely in terms of unitary operators acting on a state. We then apply this framework to loop excitations in three spatial dimensions and identify a new loop excitation in the (3+1)D $\mathbb{D}_4$ topological order, which we dub a non-Abelian fermionic loop. Using the invariant of sequential circuits, we characterize the statistics of non-Abelian fermionic loops. In addition, we find a new (3+1)D mixed topological order with a single non-Abelian fermionic loop, whose long-range entanglement is protected by an invariant of sequential circuits.

03.
arXiv (quant-ph) 2026-06-16

Influence of the Electron's Anomalous Magnetic Dipole Moment on High-Atomic-Number Atoms

arXiv:2606.15995v1 Announce Type: new Abstract: Super-heavy atoms ($Z > 100$) are usually studied in the context of the so-called ``Quantum Electrodynamics of Strong Fields''. In this theory the problem of the singularity in the electron energy whenever $Z > 137$ is overcome. This is done by considering the finite size of the nucleus and leads to interesting phenomena, such as the spontaneous production of positrons. Here, we show that taking into account the contribution from the Anomalous Magnetic Dipole Moment of the electron (by means of an effective theory), within a point-nucleus model, is a sufficient condition to obtain regular wave functions and physically acceptable energy values for $Z > 137$.

04.
arXiv (CS.CL) 2026-06-19

Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese

Byte-Pair Encoding tokenization is statistically efficient for vocabulary compression, but semantically blind to structured technical entities, fragmenting physical quantities, numbers, units, and symbolic expressions into lexically arbitrary subwords. We present TOTEN, a knowledge-based ontological tokenization framework that replaces statistical derivation with declarative classification grounded in a formal ontology of engineering entities (OEE). We formalize TOTEN as the triple : the ontology gathers types, structural principles, composition relations, and preservable invariants; the classification function maps raw text into typed regions; and the instantiator family yields a self-descriptive structured representation. Robustness derives from deterministic coupling with three external oracles: Pint (dimensional), Unicode Character Database (typographic), and RSLP (Portuguese morphology). Intrinsic evaluation covers four properties verifiable by construction – ontological atomicity, dimensional equivalence, typographic robustness, and numerical reconstruction – over an internal, physically validated benchmark (EngQuant, N=800) and four Brazilian Portuguese external corpora (N=1771 eligible cases). We also report detection recall, distinguishing coverage from conditional atomicity. Against eight state-of-the-art baselines, TOTEN achieves unit ontological atomicity in all contrasts and numerical reconstruction of 0.775-0.904 on external corpora, vs. 0.627-0.703 for the best baseline (Quantulum3); on EngQuant, 0.780 vs. 0.340. Differences are statistically significant (McNemar with Holm correction). Spearman correlation between internal and external rankings confirms concurrent validity of the control benchmark. Dimensional equivalence shows statistical parity with Pint, the oracle from which the system inherits dimensional authority.

05.
arXiv (CS.CL) 2026-06-15

SuperThoughts: Reasoning Tokens in Superposition

Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20–30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).

06.
arXiv (CS.CL) 2026-06-18

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

Writing is a cognitively demanding activity that requires constant decision-making, heavy reliance on working memory, and frequent shifts between tasks of different goals. To build writing assistants that truly align with writers' cognition, we must capture and decode the complete thought process behind how writers transform ideas into final texts. We present ScholaWrite, the first dataset of end-to-end scholarly writing, tracing the multi-month journey from initial drafts to final manuscripts. We contribute three key advances: (1) a Chrome extension that unobtrusively records keystrokes on Overleaf, enabling the collection of realistic, in-situ writing data; (2) a novel corpus of full scholarly manuscripts, enriched with fine-grained annotations of cognitive writing intentions. The dataset includes \LaTeX-based edits from five computer science preprints, capturing nearly 62K text changes over four months; and (3) analyses and insights into the micro-dynamics of scholarly writing, highlighting gaps between human writing processes and the current capabilities of large language models (LLMs) in providing meaningful assistance. ScholaWrite underscores the value of capturing end-to-end writing data to develop future writing assistants that support, not replace, the cognitive work of scientists.

07.
medRxiv (Medicine) 2026-06-22

MinderCare: protocol for a mixed-methods evaluation of a digitally enabled dementia care service.

Introduction and aims Dementia is a growing public health challenge affecting millions of people worldwide. It is a progressive condition that increases the risk of infections, falls, hospital admissions, dependence in activities of daily living, safety issues such as wandering, care home transfers, and death. New ways of supporting people living with dementia (PLWD) at home are urgently needed. We describe the MinderCare study which evaluates a digitally enabled care model that integrates low-burden sensor-based remote monitoring within a nurse-led clinical service. Methods and analysis In this mixed-methods study, we will recruit 100 people with confirmed or suspected dementia living at home and deploy the Minder remote monitoring system for at least 12 months. A detailed characterisation of the cohort will be obtained, including cognition, frailty, participant and carer wellbeing, functioning, and quality of life. The feasibility, acceptability, sustainability, and resource requirements of the service will also be assessed. Low-cost sensors provide information about behaviour, environment and physiology from the home. Machine-learning algorithms have been used to develop digital biomarkers of infection, sleep, night-time behaviours, daily activities and routines, and the effects of clinical events and treatment. These will be assessed through clinical reports of sensor-derived data that include anomaly alerts provided to the clinical teams. Algorithms will be assessed for their clinical utility and acceptability. The comparative-effectiveness component will be designed as a target trial emulation using linked electronic health-record data to construct a time-indexed external usual-care control cohort. The primary comparative outcome will be Days Alive and Out of Hospital (DAOH) over 12 months from the activation-index date, with healthcare utilisation, costs, institutionalisation and mortality assessed as secondary outcomes. DAOH and estimated MinderCare effects will also be examined across prespecified strata of baseline inpatient utilisation. Ethics and dissemination Ethical approval has been granted by the North East Newcastle and North Tyneside 2 Research Ethics Committee, and the study has received confirmation of capacity and capability by the Imperial College Healthcare NHS Trust. Study findings will be disseminated to patients, health and social care professionals, and policymakers through peer-reviewed publications and conference presentations. Study registration number: ISRCTN14997677 and NIHR portfolio CPMSID 63023.

08.
arXiv (CS.CL) 2026-06-24

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks. More recently, three other datasets from the Prague family were added and the annotations thoroughly revisited, forming the "Prague Dependency Treebank-Consolidated" (PDT-C). In comparison to the original PDT, PDT-C is more than twice as large, but it is also much more diverse in terms of genres and domains. In this paper, we describe the conversion of the new resource to Universal Dependencies. While the two annotation schemes are relatively similar at the first sight, there are numerous small differences in topology of the dependency structures and in granularity of the POS and relation type inventories. We demonstrate a selection of such differences on examples, discuss the diverging motivations, as well as ways to overcome the differences during conversion. We argue that while PDT is less "universal" and more tightly bound to one language, its multi-layer annotation is rich and provides all information needed for basic UD trees, and much more.

09.
medRxiv (Medicine) 2026-06-10

Longitudinal brain structural changes during clozapine treatment: associations with neuroreceptor architecture and clinical response

In treatment-resistant schizophrenia, clozapine treatment has been associated with longitudinal reductions in subcortical volumes, ventricular enlargement, and widespread cortical thinning. However, it is unknown how these structural changes relate to clozapines pharmacological profile and clinical efficacy. We combined five longitudinal datasets with MRI acquired before and on average 5 months after clozapine initiation in 143 individuals to quantify brain structural changes and their association with normative maps relating to neuroreceptor architecture and physiological systems, and improvement in symptom severity. Clozapine treatment was associated with grey matter volume reductions across multiple subcortical regions (including the amygdala, hippocampus, thalamus, caudate, putamen and nucleus accumbens), increases in pallidal volume, ventricular enlargement, and widespread cortical thinning. Cortical regions showing the greatest magnitude of thinning corresponded to areas with higher normative densities of serotonergic 5-HT1A, 5-HT2A and 5-HT4 receptors. Changes in subcortical volume or cortical thickness during clozapine treatment were not associated with changes in total or positive symptom severity. In addition, baseline subcortical volume, cortical thickness, or gyrification prior to starting clozapine did not predict subsequent symptom improvement. Cortical thinning may partly reflect clozapines activity at serotonergic receptors, which have been implicated in cortical network stabilisation and neuroplasticity, however structural remodelling during clozapine treatment may reflect a process independent from its clinical efficacy in improving core symptoms of psychosis.

10.
arXiv (CS.LG) 2026-06-17

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv:2606.17526v1 Announce Type: new Abstract: Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose MGUP, a novel mechanism for selective updates. MGUP augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, MGUP seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as MGUP-AdamW, MGUP-Lion, and MGUP-Muon. Under standard assumptions, we provide theoretical convergence guarantees for MGUP-AdamW (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.

11.
arXiv (CS.LG) 2026-06-12

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

arXiv:2606.12490v1 Announce Type: new Abstract: Certified local robustness verification for recurrent neural networks (RNNs) is challenging because approximation errors introduced by nonlinear relaxations can propagate through recurrent connections and accumulate over time. As a result, scalable linear bound propagation methods often become overly conservative and fail to certify inputs that are in fact robust, especially when many pre-activation intervals cross zero. We propose an abstraction-refinement framework for RNN verification that partitions such intervals to remove the dominant relaxation error: on each refined branch, ReLU becomes exact, and smooth activations such as tanh and sigmoid admit substantially tighter linear envelopes. To control the combinatorial cost of splitting in long sequences, we introduce a SHAP-guided timestep selection strategy that ranks hidden states by their contribution to the verification objective and refines only the most critical timesteps in temporal order. Experiments on CIFAR10 and MNIST stroke benchmarks demonstrate consistent improvements in verification success and robustness-margin tightness over abstraction-only baselines, while exposing clear runtime trade-offs between ReLU and tanh models.

12.
arXiv (quant-ph) 2026-06-16

Dressed Floquet scars from protected zero modes in a Rydberg chain

arXiv:2606.15605v1 Announce Type: cross Abstract: In this Letter, we present an approximate analytic construction of two zero quasienergy quantum many-body scars in a periodically driven model of Rydberg atoms on a ring, which persist over a range of driving amplitudes and frequencies for finite sizes. An index theorem protects an exponentially large number (in system size) of exact zero energy modes of the Floquet Hamiltonian in this setting. Unlike most of these zero modes which continuously change with drive parameters, these two quantum many-body scars retain the memory of particular states. They can be expressed as {\it dressed versions} of two contrasting states, the Rydberg vacuum and a unitarily rotated variant of a volume-law scar [Ivanov and Motrunich, Phys. Rev. Lett. {\bf 134}, 050403 (2025)], respectively. We provide an analytic understanding of their existence using a Floquet perturbation theory and show their resilience beyond the perturbative regime using exact diagonalization in finite systems. Our study provides insight into the structure of protected zero modes in interacting Floquet settings.

13.
arXiv (CS.CV) 2026-06-18

GUMP-Net: An interpretable model-data-driven intelligent algorithm for multi-class pelvic segmentation

Pelvic segmentation is one of the most important and fundamental research problems in precise and intelligent diagnosis and treatment, as well as surgical planning and navigation for pelvic fractures. By combining an improved geodesic active contour model with deep neural networks, we propose GUMP-Net, an interpretable model-data-driven intelligent algorithm for multi-class pelvic segmentation, in which three network modules are designed to constitute the overall segmentation framework together: the object detection module for automatic level set initialization, the edge detector module for learning an anatomy-aware edge detector function and the iteration module for deep level set evolution. Leveraging the advantages of level set representation and deep learning, GUMP-Net shows more accurate, robust and consistent segmentation performance, especially in small training data situation, compared to the state-of-the-art methods. Extensive experiments on pelvic datasets demonstrate the rationality and effectiveness of the proposed algorithm. Further experiments extended to ankle dataset indicate broader applications to other anatomies. The proposed algorithm not only provides an efficient segmentation method for complex fracture reduction, but also gives an interpretable geometric perspective for understanding deep learning segmentation.

14.
arXiv (CS.CV) 2026-06-19

Timage: A Generative Text-in-Image Paradigm for Fine-Tuning Vision-Language Models

Multimodal Large Language Models (MLLMs) often lose track of the right image regions during fine-grained spatial reasoning, because a textual query rarely carries any explicit geometric anchor into the pixel domain. Prevailing remedies either rewire the model's weights or pad the prompt with verbose instructions, yet neither reliably pins the language to the correct visual coordinates without eroding the backbone's general competence. We introduce Timage, a paradigm that recasts multimodal understanding as an alignment problem solved at the input: the query is drawn, as a typeset overlay, onto the image itself. The placement and appearance of this overlay are produced by a Constrained Schrödinger Bridge (cSB), an entropic optimal-transport sampler that factorizes layout synthesis into two coupled stochastic stages. The first stage, Region Search, transports noise toward query-aligned image zones while obeying a hard occlusion barrier that protects salient foreground content; the second stage, Appearance Shaping, sizes the glyphs through an ``ink-budget'' regularizer so that the rendered text stays legible and visually balanced. The resulting overlay behaves as an explicit attention beacon that channels the model's focus along spatial semantics. On the VMCBench suite, Timage paired with a modest 7B backbone clearly overtakes far larger proprietary systems as well as parameter-tuned baselines. The study positions deliberate input reconstruction as a powerful, architecture-neutral lever for strengthening multimodal reasoning.

15.
arXiv (quant-ph) 2026-06-12

Quantum Otto engine powered by an anisotropic Heisenberg XYZ model under independent local magnetic fields

arXiv:2606.12877v1 Announce Type: new Abstract: We study a quantum Otto heat engine whose working substance is an anisotropic two-qubit Heisenberg XYZ model. Independent local magnetic fields are used to control each spin individually. The influence of the longitudinal coupling, anisotropy, transverse coupling, and local fields on the net work output and efficiency is systematically examined. Reducing the longitudinal coupling is found to markedly improve both the maximum work and the peak efficiency. The engine performance reaches an optimum at a particular value of the anisotropy parameter. A local work analysis clarifies how work is produced during the cycle. Because of the asymmetric local fields and the intrinsic spin-spin interaction, the two qubits play markedly different thermodynamic roles; the interaction term itself contributes crucially to the total work. We further analyze the variation of quantum entanglement, quantified by concurrence, along the cycle. The results indicate that a pronounced change in entanglement between the hot and cold isomagnetic strokes is closely correlated with the efficiency enhancement. This work offers new insight into the operating principles and control of quantum Otto heat engines.

16.
arXiv (quant-ph) 2026-06-24

Quantum Adaptive Self-Attention for Quantum Transformer Models

arXiv:2504.05336v4 Announce Type: replace Abstract: A recurring weakness in quantum machine learning (QML) is that reported ``quantum advantages'' are seldom tested against a capacity-matched classical control, leaving it unclear whether a gain comes from the quantum substrate or from the architectural change that accompanies it. Our primary contribution is methodological: a protocol for attributing such gains honestly – a capacity-matched classical bottleneck of identical parameter budget, transparent reporting of where quantum does not help, and validation on real quantum hardware – which we develop and apply through a concrete case study. That case study is Quantum Adaptive Self-Attention (QASA), a hybrid Transformer that replaces the value projection of a single encoder layer with a 36-parameter parameterized quantum circuit (PQC), keeping all other layers classical. Across nine synthetic benchmarks and the real-world ETTh1 dataset, QASA improves on a full-capacity classical Transformer for chaotic and trend-dominated signals. To ask whether this is a genuinely quantum effect, we introduce a control rarely applied in quantum machine learning – a capacity-matched classical bottleneck with the same parameter budget – and find that it matches the PQC on the error metrics. The gain is therefore attributable to the low-rank value-projection bottleneck (an architectural parsimony principle), not to quantumness; adding further quantum layers only degrades performance and trainability. We accordingly position the quantum layer not as a source of accuracy advantage but as a competitive instantiation of this principle: its low-rank compression onto the signal's intrinsic dimensionality is matched by a classical bottleneck, so the gain is architectural rather than quantum.

17.
arXiv (CS.AI) 2026-06-12

Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

arXiv:2606.12945v1 Announce Type: new Abstract: Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer with semantic similarity or recency – both mis-specified for the forgetting decision, which is made at consolidation time before the future query is known. We propose a multi-factor memory value function V(m)=\sum_i w_i f_i(m) over seven interpretable factors (emotional intensity, goal relevance, value alignment, self/user relevance, task utility, reliability, and usage history) drawn from cognitive psychology, whose weights are learned from a downstream objective by a gradient-free optimiser, and whose single scalar uniformly controls encoding depth, forget risk, and retrieval rank. We make a methodological point: on LongMemEval, scoring goal relevance against the held-out evaluation question saturates gold-evidence retention at \approx 0.98 – this measures retrieval, not forgetting. In the realistic blind regime, a learned multi-factor value retains 0.770 \pm 0.011 of gold evidence across 479 usable cases, versus 0.657 for uniform weights, 0.518 for the best single factor, and 0.368 for recency; every paired gap's 95% bootstrap CI is above zero, and a neural network over the same factors ties the linear model. The learned weights are interpretable – reliability, emotional intensity, and self/user relevance dominate, while query-time goal similarity is correctly down-weighted for the forgetting decision. A controlled synthetic task with planted confounds confirms the learner recovers a separating weighting (1.00 retention) where uniform weighting fails (0.62). The substrate is open-source; all experiments run on a single CPU with no API calls.

18.
arXiv (CS.CV) 2026-06-11

Frozen Multimodal Embeddings for Personality and Cognitive Ability Assessment in Asynchronous Video Interviews

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging multimodal learning problem because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1\% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

19.
arXiv (quant-ph) 2026-06-24

Ultra-Low-Rate Information Reconciliation: Repetition Coding or Dedicated Codes?

arXiv:2606.23726v1 Announce Type: new Abstract: We compare repetition-based ultra-low-rate information reconciliation with dedicated ultra-low-rate codes for CV-QKD. Repetition coding offers a favorable performance-complexity trade-off, incurring only a moderate error-rate penalty while reducing decoding complexity by $2\times$, making it attractive for implementation-constrained systems.

20.
arXiv (CS.AI) 2026-06-12

Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems

arXiv:2606.06525v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have emerged as powerful foundation models with strong reasoning capabilities across domains. Beyond reactive text generation, agentic LLMs enable autonomous workflow execution through modular task decomposition and coordinated tool use. In structural engineering, recent efforts have developed agentic LLMs for automated analysis of plane frames. However, their extension to 3D frames remains underexplored due to challenges in irregular geometric representation, topological consistency, and long-horizon reasoning. This paper proposes an agentic LLM framework for automated structural analysis of 3D frames from natural language inputs. Irregular 3D frames are represented by projection onto a 2D plan, where orthogonal gridlines define spatial coordinates and a matrix of number of stories encodes vertical extrusion of each grid cell. Building on this representation, the framework establishes a multi-agent pipeline: a problem analysis agent parses input into structured JSON; a floor decomposition agent derives the spatial layout of each floor; the 3D geometry is assembled by node, girder, slab, and column agents; support and load agents assign boundary and loading conditions, and code translation agents generate executable SAP2000 script. Evaluated on ten representative 3D frames, the proposed framework achieves an average accuracy of 90% across repeated trials, demonstrating consistent and reliable performance.

21.
arXiv (CS.AI) 2026-06-19

CareTransition-Audit: A Benchmark to Audit Discharge Summaries for Efficient Care Transitions

arXiv:2604.05435v2 Announce Type: replace Abstract: Incomplete or inconsistent discharge documentation drives care fragmentation and avoidable readmissions. Despite its critical role in patient safety, auditing discharge summaries relies on manual review and does not scale. We propose an automated framework for auditing discharge summaries using large language models (LLMs). Our approach operationalizes the DISCHARGED framework into a checklist of 46 questions. Using 50 summaries from the MIMIC-IV database, with clinician ground-truth labels, we benchmark 11 LLMs. Model-assessed mean documentation completeness ranges from 54.9% to 74.2%, and the best-performing models achieve a Cohen's kappa values around 0.5 against clinician labels, indicating moderate agreement. All models struggle to identify ambiguous documentation (Unclear), highlighting a key gap in current automated auditing. This work provides a clinician-validated benchmark and zero-shot baselines for systematic quality improvement in clinical documentation.

22.
arXiv (CS.CV) 2026-06-11

Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding

In this paper, we address the problem of zero-shot understanding of accidents from surveillance videos by identifying when an impact event occurs, what type of impact it is, and where in the frame it occurs using natural language. We propose a three-stage pipeline that decomposes the accident understanding into when, what, and where. The first stage extracts a short temporal window around the impact using vision-language similarity. In the second stage, we perform metadata-driven multi-prompt reasoning with five complementary views (baseline, motion, geometry, contrast, and tiebreaker) and resolve disagreement via an entropy-gated pairwise adjudicator. Finally, we localize the impact of an open-vocabulary detector queried on the predicted accident type and scene layout, and aggregate detections across keyframes using a score-weighted centroid. Our pipeline achieves a substantial improvement in the harmonic-mean score over a centre-of-frame baseline on the zero-shot ACCIDENT @ CVPR benchmark. We show that decomposing zero-shot video understanding into temporal localization, semantic classification, and spatial grounding enable more reliable reasoning with vision-language models than direct prompting alone.

23.
arXiv (quant-ph) 2026-06-16

Exact Many-body Quantum Dynamics in One-Dimensional Baths via Collective Spins

arXiv:2505.00588v2 Announce Type: replace Abstract: Computing the exact dynamics of many-body quantum systems becomes intractable as system size grows. Here, we present a symmetry-based method that provides an exponential reduction in the complexity of a broad class of such problems $\unicode{x2014}$ qubits coupled to one-dimensional electromagnetic baths. We identify conditions under which partial permutational symmetry emerges and exploit it to group qubits into collective multi-level degrees of freedom, which we term ''superspins.'' These superspins obey a generalized angular momentum algebra, reducing the relevant Hilbert space dimension from exponential to polynomial. Using this framework, we efficiently compute many-body superradiant dynamics in large arrays of qubits coupled to waveguides and ring resonators, showing that $\unicode{x2014}$ unlike in conventional Dicke superradiance $\unicode{x2014}$ the total spin length is not conserved. At long times, dark states become populated. We identify configurations where these states exhibit metrologically useful entanglement. Our approach enables exact treatment of complex dissipative dynamics beyond the fully symmetric limit and provides a rigorous benchmark for approximate numerical methods.

24.
arXiv (CS.LG) 2026-06-12

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

Authors:

arXiv:2606.13092v1 Announce Type: new Abstract: Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$. The horizon is two-sided – a matching lower bound makes approximate equivariance provably horizon-limited – and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot – with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy – calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting – a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum – the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

25.
arXiv (CS.AI) 2026-06-15

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

arXiv:2601.19810v2 Announce Type: replace-cross Abstract: Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax