Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Polynomial-Time Mistake-Bounded Language Generation

arXiv:2606.16077v1 Announce Type: cross Abstract: In this note, we introduce a polynomial-time version of the mistake-bounded language generation (MBLG) framework due to Kleinberg, Peale, and Reingold (2026). We observe that the family of parities of variables, and the family of conjunctions of literals, are polynomial-time MBLG. Our main result states that the family of monotone Boolean functions with polynomially-many maxterms is polynomial-time MBLG. This family includes all monotone Boolean functions, computable by polynomial-size decision trees. Our technique can be presented as a new combinatorial game about writing numbers on a board.

02.
arXiv (CS.CL) 2026-06-11

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in contrast to English and other languages, research into sensitive personal information has been limited in the Japanese language. In this study, we focus on sensitive personal data defined as special care-required personal information (SCPI) under Japan's Act on the Protection of Personal Information (APPI). We construct an SCPI dataset using LLM-based annotation and train machine learning models to rapidly detect SCPI in text. As a result, our SCPI classifier can effectively identify information related to SCPI. This study is the first to explore SCPI detection in Japanese text corpora, highlighting the challenges of accurate detection.

03.
medRxiv (Medicine) 2026-06-15

Filum Terminale Diameter on Routine Pediatric MRI: A Large-Cohort Clinical Reference in 3,406 Children and the Age-Dependent Meaning of the 2-mm Thickened-Filum Threshold

Background. A filum diameter >2 mm is the conventional MRI threshold for a thickened filum, but it derives from small, mostly adult series showing no age dependence; whether one cutoff suits all of childhood is untested. Objective. To build an age-specific filum-diameter reference on routine pediatric MRI and test, adjusting for image resolution, whether the 2-mm threshold is age-stationary. Materials and methods. In this retrospective study an nnU-Net tracer measured the maximal filum diameter on consecutive lumbosacral MRI; versus manual tracing it showed negligible bias but moderate single-measure agreement. After excluding report-confirmed fatty filum, lipoma, or tethered cord, the proportion >2 mm was analysed within one acquisition protocol and by logistic regression adjusting for voxel size and slice thickness. Results. Of 7,245 examinations, 3,869 (53%) were traceable; untraced ones were younger (median 0.75 vs 2.0 years). The presumed-normal cohort had median diameter 1.48 mm. At matched resolution, 2 mm marked the 94th percentile in infants (5.6% exceeded it) but the 83rd by 3-6 years (17.4%); the age effect persisted after adjusting for voxel size and slice thickness (3-6 years vs infants, adjusted OR 4.7; P < .001). Conclusion. Filum diameter clusters near 1.5 mm, and the fixed 2-mm cutoff flags ~5% of infants but ~17% of preschoolers. Caliber should be judged against an age-specific clinical reference, not one fixed cutoff; a thick filum is not itself a diagnosis of tethered cord.

04.
arXiv (CS.CV) 2026-06-11

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

Natural environments present a complex challenge to robotics perception systems. Current models, particularly vision foundation models, are largely trained on structured, urban environments leading to weaknesses in their perception for field robotics tasks. We showcase the limitations of current models using our recently released WildCross benchmark, a new cross-modal benchmark for place recognition and metric depth estimation in large-scale natural environments. WildCross comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF pose and synchronized dense lidar submaps. In this work, we provide an expanded analysis of the benchmark results from the recent WildCross benchmark, with particular emphasis on expanded metric depth estimation experiments. Access to the code repository and dataset for this work can be found at https://csiro-robotics.github.io/WildCross.

05.
medRxiv (Medicine) 2026-06-16

High-Risk Anti-Seizure Medication Use in Childbearing-Age People with Epilepsy in a Taenia solium Endemic Region

Background: People of childbearing potential with epilepsy in regions endemic for Taenia solium, where neurocysticercosis (NCC) is highly prevalent, represent a vulnerable population due to the elevated burden of epilepsy and resource limitations. Clinical practice in these settings remains poorly characterized. This study characterized anti-seizure medication (ASM) prescribing patterns by medication risk profiles among people of childbearing potential with epilepsy in Northern Peru, a region highly endemic for T. solium. Methods: Participants were drawn from a prospective, population-based epilepsy cohort in Tumbes, Peru (2006 to 2020). The analytic population included females with epilepsy aged 15 to 49 years. The primary outcome was pregnancy-associated ASM risk of congenital malformations and adverse neurodevelopmental outcomes. ASMs were classified as ''Established Low Risk'' (lamotrigine, levetiracetam), ''Possible Risk/Inadequate Data'' (carbamazepine, phenobarbital, phenytoin), and ''Established High Risk'' (valproic acid). Prescription patterns were examined in relation to demographic and clinical characteristics. Results: Among 1,975 individuals with epilepsy, 685 were people of childbearing potential. Approximately 34.9% met criteria for probable or definite NCC. Most ASM prescriptions were in the ''Possible Risk/Inadequate Data'' category (87.0%), and 12.8% received ''Established High Risk'' medications. In multivariable analysis, high-risk prescribing was associated with prior ASM use and polytherapy. Discussion: People of childbearing potential with epilepsy were predominantly treated with carbamazepine, phenytoin, phenobarbital, and valproate, reflecting local ASM availability. Despite evidence supporting lamotrigine and levetiracetam in pregnancy, prescribing patterns reflect local formulary constraints. These findings highlight a gap between guideline recommendations and real-world prescribing in resource-limited settings, underscoring the need for context-specific treatment strategies.

06.
arXiv (quant-ph) 2026-06-12

Proper and improper mixed states serve as different prior beliefs for quantum state retrodiction

arXiv:2502.10030v2 Announce Type: replace Abstract: A mixed quantum state can be taken as capturing an unspecified form of ignorance; or as describing the lack of knowledge about the true pure state of the system ("proper mixture"); or as arising from entanglement with another system that has been disregarded ("improper mixture"). These different views yield identical density matrices and therefore identical predictions for future measurements. But when used as prior beliefs for inferring the past state from later observations ("retrodiction"), they lead to different updated beliefs. This is a purely quantum feature of Bayesian agency. Based on this observation, we establish a framework for retrodicting on any quantum belief and we prove a necessary and sufficient condition for the equivalence of beliefs. We also illustrate how these differences have operational consequences in quantum state recovery.

07.
arXiv (CS.AI) 2026-06-19

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

arXiv:2603.28387v2 Announce Type: replace Abstract: Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely mentioning MRI availability in the task prompt accounts for 70-80\% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the scaffold effect. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, while eliminating MRI-referencing behavior, collapses both conditions toward random baseline. Our findings demonstrate that surface evaluations are inadequate indicators of multimodal reasoning, with direct implications for the deployment of VLMs in clinical settings.

08.
arXiv (CS.AI) 2026-06-11

A New Perspective on Precision and Recall for Generative Models

arXiv:2511.02414v3 Announce Type: replace Abstract: With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.

09.
arXiv (CS.AI) 2026-06-19

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

arXiv:2506.14990v3 Announce Type: replace Abstract: Benchmarks play a central role in reinforcement learning (RL) research, yet their computational constraints often shape what is studied. Despite the motivation of lifelong learning, most continual RL papers consider only 3-10 sequential tasks, as CPU-bound environments make longer sequences impractical. Meanwhile, continual learning in cooperative multi-agent settings remains largely unexplored. To address these gaps, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark for continual multi-agent RL. By leveraging JAX and GPU acceleration, MEAL enables training on sequences of 100 tasks in a few hours on a single GPU. We find that long task sequences reveal failure modes that do not appear at smaller scales.

10.
arXiv (CS.CL) 2026-06-18

Improve Large Language Model Systems with User Logs

Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and diminishing returns from rising computational costs. As a result, recent work is increasing the focus on continual learning from real-world deployment, where user interaction logs provide a rich source of authentic human feedback and procedural knowledge. However, learning from user logs is challenging due to their unstructured and noisy nature. Vanilla LLM systems often struggle to distinguish useful feedback signals from noisy user behavior, and the disparity between user log collection and model optimization (e.g., the off-policy optimization problem) further strengthens the problem. To this end, we propose UNO (User log-driveN Optimization), a unified framework for improving LLM systems (LLMsys) with user logs. UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model's prior knowledge and the log data. This assessment guides the LLMsys to adaptively filter out noisy feedback and construct different modules for primary and reflective experiences extracted from user logs, thereby improving future responses. Extensive experiments show that UNO achieves state-of-the-art effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines. We have open-sourced our code at https://github.com/bebr2/UNO .

11.
arXiv (CS.CL) 2026-06-15

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources. To enable precise access to the information contained in them we need conversational interfaces based on Text-To-Cypher (Text2Cypher) parsers. This paper presents an automatic synthetic data generation method that can be leveraged to fine-tune small LLMs for this task. We conduct experiments on all the major Text-To-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models. This means that in settings in which models must be locally deployed we can ensure data-sovereignty without sacrificing accuracy and without costly annotation campaigns.

12.
arXiv (math.PR) 2026-06-18

Formation of clusters and coarsening in weakly interacting diffusions

arXiv:2510.17629v3 Announce Type: replace-cross Abstract: This paper studies the clustering behavior of weakly interacting diffusions under the influence of sufficiently localized attractive interaction potentials on the one-dimensional torus. We describe how this clustering behavior is closely related to the presence of discontinuous phase transitions in the mean-field PDE. For local attractive interactions, we employ a new variant of the strict Riesz rearrangement inequality to prove that all global minimizers of the free energy are either uniform or single-cluster states, in the sense that they are symmetrically decreasing. We analyze different timescales for the particle system and the mean-field (McKean-Vlasov) PDE, arguing that while the particle system can exhibit coarsening by both coalescence and diffusive mass exchange between clusters, the clusters in the mean-field PDE are unable to move and coarsening occurs via the mass exchange of clusters. By introducing a new model for this mass exchange, we argue that the PDE exhibits dynamical metastability. We conclude by presenting careful numerical experiments that demonstrate the validity of our model.

13.
arXiv (CS.CV) 2026-06-12

PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition

We present PROBE (PRobabilistic Occupancy BEV Encoding), a learning-free LiDAR place recognition descriptor that models each BEV cell's occupancy as a Bernoulli random variable. Rather than relying on discrete point-cloud perturbations, PROBE analytically marginalizes over continuous Cartesian translations via the polar Jacobian, yielding a distance-adaptive angular uncertainty $\sigma_\theta = \sigma_t / r$ in $\mathcal{O}(R{\cdot}S)$ time. The primary parameter $\sigma_t$ represents the expected translational uncertainty in meters, a sensor-independent physical quantity that enhances cross-sensor generalization while reducing the need for extensive per-dataset tuning. Pairwise similarity combines a Bernoulli-KL Jaccard with exponential uncertainty gating and FFT-based height cosine similarity for rotation alignment. Evaluated on four datasets spanning four diverse LiDAR types, PROBE achieves the highest accuracy among handcrafted descriptors in multi-session evaluation and competitive single-session performance relative to both handcrafted and supervised baselines. The source code and supplementary materials are available at https://sites.google.com/view/probe-pr.

14.
arXiv (math.PR) 2026-06-15

Uniform-in-time error estimates for McKean-Vlasov SDEs with common noise and stochastic algorithms

arXiv:2606.14170v1 Announce Type: new Abstract: In this work, by construct an asymptotic coupling by reflection, we first explore the uniform-in-time estimate on probability distance for two measure-valued processes induced by a McKean-Vlasov SDE with common noise and an interacting particle system, where the drift terms are dissipative merely in the long distance. As direct applications of this estimate, we establish the uniform-in-time error estimates for the numerical solutions derived via backward/tamed/adaptive Euler-Maruyama methods. Moreover, as another direct application, the uniform-in-time conditional propagation of chaos is quantified.

15.
arXiv (CS.AI) 2026-06-16

The Distributed Detectability Band Against Marginal-Preserving Attacks

arXiv:2606.10456v2 Announce Type: replace-cross Abstract: AI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to benign, so mean, max, top-k tail, and threshold monitors (Monitor A) are defeated by construction, while harm is encoded in the temporal correlation structure. We sequence the paper around three reviewer-mandated gates. (1) Realizability gate: the stealthy attack achieves KS-distance to benign of 0.013 (effectively zero) at all tested harm levels up to 3.0, confirming that harm is fully decoupled from the per-step marginal and realizability is not harm-limited. (2) Monitor-A-vs-B reconciliation: we show formally that the attack, built against Monitor A's score marginal, remains marginal-preserving under a different-score Monitor B (the correlation/sequence family: CUSUM, SPRT, HMM-LR, runs test, autocorrelation, windowed logistic), and scope worst-case claims to score functions that admit a temporal signature. (3) Non-empty detectability band: Monitor A achieves AUC 0.52 (chance); Monitor B spans AUC 0.79-0.97 at the same 1% FPR target, and as harm is amortized over more steps Monitor A collapses to chance while Monitor B holds at AUC ~0.95. These results demonstrate a non-empty detectability band and characterize the sub-threshold sabotage frontier: distribution-shape monitors fail by construction; temporal-correlation monitors can detect but are not trivially optimal.

16.
medRxiv (Medicine) 2026-06-16

Infections and suicide and self-harm: a population-based matched cohort study

Background Infections have been associated with adverse mental health outcomes, including suicide, but evidence beyond severe or central nervous system infections is limited. We investigated associations between a range of acute infections and subsequent suicide/self-harm outcomes. Methods We conducted six infection-specific matched cohort studies using English primary care records from the Clinical Practice Research Datalink Aurum (2007-2024), linked to hospital admissions and mortality data. Adults ([&ge;]18 years) with a primary care record of infection (gastroenteritis, lower respiratory tract [LRTI], skin/soft-tissue [SSTI], urinary tract [UTI], sepsis, meningitis/encephalitis [positive control]) were matched (age, sex, practice, calendar period) to up to five comparators without infection. We estimated hazard ratios (HRs) for suicide/self-harm outcomes using Cox regression, stratified by matched set and implicitly adjusting for matching factors, with additional adjustment for deprivation, lifestyle factors, and comorbidities. We examined whether associations varied over time, by infection severity, antimicrobial treatment, sex, and prior mental health conditions. Findings Cohorts ranged from 18,192 individuals with meningitis/encephalitis (matched to 90,915 without) to 398,099 with SSTI (matched to 1,743,747). After adjustment, individuals with infection had a higher hazard of suicide/self-harm outcomes than comparators across all cohorts: sepsis (HR 1.79, 95% CI 1.65-1.93), gastroenteritis (1.62, 1.55-1.70), meningitis/encephalitis (1.56, 1.32-1.84), UTI (1.41, 1.33-1.50), SSTI (1.37, 1.31-1.43), and LRTI (1.37, 1.31-1.44). Risk was highest in the year post-infection, attenuating over time, and was higher among severe infections and those without prior mental health conditions. Interpretation Common acute infections recorded in primary care are associated with increased risk of suicide and self-harm, particularly following severe infections and in the year post-infection. Findings support suicide risk monitoring following acute infection, particularly among individuals without prior mental health conditions, and highlight infection prevention as a potentially modifiable strategy in vulnerable populations. Funding Wellcome and La Caixa. Copyright This work is licensed under a Creative Commons Attribution (CC BY) licence.

17.
arXiv (CS.AI) 2026-06-18

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

arXiv:2605.10840v3 Announce Type: replace-cross Abstract: We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data – to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning – remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum – predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization – addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

18.
arXiv (quant-ph) 2026-06-17

Probes of chaos over the Clifford group and approach to Haar values

arXiv:2603.29695v3 Announce Type: replace Abstract: Chaotic behavior of quantum systems can be characterized by the adherence of the expectation values of given probes to moments of the Haar distribution. In this work, we analyze the behavior of several probes of chaos using a technique known as Isospectral Twirling [1]. This consists in fixing the spectrum of the Hamiltonian and picking its eigenvectors at random. Here, we study the transition from stabilizer bases to random bases according to the Haar measure by T-doped random quantum circuits. We then compute the average value of the probes over ensembles of random spectra from Random Matrix Theory, the Gaussian Diagonal Ensemble and the Gaussian Unitary Ensemble, associated with non-chaotic and chaotic behavior respectively. We also study the behavior of such probes over the Toric Code Hamiltonian.

19.
arXiv (CS.CV) 2026-06-16

DenseControl: Instance-Level Controllable Synthesis of Dense Crowd Image

In this paper, we introduce DenseControl, a novel pipeline for generating dense crowd images. Specifically, DenseControl meticulously positions and sizes each generated instance to align precisely with the predefined coordinates and scales. Based on this, we further allow for control over the background, style, and attributes of instances. The motivation behind DenseControl stems from the observation of two main challenges in synthesizing crowd images: controlling signal embedding and maintaining topological integrity when imparting instance scale guidance. To address these, we first introduce the Isolated Object Embedding (IOE) map, a novel representation that facilitates spatial location control while mitigating the difficulties associated with learning projections for model. Secondly, we propose an Implicit Scale Embedding (ISE) strategy that seamlessly integrates with the IOE map to encode precise scale information. To further enhance the efficacy of combining ISE with the IOE map, we incorporate a Position Shortcut mechanism that enhances cross-attention to alleviate projection challenges. We evaluate DenseControl through two lenses: synthesis quality and applicability in latent applications. Experiments across different control conditions demonstrate DenseControl achieves state-of-the-art results in dense crowd image synthesis. Furthermore, we showcase applications in augmenting crowd analysis under data scarcity, transfer learning, and weather generalization scenes, to highlight the practical utility of DenseControl. The codebase will be released.

20.
arXiv (math.PR) 2026-06-17

Optional Stopping for Superhedging Supermartingales

arXiv:2606.17452v1 Announce Type: new Abstract: Superhedging supermartingales, introduced by the authors in previous work, are non-probabilistic processes defined via subadditive outer integrals that carry a purely financial interpretation in terms of superhedging cost. Building on the Leinert-König theory of non-lattice integration, the present paper establishes several results that are classical in probability theory but whose non-probabilistic proofs require fundamentally new arguments: (i) a tower inequality for the conditional outer integral \overline{\sigma}_j applied at stopping times, reducing to equality when the integrand is conditionally integrable; (ii) three versions of Doob's optional stopping theorem, organised by the class of supermartingale and the range of the stopping times; and (iii) Dubins' upcrossing inequality in both finite- and infinite-time horizons. A key structural result, property (K)-a.e., identifies conditions under which the two superhedging operators \overline{\sigma}_j and \overline{I}_j coincide on non-negative functions, extending the scope of all preceding results to the positive operator \overline{I}_j. None of the proofs invoke classical measure-theoretic tools; in particular, (classical) integrability and measurability are not assumed. The analogues of classical stochastic results acquire a purely financial interpretation and, in this way, gain depth and generality by providing a context that is independent of any a priori probabilistic structure.

21.
arXiv (CS.CV) 2026-06-17

OpenTie: Open-vocabulary Sequential Rebar Tying System

Robotic practices on the construction site emerge as an attention-attracting manner owing to their capability of tackling complex challenges, especially in the rebar-involved scenarios. Most of existing products and research are mainly focused on the collection of large amounts of data with model training demands. To fulfill this gap, we propose OpenTie, a 3D training-free rebar tying framework utilizing a RGB-to-point-cloud generation and an open-vocabulary rebar detection on the real-world test. We implement the OpenTie via a robotic arm with a binocular camera and guarantee a high accuracy by applying the prompt-based object detection method on the image filtered by our proposed post-processing procedure for the image-to-point-cloud generation framework. Our pipeline requires no training efforts and outperforms the training-based object detection, i.e., YOLO-based method, with the verification on the real-world sequential rebar tying test. The system is flexible for horizontal and vertical rebar tying tasks and holds the potential application to the real construction site with possibility of commercialization.

22.
arXiv (CS.LG) 2026-06-11

Learning Patterns and Abstractions from Perceptual Sequences

作者:

arXiv:2503.10973v2 Announce Type: replace Abstract: Cognition swiftly breaks high-dimensional sensory streams into familiar parts and uncovers their relations. Why do structures emerge, and how do they enable learning, generalization, and prediction? What computational principles underlie this core aspect of perception and intelligence? A sensory stream, simplified, is a one-dimensional sequence. In learning such sequences, we naturally segment them into parts – a process known as chunking. In the first project, I investigated factors influencing chunking in a serial reaction time task and showed that humans adapt to underlying chunks while balancing speed and accuracy. Building on this, I developed models that learn chunks and parse sequences chunk by chunk. Normatively, I proposed chunking as a rational strategy for discovering recurring patterns and nested hierarchies, enabling efficient sequence factorization. Learned chunks serve as reusable primitives for transfer, composition, and mental simulation – letting the model compose the new from the known. I demonstrated this model's ability to learn hierarchies in single and multi-dimensional sequences and highlighted its utility for unsupervised pattern discovery. The second part moves from concrete to abstract sequences. I taxonomized abstract motifs and examined their role in sequence memory. Behavioral evidence suggests that humans exploit pattern redundancies for compression and transfer. I proposed a non-parametric hierarchical variable model that learns both chunks and abstract variables, uncovering invariant symbolic patterns. I showed its similarity to human learning and compared it to large language models. Taken together, this thesis suggests that chunking and abstraction as simple computational principles enable structured knowledge acquisition in hierarchically organized sequences, from simple to complex, concrete to abstract.

23.
medRxiv (Medicine) 2026-06-17

County Year Informatics Model for Annual and Cumulative Unique Lung Cancer Screening Eligibility in Maryland, 2026 to 2045

Purpose: Population-level lung cancer screening programs require denominators that reflect age, smoking history, geography, and changing eligibility over time. We estimated annual prevalent and 20-year cumulative unique low-dose computed tomography screening eligibility for Maryland residents under alternative screening criteria. Methods: We built a deterministic cohort-cell stock-flow simulation using Maryland county-equivalent jurisdiction projections by age, sex, and race/ethnicity, with ACS socioeconomic/nativity covariates and smoking-history priors for ever-smoked status, pack-years, and quit-years. Scenarios included USPSTF 2013 legacy, USPSTF 2021, ACS 2023/2024, a risk-model-expanded sensitivity, and ever-smoked-only capacity stress tests. Cumulative unique eligibility counted people once at first eligibility rather than summing annual prevalent person-years. Results: Under USPSTF 2021, an estimated 238,346 Maryland residents were eligible in 2026 and 245,326 in 2045. The 20-year cumulative unique denominator was 768,668, whereas naively summing annual prevalent counts produced 4,850,735 person-years, a 6.31-fold overcount. ACS 2023/2024 expanded annual eligibility to 314,616 in 2026 and cumulative unique eligibility to 902,796 by adding remote former smokers. Ever-smoked-only adult eligibility was 1,957,699 in 2026 and 3,383,683 cumulative unique over 20 years. Conclusion: A Maryland statewide screening initiative should plan from cumulative unique eligibility and county-equivalent jurisdiction-specific burden rather than annual prevalence alone. Explicit pack-year and quit-year modeling materially changes statewide and county allocation compared with current-smoking proxy models.

24.
arXiv (CS.AI) 2026-06-18

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

arXiv:2508.21720v3 Announce Type: replace Abstract: Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.

25.
arXiv (CS.AI) 2026-06-12

Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

arXiv:2606.13125v1 Announce Type: cross Abstract: Reinforcement learning has rapidly emerged as a key component in the training of reasoning and coding models, yet it remains poorly understood from a mechanistic perspective. We study how and through what underlying processes capabilities are acquired or enhanced via reinforcement learning post-training. Our analysis, based on controlled math reasoning experiments with Qwen-2.5-1.5B, reveals two core mechanisms: strategy selection and strategy improvement. Our results highlight the role of SFT data and reinforcement learning data in activating these mechanisms, in particular showing how supervising the model on diverse reasoning strategies can enable strategy selection and how increasing difficulty in reinforcement learning data can enable strategy improvement. Taken together, our results provide mechanistic insight into RL training and suggest practical interventions to continue scaling reasoning capabilities.