Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-16

Biarchetype analysis for univariate functional data. An application to macroeconomic financial time series

arXiv:2606.15881v1 Announce Type: cross Abstract: We introduce biarchetype analysis for the first time in the context of univariate functional data. This unsupervised methodology extends archetype analysis by simultaneously identifying archetypal structures across both the cases (countries, in our application) and the temporal argument. Both cases and time points are expressed as mixtures of biarchetypes, yielding a concise and highly interpretable representation of complex functional observations. Although biarchetype analysis is not intended as a clustering technique, it offers superior interpretability compared with biclustering approaches, as it is based on extreme, representative patterns rather than average centroids, thereby enhancing human comprehension. We apply the proposed method to 10-year government bond yields of European countries over the period 2001-2025. The results identify three distinct time regimes (the pre-crisis period, the euro-area sovereign debt crisis, and the post-crisis period), and reveal Germany, Greece, and Hungary as country archetypes.

02.
arXiv (CS.CL) 2026-06-11

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

Visual question answering increasingly requires multi-step reasoning. Recent post-training with reinforcement learning under verifiable rewards (RLVR) and Group Relative Policy Optimization (GRPO) can improve multimodal reasoning, but most approaches rely on sparse outcome-only rewards. As a result, they struggle to tell whether an incorrect answer comes from a small mistake late in the reasoning or from an unhelpful trajectory from the start. A common solution is to train a process reward model (PRM) for step-level supervision, but this typically requires large-scale high-quality chain-of-thought annotations and additional training cost. We propose ProcessThinker, a practical post-training pipeline that provides step-level process rewards without training an explicit PRM. ProcessThinker first rewrites reasoning traces into a step-tagged format for cold-start supervised fine-tuning, then applies GRPO with a standard format reward and our rollout-based process reward. Concretely, for each intermediate step, we sample multiple continuations from that step and use the empirical success rate (final-answer verification) as the step reward. This gives dense credit assignment and encourages reasoning steps that more reliably support a correct conclusion, helping reduce inconsistent or self-contradictory progress across steps – a key issue in logical reasoning. Across four challenging video benchmarks (Video-MMMU, MMVU, VideoMathQA, and LongVideoBench), ProcessThinker consistently improves over the baseline model Qwen3-VL-8B-Instruct

03.
medRxiv (Medicine) 2026-06-18

Expert in Ultrasound Skills: Feasibility of an IMU-video platform to describe technical profiles during focused cardiac ultrasound. Pilot study

Background: Focused cardiac ultrasound (FoCUS) is operator dependent and requires coordinated probe manipulation, image interpretation and iterative visual feedback. Existing assessment approaches often emphasize final image quality or expert rating. We developed Expert in Ultrasound Skills (EXUS) , a platform that synchronizes transducer-mounted inertial measurement unit (IMU) data with ultrasound video, and evaluated its technical feasibility during FoCUS acquisition. Methods: This observational pilot study included 6 operators performing two repetitions of a four-view FoCUS protocol, yielding 12 analytical sessions and 48 planned acquisitions. Feasibility was defined by acquisition completion, video availability, start/stop events, fused IMU-video windows, temporal coverage, complete human label entries and IMU integrity. A 100-image Likert rating task was used to summarize pairwise inter-rater agreement for still-frame image quality assessment. Results: All 48 planned acquisitions were completed with video, start/stop events, fused windows and complete human label entries. Temporal coverage was at least 90% in 47/48 acquisitions. IMU integrity endpoints exceeded the 80% threshold: 43/48 acquisitions had no extreme IMU-derived artifact, 43/48 had no active-segment IMU restart and 44/48 had no complete motion flatline. Mean pairwise exact agreement for the Likert task was 38.9%, with mean quadratic-weighted Cohen's kappa of 0.564. Post hoc profiles varied across duration, visual quality, mechanical load and motor efficiency. Conclusions: EXUS was technically feasible for synchronized IMU-video capture during FoCUS. The pilot supports multimodal acquisition data as a way to describe technical profiles and generate formative feedback hypotheses, but the post hoc indices are not validated competency measures. Keywords: focused cardiac ultrasound; point-of-care ultrasound; inertial measurement unit; medical education; deliberate practice

04.
arXiv (CS.AI) 2026-06-15

Hyperdimensional computing for structured querying on tabular data embeddings

arXiv:2606.13871v1 Announce Type: new Abstract: Tabular data embeddings have become a cornerstone of data profiling and data integration pipelines, enabling tasks such as entity annotation and resolution; schema matching; column type detection; and table search, among others. Existing approaches embed rows, columns, or entire tables into a vector space and rely on nearest-neighbor search to retrieve candidate matches. A fundamental limitation of current embedding methods is the lack of interpretable similarity scores: the concrete similarity value between a query and its nearest neighbour carries no intrinsic meaning, making it impossible to determine whether that neighbour is a true match or simply the least-dissimilar item in a corpus that contains no valid answer. This inability to set principled thresholds for retrieval undermines practical deployment, particularly for zero-match detection. We investigate the use of HyperDimensional Computing (HDC), specifically the Holographic Reduced Representations (HRR) model, as a framework for tabular row embeddings when the retrieval task corresponds to answering structured select-project queries in vector space. Exploiting the algebraic properties of HDC operations, we derive closed-form expected similarity values for both equality and non-equality retrieval predicates, which converge to interpretable values as dimensionality increases, and use these to identify suitable retrieval thresholds. We evaluate HDC against EmbDI, a graph-based baseline, on two real-world datasets across varying table sizes and predicate lengths. Our results show that HDC matches or outperforms EmbDI for row retrieval across all configurations, handles non-equality predicates more robustly, and achieves perfect attribute projection accuracy at sufficient dimensionality – while uniquely enabling reliable identification of zero-match predicates through its principled thresholds.

05.
arXiv (quant-ph) 2026-06-24

Improved State Readout in NV Centers using Regression Models and Rabi Driving

arXiv:2606.23454v2 Announce Type: replace Abstract: Readout of state populations in nitrogen-vacancy centers from fluorescence measurements at room-temperature is routinely achieved via contrast-based calibration. The fidelities achieved by this conventional approach are limited by reducing the dynamical fluorescence behaviour of the NV center to a scalar value, and calculating the population of each possible state independently. To address these limitations, we use regression models trained on experimental data to map the fluorescence signals onto ideal simulated populations. Additionally, we enhance the informational content of the fluorescence signals by performing measurements during induced Rabi oscillations. Our results demonstrate that including these dynamical signals significantly reduces state readout errors across multiple tested models. Notably, linear ridge regression performs nearly on par with a non-linear kernel-based model, showing that simple models already capture the relevant mapping between the enhanced fluorescence signals and the underlying state populations. This data-driven approach provides a robust alternative that achieves higher fidelities than conventional calibration in our setting, paving the way for high-fidelity state readout in solid-state quantum registers.

06.
arXiv (CS.AI) 2026-06-16

Learning Earthquake Wave Arrival Time Picking from Labels with Inaccuracies

arXiv:2606.15377v1 Announce Type: cross Abstract: Inaccurately labeled training data, or "label noise", poses a significant threat to the integrity of supervised machine learning models. This corruption directly degrades performance by teaching the model erroneous mappings between features and labels, which leads to poor generalization and reduced accuracy on properly labeled validation and test data. Current seismological applications mainly rely on large-scale training sets or data augmentation to reduce the label-noise impact, which can be labor-intensive and costly. Here, we introduce a Label Noise-Contrastive Robust Learning (LaNCoR) approach that can effectively handle noisy labels in seismic signal processing tasks, without requiring large-scale training datasets. In this approach, the input waveform feature and label representation distributions are aligned in the feature space to correct mislabeling and reduce its impact on the training process. We present LaNCoR's performance on the task of P-phase arrival-time picking of real microseismic data using two baseline models and training approaches. Our results indicate that LaNCoR can improve performance by up to 28.8% across performance metrics. This approach holds great promise for model training in seismology and geosciences.

07.
medRxiv (Medicine) 2026-06-18

Web-based education on Metabolism and Obesity is associated with improved lifestyle and health behaviours among Brazilian school teachers

Background: Obesity is a major global public health challenge, and teachers play a critical role in school-based health promotion. This study examined the perceived impact of a web-based educational program on metabolism and obesity delivered to Brazilian school teachers. Methods: This analytical cross-sectional study included 217 teachers who responded to the evaluation questionnaire after attending the course between 2017 and 2022. Statistical analyses included logistic regression and chi-square tests. Findings: Course completion rate was 81.98%, substantially exceeding the 5-15% typical of global MOOCs. However, ethnic disparities were observed: White respondents were 4.95 times more likely to complete the course than Black respondents (p=0.00097) and Brown respondents were 3.05 times more likely (p=0.0268) than Black respondents. Among non-completers, lack of time (64.7%) was the primary barrier. Participation was concentrated in Sao Paulo (77%), with no respondents from three northern states. Perceived difficulty showed a non-significant trend (p=0.0893) where by Black respondents had the lowest predicted difficulty; the most challenging course material was Scientific Content/Reading papers (50%). Completion was strongly associated with applying learned activities in teaching (p

08.
arXiv (CS.AI) 2026-06-19

TerraMind: Large-Scale Generative Multimodality for Earth Observation

arXiv:2504.11171v5 Announce Type: replace-cross Abstract: We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) – the capability of generating additional artificial data during finetuning and inference to improve the model output – and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.

09.
arXiv (CS.CV) 2026-06-17

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

The shift from video generation to interactive world modeling places new demands on data: beyond captioned videos, world models require temporally aligned video-action-language trajectories grounded in the actions, camera motion, states, and events that drive future scene changes. However, such data is difficult to obtain at scale. Web video datasets offer broad visual coverage but lack executable actions and reliable states; robotic datasets provide action and state supervision but are costly and limited in scene diversity; and existing simulators often lack large-scale human-driven interaction trajectories. In this paper, we introduce EgoCS-400K, a large-scale replay-grounded egocentric Counter-Strike dataset for world models, built from public professional CS and CS2 match demos that preserve human gameplay trajectories and enable parsing, replaying, rendering, and temporal alignment. We extract player states, view directions, movements, keyboard/button inputs, view-angle changes, weapon usage, game events, and round-level context, and render clean first-person videos from the same trajectories. EgoCS-400K contains over 400,000 first-person videos and 10,000 hours of gameplay from more than 1,000 matches and 40,000 rounds, covering 13 maps and 10 player viewpoints per round. It supports a range of interactive visual modeling tasks, including action-conditioned future prediction, state- and event-aware scene rollout, replay-grounded captioning, and agent egocentric action understanding. By connecting visual observations with human actions, camera motion, game states, and events at scale, EgoCS-400K serves as a practical bridge between passive web videos, controllable game simulation, and costly real-world embodied data.

10.
arXiv (CS.CL) 2026-06-12

Agents' Last Exam

Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long horizon, economically valuable, real world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 sub fields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is below 1%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP relevant impact.

11.
arXiv (quant-ph) 2026-06-19

Optimizing resource allocation for accuracy in noisy variational quantum algorithms

arXiv:2606.20153v1 Announce Type: new Abstract: For quantum algorithms to achieve their full potential, we need methodologies to optimize them, such as reaching a given output accuracy with minimal resource costs. Here, we develop such a methodology for a class of Noisy Intermediate-Scale Quantum (NISQ) algorithms. We leverage simulations of a Variational Quantum Eigensolver (VQE) to propose a phenomenological model of such algorithms that captures the complex relationship between algorithmic accuracy, algorithmic resource costs, and the noise that exists in realistic quantum hardware. For this, we take the algorithmic resource cost to be the total number of quantum gate-operations in the algorithm; minimizing this cost typically makes the algorithm faster and more energy-efficient. We consider the subtle trade-off between quantum circuit size (small circuits are too imprecise, but large ones are too noisy), and the number of iterations of that quantum circuit for the full algorithm to sufficiently converge. Using a noise-metric-resource methodology, we identify the sweet spot (of circuit size versus iterations) that minimizes the algorithmic resource costs for a desired algorithm accuracy. It also gives the circuit size that maximizes algorithm accuracy for a fixed resource cost. Our methodology provides a practical guideline for near-term deployment of variational algorithms on realistic noisy hardware, including hardware that uses error mitigation.

12.
arXiv (CS.LG) 2026-06-12

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

arXiv:2606.13626v1 Announce Type: cross Abstract: We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and generative adversarial networks. We compare their ability to model polyphonic note sequences, learn useful latent representations, and generate stylistically coherent compositions. Our experiments show that the autoregressive LSTM with attention produces the most musically coherent samples, while vector quantization helps mitigate posterior collapse and yields more structured outputs than conventional recurrent VAEs. The adversarial approach captures local pitch patterns but remains difficult to train and generalizes less reliably to Bach's style. These results highlight the relative strengths and failure modes of autoregressive, latent-variable, and adversarial approaches for symbolic music generation.

13.
arXiv (CS.AI) 2026-06-17

IsabeLLM: Automated Theorem Proving Applied to Formally Verifying Consensus

arXiv:2606.18098v1 Announce Type: new Abstract: Advances in Artificial Intelligence (AI) have led AI for Theorem Proving to become a promising means of formally verifying computer systems. Whilst formal verification is traditionally reserved for safety-critical systems due to the required amount of expertise and effort, AI can help to automate a large amount of this workload and make it far more accessible. Blockchain-based systems are becoming increasingly popular and are frequently targeted by malicious actors, often resulting in huge financial losses, highlighting the need to better verify these systems and mitigate vulnerabilities. Arguably the most important component of these systems is the consensus protocol, which allows nodes to agree on decisions in a potentially adversarial environment. In this paper, we improve upon IsabeLLM, the automated theorem proving tool in Isabelle. Namely, we implement a Retrieval-Augmented Generation framework, Error tracing and counterexample generation for improved context supplied to the Large Language Model. Compatibility with the latest version of Isabelle and Sledgehammer is also implemented for improved efficiency. We compare the performance of the two versions of IsabeLLM in their ability to complete the verification of Bitcoin's Proof of Work consensus.

14.
arXiv (CS.LG) 2026-06-16

Closing the Approximation Gap in Simulation-free Latent SDEs

arXiv:2606.16138v1 Announce Type: cross Abstract: Recovering dynamical systems from noisy observations is a recurring challenge across scientific domains, including neuroscience and physics. Latent stochastic differential equations (SDEs) address this by modeling the system as an unobserved state that evolves according to a learnable SDE and generates the observations. Variational inference (VI) provides a tractable objective for fitting latent SDEs. Traditional VI algorithms evaluate this objective by numerical simulation over a time discretization, trading fidelity for computational cost. A recent class of algorithms, simulation-free VI, sidesteps this tradeoff by parameterizing the posterior through its instantaneous marginals rather than its drift. In this work, we show that the efficiency of existing simulation-free VI algorithms comes at a price: their parameterizations restrict the approximate posterior to a subset of the SDEs available to simulation-based methods, degrading posterior inference and parameter learning. We propose Helmholtz-SDE, a simulation-free VI algorithm that closes this gap by optimizing over path laws compatible with a prescribed collection of marginals. Helmholtz-SDE recovers dynamics more faithfully than prior simulation-free methods, with the largest gains under high posterior uncertainty. It further matches the performance of simulation-based VI at a fraction of the runtime.

15.
arXiv (CS.CL) 2026-06-16

From Awareness to Adherence: Bridging the Context Gap in Spoken Dialogue Systems via Context-Aware Decoding

Despite the success of end-to-end (E2E) spoken dialogue systems, maintaining strict context adherence in multi-round conversations remains a challenge. While prior works attribute these failures to models forgetting dialogue history, we highlight an equally critical but overlooked bottleneck: a gap between latent context awareness and active adherence. Although models internally recognize relevant past utterances, strong parametric priors often overshadow these signals during decoding. To bridge this gap, we propose an audio-adapted Context-Aware Decoding (CAD) approach. By leveraging internal attention mechanisms to isolate key historical rounds, our approach contrasts output distributions with and without this key context during inference, directly amplifying multimodal contextual signals. Evaluations on the Audio MultiChallenge benchmark demonstrate significant improvements in Semantic Memory and Self Coherence subtasks, successfully enforcing strict, context-faithful adherence.

16.
medRxiv (Medicine) 2026-06-11

Computer Vision Scoring of Figure Copy and Recall

Objective. Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters. Methods. A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters. Results. Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect ({Delta} = +1.18) that continued with repeated tests up to 30 months ({Delta} = +0.75).

17.
arXiv (CS.LG) 2026-06-12

Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory

arXiv:2508.12681v3 Announce Type: replace-cross Abstract: Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of up to 44,000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3\% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

18.
arXiv (CS.CV) 2026-06-18

Automatic ply-specific analyses of CFRP micrographs using shortest-path-based ply distinction

We present an automated approach to distinguish between ply instances in semantic segmentation masks of high-resolution carbon-fiber reinforced polymer micrographs. Interpreting the segmentation mask as a graph with pixels as vertices, enables us to use a shortest-path algorithm yielding the ply-separating paths. Thereby, we bridge the gap between semantic segmentation and ply instance segmentation using global information. We successfully apply our approach on high-resolution micrographs featuring a broad range of characteristics like artificially added gaps in single or multiple plies, different stacking sequences and ply traversing cracks. Assigning each fiber pixel to a ply based on the calculated paths, allows for a comprehensive, quantitative ply analysis with respect to its microstructural properties like the local fiber volume fraction as well as locally resolved ply and interleaf layer thickness. These insights help to reveal manufacturing-induced inhomogeneities, draw conclusions on manufacturing parameters and link mechanical properties to underlying microstructural imperfections.

19.
medRxiv (Medicine) 2026-06-10

Seasonality, source type, and women's water labor: A longitudinal mixed-methods study in Kenya and Honduras

Women shoulder the majority of water collection labor globally, yet how their water collection and water-related work experiences may change over time or by water source type remains insufficiently understood. We conducted a longitudinal, mixed-methods study in rural Kenya and Honduras to understand how women's experiences collecting water and performing water-related work varied between (a) two time points, (b) improved and unimproved water source types, and (c) water source location. Data were collected in 2023 and 2024 using interviews, observation, GPS-enabled watches, and scales to measure time and distance traveled, water weight and volume carried, and calories expended. 133 women participated in data collection (66 Kenya, 67 Honduras). We compared women's experience data by time point (2023 vs. 2024), source type (improved vs. unimproved), and source location (off-premises vs. on-premises) (t-test, Mann-Whitney U test). We also mapped participants' routes and activities to show which sources were visited, when, and for what activities. In Kenya, mean water collection time, distance, and caloric expenditure were significantly lower and water volume was significantly higher in 2024 when there were unexpected rains compared to 2023 when there was a persistent drought. When comparing source types during the 2023 drought, journeys to improved sources took significantly less time and energy and covered less distance than journeys to unimproved sources. These differences were not observed during the rainy conditions of 2024 when unimproved sources were closer and more accessible. In Honduras, water collection and water work burdens did not differ significantly by time point or source type. We found women with on-premises water access to still expend considerable time and caloric expenditure engaging in water work within their household compounds. Findings from Kenya suggest that water infrastructure improvements can reduce women's water collection burdens, though benefits may depend on and vary by season and source location. Findings from Honduras show that water labor does not end once water is in the household. Rather, substantial time and energy are expended carrying out water-related work even when sources are on premises, suggesting that efforts to assess water labor need to extend beyond collection alone. To meaningfully reduce burdens and ensure improved water sources are utilized during all seasons, initiatives need to consider source location, seasonal variability, and work beyond collection. Evaluations to assess infrastructure impacts on women's labor and well-being are needed and long overdue.

20.
arXiv (CS.CL) 2026-06-12

Detecting Functional Memorization in Code Language Models

Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap.

21.
arXiv (quant-ph) 2026-06-24

Quantum algorithm for Valiant-Vazirani reduction

arXiv:2606.18428v2 Announce Type: replace Abstract: There is growing interest in extensions of the standard model of gate-based quantum computation to include auxiliary degrees of freedom evolving according to a nonlinear Schrödinger equation. By reducing the Boolean satisfiability problem SAT to quantum state discrimination, Abrams and Lloyd argued that the right type of nonlinearity can be used to solve NP and #P problems in polynomial time, at least in an idealized noise-free limit. For practical implementation, however, we are restricted to simulated and emergent nonlinearities, such as that appearing in mean field models for ultracold atoms and similar ensembles. A prominent example is the torsion model, which arises in two-component Bose-Einstein condensates and spin models with all-to-all Ising interaction. But torsion-based state discrimination appears to fall short of solving SAT. Here we close this gap by constructing the filtered oracle of the Valiant-Vazirani theorem, providing a randomized polynomial-time reduction from SAT to UNIQUE SAT, a promise problem where there is at most 1 satisfying assignment. In the noise-free limit, the UNIQUE SAT problem can be solved in polynomial time using torsion nonlinearity. Quantum Valiant-Vazirani reduction is no faster than the efficient classical version, but a fault-tolerant implementation coupled to a nonlinear quantum coprocessor simulating torsion would enable polynomial time solution to NP (but not #P) problems.

22.
arXiv (CS.CL) 2026-06-24

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across generation and question answering tasks, with a focus on commonsense knowledge retrieval and completion. We highlight the benefits of incorporating multiple objectives during both pre-training and fine-tuning stages. We introduce the Match Task to Objective (MTO) framework and methods for determining the appropriate objective for a given task. This framework offers automated methods to prepare task-related data for adaptation through unsupervised training, based on the identified objective. In the fine-tuning stage, we design novel templates that align with the objectives of the pre-training and adaptation stages. When aligned with task requirements, these strategies can achieve a performance gain of over 120\% compared to conventional methods in few-shot settings. They significantly outperform related works in few-shot settings and exceed the baseline even in full-dataset scenarios. Furthermore, we extend this approach to include prompt-tuning methodologies, providing guidance for more effective soft prompt engineering and optimization. Our strategies significantly enhance prompt-tuning performance as well. These insights hold substantial value, precisely guiding the selection and optimization of models customized for specific tasks. Code is available at https://github.com/puraminy/MTO/

23.
arXiv (CS.CL) 2026-06-16

ACC: Compiling Agent Trajectories for Long-Context Training

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

24.
arXiv (CS.LG) 2026-06-16

Discovering Subgroups with Exceptional Survival Characteristics

arXiv:2602.22179v2 Announce Type: replace Abstract: In many applications, it is important to identify subpopulations that survive longer or shorter than the rest of the population. In medicine, for example, it allows determining which patients benefit from treatment, and in predictive maintenance, which components are more likely to fail. Existing methods for discovering subgroups with exceptional survival characteristics rely on restrictive assumptions about the survival model (e.g. proportional hazards), require pre-discretized features, and, as they compare average statistics, tend to overlook individual heterogeneity. In this paper, we propose Sysurv, a non-parametric, fully differentiable method that discovers human-readable rules selecting subgroups with exceptional survival characteristics. Empirical evaluation on a wide range of datasets and settings, including a case study on cancer data, shows that Sysurv reveals insightful and actionable survival subgroups, outperforming the state of the art.

25.
arXiv (CS.CL) 2026-06-19

Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings

Aligning language models with human preferences often requires optimising multiple behavioural objectives. A practical approach is to apply these objectives sequentially using preference optimisation methods such as Direct Preference Optimisation (DPO), but it remains unclear whether later training uniformly degrades preferences learned earlier or whether the effect depends on the relationship between objectives. We study sequential DPO across four preference settings covering distributional conflict, multi-attribute interaction, strong safety signal, and compatible response-quality objectives. Using Llama-3.1-8B-Instruct with LoRA adapters, we evaluate all objectives after every stage with a fixed base-model reference. We find that sequential DPO does not produce a single forgetting pattern; preference change ranges from partial degradation to stability, pair-level redistribution, or positive transfer depending on objective relationship, signal strength, and training order. Pair-level analysis using length-normalised policy margins shows that aggregate metrics can mask heterogeneous changes across preference pairs, whereas quartile decomposition reveals that high-confidence pairs can either degrade or improve depending on the setting. Mechanistic diagnostics show that Stage~2 gradients and adapter updates are near-orthogonal to the previous objective across all settings, providing little evidence that direct gradient opposition is the primary driver. These findings suggest that future sequential alignment pipelines should account for objective compatibility and signal strength, rather than assuming that later objectives affect earlier preferences uniformly.