← 返回首页

📚 AI论文速递 2026-03-24

7大主题 × 10篇 | 由 伊利虾 🦐 自动整理

🧠 大语言模型

🖼️ 计算机视觉

🎨 多模态学习

📊 新数据集

✂️ 模型压缩

📝 综述论文

1. Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents➡️ 中文翻译:重新审视基于 **Hierarchical Feature Selection** 与 **AI Agents** 的 **Virtual Study Group** 的 **Gene Ontology** 知识发现。

👤 Cen Wan, Alex A. Freitas

📄 Large language models have achieved great success in multiple challenging tasks, and their capacity can be further boosted by the emerging agentic AI techniques. This new computing paradigm has already started revolutionising the traditional scientific discovery pipelines. In this work, we propose a novel agentic AI-based knowledge discovery-oriented virtual study group that aims to extract meaningful ageing-related biological knowledge considering highly ageing-related Gene Ontology terms that are selected by hierarchical feature selection methods. We investigate the performance of the proposed agentic AI framework by considering four different model organisms' ageing-related Gene Ontology terms and validate the biological findings by reviewing existing research articles. It is found that the majority of the AI agent-generated scientific claims can be supported by existing literatures and the proposed internal mechanisms of the virtual study group also play an important role in the designed agentic AI-based knowledge discovery framework.

📄 大语言模型已在多个挑战性任务中取得了巨大成功,其能力可通过新兴的agentic AI技术进一步提升。这一新的计算范式已经开始革新传统的科学发现流程。在这项工作中,我们提出了一个基于agentic AI的新型知识发现导向虚拟研究小组,旨在提取有意义的衰老相关生物学知识,同时考虑通过层次特征选择方法筛选出的高度衰老相关Gene Ontology术语。我们通过考虑四种不同模式生物的衰老相关Gene Ontology术语来研究所提出的agentic AI框架的性能,并通过回顾现有研究文章来验证生物学发现。研究发现,大多数AI agent生成的科学主张可以获得现有文献的支持,且虚拟研究小组的内部机制在设计的基于agentic AI的知识发现框架中也发挥着重要作用。

2. HortiMulti: A Multi-Sensor Dataset for Localisation and Mapping in Horticultural Polytunnels➡️ **HortiMulti:用于园艺多隧道温室定位与测绘的多传感器数据集**

👤 Shuoyuan Xu, Zhipeng Zhong, Tiago Barros

📄 Agricultural robotics is gaining increasing relevance in both research and real-world deployment. As these systems are expected to operate autonomously in more complex tasks, the availability of representative real-world datasets becomes essential. While domains such as urban and forestry robotics benefit from large and established benchmarks, horticultural environments remain comparatively under-explored despite the economic significance of this sector. To address this gap, we present HortiMulti, a multimodal, cross-season dataset collected in commercial strawberry and raspberry polytunnels across an entire growing season, capturing substantial appearance variation, dynamic foliage, specular reflections from plastic covers, severe perceptual aliasing, and GNSS-unreliable conditions, all of which directly degrade existing localisation and perception algorithms. The sensor suite includes two 3D LiDARs, four RGB cameras, an IMU, GNSS, and wheel odometry. Ground truth trajectories are derived from a combination of Total Station surveying, AprilTag fiducial markers, and LiDAR-inertial odometry, spanning dense, sparse, and marker-free coverage to support evaluation under both controlled and realistic conditions. We release time-synchronised raw measurements, calibration files, reference trajectories, and baseline benchmarks for visual, LiDAR, and multi-sensor SLAM, with results confirming that current state-of-the-art methods remain inadequate for reliable polytunnel deployment, establishing HortiMulti as a one-stop resource for developing and testing robotic perception systems in horticulture environments.

📄 Agricultural robotics 在研究和实际部署中都日益重要。随着这些系统被期望在更复杂的任务中自主运行,代表性的真实世界数据集变得尤为关键。虽然城市和林业机器人等领域受益于大型且成熟的基准数据集,但园艺环境尽管经济意义重大,却相对缺乏探索。为填补这一空白,我们提出了 **HortiMulti**,一个多模态、跨季节的数据集,采集自整个生长季的商业草莓和覆盆子塑料隧道,涵盖了显著的外观变化、动态的叶片、塑料覆盖物的镜面反射、严重的感知混叠以及 GNSS 不可靠的情况——所有这些因素都会直接削弱现有的定位与感知算法。传感器套件包括两台 3D **LiDAR**、四台 **RGB** 摄像头、一个 **IMU**、**GNSS** 以及轮式里程计。真实轨迹由全站仪测量、**AprilTag** 基准标记以及 **LiDAR‑inertial** 里程计相结合得到,覆盖稠密、稀疏以及无标记的区域,以支持在受控和真实条件下的评估。我们发布了时间同步的原始测量数据、标定文件、参考轨迹以及针对视觉、**LiDAR** 和多传感器 **SLAM** 的基线基准;实验结果证实,当前的最先进方法在可靠的塑料隧道部署中仍显不足,从而确立了 **HortiMulti** 作为园艺环境下机器人感知系统研发与测试的一站式资源。

3. A Century of Radial Velocity and Astrometric Monitoring of 70 Oph AB: New PFS Data and Constraints on Planetary Companions➡️ **中文翻译:** 过去一个世纪对 70 Oph AB 的径向速度与天体测量监测:新的 PFS 数据以及对行星伴星的限制。

👤 Yiting Li, Michael R. Meyer, Skylar D'Angiolillo

📄 At a distance of 5.1 pc, the 70 Oph AB binary star system is one of the most favorable targets for future direct imaging and astrometry missions surveying mature, terrestrial planets. We present new radial velocities (RVs) obtained with the Planet Finder Spectrograph (PFS) on the 6.5\,m Magellan II Clay Telescope in Chile. We collected 499 measurements of 70 Oph A and 334 measurements of 70 Oph B during 2023--2025. Combining these data with decades of archival RVs and astrometry, we derive an updated orbital solution for the binary and dynamical masses of $0.88 \pm 0.004\,M_\odot$ and $0.73 \pm 0.003\,M_\odot$ for the primary and secondary components, respectively. We find that the long-term RV variability of both components is consistent with stellar activity modulated by rotation periods, and we detect no coherent planetary signals in either component. We place upper limits on any planets orbiting in the plane of the binary. The 27 yr RV baseline for 70 Oph A excludes Jupiter-mass planets interior to 5 au and reaches a sensitivity of $0.3\,M_{\rm Jup}$ at 1 au or $0.5\,M_{\rm Jup}$ at 2 au. For 70 Oph B, with PFS data we rule out planets more massive than $0.25$--$0.3\,M_{\rm Jup}$ inside 0.5 au. We show that stable S-type orbits around 70 Oph A extend to $\sim2.5$ au, covering the habitable zone. Thus, Saturn-mass planets or smaller on stable orbits in the habitable zone of 70 Oph A are allowed. Overall, our results provide important guidance for future planet searches around this stellar system.

📄 在5.1 pc的距离上,70 Oph AB 双星系统是未来直接成像和天体测量任务搜索成熟类地行星的最有利目标之一。我们展示了使用位于智利的 6.5 m Magellan II Clay 望远镜上的 Planet Finder Spectrograph (PFS) 获得的新径向速度 (RVs) 数据。2023 年至 2025 年期间,我们共获取了 70 Oph A 的 499 次测量和 70 Oph B 的 334 次测量。结合数十年的档案径向速度和天体测量数据,我们推导出了该双星系统的更新轨道解,并得到主星和次星的质量分别为 $0.88 \pm 0.004\,M_\odot$ 和 $0.73 \pm 0.003\,M_\odot$。我们发现两颗星的长周期径向速度变化与由自转调制的恒星活动相一致,且在两颗星中均未检测到相干的行星信号。我们对位于双星轨道平面内的任何行星设定了上限。对于 70 Oph A,27 年的径向速度基线排除了在 5 au 以内的木星质量行星,并在 1 au 处达到 $0.3\,M_{\rm Jup}$、在 2 au 处达到 $0.5\,M_{\rm Jup}$ 的灵敏度。对于 70 Oph B,利用 PFS 数据我们排除了在 0.5 au 以内质量大于 $0.25$–$0.3\,M_{\rm Jup}$ 的行星。我们表明,围绕 70 Oph A 的稳定 S 型轨道可延伸至约 2.5 au,覆盖了宜居带。因此,围绕 70 Oph A 的宜居带内稳定轨道上允许存在土星质量或更小的行星。总体而言,我们的结果为未来在该恒星系统中搜寻行星提供了重要的参考和指导。

4. Cosmological forecast from the full-sky angular power spectrum and bispectrum of 21cm intensity mapping➡️ 全天空角功率谱和 Bispectrum 对 21cm 强度映射的宇宙学预测。

👤 Rodrigo F. Pinheiro, André A. Costa, Yu Sang

📄 We compute the full-sky angular power spectrum and bispectrum, along with their Fisher matrices, to forecast constraints on cosmological parameters for the BINGO and SKA1-MID Band 2 radio telescopes. This represents the first forecast analysis using the full-sky relativistic bispectrum in redshift space for these surveys. Our results show that the second-order velocity contribution, often neglected under the Limber approximation, accounts for approximately $24\%$ of the total signal at low redshifts, indicating that it must be included for accurate modeling. Using these forecasts, we find that while the bispectrum provides constraints comparable to the angular power spectrum for $Λ$CDM and ${\rm w}$CDM models, it becomes a powerful probe of dynamical dark energy. Restricting the analysis to linear scales, we show that the inclusion of the bispectrum yields a substantial improvement in the determination of the Chevallier-Polarski-Linder (CPL) parameters. In particular, the joint analysis of the bispectrum, power spectrum, and Planck CMB data improves constraints on ${\rm w}_0$ and ${\rm w}_a$ by over $70\%$, and the Hubble parameter $h$ by approximately $60\%$. These results underscore the importance of relativistic bispectrum for breaking parameter degeneracies and probing the nature of dark energy with upcoming large-scale structure surveys.

📄 我们计算全天空角功率谱和双谱,以及它们的 Fisher 矩阵,以预测 BINGO 与 SKA1‑MID Band 2 射电望远镜对宇宙学参数的约束。这是对这些巡天在红移空间使用全天空相对论双谱的首次预测分析。我们的结果表明,在低红移处,通常在 Limber 近似下被忽略的二级速度贡献约占总信号的 24 %,这表明在进行精确建模时必须包含它。利用这些预测,我们发现,虽然双谱对 ΛCDM 与 wCDM 模型的约束与角功率谱相当,但双谱成为动力学暗能量的有力探针。限制在线性尺度上进行分析,我们表明加入双谱可显著改进 Chevallier‑Polarsky‑Linder(CPL)参数的测定。特别是,结合双谱、功率谱与 Planck CMB 数据的联合分析,使 \(w_0\) 和 \(w_a\) 的约束提升超过 70 %,并使哈勃参数 \(h\) 的约束提升约 60 %。这些结果凸显了相对论双谱在破除参数简并、以及利用即将开展的大尺度结构巡天探测暗能量本质方面的重要性。

5. Galaxy sizes as complementary (zero-)bias tracers of local primordial non-Gaussianity➡️ Galaxy 大小作为互补的 (zero-)bias 示踪剂用于局域原始非高斯性

👤 Nhat-Minh Nguyen, Kazuyuki Akitsu, Atsushi Taruya

📄 The scale-dependent bias in halo and galaxy power spectra is a key signature of local primordial non-Gaussianity (local PNG), with PNG sensitivity scaling as $b_φ/b_1$ -- the ratio of their responses to long-wavelength primordial potential $b_φ$ and late-time density fluctuations $b_1$. For number density fluctuations, these responses are closely tied by the universality relation, limiting the achievable ratio. We show that size density fluctuations strongly violate this relation, thus evading the limit. For galaxy-mass halos, sizes have a vanishingly small density response but a sizable, negative local PNG response, implying an effective $b_φ/b_1$ that is large in magnitude and opposite in sign to that of number counts. This makes galaxy sizes complementary probes of local PNG from the same galaxy sample, without any sample split. For a DESI-like survey, a multi-tracer analysis combining galaxy numbers and sizes improves the local-PNG detection significance by a factor of $\sim\!3.6$. Due to the sign flip, the number-size cross power spectrum further provides a handle on systematics in the event of a detection.

📄 光晕(halo)和星系(galaxy)功率谱(power spectra)的尺度依赖偏差是局域原始非高斯性(local PNG)的一个关键特征,PNG 的敏感度随 $b_{\phi}/b_{1}$ 缩放,即它们对长波原始势 $b_{\phi}$ 与后期密度涨落 $b_{1}$ 的响应之比。对于数密度涨落(number density fluctuations),这些响应通过普适性关系(universality relation)紧密相连,限制可实现的比值。我们发现尺寸密度涨落(size density fluctuations)强烈违背该关系,从而规避了上述限制。对于星系质量的暗晕(galaxy‑mass halos),尺度的密度响应几乎为零,但对局域 PNG 的响应却相当大且为负,这意味着有效的 $b_{\phi}/b_{1}$ 在幅度上很大,并且符号与数目计数的相反。这使得星系尺寸成为对同一星系样本中局域 PNG 的互补探测手段,无需进行样本分割。对于类似 DESI 的巡天(DESI‑like survey),结合星系数目和尺寸的多示踪剂(multi‑tracer)分析可将局域 PNG 的探测显著性提升约 $\sim\!3.6$ 倍。由于符号反转,数目‑尺度交叉功率谱(number‑size cross power spectrum)还能在检测到信号时提供对系统误差(systematics)的控制手段。

6. Early emission characterization of TDE2025aarm➡️ 中文翻译: **TDE2025aarm 的早期辐射特征描述**

👤 Andrea Simongini, Maria Kherlakian, Alicia López-O

📄 In this Letter, we present early emission data analysis of the tidal disruption event TDE2025aarm, including optical, UV and X-ray data. At a redshift of z = 0.01368, TDE2025aarm is the second closest TDE ever discovered, offering an unprecedented opportunity to study such phenomena in great details. We observed TDE2025aarm in optical with the Liverpool Telescope for a total of three epochs, and complemented our dataset with ancillary spectroscopic and photometric data. The early optical spectra are characterized by a blue-continuum and helium, hydrogen and possibly Bowen lines typical of H+He events. The optical light curves peak at M_g ~ -18.63 mag and are well described by fallback of a M_star ~ 0.16 M_sun star onto a M_BH ~ 2x10^{7} M_sun black hole. We report Swift-XRT detection in the 0.3-10 keV range, with a total flux of F_X ~ 1.42x10^{-14} erg s-1 cm-2, fitted by a black-body with kT ~ 0.39 keV. This makes TDE2025aarm a new event among optical/UV bright TDEs detected in soft X-rays. Our analysis suggests that the early emission from TDE2025aarm is powered by circularization shocks, and that the delayed accretion scenario best describes the observed features.

📄 在这篇通讯中,我们展示了潮汐破坏事件 TDE2025aarm 的早期辐射数据分析,包括光学、UV 和 X-ray 数据。在红移 z = 0.01368 处,TDE2025aarm 是迄今为止发现的第二近的 TDE,为详细研究此类现象提供了前所未有的机会。我们使用 Liverpool Telescope 对 TDE2025aarm 进行了三个历元的光学观测,并用辅助的光谱和测光数据补充了我们的数据集。早期光学光谱的特征是蓝色连续谱以及氦、氢和可能的 Bowen 发射线,这是 H+He 事件的典型特征。光学光变曲线的峰值在 M_g ~ -18.63 mag,可以用一颗 M_star ~ 0.16 M_sun 的恒星坠入一个 M_BH ~ 2×10^7 M_sun 的黑洞的恒星物质回退过程很好地描述。我们报告了在 0.3-10 keV 能量范围内的 Swift-XRT 检测,总流量为 F_X ~ 1.42×10^-14 erg s^-1 cm^-2,用 kT ~ 0.39 keV 的黑体拟合。这使 TDE2025aarm 成为在软 X-ray 中检测到的光学/UV 明亮 TDE 中的一个新的事件。我们的分析表明,TDE2025aarm 的早期辐射是由圆化激波提供能量的,延迟吸积情景最能描述所观测到的特征。

7. Plasmonics of non-noble metals➡️ 非贵金属的等离子体学(Plasmonics)

👤 Michal Horák, Michael Foltýn, Viktor Bajo

📄 Localized surface plasmon resonances are self-sustained, collective oscillations of free electrons in metallic nanostructures. They have a wide range of applications. The most common plasmonic metals are noble metals, such as gold and silver. However, there are applications, such as surface-enhanced Raman spectroscopy, in which using non-noble metals is advantageous. This review summarizes the investigation of localized surface plasmons in non-noble metal nanoparticles, providing an overview of the plasmonic properties of non-noble metals. We cover the following metals: aluminium (Al), antimony (Sb), bismuth (Bi), chromium (Cr), copper (Cu), gallium (Ga), indium (In), lead (Pb), magnesium (Mg), molybdenum (Mo), nickel (Ni), potassium (K), selenium (Se), sodium (Na), tellurium (Te), tin (Sn), titanium (Ti), tungsten (W), and zinc (Zn). Our summary therefore compares the plasmonic properties of non-noble metals and briefly introduces their potential to the readers.

📄 Localized surface plasmon resonances 是金属纳米结构中自由电子的自维持、集体振荡。它们具有广泛的应用。最常见的等离子金属是贵金属,如 **gold** 和 **silver**。然而,在某些应用中,例如 **surface‑enhanced Raman spectroscopy**(表面增强拉曼光谱),使用非贵金属是有优势的。 本综述总结了非贵金属纳米颗粒中的 **localized surface plasmons** 的研究,概述了非贵金属的等离子体(plasmonic)特性。我们覆盖了以下金属: **Aluminium (Al)、Antimony (Sb)、Bismuth (Bi)、Chromium (Cr)、Copper (Cu)、Gallium (Ga)、Indium (In)、Lead (Pb)、Magnesium (Mg)、Molybdenum (Mo)、Nickel (Ni)、Potassium (K)、Selenium (Se)、Sodium (Na)、Tellurium (Te)、Tin (Sn)、Titanium (Ti)、Tungsten (W)** 和 **Zinc (Zn)**。 因此,我们的总结比较了非贵金属的等离子体特性,并向读者简要介绍了它们的潜在应用。

8. Koopman and transfer operator techniques from the perspective of quantum theory➡️ 从量子理论视角看Koopman和转移算子技术。

👤 Dimitrios Giannakis, Michael Montgomery

📄 The study of mathematical connections between operator-theoretic formulations of classical dynamics and quantum mechanics began at least as early as the 1930s in work of Koopman and von Neumann and was developed in later decades by many authors, often independently, into a framework now broadly known as Koopman-von Neumann representation of classical dynamics. This article surveys aspects of this framework for measure-preserving ergodic dynamical systems and connects it with recent approximation techniques for Koopman and transfer operators that are amenable to data-driven numerical implementation. In broad terms, these methods are based on representations of (i) classical observables as elements of an algebra of operators acting on a Hilbert space; and (ii) classical probability measures as elements of the state space of that algebra, with lifted versions of the Koopman and transfer operators inducing dynamical evolution of observables and states, respectively. A common theme underlying the techniques surveyed here is the use of reproducing kernel Hilbert spaces with coalgebra structure (so-called "reproducing kernel Hilbert algebras'') that aids the quantum representation of classical objects, as well as the use of Fock spaces to build approximation schemes with high expressivity and structure preservation properties (notably, preservation of positivity and multiplicativity of composition operators). Applications to quantum algorithms for approximating the Koopman evolution of observables in systems with pure point spectra are also discussed.

📄 中文翻译: 对经典动力学和量子力学的算子理论表述之间的数学联系的研究,至少早在20世纪30年代Koopman和von Neumann的工作中就已经开始,并在随后数十年中由众多作者(往往是独立地)发展成为一种如今被广泛称为Koopman‑von Neumann经典动力学表示的框架。本文综述了该框架在保测遍历动力学系统方面的若干内容,并将其与最近适用于数据驱动数值实现的Koopman算子和转移算子的近似技术联系起来。概括而言,这些方法基于以下表示:(i) 将经典可观测量表示为作用在Hilbert空间上的算子代数中的元素;(ii) 将经典概率测度表示为该代数状态空间中的元素,并且Koopman算子和转移算子的提升版本分别诱导可观测量和状态的动力学演化。这些综述技术的一个共同主题是使用具有余代数结构的再生核Hilbert空间(所谓的“再生核Hilbert代数”),它有助于经典对象的量子表示;同时还使用Fock空间来构建具有高表达能力和结构保持特性(特别是保持复合算子的正性和乘法性)的近似格式。此外,还讨论了将此类技术应用于近似具有纯点谱系统的可观测量Koopman演化的量子算法。

9. From School AI Readiness to Student AI Literacy: A National Multilevel Mediation Analysis of Institutional Capacity and Teacher Capability➡️ 从学校 **AI** 预备状态到学生 **AI** 素养:机构能力与教师能力的**国家多层次中介效应分析**。

👤 Xiu Guan, Mingmin Zheng, Dragan Gašević

📄 Artificial intelligence (AI) is increasingly embedded in vocational education systems, yet empirical evidence linking institutional AI readiness to student learning outcomes remains limited. This study develops and tests a 2-2-1 cross-level mediation framework examining how school-level AI readiness is associated with student AI literacy through aggregated teacher mechanisms. Using linked survey data from 1,007 vocational institutions, 156,125 teachers, and 2,379,546 students nationwide, multilevel models were estimated to assess direct, indirect, and contextual effects. Results indicate that overall school AI readiness is positively associated with student AI literacy after adjusting for institutional and regional characteristics. When examined independently, all readiness dimensions show positive associations, while simultaneous modelling suggests that readiness operates as an integrated organisational configuration. Cross-level mediation analyses reveal that aggregated teacher-perceived AI capability partially mediates the relationship between institutional readiness and student literacy, whereas general attitudinal acceptance measures do not demonstrate stable transmission effects. Robustness analyses further show that this readiness-capability-literacy pathway remains structurally stable across heterogeneous regional AI development contexts and under alternative modelling specifications. These findings reposition institutional AI readiness as a multilevel organisational condition linked to student AI literacy, identify collective teacher capability as its central transmission mechanism, and underscore the need to align infrastructural investment with sustained professional capacity development.

📄 人工智能(AI)正日益融入职业教育体系,但将院校层面AI准备度与学生学习成果联系起来的实证证据仍然有限。本研究开发和检验了一个2-2-1跨层中介框架,考察院校层面AI准备度如何通过聚合的教师机制与学生AI素养产生关联。利用来自全国1007所职业院校、156125名教师和2379546名学生的关联调查数据,估计了多层模型以评估直接效应、间接效应和情境效应。结果表明,在控制了院校和区域特征后,整体院校AI准备度与学生AI素养呈正相关。当独立检验时,所有准备度维度均显示出正相关关系,而同时建模则表明准备度作为一种整合的组织配置发挥作用。跨层中介分析显示,聚合的教师感知AI能力部分中介了院校准备度与学生素养之间的关系,而一般态度接受度指标则未表现出稳定的传导效应。稳健性分析进一步表明,这一准备度-能力-素养路径在异质的区域AI发展情境下以及在替代模型设定下仍保持结构稳定性。这些发现将院校AI准备度重新定位为与学生AI素养相关的多层组织条件,确定集体教师能力为其核心传导机制,并强调需要将基础设施投资与持续的专业能力发展相结合。

10. SPT-3G D1: Maps of the millimeter-wave sky from 2019 and 2020 observations of the SPT-3G Main field➡️ SPT-3G D1:2019年和2020年SPT-3G Main field观测的毫米波天空图

👤 W. Quan, E. Camphuis, C. Daley

📄 Maps of the sky in millimeter wavelengths contain rich information on cosmology through anisotropies of the cosmic microwave background (CMB). Creating multifrequency sky maps of anisotropies in the $I$, $Q$, and $U$ Stokes parameters is one of the first steps of CMB cosmology analyses. In this work, we describe the production and validation of a set of sky maps from the South Pole Telescope's third-generation camera, SPT-3G. The maps are from data taken in frequency bands centered at 95, 150, and 220 GHz and taken during the first two years, 2019 and 2020, of the SPT-3G Main survey, which covers $4\%$ of the sky. We applied high-pass filters to time series of individual detectors and binned the filtered time series samples into map pixels. After that, we calibrated and cleaned the maps to reduce known systematic errors. In addition, we searched for other systematic errors through null tests and mitigated a significant systematic error detected therein. The white noise levels of the full-depth maps of the $I$ Stokes parameter are $5.4$, $4.4$, and $16.2$ $\mathrm{μK}$-$\mathrm{arcmin}$ in the 95, 150, and 220 GHz bands, respectively, and $8.4$, $6.6$, and $25.8$ $\mathrm{μK}$-$\mathrm{arcmin}$ for $Q/U$. These maps are the deepest to date used for measurements of mid-to-high-$\ell$ primary temperature and $E$-mode polarization CMB anisotropies, and reconstructions of the CMB gravitational lensing potential. We make these maps and supporting data products publicly accessible.

📄 **中文翻译(保留英文专有名词)** 在毫米波波段的夜空图包含丰富的宇宙学信息,这些信息通过宇宙微波背景(CMB)的各向异性得以体现。制作多频段的天空图,描绘 **I、Q、U** Stokes 参数的各向异性,是 CMB 宇宙学分析的首要步骤之一。本工作描述了来自 **South Pole Telescope**(南极望远镜)第三代相机 **SPT‑3G** 的天空图的制作与验证。这些图使用了 **SPT‑3G Main survey** 前两年(2019 年和 2020 年)在中心频率 **95 GHz、150 GHz、220 GHz** 三个频段获取的数据,覆盖约 **4 %** 的天球。我们对单探测器的时序数据施加 **high‑pass filters**(高通滤波器),随后将滤波后的时序样本 **bin**(分块)到地图像素中。之后对地图进行 **calibration**(定标)和 **cleaning**(清洗),以降低已知的系统误差。此外,我们通过 **null tests**(零值检验)寻找其他系统误差,并对其中检测到的显著系统误差进行 **mitigation**(消除)。全深度 **I** Stokes 参数图的白色噪声水平在 95、150、220 GHz 波段分别为 **5.4、4.4、16.2 µK‑arcmin**,而 **Q/U** 图的噪声水平为 **8.4、6.6、25.8 µK‑arcmin**。这些图是迄今为止用于测量 **mid‑to‑high ℓ** 初级温度与 **E‑mode** 极化 CMB 各向异性以及 CMB 引力透镜势(CMB gravitational lensing potential)重构的最深图谱。我们将上述地图及其配套数据产品向公众开放(publicly accessible)。

🎮 强化学习

1. Improved constraint on the Hubble constant from dark sirens with LIGO/Virgo/KAGRA O4a➡️ 利用 LIGO/Virgo/KAGRA O4a 的暗警改进对哈勃常数的约束。

👤 V. Alfradique, C. R. Bom, G. Teixeira

📄 A new measurement of the Hubble constant $H_0$ is presented using the statistical dark siren method applied to a sample of seven well-localized gravitational-wave (GW) events from the fourth LIGO-Virgo-KAGRA (LVK) observing run and ten additional events from the first three runs. Galaxy catalogs from the DESI Legacy Imaging Survey (LS) are combined with a deep learning model to compute photometric redshift probability density functions. We extend our previous analysis by including the events GW230731_215307 and GW230927_153832, using sky maps from the fourth Gravitational-Wave Transient Catalog (GWTC-4), and introducing key methodological improvements: $r$-band luminosity weighting of host galaxies; an extended GW likelihood that incorporates information from the binary black hole component masses; and a consistent treatment of selection effects that accounts for the incompleteness of the magnitude-limited LS galaxy catalog. Using a total of 17 well-localized dark sirens (seven from the first part of the fourth observing run, O4a), we obtain $H_0 = 78.8^{+14.6}_{-12.2}$ km/s/Mpc without luminosity weighting and $H_0 = 78.2^{+12.0}_{-11.0}$ km/s/Mpc when applying $r$-band luminosity weighting. Finally, we combine the luminosity-weighted dark siren sample with the bright siren GW170817, including constraints on the jet viewing angle and corrections for the host galaxy peculiar velocity, to obtain a final constraint of $H_0 = 69.9^{+4.1}_{-4.0}$ km/s/Mpc, representing an improvement of approximately 11% in the uncertainty relative to the GW170817-only result.

📄 一项利用统计暗警笛(dark siren)方法对哈勃常数 \(H_0\) 进行的新测量,使用了来自第四次 LIGO‑Virgo‑KAGRA(LVK)观测运行(O4)中精确定位的七个引力波(GW)事件以及前三轮运行的十个额外事件的样本。将 DESI Legacy Imaging Survey(LS)星系目录与深度学习模型相结合,以计算光度红移概率密度函数。我们在之前的分析基础上加入了事件 GW230731_215307 和 GW230927_153832,使用第四次引力波瞬态目录(GWTC‑4)的天图,并引入若干关键方法改进:对宿主星系进行 \(r\) 波段光度加权;扩展的 GW 似然函数,纳入双黑洞质量成分信息;以及对选择效应的统一处理,以考虑幅度受限 LS 星系目录的不完整性。使用共计 17 个精确定位的暗警笛(其中七个来自 O4 前半段,O4a),在不使用光度加权时得到 \(H_0 = 78.8^{+14.6}_{-12.2}\,\text{km/s/Mpc}\),在使用 \(r\) 波段光度加权后得到 \(H_0 = 78.2^{+12.0}_{-11.0}\,\text{km/s/Mpc}\)。最后,我们将光度加权的暗警笛样本与明亮的警笛 GW170817 组合,加入喷流视角角的约束并对宿主星系特殊速度进行修正,得到最终约束 \(H_0 = 69.9^{+4.1}_{-4.0}\,\text{km/s/Mpc}\),相对于仅使用 GW170817 的结果,误差约改善了 11%。

2. Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models➡️ **中文翻译:** 基于语义Token聚类的大型语言模型高效不确定性量化 (保留英文专有名词:Token、Large Language Models)

👤 Qi Cao, Andrew Gambardella, Takeshi Kojima

📄 Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence further limits reliability. Uncertainty quantification offers a promising way to identify potentially unreliable outputs, but most existing methods rely on repeated sampling or auxiliary models, introducing substantial computational overhead. To address these limitations, we propose Semantic Token Clustering (STC), an efficient uncertainty quantification method that leverages the semantic information inherently encoded in LLMs. Specifically, we group tokens into semantically consistent clusters using embedding clustering and prefix matching, and quantify uncertainty based on the probability mass aggregated over the corresponding semantic cluster. Our approach requires only a single generation and does not depend on auxiliary models. Experimental results show that STC achieves performance comparable to state-of-the-art baselines while substantially reducing computational overhead.

📄 大型语言模型 (Large language models, LLMs) 已在多种任务中展现出卓越的能力。然而,其输出的真实性无法得到保证,且其倾向于过度自信进一步限制了可靠性。不确定性量化提供了一种有前景的方式来识别潜在不可靠的输出,但大多数现有方法依赖重复采样或辅助模型,导致显著的计算开销。为解决这些局限,我们提出了 **Semantic Token Clustering (STC)**,一种高效的不确定性量化方法,利用 LLMs 内部编码的语义信息。具体而言,我们使用 **embedding clustering** 与 **prefix matching** 将 token 划分为语义一致的簇,并根据相应语义簇上聚合的概率质量来量化不确定性。该方法仅需一次生成过程,且不依赖辅助模型。实验结果表明,STC 在性能上与 state‑of‑the‑art 基线相当,同时大幅降低了计算开销。

3. Quantum inference on a classically trained quantum extreme learning machine➡️ 在经典训练的 Quantum Extreme Learning Machine 上的量子推理。

👤 Emanuele Brusaschi, Marco Clementi, Marco Liscidin

📄 Quantum extreme learning machines (QELMs) are unconventional computing architectures that bear remarkable promise in both classical and quantum machine-learning tasks, such as the estimate of quantum state properties. However, the probabilistic nature of quantum measurements demands extensive repetitions for training to precisely estimate expectation values, imposing stringent trade-offs among experimental resources, acquisition time, and signal-to-noise ratio, particularly for large datasets. Here we introduce a paradigm shift by harnessing the correspondence between stimulated and spontaneous emission. The QELM is trained exclusively with intense classical fields, yet it performs inference directly on previously unseen quantum input states to predict their quantum properties. This strategy dramatically reduces acquisition times while substantially enhancing the signal-to-noise ratio. Using frequency-bin-encoded biphoton states, implemented here for the first time in a quantum machine-learning architecture, we demonstrate entanglement witnessing of two-qubit states with (93 +- 4)% accuracy, multi-dimensional entanglement detection, and learning of the Hamiltonian governing photon-pair generation with a fidelity of (96 +- 4)%. By establishing classical training as a scalable route to quantum feature extraction, our results bridge macroscopic observables and nonclassical correlations, opening a new pathway toward faster and more robust quantum neural networks

📄 量子极端学习机(Quantum extreme learning machines, QELMs)是一类非常规的计算架构,在经典和量子机器学习任务中都展现出显著的潜力,例如对量子态属性的估计。然而,量子测量的概率本质要求在训练过程中进行大量重复,以精确估计期望值,这在与实验资源、采集时间和信噪比(signal‑to‑noise ratio)的权衡之间产生了严格要求,特别是在大规模数据集上。这里我们通过利用受激辐射(stimulated emission)与自发辐射(spontaneous emission)之间的对应关系,实现了一次范式转变。QELM 仅使用强经典场进行训练,却能够直接对先前未见过的量子输入态进行推断,以预测其量子属性。该策略显著降低了采集时间,同时大幅提升了信噪比。利用 frequency‑bin‑encoded biphoton states(频率仓编码的双光子态)——首次在量子机器学习架构中实现——我们展示了对 two‑qubit states(双量子比特态)的 entanglement witnessing(纠缠见证),准确率为 (93 ± 4)%,并实现了 multi‑dimensional entanglement detection(多维纠缠检测)以及学习光子对产生所遵循的 Hamiltonian(哈密顿量),保真度达 (96 ± 4)%。通过将 classical training(经典训练)确立为通往 quantum feature extraction(量子特征提取)的可扩展路径,我们的结果 bridge macroscopic observables(宏观可观测量)和 nonclassical correlations(非经典关联),为更快、更稳健的 quantum neural networks(量子神经网络)开辟了新的途径。

4. From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering➡️ **从掩码到像素与意义:VLM图像篡改的全新分类、基准和评估指标**

👤 Xinyi Shang, Yi Tang, Jiacheng Cui

📄 Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tampering from coarse region labels to a pixel-grounded, meaning and language-aware task. First, we introduce a taxonomy spanning edit primitives (replace/remove/splice/inpaint/attribute/colorization, etc.) and their semantic class of tampered object, linking low-level changes to high-level understanding. Second, we release a new benchmark with per-pixel tamper maps and paired category supervision to evaluate detection and classification within a unified protocol. Third, we propose a training framework and evaluation metrics that quantify pixel-level correctness with localization to assess confidence or prediction on true edit intensity, and further measure tamper meaning understanding via semantics-aware classification and natural language descriptions for the predicted regions. We also re-evaluate the existing strong segmentation/localization baselines on recent strong tamper detectors and reveal substantial over- and under-scoring using mask-only metrics, and expose failure modes on micro-edits and off-mask changes. Our framework advances the field from masks to pixels, meanings and language descriptions, establishing a rigorous standard for tamper localization, semantic classification and description. Code and benchmark data are available at https://github.com/VILA-Lab/PIXAR.

📄 中文翻译: 现有的篡改检测基准测试在很大程度上依赖于目标掩码,这与真实编辑信号严重错位:掩码内的许多像素未被触碰或仅被微不足道地修改,而掩码外细微却至关重要的编辑却被视为自然内容。我们将VLM图像篡改从粗粒度区域标签重新定义为一个像素级、语义和语言感知任务。首先,我们引入了一个分类体系,涵盖编辑原语(替换/删除/拼接/修复/属性修改/着色等)及其被篡改目标的语义类别,将低级变化与高级理解联系起来。其次,我们发布了一个新的基准测试,包含像素级篡改图和配对类别监督,以在统一协议下评估检测和分类任务。第三,我们提出了一个训练框架和评估指标,用于量化像素级正确性并结合定位来评估对真实编辑强度的置信度或预测,并进一步通过语义感知分类和自然语言描述来衡量篡改语义理解能力(针对预测区域)。我们还在近期强大的篡改检测器上重新评估了现有强大的分割/定位基线模型,揭示了仅使用掩码指标的严重高估和低估问题,并暴露了微编辑和掩码外变化的失败模式。我们的框架将领域从掩码推进到像素、语义和语言描述,为篡改定位、语义分类和描述建立了严格的标准。代码和基准测试数据可在 https://github.com/VILA-Lab/PIXAR 获取。

5. Kolmogorov-Arnold causal generative models➡️ Kolmogorov-Arnold 因果生成模型

👤 Alejandro Almodóvar, Mar Elizo, Patricia A. Apellá

📄 Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque mechanisms, limiting auditability in high-stakes domains. We propose KaCGM, a causal generative model for mixed-type tabular data where each structural equation is parameterized by a Kolmogorov--Arnold Network (KAN). This decomposition enables direct inspection of learned causal mechanisms, including symbolic approximations and visualization of parent--child relationships, while preserving query-agnostic generative semantics. We introduce a validation pipeline based on distributional matching and independence diagnostics of inferred exogenous variables, allowing assessment using observational data alone. Experiments on synthetic and semi-synthetic benchmarks show competitive performance against state-of-the-art methods. A real-world cardiovascular case study further demonstrates the extraction of simplified structural equations and interpretable causal effects. These results suggest that expressive causal generative modeling and functional transparency can be achieved jointly, supporting trustworthy deployment in tabular decision-making settings. Code: https://github.com/aalmodovares/kacgm

📄 因果生成模型为从观测数据中回答观测、干预和反事实查询提供了原则性框架。然而,许多深度因果模型依赖高度表达的架构,机制不透明,限制了高风险领域的可审计性。我们提出 **KaCGM**,一个用于混合类型表格数据的因果生成模型,其中每个结构方程由 **Kolmogorov–Arnold Network (KAN)** 参数化。这种分解使得可以直接检查学习到的因果机制,包括符号近似和父子关系的可视化,同时保持查询无关的生成语义。我们引入了一个基于外生变量推断的分布匹配和独立性诊断的验证管道,仅使用观测数据即可进行评估。在合成和半合成基准上的实验表明,与最先进方法相比具有竞争力的性能。一个真实世界的心血管案例研究进一步展示了简化结构方程的提取和可解释的因果效应。这些结果表明,表达性的因果生成建模和功能透明度可以共同实现,支持在表格决策场景中的可信部署。 代码: https://github.com/aalmodovares/kacgm

6. Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation➡️ **测量忠实度取决于你如何测量:分类器在 T8110 Chain‑of‑Thought 评估中的敏感性**

👤 Richard J. Young

📄 Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying that faithfulness is an objective, measurable property of a model. This paper demonstrates that it is not. Three classifiers (a regex-only detector, a two-stage regex-plus-LLM pipeline, and an independent Claude Sonnet 4 judge) are applied to 10,276 influenced reasoning traces from 12 open-weight models spanning 9 families and 7B to 1T parameters. On identical data, these classifiers produce overall faithfulness rates of 74.4%, 82.6%, and 69.7%, respectively, with non-overlapping 95% confidence intervals. Per-model gaps range from 2.6 to 30.6 percentage points; all are statistically significant (McNemar's test, p < 0.001). The disagreements are systematic, not random: inter-classifier agreement measured by Cohen's kappa ranges from 0.06 ("slight") for sycophancy hints to 0.42 ("moderate") for grader hints, and the asymmetry is pronounced: for sycophancy, 883 cases are classified as faithful by the pipeline but unfaithful by the Sonnet judge, while only 2 go the other direction. Classifier choice can also reverse model rankings: Qwen3.5-27B ranks 1st under the pipeline but 7th under the Sonnet judge; OLMo-3.1-32B moves in the opposite direction, from 9th to 3rd. The root cause is that different classifiers operationalize related faithfulness constructs at different levels of stringency (lexical mention versus epistemic dependence), and these constructs yield divergent measurements on the same behavior. These results demonstrate that published faithfulness numbers cannot be meaningfully compared across studies that use different classifiers, and that future evaluations should report sensitivity ranges across multiple classification methodologies rather than single point estimates.

📄 近期关于链-of-thought(chain-of-thought)忠实度(faithfulness)的研究报告了单一的汇总数字(例如 DeepSeek-R1 有 39% 的情况承认提示),这意味着忠实度是模型的一种客观、可测量的属性。本文表明情况并非如此。三个分类器(一个仅使用正则表达式的检测器、一个两阶段的 regex-plus-LLM 流程,以及一个独立的 Claude Sonnet 4 评判器)被应用于来自 12 个开源模型的 10,276 条受影响的推理轨迹,这些模型涵盖 9 个系列,参数量从 7B 到 1T 不等。在相同的数据上,这些分类器产生的总体忠实度率分别为 74.4%、82.6% 和 69.7%,且 95% 置信区间互不重叠。各模型间的差异从 2.6 到 30.6 个百分点不等;所有差异均具有统计显著性(McNemar's test,p < 0.001)。这些分歧是系统性的,而非随机的:用 Cohen's kappa 测量的分类器间一致性,对于谄媚性提示(sycophancy hints)为 0.06("轻微"),对于评分者提示(grader hints)为 0.42("中等"),且不对称性十分显著:对于谄媚性提示,有 883 个案例被流程分类为忠实但被 Sonnet 评判器分类为不忠实,而相反方向的案例仅有 2 个。分类器的选择也会逆转模型排名:Qwen3.5-27B 在流程下排名第一,但在 Sonnet 评判器下排名第七;OLMo-3.1-32B 的排名则向相反方向移动,从第九位升至第三位。根本原因在于不同的分类器在不同严格程度上将相关的忠实度概念操作化(从词汇提及到认知依赖),这些概念对相同行为产生了不同的测量结果。这些结果表明,发表忠实度数字无法在不同分类器的研究之间进行有意义的比较,未来的评估应报告多种分类方法论的敏感性范围,而不是单一的点估计值。

7. AI Agents Can Already Autonomously Perform Experimental High Energy Physics➡️ T3179__Agents 已经能够自主进行实验性高能物理研究

👤 Eric A. Moreno, Samuel Bright-Thonney, Andrzej Nov

📄 Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a HEP dataset, an execution framework, and a corpus of prior experimental literature, we find that Claude Code succeeds in automating all stages of a typical analysis: event selection, background estimation, uncertainty quantification, statistical inference, and paper drafting. We argue that the experimental HEP community is underestimating the current capabilities of these systems, and that most proposed agentic workflows are too narrowly scoped or scaffolded to specific analysis structures. We present a proof-of-concept framework, Just Furnish Context (JFC), that integrates autonomous analysis agents with literature-based knowledge retrieval and multi-agent review, and show that this is sufficient to plan, execute, and document a credible high energy physics analysis. We demonstrate this by conducting analyses on open data from ALEPH, DELPHI, and CMS to perform electroweak, QCD, and Higgs boson measurements. Rather than replacing physicists, these tools promise to offload the repetitive technical burden of analysis code development, freeing researchers to focus on physics insight, truly novel method development, and rigorous validation. Given these developments, we advocate for new strategies for how the community trains students, organizes analysis efforts, and allocates human expertise.

📄 基于大语言模型的 AI 代理现在能够以最少的专家策划输入,自主执行高能物理(HEP)分析管道的大部分内容。在获得访问HEP数据集、执行框架和先验实验文献的权限后,我们发现 Claude Code 成功实现了典型分析所有阶段的自动化:事件选择、本底估计、不确定性量化、统计推断和论文撰写。我们认为,实验HEP社区低估了这些系统的当前能力,而且大多数提出的代理工作流程范围过于狭窄,或仅针对特定分析结构进行搭建。我们提出了一个概念验证框架——Just Furnish Context (JFC),它将自主分析代理与基于文献的知识检索和多代理审查相集成,并证明这足以规划、执行和记录一个可信的高能物理分析。我们通过对来自ALEPH、DELPHI和CMS的开放数据进行分析来验证这一点,执行了电弱相互作用、QCD和希格斯玻色子的测量。这些工具并非要取代物理学家,而是有望卸下分析代码开发中重复的技术负担,使研究人员能够专注于物理洞察、真正创新的方法开发以及严格的验证。鉴于这些发展,我们倡导社区采取新的策略来培养学生、组织分析工作和分配人类专业知识。

8. MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms➡️ **中文翻译** **MeanFlow 与控制相遇:规模化采样数据控制在群体中的应用**

👤 Anqi Dong, Yongxin Chen, Karl H. Johansson

📄 Steering large-scale swarms in only a few control updates is challenging because real systems operate in sampled-data form: control inputs are updated intermittently and applied over finite intervals. In this regime, the natural object is not an instantaneous velocity field, but a finite-window control quantity that captures the system response over each sampling interval. Inspired by MeanFlow, we introduce a control-space learning framework for swarm steering under linear time-invariant dynamics. The learned object is the coefficient that parameterizes the finite-horizon minimum-energy control over each interval. We show that this coefficient admits both an integral representation and a local differential identity along bridge trajectories, which leads to a simple stop-gradient training objective. At implementation time, the learned coefficient is used directly in sampled-data updates, so the prescribed dynamics and actuation map are respected by construction. The resulting framework provides a scalable approach to few-step swarm steering that is consistent with the sampled-data structure of real control systems.

📄 **中文翻译:** 在仅有少数几次控制更新的情况下引导大规模蜂群是极具挑战性的,因为真实系统以采样数据(sampled‑data)方式运行:控制输入被间歇性地更新,并在有限的时间间隔内生效。在这种制度下,自然的描述对象不再是瞬时速度场,而是能够捕获每个采样间隔内系统响应的有限窗口控制量(finite‑window control quantity)。受 MeanFlow 的启发,我们为线性时不变(linear time‑invariant)动力学下的蜂群引导提出了一个控制空间学习框架(control‑space learning framework)。所学习的对象是对每个间隔内有限时域最小能量控制(finite‑horizon minimum‑energy control)进行参数化的系数(coefficient)。我们证明该系数既可以表示为积分形式(integral representation),也可以在桥接轨迹(bridge trajectories)上满足局部微分恒等式(local differential identity),从而得到一个简洁的 stop‑gradient 训练目标(stop‑gradient training objective)。在实际部署时,学习得到的系数直接用于采样数据更新(sampled‑data updates),因此所规定的动力学和执行映射在结构上得到保证。该框架为 few‑step 蜂群引导(few‑step swarm steering)提供了一种可扩展的方法,并且与真实控制系统(real control systems)的采样数据结构(sampled‑data structure)相一致。

9. Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning➡️ 通过 Multi‑Modal Contrastive Learning 提升网络安全任务的泛化能力

👤 Jianan Huang, Rodolfo V. Valentim, Luca Vassio

📄 The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performance in production. The root cause often lies in ML algorithms learning superficial patterns (shortcuts) rather than underlying cybersecurity concepts. We investigate contrastive multi-modal learning as a first step towards improving ML performance in cybersecurity tasks. We aim at transferring knowledge from data-rich modalities, such as text, to data-scarce modalities, such as payloads. We set up a case study on threat classification and propose a two-stage multi-modal contrastive learning framework that uses textual vulnerability descriptions to guide payload classification. First, we construct a semantically meaningful embedding space using contrastive learning on descriptions. Then, we align payloads to this space, transferring knowledge from text to payloads. We evaluate the approach on a large-scale private dataset and a synthetic benchmark built from public CVE descriptions and LLM-generated payloads. The methodology appears to reduce shortcut learning over baselines on both benchmarks. We release our synthetic benchmark and source code as open source.

📄 ML 在网络安全中的使用长期以来受到泛化问题的困扰:在受控场景中表现良好的模型在生产环境中难以保持性能。根本原因通常在于 ML 算法学习表面模式(shortcuts)而不是底层网络安全概念。我们研究对比多模态学习作为提高网络安全任务中 ML 性能的第一步。我们的目标是将知识从数据丰富的模态(如文本)迁移到数据稀缺的模态(如 payloads)。我们建立了一个威胁分类的案例研究,并提出了一个两阶段多模态对比学习框架,使用文本漏洞描述来指导 payload 分类。首先,我们使用描述上的对比学习构建了一个语义上有意义的嵌入空间。然后,我们将 payloads 对齐到这个空间,实现从文本到 payloads 的知识迁移。我们在一个大规模私有数据集和一个由公开的 CVE 描述和 LLM 生成的 payloads 合成的基准上评估了该方法。该方法似乎在两个基准上都减少了相对于基线的 shortcut 学习。我们将合成的基准和源代码作为开源发布。

10. MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering➡️ **MuSteerNet:通过观察‑反应相互引导(Observation‑Reaction Mutual Steering)实现视频中人类反应的生成**

👤 Yuan Zhou, Yongzhi Li, Yanqi Dai

📄 Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However, existing methods often fail to effectively leverage video inputs to steer human reaction synthesis, resulting in reaction motions that are mismatched with the content of video sequences. We reveal that this limitation arises from a severe relational distortion between visual observations and reaction types. In light of this, we propose MuSteerNet, a simple yet effective framework that generates 3D human reactions from videos via observation-reaction mutual steering. Specifically, we first propose a Prototype Feedback Steering mechanism to mitigate relational distortion by refining visual observations with a gated delta-rectification modulator and a relational margin constraint, guided by prototypical vectors learned from human reactions. We then introduce Dual-Coupled Reaction Refinement that fully leverages rectified visual cues to further steer the refinement of generated reaction motions, thereby effectively improving reaction quality and enabling MuSteerNet to achieve competitive performance. Extensive experiments and ablation studies validate the effectiveness of our method. Code coming soon: https://github.com/zhouyuan888888/MuSteerNet.

📄 视频驱动的人类反应生成旨在合成能够直接对观察到的视频序列做出响应的3D人体动作,这对于构建类人交互系统至关重要。然而,现有方法往往无法有效利用视频输入来引导人类反应合成,导致生成的反应动作与视频序列的内容不匹配。我们揭示了这一局限性源于视觉观察与反应类型之间严重的关联失真。为此,我们提出了 MuSteerNet,一个简单而有效的框架,通过“观察-反应互引导”从视频中生成3D人类反应。具体而言,我们首先提出 Prototype Feedback Steering 机制,通过门控 delta-校正调制器和关系边际约束来优化视觉观察,从而缓解关联失真问题,该机制由从人类反应中学习到的原型向量引导。随后,我们引入 Dual-Coupled Reaction Refinement 方法,充分利用校正后的视觉线索进一步引导生成反应动作的细化,从而有效提升反应质量,并使 MuSteerNet 达到具有竞争力的性能表现。大量实验和消融研究验证了我们方法的有效性。代码即将发布:https://github.com/zhouyuan888888/MuSteerNet。