AI论文速递 2026-03-10

🧠 大语言模型

1. BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations➡️ BEVLM：将 LLMs 中的语义知识提炼为鸟瞰图表示

👤 Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen

📄 The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation and limited spatial consistency. This separation in visual processing hinders accurate 3D spatial reasoni...

📄 大型语言模型（LLMs）与自动驾驶的集成因其强大的推理和语义理解能力而引起了越来越多的兴趣，这对于处理复杂的决策和长尾场景至关重要。然而，现有方法通常独立地向 LLMs 提供来自多视图和多帧图像的标记，导致冗余计算和有限的空间一致性。视觉处理中的这种分离阻碍了准确的 3D 空间推理...

📚 AI论文速递 2026-03-10

🧠 大语言模型

1. BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations➡️ BEVLM：将 LLMs 中的语义知识提炼为鸟瞰图表示

2. A class of d-dimensional directed polymers in a Gaussian environment➡️ 高斯环境中的一类 d 维定向聚合物

3. EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking➡️ EgoReasoner：通过任务自适应结构化思维学习以自我为中心的 4D 推理

4. SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation➡️ 范围：场景上下文增量少镜头 3D 分割

5. Third-order mixed electroweak-QCD corrections to the W-boson mass prediction from the muon lifetime➡️ 根据 μ 子寿命对 W 玻色子质量预测进行三阶混合电弱 QCD 校正

6. SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning➡️ SUREON：手术推理的基准和视觉语言模型

7. Multimodal Large Language Models as Image Classifiers➡️ 作为图像分类器的多模态大型语言模型

8. Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion➡️ Omni-Diffusion：使用屏蔽离散 Diffusion 进行统一多模态理解和生成

9. Fly360: Omnidirectional Obstacle Avoidance within Drone View➡️ Fly360：无人机视野内全方位避障

10. A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention➡️ 基于注意力的可扩展 MLIP 的秘诀：通过全节点注意力解锁远程准确性

🖼️ 计算机视觉

1. Causal Interpretation of Neural Network Computations with Contribution Decomposition➡️ 贡献分解神经网络计算的因果解释

2. A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention➡️ 基于注意力的可扩展 MLIP 的秘诀：通过全节点注意力解锁远程准确性

3. EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking➡️ EgoReasoner：通过任务自适应结构化思维学习以自我为中心的 4D 推理

4. Fly360: Omnidirectional Obstacle Avoidance within Drone View➡️ Fly360：无人机视野内全方位避障

5. The Pen: Episodic Cognitive Assistance via an Ear-Worn Interface➡️ The Pen：通过耳戴式界面提供情景认知辅助

6. Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders➡️ Penguin-VL：使用基于 LLM 的视觉编码器探索 VLM 的效率极限

7. Sampling-based Continuous Optimization for Messenger RNA Design➡️ 基于采样的信使 RNA 设计连续优化

8. An ode to instantons➡️ 瞬子颂歌

9. BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations➡️ BEVLM：将 LLMs 中的语义知识提炼为鸟瞰图表示

10. Boosting deep Reinforcement Learning using pretraining with Logical Options➡️ 使用带有逻辑选项的预训练来促进深度强化学习

🎨 多模态学习

1. Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders➡️ Penguin-VL：使用基于 LLM 的视觉编码器探索 VLM 的效率极限

2. Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion➡️ Omni-Diffusion：使用屏蔽离散 Diffusion 进行统一多模态理解和生成

3. Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing➡️ 超越行推理：多模式电子表格理解和编辑的代理检索

4. Multimodal Large Language Models as Image Classifiers➡️ 作为图像分类器的多模态大型语言模型

5. Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling➡️ 用于在线多约束多模态推理调度的适配器增强老虎机

6. Pinterest Canvas: Large-Scale Image Generation at Pinterest➡️ Pinterest Canvas：Pinterest 的大规模图像生成

7. When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models➡️ 当一种模态统治所有模态时：多模态 Diffusion 模型中的后门模态崩溃

8. The EpisTwin: A Knowledge Graph-Grounded Neuro-Symbolic Architecture for Personal AI➡️ EpisTwin：基于知识图谱的个人神经符号架构 AI

9. History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation➡️ 用于高效视觉语言导航的历史条件时空视觉标记修剪

10. Underactuated multimodal jumping robot for extraterrestrial exploration➡️ 用于外星探索的欠驱动多模式跳跃机器人

📊 新数据集

1. Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion➡️ Omni-Diffusion：使用屏蔽离散 Diffusion 进行统一多模态理解和生成

2. SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning➡️ SUREON：手术推理的基准和视觉语言模型

3. SurgFormer: Scalable Learning of Organ Deformation with Resection Support and Real-Time Inference➡️ SurgFormer：具有切除支持和实时推理的器官变形的可扩展学习

4. Causal Interpretation of Neural Network Computations with Contribution Decomposition➡️ 贡献分解神经网络计算的因果解释

5. Modeling and Measuring Redundancy in Multisource Multimodal Data for Autonomous Driving➡️ 自动驾驶多源多模态数据中的冗余建模和测量

6. Fly360: Omnidirectional Obstacle Avoidance within Drone View➡️ Fly360：无人机视野内全方位避障

7. EgoReasoner: Learning Egocentric 4D Reasoning via Task-Adaptive Structured Thinking➡️ EgoReasoner：通过任务自适应结构化思维学习以自我为中心的 4D 推理

8. Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations➡️ 具有时间和不确定性解释的分层工业需求预测

9. SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants➡️ SG-DOR：利用方向条件遮挡推理学习胡椒植物的场景图

10. NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion➡️ NEGATE：文本转视频中语言否定的约束语义指导 Diffusion

✂️ 模型压缩

1. SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation➡️ 范围：场景上下文增量少镜头 3D 分割

2. SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning➡️ SUREON：手术推理的基准和视觉语言模型

3. A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention➡️ 基于注意力的可扩展 MLIP 的秘诀：通过全节点注意力解锁远程准确性

4. Data-Driven Trends and Subpopulations in the Gravitational Wave Binary Black Hole Merger Population with UMAP➡️ UMAP 引力波二元黑洞合并种群中的数据驱动趋势和子种群

5. Third-order mixed electroweak-QCD corrections to the W-boson mass prediction from the muon lifetime➡️ 根据 μ 子寿命对 W 玻色子质量预测进行三阶混合电弱 QCD 校正

6. BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations➡️ BEVLM：将 LLMs 中的语义知识提炼为鸟瞰图表示

7. Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders➡️ Penguin-VL：使用基于 LLM 的视觉编码器探索 VLM 的效率极限

8. Causal Interpretation of Neural Network Computations with Contribution Decomposition➡️ 贡献分解神经网络计算的因果解释

9. The Prevalence of Turbulence-Regulated Multiphase Galactic Winds in Star-Forming Galaxies➡️ 恒星形成星系中湍流调节的多相星系风的普遍性

10. Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion➡️ Omni-Diffusion：使用屏蔽离散 Diffusion 进行统一多模态理解和生成

📝 综述论文

1. Estimating Residential Displacement in the Central Puget Sound Region using Household Survey Data➡️ 使用家庭调查数据估算普吉特海湾中部地区的住宅搬迁情况

2. Unbiased Bayesian Inference of Peculiar Motions of Galaxies from Type Ia Supernovae Observations➡️ 根据 Ia 型超新星观测对星系奇异运动的无偏贝叶斯推断

3. Density of States Weighted Decoherence Probe Formalism for Charge Transport in DNA➡️ DNA 电荷传输的态密度加权退相干探针形式

4. KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot Approaches to Political Evasion Detection➡️ SemEval-2026 任务 6 上的 KCLarity：政治逃逸检测的编码器和零样本方法

5. Construction and Science of SURF➡️ SURF的建设与科学

6. WALLABY pilot survey: Blinded by the light -- discovery of a fourth member in the ESO 179-013 system➡️ WALLABY 试点调查：被光致盲——在 ESO 179-013 系统中发现第四个成员

7. The Collective Voice of Ly$α$ Emitters: Insights from JWST Stacked Spectroscopy➡️ Ly$α$ 发射体的集体声音：来自 JWST 堆叠光谱的见解

8. Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows➡️ 自由交谈，严格执行：模式门控代理 AI 用于灵活且可重复的科学工作流程

9. MIGHTEE: The dark matter haloes, duty cycle and mechanical feedback from radio-AGN up to $z \sim 2.5$➡️ MIGHTEE：暗物质晕、占空比和来自无线电 AGN 的机械反馈高达 $z \sim 2.5$

10. Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction➡️ 医生还是病人？协同二值化和 ASR 进行代码转换的印度英语医疗状况提取

🎮 强化学习

1. Convergence of Neural Network Policies for Risk--Reward Optimization➡️ 风险神经网络策略的收敛--奖励优化

2. SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation➡️ 范围：场景上下文增量少镜头 3D 分割

3. A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention➡️ 基于注意力的可扩展 MLIP 的秘诀：通过全节点注意力解锁远程准确性

4. An ode to instantons➡️ 瞬子颂歌

5. Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations➡️ 具有时间和不确定性解释的分层工业需求预测

6. Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders➡️ Penguin-VL：使用基于 LLM 的视觉编码器探索 VLM 的效率极限

7. Kinematically Coherent Multiphase Galactic Winds in Star-Forming Galaxies Revealed by Unified Radiative Transfer Modeling of UV Emission and Absorption Lines➡️ 通过紫外线发射线和吸收线的统一辐射传递模型揭示了恒星形成星系中运动学相干的多相星系风

8. Causal Interpretation of Neural Network Computations with Contribution Decomposition➡️ 贡献分解神经网络计算的因果解释

9. Unified Learning of Temporal Task Structure and Action Timing for Bimanual Robot Manipulation➡️ 双手机器人操作时间任务结构和动作时序的统一学习

10. Boosting deep Reinforcement Learning using pretraining with Logical Options➡️ 使用带有逻辑选项的预训练来促进深度强化学习