AI论文速递 2026-03-27

🧠 大语言模型

1. MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models➡️ [翻译失败] MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models...

👤 Bocheng Zou, Mu Cai, Mark Stanley

📄 Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale. This prevalent single-scale paradigm overlooks a fundamental property of visual perception: varying resolutions offer complementary inductive biases, where low-resolution views excel at global semantic recognition and high-resolution views are essential for fine-grained refinement. In this work, we propose Multi-Resolution Fusion (MuRF), a simple yet universally effective strategy to harness this synergy at inference time. Instead of relying on a single view, MuRF constructs a unified representation by processing an image at multiple resolutions through a frozen VFM and fusing the resulting features. The universality of MuRF is its most compelling attribute. It is not tied to a specific architecture, serving instead as a fundamental, training-free enhancement to visual representation. We empirically validate this by applying MuRF to a broad spectrum of critical computer vision tasks across multiple distinct VFM families - primarily DINOv2, but also demonstrating successful generalization to contrastive models like SigLIP2.

📄 [翻译失败] Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle ...

📚 AI论文速递 2026-03-27

🧠 大语言模型

1. MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models➡️ [翻译失败] MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models...

2. Parameterizing Dark Energy at the density level: A two-parameter alternative to CPL➡️ [翻译失败] Parameterizing Dark Energy at the density level: A two-parameter alternative to CPL...

3. Unleashing Guidance Without Classifiers for Human-Object Interaction Animation➡️ [翻译失败] Unleashing Guidance Without Classifiers for Human-Object Interaction Animation...

4. BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation➡️ [翻译失败] BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation...

5. Vega: Learning to Drive with Natural Language Instructions➡️ [翻译失败] Vega: Learning to Drive with Natural Language Instructions...

6. CMB constraints on dark matter-proton scattering: investigating prior-volume effects using profile likelihoods➡️ [翻译失败] CMB constraints on dark matter-proton scattering: investigating prior-volume effects using profile likelihoods...

7. ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling➡️ [翻译失败] ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling...

8. How good was my shot? Quantifying Player Skill Level in Table Tennis➡️ [翻译失败] How good was my shot? Quantifying Player Skill Level in Table Tennis...

9. Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment➡️ [翻译失败] Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment...

10. MegaFlow: Zero-Shot Large Displacement Optical Flow➡️ [翻译失败] MegaFlow: Zero-Shot Large Displacement Optical Flow...

🖼️ 计算机视觉

1. Vega: Learning to Drive with Natural Language Instructions➡️ [翻译失败] Vega: Learning to Drive with Natural Language Instructions...

2. SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding➡️ [翻译失败] SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding...

3. PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow➡️ [翻译失败] PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow...

4. BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation➡️ [翻译失败] BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation...

5. Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving➡️ [翻译失败] Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving...

6. How good was my shot? Quantifying Player Skill Level in Table Tennis➡️ [翻译失败] How good was my shot? Quantifying Player Skill Level in Table Tennis...

7. ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling➡️ [翻译失败] ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling...

8. Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting➡️ [翻译失败] Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting...

9. MegaFlow: Zero-Shot Large Displacement Optical Flow➡️ [翻译失败] MegaFlow: Zero-Shot Large Displacement Optical Flow...

10. MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models➡️ [翻译失败] MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models...

🎨 多模态学习

1. From Manipulation to Mistrust: Explaining Diverse Micro-Video Misinformation for Robust Debunking in the Wild➡️ [翻译失败] From Manipulation to Mistrust: Explaining Diverse Micro-Video Misinformation for Robust Debunking in the Wild...

2. BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning➡️ [翻译失败] BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning...

3. VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents➡️ [翻译失败] VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents...

4. Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs➡️ [翻译失败] Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs...

5. Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence➡️ [翻译失败] Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence...

6. Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference➡️ [翻译失败] Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference...

7. R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning➡️ [翻译失败] R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning...

8. Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion➡️ [翻译失败] Towards Comprehensive Real-Time Scene Understanding in Ophthalmic Surgery through Multimodal Image Fusion...

9. LanteRn: Latent Visual Structured Reasoning➡️ [翻译失败] LanteRn: Latent Visual Structured Reasoning...

10. Symplectic Split-Operator Propagators from Tridiagonalized Multi-Mode Bosonic Hilbert Spaces for Bose-Hubbard Hamiltonians➡️ [翻译失败] Symplectic Split-Operator Propagators from Tridiagonalized Multi-Mode Bosonic Hilbert Spaces for Bose-Hubbard Hamiltonians...

📊 新数据集

1. How good was my shot? Quantifying Player Skill Level in Table Tennis➡️ [翻译失败] How good was my shot? Quantifying Player Skill Level in Table Tennis...

2. AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation➡️ [翻译失败] AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation...

3. Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving➡️ [翻译失败] Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving...

4. PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow➡️ [翻译失败] PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow...

5. MegaFlow: Zero-Shot Large Displacement Optical Flow➡️ [翻译失败] MegaFlow: Zero-Shot Large Displacement Optical Flow...

6. Vega: Learning to Drive with Natural Language Instructions➡️ [翻译失败] Vega: Learning to Drive with Natural Language Instructions...

7. BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation➡️ [翻译失败] BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation...

8. Natural-Language Agent Harnesses➡️ [翻译失败] Natural-Language Agent Harnesses...

9. SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding➡️ [翻译失败] SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding...

10. RefAlign: Representation Alignment for Reference-to-Video Generation➡️ [翻译失败] RefAlign: Representation Alignment for Reference-to-Video Generation...

✂️ 模型压缩

1. PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference➡️ [翻译失败] PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference...

2. MegaFlow: Zero-Shot Large Displacement Optical Flow➡️ [翻译失败] MegaFlow: Zero-Shot Large Displacement Optical Flow...

3. CMB constraints on dark matter-proton scattering: investigating prior-volume effects using profile likelihoods➡️ [翻译失败] CMB constraints on dark matter-proton scattering: investigating prior-volume effects using profile likelihoods...

4. PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow➡️ [翻译失败] PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow...

5. BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation➡️ [翻译失败] BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation...

6. Vega: Learning to Drive with Natural Language Instructions➡️ [翻译失败] Vega: Learning to Drive with Natural Language Instructions...

7. ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling➡️ [翻译失败] ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling...

8. RefAlign: Representation Alignment for Reference-to-Video Generation➡️ [翻译失败] RefAlign: Representation Alignment for Reference-to-Video Generation...

9. Unleashing Guidance Without Classifiers for Human-Object Interaction Animation➡️ [翻译失败] Unleashing Guidance Without Classifiers for Human-Object Interaction Animation...

10. Pseudogap and Non-Fermi-liquid criticality in double Kondo model for bilayer nickelates➡️ [翻译失败] Pseudogap and Non-Fermi-liquid criticality in double Kondo model for bilayer nickelates...

📝 综述论文

1. Painting a full radio sky -- Empirical mock catalogues with multiple source populations for future radio surveys➡️ [翻译失败] Painting a full radio sky -- Empirical mock catalogues with multiple source populations for future radio surveys...

2. A Representation Optimization Dichotomy, Lie-Algebraic Policy Optimization➡️ [翻译失败] A Representation Optimization Dichotomy, Lie-Algebraic Policy Optimization...

3. Nonlinear Information from DESI Luminous Red Galaxies: An Emulator-Based Analysis of Pre- and Post-Reconstruction Power Spectra➡️ [翻译失败] Nonlinear Information from DESI Luminous Red Galaxies: An Emulator-Based Analysis of Pre- and Post-Reconstruction Power Spectra...

4. Advancing weak lensing mass mapping with a mask-aware HEALPix transformer➡️ [翻译失败] Advancing weak lensing mass mapping with a mask-aware HEALPix transformer...

5. Rotatable Antenna-Empowered Wireless Networks: A Tutorial➡️ [翻译失败] Rotatable Antenna-Empowered Wireless Networks: A Tutorial...

6. Natural-Language Agent Harnesses➡️ [翻译失败] Natural-Language Agent Harnesses...

7. Identifying Surface Degeneracies in Single-Visit Reflected Light Observations of Modern Earth using the Habitable Worlds Observatory➡️ [翻译失败] Identifying Surface Degeneracies in Single-Visit Reflected Light Observations of Modern Earth using the Habitable Worlds Observatory...

8. WKB for semiclassical operators: How to fly over caustics (and more)➡️ [翻译失败] WKB for semiclassical operators: How to fly over caustics (and more)...

9. Eclipsing binary classification with machine learning techniques➡️ [翻译失败] Eclipsing binary classification with machine learning techniques...

10. Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring➡️ [翻译失败] Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring...

🎮 强化学习

1. Compiling molecular ultrastructure into neural dynamics➡️ [翻译失败] Compiling molecular ultrastructure into neural dynamics...

3. SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation➡️ [翻译失败] SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation...

4. Parameterizing Dark Energy at the density level: A two-parameter alternative to CPL➡️ [翻译失败] Parameterizing Dark Energy at the density level: A two-parameter alternative to CPL...

5. SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding➡️ [翻译失败] SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding...

6. Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?➡️ [翻译失败] Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?...

7. Neural Network Conversion of Machine Learning Pipelines➡️ [翻译失败] Neural Network Conversion of Machine Learning Pipelines...

8. Vega: Learning to Drive with Natural Language Instructions➡️ [翻译失败] Vega: Learning to Drive with Natural Language Instructions...

9. How good was my shot? Quantifying Player Skill Level in Table Tennis➡️ [翻译失败] How good was my shot? Quantifying Player Skill Level in Table Tennis...

10. PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow➡️ [翻译失败] PSDesigner: Automated Graphic Design with a Human-Like Creative Workflow...