Philippe Weinzaepfel

I am currently a Principal Research Scientist at Naver Labs Europe in the Computer Vision group.

My research interests include (but are not limited to):

Photo Philippe Weinzaepfel

All publications

Per year, with some of them highlighted.

2025

DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis
CVPR 2025
A single encoder distilled from multiple teachers: DINOv2, MASt3R and Multi-HMR, versatile enough to perform heterogeneous tasks.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud
CVPR 2025
Versatile integration of several camera and scene priors into DUSt3R-like approaches.
CondiMen: Conditional Multi-Person Mesh Recovery
Romain Brégier, Fabien Baradel, Thomas Lucas, Salma Galaaoui, Matthieu Armando, Philippe Weinzaepfel, Grégory Rogez
CVPR Workshop 2025
A multi-person human mesh recovery method that outputs a joint parametric distribution over likely poses, body shapes, intrinsics and distances to the camera, using a Bayesian network.
MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion
Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jerome Revaud
3DV 2025 (oral) Best Student Paper Award
Scaling MASt3R to large image collections thanks to using the encoder features for retrieval and a sparse alignment formulation.

2024

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez, Thomas Lucas
ECCV 2024
A simple yet effective model for multi-person whole-body human mesh recovery estimation running in real-time on a GPU and reaching SotA results.
PoseEmbroider: Towards a 3D, Visual, Semantic-Aware Human Pose Representation
Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno-Noguer, Grégory Rogez
ECCV 2024
A multi-modal representation with any combination of modalities among human pose, text description and person's picture.
UNIC: Universal Classification Models via Multi-teacher Distillation
Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis
ECCV 2024
A UNIC classification model that distills from strong pretrained models, and performs on par of better then each of them.
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez
CVPR 2024
Applying the CroCo pre-training idea on human-centric data, sampling image pairs from multi-view or video datasets.
SACReg: Scene-agnostic coordinate regression for visual localization
Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel
CVPR Workshop 2024
Masking scene coordinate regression models scene-agnostic by considering the 2D-3D matches as an external database.
End-to-End (Instance)-Image Goal Navigation Through Correspendence As An Emerging Phenomenon
Guillaume Bono, Leonid Antsfeld, Boris Chidlovskii, Philippe Weinzaepfel, Christian Wolf
ICLR 2024
In an ImageGoal navigation context, we propose two pre-text tasks which let correspondence emerge as a solution and train a dual visual encoder based on a binocular transformer.
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel
ICLR 2024
Win-Win enables to efficiently train vanilla ViTs for high-resolution dense pixelwise tasks.
Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency
Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael S Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka
ICLR 2024
We make retrieval for localization models robust to weather, seasonal and time-of-day changes by augmenting the training set with synthetic variations generated using Generative AI and leverage geometric consistency for sampling and filtering.
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video
Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Brégier, Matthieu Armando, Jean-Sebastien Franco, Grégory Rogez
CVIU 2024
Extension of SHOWMe (ICCVW 2023) with improved hand-object reconstruction by extending the two-stage method with the estimation of virtual camera poses based on a finetuned CroCo model.
Purposer: Putting Human Motion Generation in Context
Nicolás Ugrinovic, Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer
3DV 2024
A versatile method able to generate realistic-looking motions that interact with virtual scenes.
PoseScript: Linking 3D Human Poses and Natural Language
Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, Francesc Moreno-Noguer, Grégory Rogez
IEEE Trans. PAMI 2024
Extension of the ECCV 2022 paper on dataset and tasks that relate 3D human poses and their description in natural language.

2023

CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud
ICCV 2023
Extending CroCo to real datasets leads to state-of-the-art results for binocular tasks like stereo and flow.
PoseFix: Correcting 3D Human Poses with Natural Language
Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno-Noguer, Grégory Rogez
ICCV 2023
Text description of the difference between two human poses.
SHOWMe: Benchmarking object-agnostic hand-object 3d reconstruction
Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Brégier, Matthieu Armando, Jean-Sebastien Franco, Grégory Rogez
ICCV Workshop 2023
A dataset and method for highly-detailed object-agnostic hand-object reconstruction

2022

CroCo: Self-Supervised Pre-Training for 3D Vision Tasks by Cross-view Completion
Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud
NeurIPS 2022
Masked image modeling with a second reference view implicitly learns correspondences and is thus well suited for geometric tasks.
PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
Fabien Baradel, Romain Brégier, Thibault Groueix, Philippe Weinzaepfel, Yannis Kalantidis, Grégory Rogez
IEEE Trans. PAMI 2022
A generic transformer model for temporal modeling of human and hand shape trained with masked modeling and that can be applied e.g. to pose estimation and future pose prediction (extension of our 3DV 2021 paper).
PoseScript: Linking 3D Human Poses and Natural Language
Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, Francesc Moreno-Noguer, Grégory Rogez
ECCV 2022
Dataset and tasks that relate 3D human poses and their description in natural language.
PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Grégory Rogez
ECCV 2022
PoseGPT generates a human motion, conditioned on an action label, a duration and optionally on an observed past human motion using a VQ-VAE.
Multi-Finger Grasping Like Humans
Yuming Du, Philippe Weinzaepfel, Vincent Lepetit, Romain Brégier
IROS 2022
An optimization-based approach to transform a human grasp into a multi-finger robot grasp.
Investigating the role of image retrieval for visual localization: An exhaustive benchmark
Martin Humenberger, Yohann Cabon, Noé Pion, Philippe Weinzaepfel, Donghwan Lee, Nicolas Guérin, Torsten Sattler, Gabriela Csurka
IJCV 2022
We analyze the role of image retrieval for three visual localization paradigms.
PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors
Jérome Revaud, Vincent Leroy, Philippe Weinzaepfel, Boris Chidlovskii
CVPR 2022
Unsupervised learning of local descriptors thanks to loss that incites unique matches.
Learning Super-Features for Image Retrieval
Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis
ICLR 2022
Extract mid-level features for image retrieval with ASMK.
Barely-supervised learning: Semi-supervised learning with very few labeled images
Thomas Lucas, Philippe Weinzaepfel, Gregory Rogez
AAAI 2022
Use self-supervised learning if the pseudo-label from the weak augmentation is not confident enough.

2021

Leveraging MoCap Data for Human Mesh Recovery
Fabien Baradel, Thibault Groueix, Philippe Weinzaepfel, Romain Brégier, Yannis Kalantidis, Grégory Rogez
3DV 2021
Motion capture data helps to improve image-based and video-based human mesh recovery.
Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps
Jens Lundell, Enric Corona, Tran Nguyen Le, Francesco Verdoja, Philippe Weinzaepfel, Grégory Rogez, Francesc Moreno-Noguer, Ville Kyrki
ICRA 2021
A fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second.
Large-scale localization datasets in crowded indoor spaces
Donghwan Lee, Soohyun Ryu, Suyong Yeon, Yonghan Lee, Deokhwa Kim, Cheolho Han, Yohann Cabon, Philippe Weinzaepfel, Nicolas Guérin, Gabriela Csurka, Martin Humenberger
CVPR 2021
Dataset and baseline for large-scale localization in crowded indoor spaces (metro station or shopping mall).
Mimetics: Towards Understanding Human Actions Out of Context
Philippe Weinzaepfel, Grégory Rogez
IJCV 2021
The Mimetics dataset contains 713 video clips of mimed action to evaluate out-of-context human action methods.

2020

Hard negative mixing for contrastive learning
Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, Diane Larlus
NeurIPS 2020
Generating hard negatives in the feature space for improved self-supervised contrastive learning.
SuperLoss: A Generic Loss for Robust Curriculum Learning
Thibault Castells, Philippe Weinzaepfel, Jerome Revaud
NeurIPS 2020
Automatically downweighting samples with a high loss implicitly performs curriculum learning.
SMPLy Benchmarking 3D Human Pose Estimation in the Wild
Vincent Leroy, Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Grégory Rogez
3DV 2020
Dataset for in-the-wild human mesh recovery evaluation by fitting pseudo-ground-truth on Mannequin Challenge videos where people are static.
DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild
Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, Grégory Rogez
ECCV 2020
A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.
Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction
Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hroz, Jakub Kanis, Zdenek Krnoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei, Yunhui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel, Romain Brégier, Grégory Rogez, Vincent Lepetit, Tae-Kyun Kim
ECCV 2020
Outcome of the HANDS'19 challenge.

2019

R2D2: Repeatable and Reliable Detector and Descriptor
Jerome Revaud, Cesar De Souza, Martin Humenberger, Philippe Weinzaepfel
NeurIPS 2019 (oral)
A neural network trains to detect and describe repeatable and reliable keypoints
MARS: Motion-Augmented RGB Stream for Action Recognition
Nieves Crasto, Philippe Weinzaepfel, Karteek Alahari, Cordelia Schmid
CVPR 2019
Distill an optical flow based action recognition network into a RGB-based network.
Visual Localization by Learning O-of-Interest Dense Match Regression
Philippe Weinzaepfel, Gabriela Csurka, Yohann Cabon, Martin Humenberger
CVPR 2019
Visual localization by predicting dense texture coordinates of a given list of planar objects.
LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images
Gregory Rogez, Philippe Weinzaepfel, Cordelia Schmid
IEEE Trans. PAMI 2019
Journal extension of LCR-Net (CVPR 2017) for robust 2D-3D human pose estimation in the wild.

2018

PoTion: Pose MoTion Representation for Action Recognition
Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid
CVPR 2018
Action recognition from human poses using body-joint heatmaps that are colored according to their motions.

2017

Action Tubelet Detector for Spatio-Temporal Action Localization
Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid
ICCV 2017
Spatio-temporal video action detector by regressing tubelets from anchor cuboids.
Joint Learning of Object and Action Detectors
Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid
ICCV 2017
Spatio-temporal detection of different action classes performed by various types of "objects" in videos.
LCR-Net: Localization-Classification-Regression for Human Pose
Gregory Rogez, Philippe Weinzaepfel, Cordelia Schmid
CVPR 2017 (spotlight)
Highly-robust 2D-3D human pose estimation by classifying poses among some predefined clusters and regressing the offset.

2016

Human Action Localization with Sparse Spatial Supervision
Philippe Weinzaepfel, Xavier Martin, Cordelia Schmid
arXiv 2016
Spatio-temporal video action detection from temporal action annotation and one bounding box.
DeepMatching: Hierarchical Deformable Dense Matching
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid
IJCV 2016
A learning-free algorithm to compute dense correspondences with a hierarchical, multi-layer correlational architecture inspired by deep convolutional networks.
Motion in action : optical flow estimation and action localization in videos
Philippe Weinzaepfel
PhD Thesis, University Grenoble Alpes, 2016
My PhD thesis.

2015

Learning to Track for Spatio-Temporal Action Localization
Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid
ICCV 2015
Spatio-temporal video action detection by frame-level detection and scoring, tracking best candidates across videos and scoring the obtained tracks.
Learning to Detect Motion Boundaries
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
CVPR 2015
Dataset for motion boundary detection and structured random forest baseline.
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid
CVPR 2015 (oral)
See title.

2013

DeepFlow: Large Displacement Optical Flow with Deep Matching
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
ICCV 2013 (oral)
Novel matching algorithm based on optimal hierarchical motion of moving quadrants tailored for large displacements and its application to optical flow.

2011

Reconstructing an Image from its Local Descriptors
Philippe Weinzaepfel, Hervé Jégou, Patrick Pérez
CVPR 2011
From the local features and their locations of an input image, the content can be reconstructed by looking from similar regions from a database.