Philippe Weinzaepfel

What does really matter in image goal navigation?

Gianluca Monaci, Philippe Weinzaepfel, Christian Wolf

3DV 2026 (oral)

arXiv

Extensive evaluation of the visual module in ImageNav: in the realistic setting without the sliding option of the Habitat simulator, early/cross-attention fusion and pre-training are important.

Human Mesh Modeling for Anny Body

Romain Brégier, Guénolé Fiche, Laura Bravo-Sánchez, Thomas Lucas, Matthieu Armando, Philippe Weinzaepfel, Grégory Rogez, Fabien Baradel

arXiv 2025

arXiv github

Anny is a differentiable human body mesh model which can represent a large variety of human body shapes, from infants to elders, using a common topology and parameter space

Kinaema: A recurrent sequence model for memory and pose in motion

Mert Bulent Sariyildiz, Philippe Weinzaepfel, Guillaume Bono, Gianluca Monaci, Christian Wolf

NeurIPS 2025

arXiv project

Recurrent model for robotics that builds a latent memory of a potentially large scene by integrating a stream of visual observations while moving.

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas, Matthieu Armando, Bernard Ghamen, Philippe Weinzaepfel, Vincent Leroy, Gregory Rogez

ICCV 2025

arXiv

Dense 3D pointmap regression with human semantic information: extension of DUSt3R with DUNE encoder and additional regression heads for human segmentation and DensePose.

HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images

Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-Sébastien Franco, Grégory Rogez

ICCVW 2025

arXiv

Leveraging a DUSt3R-like method trained on hand-object data to better estimate rigid transformation for two-stage detailed hand-object reconstruction.

DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers

Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis

CVPR 2025

arXiv project

A single encoder distilled from multiple teachers: DINOv2, MASt3R and Multi-HMR, versatile enough to perform heterogeneous tasks.

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud

CVPR 2025

arXiv project

Versatile integration of several camera and scene priors into DUSt3R-like approaches.

CondiMen: Conditional Multi-Person Mesh Recovery

Romain Brégier, Fabien Baradel, Thomas Lucas, Salma Galaaoui, Matthieu Armando, Philippe Weinzaepfel, Grégory Rogez

CVPR Workshop 2025

arXiv

A multi-person human mesh recovery method that outputs a joint parametric distribution over likely poses, body shapes, intrinsics and distances to the camera, using a Bayesian network.

MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jerome Revaud

3DV 2025 (oral) Best Student Paper Award

arXiv github

Scaling MASt3R to large image collections thanks to using the encoder features for retrieval and a sparse alignment formulation.

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez, Thomas Lucas

ECCV 2024

arXiv github demo project winner of ROBIN challenge @CVPR'24

A simple yet effective model for multi-person whole-body human mesh recovery estimation running in real-time on a GPU and reaching SotA results.

PoseEmbroider: Towards a 3D, Visual, Semantic-Aware Human Pose Representation

Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno-Noguer, Grégory Rogez

ECCV 2024

arXiv github project

A multi-modal representation with any combination of modalities among human pose, text description and person's picture.

UNIC: Universal Classification Models via Multi-teacher Distillation

Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis

ECCV 2024

arXiv github project

A UNIC classification model that distills from strong pretrained models, and performs on par of better then each of them.

Cross-view and Cross-pose Completion for 3D Human Understanding

Matthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez

CVPR 2024

arXiv project

Applying the CroCo pre-training idea on human-centric data, sampling image pairs from multi-view or video datasets.

SACReg: Scene-agnostic coordinate regression for visual localization

Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel

CVPR Workshop 2024

arXiv

Masking scene coordinate regression models scene-agnostic by considering the 2D-3D matches as an external database.

End-to-End (Instance)-Image Goal Navigation Through Correspendence As An Emerging Phenomenon

Guillaume Bono, Leonid Antsfeld, Boris Chidlovskii, Philippe Weinzaepfel, Christian Wolf

ICLR 2024

arXiv

In an ImageGoal navigation context, we propose two pre-text tasks which let correspondence emerge as a solution and train a dual visual encoder based on a binocular transformer.

Win-Win: Training High-Resolution Vision Transformers from Two Windows

Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel

ICLR 2024

arXiv

Win-Win enables to efficiently train vanilla ViTs for high-resolution dense pixelwise tasks.

Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael S Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

ICLR 2024

arXiv project

We make retrieval for localization models robust to weather, seasonal and time-of-day changes by augmenting the training set with synthetic variations generated using Generative AI and leverage geometric consistency for sampling and filtering.

SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video

Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Brégier, Matthieu Armando, Jean-Sebastien Franco, Grégory Rogez

CVIU 2024

paper dataset

Extension of SHOWMe (ICCVW 2023) with improved hand-object reconstruction by extending the two-stage method with the estimation of virtual camera poses based on a finetuned CroCo model.

Purposer: Putting Human Motion Generation in Context

Nicolás Ugrinovic, Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer

3DV 2024

arXiv

A versatile method able to generate realistic-looking motions that interact with virtual scenes.

PoseScript: Linking 3D Human Poses and Natural Language

Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, Francesc Moreno-Noguer, Grégory Rogez

IEEE Trans. PAMI 2024

arXiv github project

Extension of the ECCV 2022 paper on dataset and tasks that relate 3D human poses and their description in natural language.

CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

ICCV 2023

arXiv github

Extending CroCo to real datasets leads to state-of-the-art results for binocular tasks like stereo and flow.

PoseFix: Correcting 3D Human Poses with Natural Language

Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno-Noguer, Grégory Rogez

ICCV 2023

arXiv github dataset

Text description of the difference between two human poses.

SHOWMe: Benchmarking object-agnostic hand-object 3d reconstruction

Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Brégier, Matthieu Armando, Jean-Sebastien Franco, Grégory Rogez

ICCV Workshop 2023

arXiv github project

A dataset and method for highly-detailed object-agnostic hand-object reconstruction

CroCo: Self-Supervised Pre-Training for 3D Vision Tasks by Cross-view Completion

Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

NeurIPS 2022

arXiv github project

Masked image modeling with a second reference view implicitly learns correspondences and is thus well suited for geometric tasks.

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

Fabien Baradel, Romain Brégier, Thibault Groueix, Philippe Weinzaepfel, Yannis Kalantidis, Grégory Rogez

IEEE Trans. PAMI 2022

arXiv github

A generic transformer model for temporal modeling of human and hand shape trained with masked modeling and that can be applied e.g. to pose estimation and future pose prediction (extension of our 3DV 2021 paper).

PoseScript: Linking 3D Human Poses and Natural Language

Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, Francesc Moreno-Noguer, Grégory Rogez

ECCV 2022

arXiv github project

Dataset and tasks that relate 3D human poses and their description in natural language.

PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting

Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Grégory Rogez

ECCV 2022

arXiv github

PoseGPT generates a human motion, conditioned on an action label, a duration and optionally on an observed past human motion using a VQ-VAE.

Multi-Finger Grasping Like Humans

Yuming Du, Philippe Weinzaepfel, Vincent Lepetit, Romain Brégier

IROS 2022

arXiv

An optimization-based approach to transform a human grasp into a multi-finger robot grasp.

Investigating the role of image retrieval for visual localization: An exhaustive benchmark

Martin Humenberger, Yohann Cabon, Noé Pion, Philippe Weinzaepfel, Donghwan Lee, Nicolas Guérin, Torsten Sattler, Gabriela Csurka

IJCV 2022

arXiv github

We analyze the role of image retrieval for three visual localization paradigms.

PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors

Jérome Revaud, Vincent Leroy, Philippe Weinzaepfel, Boris Chidlovskii

CVPR 2022

paper

Unsupervised learning of local descriptors thanks to loss that incites unique matches.

Learning Super-Features for Image Retrieval

Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis

ICLR 2022

arXiv github

Extract mid-level features for image retrieval with ASMK.

Barely-supervised learning: Semi-supervised learning with very few labeled images

Thomas Lucas, Philippe Weinzaepfel, Gregory Rogez

AAAI 2022

arXiv

Use self-supervised learning if the pseudo-label from the weak augmentation is not confident enough.

Leveraging MoCap Data for Human Mesh Recovery

Fabien Baradel, Thibault Groueix, Philippe Weinzaepfel, Romain Brégier, Yannis Kalantidis, Grégory Rogez

3DV 2021

arXiv github

Motion capture data helps to improve image-based and video-based human mesh recovery.

Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps

Jens Lundell, Enric Corona, Tran Nguyen Le, Francesco Verdoja, Philippe Weinzaepfel, Grégory Rogez, Francesc Moreno-Noguer, Ville Kyrki

ICRA 2021

arXiv github

A fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second.

Large-scale localization datasets in crowded indoor spaces

Donghwan Lee, Soohyun Ryu, Suyong Yeon, Yonghan Lee, Deokhwa Kim, Cheolho Han, Yohann Cabon, Philippe Weinzaepfel, Nicolas Guérin, Gabriela Csurka, Martin Humenberger

CVPR 2021

arXiv dataset project

Dataset and baseline for large-scale localization in crowded indoor spaces (metro station or shopping mall).

Mimetics: Towards Understanding Human Actions Out of Context

Philippe Weinzaepfel, Grégory Rogez

IJCV 2021

arXiv dataset project

The Mimetics dataset contains 713 video clips of mimed action to evaluate out-of-context human action methods.

Hard negative mixing for contrastive learning

Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, Diane Larlus

NeurIPS 2020

arXiv project

Generating hard negatives in the feature space for improved self-supervised contrastive learning.

SuperLoss: A Generic Loss for Robust Curriculum Learning

Thibault Castells, Philippe Weinzaepfel, Jerome Revaud

NeurIPS 2020

paper

Automatically downweighting samples with a high loss implicitly performs curriculum learning.

SMPLy Benchmarking 3D Human Pose Estimation in the Wild

Vincent Leroy, Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Grégory Rogez

3DV 2020

arXiv dataset

Dataset for in-the-wild human mesh recovery evaluation by fitting pseudo-ground-truth on Mannequin Challenge videos where people are static.

DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild

Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, Grégory Rogez

ECCV 2020

arXiv github

A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hroz, Jakub Kanis, Zdenek Krnoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei, Yunhui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel, Romain Brégier, Grégory Rogez, Vincent Lepetit, Tae-Kyun Kim

ECCV 2020

arXiv project

Outcome of the HANDS'19 challenge.

R2D2: Repeatable and Reliable Detector and Descriptor

Jerome Revaud, Cesar De Souza, Martin Humenberger, Philippe Weinzaepfel

NeurIPS 2019 (oral)

arXiv github

A neural network trains to detect and describe repeatable and reliable keypoints

MARS: Motion-Augmented RGB Stream for Action Recognition

Nieves Crasto, Philippe Weinzaepfel, Karteek Alahari, Cordelia Schmid

CVPR 2019

paper github

Distill an optical flow based action recognition network into a RGB-based network.

Visual Localization by Learning O-of-Interest Dense Match Regression

Philippe Weinzaepfel, Gabriela Csurka, Yohann Cabon, Martin Humenberger

CVPR 2019

paper dataset project

Visual localization by predicting dense texture coordinates of a given list of planar objects.

LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images

Gregory Rogez, Philippe Weinzaepfel, Cordelia Schmid

IEEE Trans. PAMI 2019

arXiv github

Journal extension of LCR-Net (CVPR 2017) for robust 2D-3D human pose estimation in the wild.

PoTion: Pose MoTion Representation for Action Recognition

Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid

CVPR 2018

paper

Action recognition from human poses using body-joint heatmaps that are colored according to their motions.

Action Tubelet Detector for Spatio-Temporal Action Localization

Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid

ICCV 2017

arXiv

Spatio-temporal video action detector by regressing tubelets from anchor cuboids.

Joint Learning of Object and Action Detectors

Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid

ICCV 2017

paper

Spatio-temporal detection of different action classes performed by various types of "objects" in videos.

LCR-Net: Localization-Classification-Regression for Human Pose

Gregory Rogez, Philippe Weinzaepfel, Cordelia Schmid

CVPR 2017 (spotlight)

paper

Highly-robust 2D-3D human pose estimation by classifying poses among some predefined clusters and regressing the offset.

Human Action Localization with Sparse Spatial Supervision

Philippe Weinzaepfel, Xavier Martin, Cordelia Schmid

arXiv 2016

arXiv dataset

Spatio-temporal video action detection from temporal action annotation and one bounding box.

DeepMatching: Hierarchical Deformable Dense Matching

Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

IJCV 2016

arXiv project

A learning-free algorithm to compute dense correspondences with a hierarchical, multi-layer correlational architecture inspired by deep convolutional networks.

Motion in action : optical flow estimation and action localization in videos

Philippe Weinzaepfel

PhD Thesis, University Grenoble Alpes, 2016

paper

My PhD thesis.

Learning to Track for Spatio-Temporal Action Localization

Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

ICCV 2015

arXiv

Spatio-temporal video action detection by frame-level detection and scoring, tracking best candidates across videos and scoring the obtained tracks.

Learning to Detect Motion Boundaries

Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid

CVPR 2015

paper dataset

Dataset for motion boundary detection and structured random forest baseline.

EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

CVPR 2015 (oral)

arXiv

See title.

DeepFlow: Large Displacement Optical Flow with Deep Matching

Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid

ICCV 2013 (oral)

paper

Novel matching algorithm based on optimal hierarchical motion of moving quadrants tailored for large displacements and its application to optical flow.

Reconstructing an Image from its Local Descriptors

Philippe Weinzaepfel, Hervé Jégou, Patrick Pérez

CVPR 2011

paper

From the local features and their locations of an input image, the content can be reconstructed by looking from similar regions from a database.

Philippe Weinzaepfel

All publications

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2013

2011