facebookresearch / supervision-by-registration翻译 / 编辑





Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

By Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh

University of Technology Sydney, Facebook Reality Labs


We propose a method to find facial landmarks (e.g. corner of eyes, corner of mouth, tip of nose, etc) more precisely. Our method utilizes the fact that objects move smoothly in a video sequence (i.e. optical flow registration) to improve an existing facial landmark detector. The key novelty is that no additional human annotations are necessary to improve the detector, hence it is an “unsupervised approach”.



If you find that Supervision-by-Registration helps your research, please cite the paper:

  title={{Supervision-by-Registration}: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors},
  author={Dong, Xuanyi and Yu, Shoou-I and Weng, Xinshuo and Wei, Shih-En and Yang, Yi and Sheikh, Yaser},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},


  • PyTorch >= 0.4.0
  • Python3.6

Data Preparation

See the README in cache_data.

Dataset Format

Each dataset is saved as one file, in which each row indicates one specific face in one image or one video frame. The format of one line :

image_path annotation_path x1 y1 x2 y2 (face_size)
  • image_path: the image (video frame) file path of that face.
  • annotation_path: the annotation file path of that face (annotation is the coordinates of all landmarks)
  • x1, y1, x2, y2: the coordinates of left-upper and right-lower points of the face bounding box.
  • face_size: an optional item. If set this value, we use the face_size to compute the NME; otherwise, we use the distance between two pre-defined points to compute the NME.


See the configs directory for some example configurations.

Basic Training

python ./exps/basic_main.py [<required arguments>]

The argument list is loaded by ./lib/config_utils/basic_args.py. An examples script can is ./scripts/300W-DET.sh, and you can simple run to train the base detector on the 300-W dataset.

sh scripts/300W-DET.sh

Improving the Detector by SBR

python ./exps/lk_main.py [<required arguments>]

The argument list is loaded by ./lib/config_utils/lk_args.py.

An example to train SBR on the unlabeled sequences

The init_model parameter is the path to the detector trained in the Basic Training section.

sh scripts/demo_sbr.sh

To see visualization results use the commands in Visualization.

An example to train SBR on your own data

See the script ./scripts/sbr_example.sh, and some parameters should be replaced by your own data.


When using the basic_main.py or lk_main.py, we evaluate the testing datasets automatically.

To evaluate a single image, you can use the following script to compute the coordinates of 68 facial landmarks of the target image:

python ./exps/eval.py --image ./cache_data/cache/self.jpeg --model ./snapshots/300W-CPM-DET/checkpoint/cpm_vgg16-epoch-049-050.pth --face 250 150 900 1100 --save ./cache_data/cache/test.jpeg
  • image : the input image path
  • model : the snapshot path
  • face : the face bounding box
  • save : save the visualized results


After training the SBR on the demo video or models on other datasets, you can use the ./exps/vis.py code to generate the visualization results.

python ./exps/vis.py --meta snapshots/CPM-SBR/metas/eval-start-eval-00-01.pth --save cache_data/cache/demo-detsbr-vis
ffmpeg -start_number 3 -i cache_data/cache/demo-detsbr-vis/image%04d.png -b:v 30000k -vf "fps=30" -pix_fmt yuv420p cache_data/cache/demo-detsbr-vis.mp4

python ./exps/vis.py --meta snapshots/CPM-SBR/metas/eval-epoch-049-050-00-01.pth --save cache_data/cache/demo-sbr-vis
ffmpeg -start_number 3 -i cache_data/cache/demo-sbr-vis/image%04d.png -b:v 30000k -vf "fps=30" -pix_fmt yuv420p cache_data/cache/demo-sbr-vis.mp4
  • meta : the saved prediction files
  • save : the directory path to save the visualization results


supervision-by-registration is released under the CC-BY-NC license.

Useful information

1. train on your own video data

You should look at the ./lib/datasets/VideoDataset.py and ./lib/datasets/parse_utils.py, and add how to find the neighbour frames when giving one image path. For more details, see the parse_basic function in lib/datasets/parse_utils.py.

2. warnings when training the AFLW datase

It is ok to show the following warnings. Since some images in the AFLW dataset are in the wrong format, PIL will raise some warnings when loading these images. These warnings do not affect the training performance.

TiffImagePlugin.py:756: UserWarning: Corrupt EXIF data.  Expecting to read 12 bytes but only got 6.


To ask questions or report issues, please open an issue on the issues tracker.