IMDb-Face Dataset

IMDb-Face is a new large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website. We used this dataset in our ECCV 2018 paper "The Devil of Face Recognition is in the Noise". We hope that the IMDb-Face dataset could shed lights on the influences of data noise to the face recognition task, and point to potential labelling strategies to mitigate some of the problems. It could serve as a relatively clean data to facilitate future studies of noises in large-scale face recognition.

Details ...

Grouping Face in the Wild (GFW) Dataset

This is the largest real-world face clustering dataset. We used this dataset in our AAAI 2018 paper "Merge or Not? Learning to Group Faces via Imitation Learning". We collect 60 real users’ albums with permission from a Chinese social network portal. The size of an album varies from 120 to 3600 faces, with a maximum number of identities of 321. In total, the dataset contains 84,200 images with 78,000 faces of 3,132 different identities. We annotate all detections with identity/noise labels. The images are unconstrained, taken in various indoor/outdoor scenes. Faces are naturally distributed with different poses with spontaneous expression. In addition, faces can be severely occluded, blurred with motion, and differently illuminated under different scenes.

Details ...

FashionGAN Dataset

New annotations (languages and segmentation maps) on the subset of the DeepFashion dataset. The data is used in our ICCV 2017 paper "Be Your Own Prada: Fashion Synthesis with Structural Coherence".

Details ...

Visual Discriminative Question Generation (VDQG) Dataset

The dataset contains 11202 ambiguous image pairs collected from Visual Genome. Each image pair is annotated with 4.6 discriminative questions and 5.9 non-discriminative questions on average. The dataset is used in our ICCV 2017 paper "Learning to Disambiguate by Asking Discriminative Questions".

Details ...

MegaAge Dataset

We introduce a new large-scale MegaAge dataset that consists of 41,941 faces annotated with age posterior distributions. We also provide the MegaAge-Asian dataset that consists only Asian faces (40,000 face images). The dataset is used in our BMVC 2017 paper "Quantifying Facial Age by Posterior of Age Comparisons".

Details ...

WildLife Documentary (WLD) Dataset

The dataset contains 15 documentary films that are downloaded from YouTube, whose durations vary from 9 minutes to as long as 50 minutes, and the total number of frames is more than 747,000. More than 4000 object tracklets of 65 categories are annotated. The dataset is used in our CVPR 2017 paper "Discover and Learn New Objects from Documentaries".

Details ...

Expression in-the-Wild (ExpW) Dataset

We built a new database named as Expression in-the-Wild (ExpW) dataset that contains 91,793 faces manually labeled with expressions. Each of the face images was manually annotated as one of the seven basic expression categories: “angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, or “neutral”. The number of images in ExpW is larger and the face variations are more diverse than many existing databases. The dataset is used in our paper "From Facial Expression Recognition to Interpersonal Relation Prediction".

Details ...

Pedestrian Color Naming Dataset

To facilitate the learning of evaluation of pedestrian color naming, we build a new large-scale dataset, named Pedestrian Color Naming (PCN) dataset, which contains 14,213 images, each of which hand-labeled with color label for each pixel. All images in the PCN dataset are obtained from the Market- 1501 dataset.

Details ...

WIDER ATTRIBUTE Dataset

WIDER ATTRIBUTE dataset is a human attribute recognition benchmark dataset, of which images are selected from the publicly available WIDER dataset. There are a total of 13789 images. We annotate a bounding box for each person in these images, but no more than 20 people (with top resolutions) in a crowd image, resulting in 57524 boxes in total and 4+ boxes per image on average. For each bounding box, we label 14 distinct human attributes, resulting in a total of 805336 labels.

Details ...

General 100 Dataset

General-100 dataset contains 100 bmp-format images (with no compression). We used this dataset in our FSRCNN ECCV 2016 paper. The size of these 100 images ranges from 710 x 704 (large) to 131 x 112 (small). They are all of good quality with clear edges but fewer smooth regions (e.g., sky and ocean), thus are very suitable for the super-resolution training.

Details ...

WIDER FACE Dataset

WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.

Details ...

WWW Crowd Dataset

The dataset is used in our CVPR paper. The WWW dataset provides 10,000 videos with over 8 million frames from 8,257 diverse scenes, therefore offering a superiorly comprehensive dataset for the area of crowd understanding. The abundant sources of these videos also enrich the diversity and completeness.

Details ...

Social Relation Dataset

The dataset is used in our ICCV 2015 paper. We define the social relation traits based on the interpersonal circle proposed by Kiesler, where human relations are divided into 16 segments Each segment has its opposite side in the circle, such as 'friendly and hostile'. To investigate the detectability of social relations from a pair of face images, we build a new dataset, containing 8,306 images chosen from web and movies. Each image is labelled with faces’ bounding boxes and their pairwise relations. This is the first face dataset measuring social relation traits and it is challenging because of large face variations including poses, occlusions, and illuminations.

Details ...

The Comprehensive Cars (CompCars) Dataset

The dataset is used in our CVPR paper. The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The web-nature data contains 163 car makes with 1,716 car models. There are a total of 136,726 images capturing the entire cars and 27,618 images capturing the car parts. The full car images are labeled with bounding boxes and viewpoints. Each car model is labeled with five attributes, including maximum speed, displacement, number of doors, number of seats, and type of car. The surveillance-nature data contains 50,000 car images captured in the front view.

Details ...

Multi-Task Facial Landmark (MTFL) Dataset

The dataset is used in our ECCV paper for training a multi-task deep model of facial landmark detection. It consists 12,995 face images, each of which is annotated with bounding box and five landmarks, i.e. centers of the eyes, nose, corners of the mouth. In addition, it includes related tasks annotations, including 'smiling', 'wearing glasses', 'gender', and 'head pose'.

Details ...

PEdesTrian Attribute (PETA) Dataset

The dataset is by far the largest of its kind, covering more than 60 attributes on 19000 images. In comparison with existing datasets, PETA is more diverse and challenging in terms of imagery variations and complexity.

Details ...

CUHK Image Cropping Dataset

This dataset was used in this paper, which presents an approach for automatic image cropping.

Details ...

CUHK Crowd Dataset

474 video clips from 215 crowded scenes, with ground truth on group detection and video classes.

Details ...

TImes Square Intersection (TISI) Dataset

A busy outdoor dataset for research on visual surveillance.

Details ...

Educational Resource Centre (ERCe) Dataset

An indoor dataset collected from a university campus for physical event understanding of long video streams.

Details ...

UnderGround Re-IDentification (GRID) Dataset

The QMUL underGround Re-IDentification (GRID) dataset contains 250 pedestrian image pairs. Each pair contains two images of the same individual seen from different camera views. In addition, there are 775 extra individual images that do not belong to any of the paired images. All images are captured from 8 disjoint camera views installed in a busy underground station. The dataset is challenging due to variations of pose, colours, lighting changes; as well as poor image quality caused by low spatial resolution.

Details ...

Mall Dataset

A dataset collected from a publicly accessible webcam for crowd counting and profiling research. Over 60,000 pedestrians were labelled in 2000 video frames.

Details ...

QMUL Junction Dataset

A busy traffic dataset for research on activity analysis and behaviour understanding.

Details ...


Other datasets
  1. Keystroke100 Dataset