Overview:

Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this work, we introduce and release a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75,073 Internet videos of over 4,000 hours, covering 2,427 celebrities and 649,001 faces. This is to our knowledge the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques.
The WebV-Cele dataset is created by mining the names and faces on top of the MCG-WEBV - a real world Internet Video dataset released a few years earlier [1]. It releases celebrity names, faces, features including: (1) 2,427 celebrity names and their associated videos. Each name has at least ten occurences in the whole MCG-WEBV dataset, while the videos containing at least one celebrity name consists a video repository of 75,073 videos; (2) 649,001 face responses on 570,931 keyframes from the 75,073 videos; (3) 1937-D pixel-wised signature and 1664-D SIFT signature from each face. Both signatures are extracted from 13 facial regions including the left, middle and right corners of each eye, a point between the eyes, the two nostrils ajd the tip of the nose, and the left, middle and right corners of the mouth; (4) six types of low-level visual properties extracted from the head and upper body (if existed) associates with each face. The two regions are infered based on size and location of the face. The six properties includes 166-D color histogram, 166-D color correlogram, 225-D color moments, 96-D co-occurrence texture, 108-D wavlet texture grid, and 320-D edge histogram; (5) groundtruth labels on a subset of the dataset containing 3,194 videos, which includes 42,118 faces and are labeled against 144 celebrity names.

Implementation Details:

The WebV-Cele dataset is created on top of the MCG-WEBV, which is composed of 248,887 Internet videos crawled from YouTube. The videos in MCG-WEBV have been decomposed into shots, and more than five millions of keyframes were extracted to represent these shots. Based on metadata (titles and tags) surrounding the videos and the keyframes, the Wikipedia-based name entity extraction method [2] and commercial frontal face detector developed by the ISVision company [3] are employed to detect extract names and faces, respectively. As a result, a total of 209,001 name occurrences and 1,556,265 face responses are extracted, respectively.
There are 2,427 names appear at least ten times in the MCG-WEBV. These names are defined as celebrity names in this work. The celebrity names associate with a total of 75,073 unique videos with 649,001 faces. Thus, the WebV-Cele dataset consists of 75,073 videos, 2,427 celebrity names and 649,001 faces.
We extract 3,194 representative videos out of the WebV-Cele (these videos come from CoreData of the MCG-WEBV, while the rest videos come from ExpandedData of the MCG-WEBV), which includes 42,118 faces against 144 celebrity names for further analysis. The figure below depicts the 144 celebrities. In that figure, the celebrities are ranked in descending order of name frequency from left to right and top to bottom. The bounding box indicates professions: blue: Internet star, green: Artist, red: Politician, gray: Sportsmen, dark red: Journalist.



It is notice that the celebrities not only are highly correlated to hot news events during Dec. 2008 to Nov. 2009 (crawling period of the MCG-WEBV), they also could be grouped by their social networks. Therefore, we attempt to discover communities based on the celebrities. The mining starts by quantifying the pairwise relationships between the celebrities to form a sparse graph, and then employing the Walktrap algorithm [4] to discover the communities by partitioning the graph. There are 12 found communities, of size as small as 4 persons to as large as 26 persons, depicted in the figure below. From our analysis. these communities can be linked to both hot topics and celebrities' professions. For example, as shown in Figure 8, the community highlighted by a black dotted circle is a set of famous football stars, while the celebrities in communities highlighted by red and blue dotted circles are respectively about the judges and contestants in "Britain's Got Talent", and the actors and the original author of the movie "Twilight".



To obtain accurate labels for a part of the dataset, the 3,194 videos are manually labeled to name a total of 42,118 faces against 144 celebrity names. This process generates ground truth labels for 75,817 name-face pairs, in which 19,216 pairs have been labels as correct name-face association. Based on the annotation, we provide the baseline results on name-face association using five techniques including Weak Association (WA), SVM Classification (SVM), Multiple Instance Learning (MIL), Graph-based Clustering (GC) and Image matching (IM). The benchmark results show that the performance of name-face association in Web videos could be boosted by analyzing visual features of faces.

Download:

  1. Celebrity names, associated video IDs, face results, community results, and ground truth:
    The above items are lightweight, which can be directly downloaded from here
  2. Keyframes and visual features:
    The above items are large and heavyweight. If you are interested in downloading them, please send an email to zhineng.chen@ia.ac.cn to specify the items you need and your name and affiliation. We will send you instructions via email.
For any questions regarding WebV-Cele dataset, please contact Dr. Zhineng Chen (zhineng.chen@ia.ac.cn)

Citation:

Please cite the following paper when using WebV-Cele:
  • Zhineng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, Yu-Gang Jiang
    Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues
    Journal of Computer Science and Technology, vol.29, no.5, pp. 785-798, 2014
    [PDF][Bibtex]

References:

[1] Cao J, Zhang Y D, Song Y C, et al.. MCG-WEBV: A benchmark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, 2009.
[2] Chen Z, Cao J, Xia T, Song Y C, et al.. Web video retagging. Multimed. Tools and Appl. 55(1): pp. 53-82, 2011.
[3] http://www.isvision.com/cn/index
[4] Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. of 20th Int. ISCIS, 2005, Oct, pp.284-293.