WebV-Cele: A Large-Scale Web Video Celebrities Dataset for Name-Face Association
Zhineng Chen1,2, Chong-Wah Ngo2, Wei Zhang2, Juan Cao3, and Yu-Gang Jiang41 Institute of Automation, Chinese Academy of Sciences
2 Department of Computer Science, City University of Hong Kong
3 Institute of Computing Technology, Chinese Academy of Sciences
4 School of Computer Science, Fudan University
zhineng.chen at ia.ac.cn; cscwngo at cityu.edu.hk; wzhang34-c at my.cityu.edu.hk; caojuan at ict.ac.cn; ygj at fudan.edu.cn
Overview:Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this work, we introduce and release a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75,073 Internet videos of over 4,000 hours, covering 2,427 celebrities and 649,001 faces. This is to our knowledge the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques.
The WebV-Cele dataset is created by mining the names and faces on top of the MCG-WEBV - a real world Internet Video dataset released a few years earlier . It releases celebrity names, faces, features including: (1) 2,427 celebrity names and their associated videos. Each name has at least ten occurences in the whole MCG-WEBV dataset, while the videos containing at least one celebrity name consists a video repository of 75,073 videos; (2) 649,001 face responses on 570,931 keyframes from the 75,073 videos; (3) 1937-D pixel-wised signature and 1664-D SIFT signature from each face. Both signatures are extracted from 13 facial regions including the left, middle and right corners of each eye, a point between the eyes, the two nostrils ajd the tip of the nose, and the left, middle and right corners of the mouth; (4) six types of low-level visual properties extracted from the head and upper body (if existed) associates with each face. The two regions are infered based on size and location of the face. The six properties includes 166-D color histogram, 166-D color correlogram, 225-D color moments, 96-D co-occurrence texture, 108-D wavlet texture grid, and 320-D edge histogram; (5) groundtruth labels on a subset of the dataset containing 3,194 videos, which includes 42,118 faces and are labeled against 144 celebrity names.
Implementation Details:The WebV-Cele dataset is created on top of the MCG-WEBV, which is composed of 248,887 Internet videos crawled from YouTube. The videos in MCG-WEBV have been decomposed into shots, and more than five millions of keyframes were extracted to represent these shots. Based on metadata (titles and tags) surrounding the videos and the keyframes, the Wikipedia-based name entity extraction method  and commercial frontal face detector developed by the ISVision company  are employed to detect extract names and faces, respectively. As a result, a total of 209,001 name occurrences and 1,556,265 face responses are extracted, respectively.
There are 2,427 names appear at least ten times in the MCG-WEBV. These names are defined as celebrity names in this work. The celebrity names associate with a total of 75,073 unique videos with 649,001 faces. Thus, the WebV-Cele dataset consists of 75,073 videos, 2,427 celebrity names and 649,001 faces.
We extract 3,194 representative videos out of the WebV-Cele (these videos come from CoreData of the MCG-WEBV, while the rest videos come from ExpandedData of the MCG-WEBV), which includes 42,118 faces against 144 celebrity names for further analysis. The figure below depicts the 144 celebrities. In that figure, the celebrities are ranked in descending order of name frequency from left to right and top to bottom. The bounding box indicates professions: blue: Internet star, green: Artist, red: Politician, gray: Sportsmen, dark red: Journalist.
It is notice that the celebrities not only are highly correlated to hot news events during Dec. 2008 to Nov. 2009 (crawling period of the MCG-WEBV), they also could be grouped by their social networks. Therefore, we attempt to discover communities based on the celebrities. The mining starts by quantifying the pairwise relationships between the celebrities to form a sparse graph, and then employing the Walktrap algorithm  to discover the communities by partitioning the graph. There are 12 found communities, of size as small as 4 persons to as large as 26 persons, depicted in the figure below. From our analysis. these communities can be linked to both hot topics and celebrities' professions. For example, as shown in Figure 8, the community highlighted by a black dotted circle is a set of famous football stars, while the celebrities in communities highlighted by red and blue dotted circles are respectively about the judges and contestants in "Britain's Got Talent", and the actors and the original author of the movie "Twilight".
To obtain accurate labels for a part of the dataset, the 3,194 videos are manually labeled to name a total of 42,118 faces against 144 celebrity names. This process generates ground truth labels for 75,817 name-face pairs, in which 19,216 pairs have been labels as correct name-face association. Based on the annotation, we provide the baseline results on name-face association using five techniques including Weak Association (WA), SVM Classification (SVM), Multiple Instance Learning (MIL), Graph-based Clustering (GC) and Image matching (IM). The benchmark results show that the performance of name-face association in Web videos could be boosted by analyzing visual features of faces.
|For any questions regarding WebV-Cele dataset, please contact Dr. Zhineng Chen (email@example.com)|
Citation:Please cite the following paper when using WebV-Cele:
References: Cao J, Zhang Y D, Song Y C, et al.. MCG-WEBV: A benchmark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, 2009.
 Chen Z, Cao J, Xia T, Song Y C, et al.. Web video retagging. Multimed. Tools and Appl. 55(1): pp. 53-82, 2011.
 Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. of 20th Int. ISCIS, 2005, Oct, pp.284-293.