CC_WEB_VIDEO: Near-Duplicate Web Video Dataset
Xiao Wu+, Chong-Wah Ngo+ and Alexander G. Hauptmann#
+Department of Computer Science, City University of Hong Kong
#School of Computer Science, Carnegie Mellon University
This page describes the download address and the description of individual files.
In the VIREO_WEB_VIDEO dataset, it includes the following files:
Description of individual files
1. Video_List.txt http://vireo.cs.cityu.edu.hk/webvideo/Info/Video_List.txt
Video_List.txt is organized in the form, separated by <tab> key. Each line corresponds to one video:
<VideoID>: VideoID is in the range of 1~13129.
2. Video_Complete.txt http://vireo.cs.cityu.edu.hk/webvideo/Info/Video_Complete.txt
Video_Complete.txt is the complete version of the video info, separated by <tab> key. It also includes the title, tags, time duration, and so on. Each line corresponds to one video:
<Duration>: The video duration, in the form of mm:ss.
3. Shot_Info.txt http://vireo.cs.cityu.edu.hk/webvideo/Info/Shot_Info.txt
Shot_Info.txt is organized in the form, separated by <tab> key. Each line corresponds to one keyframe:
<SerialID>: SerialID is the serial number of the
keyframe in this video, starting from 1 for each video.
Seed.txt is the seed video for each query. For near-duplicate video retrieval, the most popular video was selected as the seed video for each query. Each line corresponds to one query.
5. Ground Truth Files http://vireo.cs.cityu.edu.hk/webvideo/Info/Ground.zip
It refers to two tasks: one is novelty re-ranking, the other is near-duplicate retrieval. Therefore, there are two types of ground truth files. For details, please refer to our ACM MM'07 paper.
In each file, it lists the top 30/50 novel videos after removing the near-duplicate videos.
It consists of two items: <VideoID> <Status>. The <status> denotes the similarity of the VideoID to the seed video. The detailed meaning of <Status> is listed as follows:
Table 1. The meaning of status
Videos are stored under the directory /webvideo/videos/ of VIREO server (http://vireo.cs.cityu.edu.hk). They are organized according to QueryID, and in the form QueryID/VideoName. Please follow the format to download the videos.
Keyframes are stored under the directory /webvideo/Keyframes/ of VIREO server (http://vireo.cs.cityu.edu.hk). They are organized according to VideoID divide by 100. That is, the keyframes of every 100 videos form one directory, starting from 0. For example, the keyframes of video 3412 are stored in the directory of Keyframes/34/ (3412/100 = 34). Totally, there are 131 directories to store the keyframes since 13129/100 = 131. Please follow the format to obtain the keyframes.
http://vireo.cs.cityu.edu.hk/webvideo/Keyframes/KID/KeyframeName where KID = VideoID/100
8. Thumbnail Images http://vireo.cs.cityu.edu.hk/webvideo/Info/Thumb.zip
Each video has one thumbnail image. They are organized by QueryID. Thumbnail images are zipped.