This page describes the download address and the description of individual files.

Package List:

In the VIREO_WEB_VIDEO dataset, it includes the following files:

Description of individual files

1. Video_List.txt

Video_List.txt is organized in the form, separated by <tab> key. Each line corresponds to one video:

            <VideoID>      <QuueryID>      <Source>      <VideoName>      <URL>
e.g. a typical item is in this way:
                  37                    1                  YouTube         1_38_Y.flv

<VideoID>: VideoID is in the range of 1~13129.
<QueryID>: There are totally 24 queries, corresponding to the ID in Table 1: 1~24
<Source>: "YouTube"/"Google"/"Yahoo", for abbreviation: Y/G/H
<VideoName>: It is the video name stored in the hard driver, which is named in the form QueryID_SubVideoID_Source.format. SubVideoID is query-dependent. For each query, it will restart from 1.
<URL>: The video url address in the webs

2. Video_Complete.txt

Video_Complete.txt is the complete version of the video info, separated by <tab> key. It also includes the title, tags, time duration, and so on. Each line corresponds to one video:

            <VideoID>      <QueryID>      <Source>      <VideoName>      <Duration>      <Format>      <Title>      <Duplicated>      <URL>      <Category>      <Tags>      <Description>     <Author>
e.g.:           37                   1               "YouTube"       "1_38_Y.flv"             "01:03"              "flv"  "The Lion Sleeps Tonight"         0           ""       "Arts & Animation"      "lion sleeps tonight animation kids disney pixar dreamworks tekenfilm"        "The Lion Sleeps Tonight"        "howthewestwaswon2"

<Duration>: The video duration, in the form of mm:ss.
<Title>: The title of the video
<Duplicated>: Currently this field hasn't been used, which is set to 0
<Tags>: The user provided tags
<Category>: The category info of the video.
<Description>: The text description of the video
: The user name that uploaded the video.

3. Shot_Info.txt

Shot_Info.txt is organized in the form, separated by <tab> key. Each line corresponds to one keyframe:

        <SerialID>      <KeyframeName>      <VideoID>      <VideoName>
e.g.         6                      37_6_RKF                  37                  1_38_Y

<SerialID>: SerialID is the serial number of the keyframe in this video, starting from 1 for each video.
<KeyframeName>: It is the keyframe name, in the form of VideoID_SerialID_RKF.
<VideoID>, <VIdeoName>: They have the same meaning with Video_List.txt

4. Seed.txt

Seed.txt is the seed video for each query. For near-duplicate video retrieval, the most popular video was selected as the seed video for each query. Each line corresponds to one query.

        <QueryID>      <SeedVideoID>
e.g.         2                         815

5. Ground Truth Files

It refers to two tasks: one is novelty re-ranking, the other is near-duplicate retrieval. Therefore, there are two types of ground truth files. For details, please refer to our ACM MM'07 paper.

  • Novelty Re-Ranking:   GT\ RANK_i.rst (i = 1~24 corresponding to QueryID)

In each file, it lists the top 30/50 novel videos after removing the near-duplicate videos.

  • Near-duplicate retrieval:   GT\ GT_i.rst (i = 1~24 corresponding to QueryID)

It consists of two items: <VideoID>      <Status>. The <status> denotes the similarity of the VideoID to the seed video. The detailed meaning of <Status> is listed as follows:

Table 1. The meaning of status




Exactly duplicate


Similar video


Different version


Major change


Long version


Dissimilar video


Video does not exist

6. Videos

Videos are stored under the directory /webvideo/videos/ of VIREO server ( They are organized according to QueryID, and in the form QueryID/VideoName. Please follow the format to download the videos.



7. Keyframes

Keyframes are stored under the directory /webvideo/Keyframes/ of VIREO server ( They are organized according to VideoID divide by 100. That is, the keyframes of every 100 videos form one directory, starting from 0. For example, the keyframes of video 3412 are stored in the directory of Keyframes/34/ (3412/100 = 34). Totally, there are 131 directories to store the keyframes since 13129/100 = 131. Please follow the format to obtain the keyframes.

        where KID = VideoID/100      


8. Thumbnail Images

Each video has one thumbnail image. They are organized by QueryID. Thumbnail images are zipped.  



  • Xiao Wu, Alexander G. Hauptmann and Chong-Wah Ngo
    Practical Elimination of Near-Duplicates from Web Video Search
    ACM International Conference on Multimedia (ACM MM'07), Augsburg, Germany, Sep. 2007, pp. 218-227,  (oral).
    Full Text: [PDF   3.79M]
  • Xiao Wu, Chong-Wah Ngo, Alexander G. Hauptmann and Hung-Khoon Tan
    Real-Time Near-Duplicate Elimination for Web Video Search with Content and Context
    IEEE Transactions on Multimedia, volume 11, issue 2, pp. 196-207, February 2009.
    Full Text: [PDF   1.3 M]