Introduction:

With the exponential growth of social media in Web 2.0, the huge volume of videos being transmitted and searched on the Internet has increased tremendously. Users can capture videos by mobile phones, video camcorders, or directly obtain videos from the web, and then distribute them again with some modifications. For example, users upload 65,000 new videos each day on video sharing website YouTube and the daily video views were over 100 million in July 2006 [2]. Among these huge volumes of videos, there exist large numbers of duplicate and near-duplicate videos.

Based on a sample of 24 popular queries from YouTube [5], Google Video [1] and Yahoo! Video [4], on average there are 27% redundant videos that are duplicate or nearly duplicate to the most popular version of a video in the search results [3]. For certain queries, the redundancy can be as high as 93% (see Table I). As a consequence, users are often frustrated when they need to spend significant amount of time to find the videos of interest, having to go through different versions of duplicate or near-duplicate videos streamed over the Internet before arriving at an interesting video. An ideal solution would be to return a list which not only maximizes precision with respect to the query, but also novelty (or diversity) of the query topic. To avoid getting overwhelmed by a large number of repeating copies of the same video in any search, efficient near-duplicate video detection and elimination is essential for effective search, retrieval, and browsing.

This work was cooperated by VIREO group from City University of Hong Kong, and Informedia group from Carnegie Mellon University. The dataset is called CC_WEB_VIDEO, named by the initials of City University of Hong Kong and Carnegie Mellon University, and which was collected from the web video sharing web site YouTube and video search engines Google Video and Yahoo! Video.

Furthermore, the social web provides much more than a platform for users to interact and exchange information. This has resulted in the rich sets of context information associated with web videos. These context resources provide complementary information to the video content itself. In this dataset, in addition to the video itself, the contextual information, such as thumbnail images, tags, titles, and time durations, is also provided.

Near-Duplicate Web Videos

Definition: Near-duplicate web videos are identical or approximately identical videos close to the exact duplicate of each other, but different in file formats, encoding parameters, photometric variations (color, lighting changes), editing operations (caption, logo and border insertion), different lengths, and certain modifications (frames add/remove). A user would clearly identify the videos as "essentially the same".

A video is a duplicate of another, if it looks the same, corresponds to approximately the same scene, and does not contain new and important information. Two videos do not have to be pixel identical to be considered duplicates. A user searching for entertaining video content on the web, might care about the overall content and subjective impression when filtering near-duplicate videos for more effective search. Exact duplicate videos are a special case of near-duplicate videos. A couple of near-duplicate web videos are shown in Figure 1 and 2.

Near-duplicate web videos can be mainly categorized into two classes:

1. Formatting differences

  • lEncoding format: flv, wmv, avi, mpg, mp4, ram ...
  • lFrame rate: 15fps, 25fps, 29.97fps ...
  • lBit rate: 529kbps, 819kbps ...
  • lFrame resolution: 174x144, 320x240, 240x320 ...

2. Content differences

  • lPhotometric variations: color / lighting change.
  • lEditing: logo insertion, adding borders around frames, superposition of overlay text.
  • lContent modification: adding unrelated frames with different content

Figure 1. Keyframe sequence of near-duplicate videos with different variations (each row corresponds to one video). (a) is the standard version (b) brightness and resolution change (c) frame rate change (d) adding overlay text, borders and content modification at the end (e, f) content modification at beginning and end (g) longer version with borders (h) resolution differences

Figure 2. Two videos of complex scene query "White and Nerdy" with complex transformations (only the first ten keyframes are displayed): logo insertion, geometric and photometric variations (lighting change, black border), and keyframes added/removed

Dataset

We selected 24 queries designed to retrieve the most viewed and top favorite videos from YouTube. Each text query was issued to YouTube, Google Video, and Yahoo! Video respectively. The videos were collected in November, 2006. Videos with time duration over 10 minutes were removed from the dataset. The final data set consists of 12,790 videos. It forms the final dataset. The query information and the number of near-duplicates to the dominant version (the video most frequently appearing in the results) are listed in Table 1. For example, there are 1,771 videos in query 15 “White and Nerdy”, and among them there are 696 near-duplicates of the most common version in the result lists. Shot boundaries were detected and each shot was represented by a keyframe. In total there are 398,015 keyframes in the set.

Table 1. 24 Video Queries Collected from YouTube, Google Video and Yahoo! Video (#: number of videos)

Queries

Near-Duplicate

ID

Query

#

#

%

1

The lion sleeps tonight

792

334

42 %

2

Evolution of dance

483

122

25 %

3

Fold shirt

436

183

42 %

4

Cat massage

344

161

47 %

5

Ok go here it goes again

396

89

 22 %

6

Urban ninja

771

45

6 %

7

Real life Simpsons

365

154

42 %

8

Free hugs

539

37

7 %

9

Where the hell is Matt

235

23

10 %

10

U2 and green day

297

52

18 %

11

Little superstar

377

59

16 %

12

Napoleon dynamite dance

881

146

17 %

13

I will survive Jesus

416

387

93 %

14

Ronaldinho ping pong

107

72

67 %

15

White and Nerdy

1771

696

39 %

16

Korean karaoke

205

20

10 %

17

Panic at the disco I write sins not tragedies

647

201

31 %

18

Bus uncle (巴士阿叔)

488

80

16 %

19

Sony Bravia

566

202

36 %

20

Changes Tupac

194

72

37 %

21

Afternoon delight

449

54

12 %

22

Numa Gary

422

32

8 %

23

Shakira hips don’t lie

1322

234

18 %

24

India driving

287

26

9 %

Total

12790

3481

27 %


Figure 3. Representative Video Samples for 24 Queries

Package:

In the VIREO_WEB_VIDEO dataset, it includes the following files:

  • Video_List.txt
  • Video_Complete.txt
  • Shot_Info.txt
  • Seed.txt
  • Ground Truth Files
  • Videos Links (size: 85G)
  • Keyframes (size: 4.5G)
  • Thumbnail Images (size: 53M)

Users:

Over 60 universities/institutes from 16 countries/regions are using this dataset.

Agreement and Download:

Note: This dataset is only for non-commercial research and/or educational purposes. To obtain this dataset, you have to fully agree on the following terms and conditions with complete understanding:

  1. I understand that the copy right of videos in the dataset fully belongs to their owners. In no event, shall City University of Hong Kong and Carnegie Mellon University be liable for any incidents, or damages caused by the direct or indirect usage of the dataset by requesting researchers.
  2. The dataset should be only used for non-commercial research and/or educational purposes.
  3. City University of Hong Kong and Carnegie Mellon University make no representations or warranties regarding the dataset, including but not limited to warranties of non-infringement, merchantability or fitness for a particular purpose.
  4. Researcher shall defend and indemnify City University of Hong Kong and Carnegie Mellon University, including its employees, trustees and officers, and agents, against any claims arising from Researcher's use of the dataset.
  5. Researcher may provide research associates and colleagues with access to the dataset provided that they have also agreed to be bound by the terms and conditions stated in this agreement.
  6. An electronic document, such as email, containing the signed form, from requesting researcher is regarded as an electronic signature on the form, which has the same legal effect as a hardcopy signature.
  7. City University of Hong Kong and Carnegie Mellon University reserve the right to terminate access to the dataset at any time.

The video dataset can be obtained via sending a request email to us. Specifically, the researchers interested in the dataset should download, fill out, scan, and sign the Agreement and Disclaimer Form, and send it back to us (mail to: wuxiaohk@gmail.com). We will send you instructions via email to download the dataset at our discretion.

Download VIREO_WEB_VIDEO Web Video Dataset:

 

Citation:

  • Xiao Wu, Alexander G. Hauptmann and Chong-Wah Ngo
    Practical Elimination of Near-Duplicates from Web Video Search
    ACM International Conference on Multimedia (ACM MM'07), Augsburg, Germany, Sep. 2007, pp. 218-227,  (oral).
    Full Text: [PDF   3.79M]
  • Xiao Wu, Chong-Wah Ngo, Alexander G. Hauptmann and Hung-Khoon Tan
    Real-Time Near-Duplicate Elimination for Web Video Search with Content and Context
    IEEE Transactions on Multimedia, volume 11, issue 2, pp. 196-207, February 2009.
    Full Text: [PDF   1.3 M]

References:

[1] Google Video. Available: http://video.google.com.
[2]
Wikipedia. http://en.wikipedia.org/wiki/Youtube.
[3] X. Wu, A. G. Hauptmann, C.-W. Ngo. Practical Elimination of Near-Duplicate from Web Video Search. ACM International Conference on Multimedia, Augsburg, Germany, Sep. 2007, pp. 218-227. (Oral)
[4] Yahoo! Video. Available: http://video.yahoo.com.

[5]
YouTube. Available: http://www.youtube.com.