This dataset is used in our ACM Multimedia 12 paper for Multimodal Question Answering (MQA), including 52 queries, 438 relevant images, and over 1M distracting images with metadata.
- 438 images, grouped in 52 instances
- With viewpoint change, different background, non-planar, and non-rigid transformations
air_force_one arduino_ng balloon barbie_cake beef_broccoli bruce_lee_statue bumblebee carlsberg casio catching_fire chamomile chicken ck_billboard cmc coca_cola come_to_dark_side_shirt crabapple crop_circle_swirl dalmatian_dog dogwood eminem_tattoo fried_chips kony_2012 lee_kum_kee_sauce leopard_boots logitech_c910 lv_neverfull lv_speedy maotai monarch_butterfly nike_dunk_bears nike_twitter nikon_d700 panda pineapple pirate_ninja_alliance_shirt pisa_tower pizza porsche pyramid_gita rafflesia starbuck statue_of_liberty transformers_cheetor tshirt_che tshirt_superman us_flag porsche v_mask wall_street_bull water_melon wii_mote zebra
- 52 instances, 8 categories
- Wide range of real life instances
- Distributions of instances over categories:
- Over 1 million images crawled from Flickr by searching 140+ popular tags
- Local features (DOG + SIFT) and Metadata (title, description, tag) are available
This dataset is only for non-commercial research and/or educational purposes. To obtain this dataset, you have to fully agree on the following terms and conditions with complete understanding:
- I understand that the copy right of images & corresponding metadata in the dataset fully belongs to their owners. In no event, shall City University of Hong Kong be liable for any incidents, or damages caused by the direct or indirect usage of the dataset by requesting researchers.
- The dataset should be only used for non-commercial research and/or educational purposes.
- City University of Hong Kong makes no representations or warranties regarding the dataset, including but not limited to warranties of non-infringement, merchantability or fitness for a particular purpose.
- Researcher shall defend and indemnify City University of Hong Kong, including its employees, trustees and officers, and agents, against any claims arising from Researcher's use of the dataset.
- Researcher may provide research associates and colleagues with access to the dataset provided that they have also agreed to be bound by the terms and conditions stated in this agreement.
- An electronic document, such as email, containing the signed form, from requesting researcher is regarded as an electronic signature on the form, which has the same legal effect as a hardcopy signature.
- City University of Hong Kong reserves the right to terminate access to the dataset at any time.
Download The dataset can be obtained via sending a request email to us. Specifically, the researchers interested in the dataset should sign the Agreement and Disclaimer Form, and Email to us. We will send you instructions via email to download the dataset at our discretion.
- 52 queries and relevant images 125MB
- Distracting Images 183GB
- Local features files 164GB
- Metadata 335MB
W. Zhang, L. Pang and C. W. Ngo. Snap-and-Ask: Answering Multimodal Question by Naming Visual Instance. ACM Multimedia (ACM MM), 2012.