MQA dataset

Instance Search dataset for MQA

This dataset is used in our ACM Multimedia 12 paper for Multimodal Question Answering (MQA), including 52 queries, 438 relevant images, and over 1M distracting images with metadata.


- 438 images, grouped in 52 instances
- With viewpoint change, different background, non-planar, and non-rigid transformations


- 52 instances, 8 categories
- Wide range of real life instances
- Distributions of instances over categories:

Distracting images & features

- Over 1 million images crawled from Flickr by searching 140+ popular tags
- Local features (DOG + SIFT) and Metadata (title, description, tag) are available

Agreement and Download

This dataset is only for non-commercial research and/or educational purposes. To obtain this dataset, you have to fully agree on the following terms and conditions with complete understanding:

  1. I understand that the copy right of images & corresponding metadata in the dataset fully belongs to their owners. In no event, shall City University of Hong Kong be liable for any incidents, or damages caused by the direct or indirect usage of the dataset by requesting researchers.
  2. The dataset should be only used for non-commercial research and/or educational purposes.
  3. City University of Hong Kong makes no representations or warranties regarding the dataset, including but not limited to warranties of non-infringement, merchantability or fitness for a particular purpose.
  4. Researcher shall defend and indemnify City University of Hong Kong, including its employees, trustees and officers, and agents, against any claims arising from Researcher's use of the dataset.
  5. Researcher may provide research associates and colleagues with access to the dataset provided that they have also agreed to be bound by the terms and conditions stated in this agreement.
  6. An electronic document, such as email, containing the signed form, from requesting researcher is regarded as an electronic signature on the form, which has the same legal effect as a hardcopy signature.
  7. City University of Hong Kong reserves the right to terminate access to the dataset at any time.

Download The dataset can be obtained via sending a request email to us. Specifically, the researchers interested in the dataset should sign the Agreement and Disclaimer Form, and Email to us. We will send you instructions via email to download the dataset at our discretion.

Package list
  • 52 queries and relevant images 125MB
  • Distracting Images 183GB
  • Local features files 164GB
  • Metadata 335MB


W. Zhang, L. Pang and C. W. Ngo. Snap-and-Ask: Answering Multimodal Question by Naming Visual Instance. ACM Multimedia (ACM MM), 2012.