Today's digital contents are inherently multimedia: text, image, audio, video etc., due to the advancement of multimodal sensors. Image and video contents, in particular, become a new way of communication among Internet users with the proliferation of sensor-rich mobile devices. Accelerated by tremendous increase in Internet bandwidth and storage space, multimedia data has been generated, published and spread explosively, becoming an indispensable part of today's big data. Such large-scale multimedia data has opened challenges and opportunities for intelligent multimedia analysis, e.g., management, retrieval, recognition, categorization and visualization. Meanwhile, with the recent advances in deep learning techniques, we are now able to boost the intelligence of multimedia analysis significantly and initiate new research directions to analyze multimedia content. For instance, convolutional neural networks have demonstrated high capability in image and video recognition, while recurrent neural networks are widely exploited in modeling temporal dynamics in videos. Therefore, deep learning for intelligent multimedia analysis is becoming an emerging research area in the field of multimedia and computer vision.
The goal of this workshop is to call for a coordinated effort to understand the scenarios and challenges emerging in multimedia analysis with deep learning techniques, identify key tasks and evaluate the state of the art, showcase innovative methodologies and ideas, introduce large scale real systems or applications, as well as propose new real-world datasets and discuss future directions. The multimedia data of interest cover a wide spectrum, ranging from text, audio, image, click-through log, Web videos to surveillance videos. We solicit manuscripts in all fields of multimedia analysis that explores the synergy of multimedia understanding and deep learning techniques.