Today's digital contents are inherently multimedia: text, image, audio, video etc., due to the advancement of multimodal sensors. Image and video contents, in particular, become a new way of communication among Internet users with the proliferation of sensor-rich mobile devices. Accelerated by tremendous increase in Internet bandwidth and storage space, multimedia data has been generated, published and spread explosively, becoming an indispensable part of today's big data. Such large-scale multimedia data has opened challenges and opportunities for intelligent multimedia analysis, e.g., management, retrieval, recognition, categorization and visualization. Meanwhile, with the recent advances in deep learning techniques, we are now able to boost the intelligence of multimedia analysis significantly and initiate new research directions to analyze multimedia content. For instance, convolutional neural networks have demonstrated high capability in image and video recognition, while recurrent neural networks are widely exploited in modeling temporal dynamics in videos. Therefore, deep learning for intelligent multimedia analysis is becoming an emerging research area in the field of multimedia and computer vision.

The goal of this workshop is to call for a coordinated effort to understand the scenarios and challenges emerging in multimedia analysis with deep learning techniques, identify key tasks and evaluate the state of the art, showcase innovative methodologies and ideas, introduce large scale real systems or applications, as well as propose new real-world datasets and discuss future directions. The multimedia data of interest cover a wide spectrum, ranging from text, audio, image, click-through log, Web videos to surveillance videos. We solicit manuscripts in all fields of multimedia analysis that explores the synergy of multimedia understanding and deep learning techniques.

Topics of Interest

The workshop will offer a timely collection of research updates to benefit the researchers and practitioners working in the broad fields ranging from computer vision, multimedia to machine learning. To this end, we solicit original research and survey papers addressing the topics listed below (but not limited to):

  • Multimedia Retrieval (image search, video search, speech/audio search, music search, retrieval models, learning to rank, hashing).
  • Web IR and Social Media (link analysis, click models, user behavioral mining, social tagging, social network analysis, community-based QA).
  • Deep image/video understanding (object detection and recognition, localization, summarization, highlight detection, action recognition, multimedia event detection and recounting, semantic segmentation, tracking).
  • Vision and language (image/video captioning, visual Q&A, image/video commenting, storytelling).
  • Multimedia data browsing, visualization, clustering and knowledge discovery.
  • Home/public video surveillance analysis (motion detection and classification, scene understanding, event detection and recognition, people analysis, object tracking and segmentation, human computer/robot interaction, behavior recognition, crowd analysis).
  • Multimedia-based security and privacy analysis.
  • Data collections, benchmarking, and performance evaluation.
  • Other applications of large-scale multimedia data.

Important Dates

Paper Submission March 3, 2017   March 13, 2017
Notification of acceptance: April 07, 2017
Camera-ready submission: April 19, 2017

Submission Guideline

Paper Format & Page Limit: 6 pages,  see details
Submission: CMT *

*: Please choose "Deep Learning for Intelligent Multimedia Analytics"

DeLIMMA Technical Program

Date: July 14, 2017

Venue: TBD

Oral 1: Detection and Localization 8:30 - 8:50 AM PBG-NET: Object Detection with a Multi-feature and Iterative CNN Model
Yingxin Lou (Beijing University of Posts and Telecommunications, China) 
Guangtao Fu (Academy of Broadcasting Science, China) 
Zhuqing Jiang (Beijing University of Posts and Telecommunications, China)
Aidong Men (Beijing University of Posts and Telecommunications, China)
Yun Zhou (Academy of Broadcasting Science, China)
8:50 - 9:10 AM Locally Optimal Detection of Adversarial Inputs to Image Classifiers
Pierre Moulin (University of Illinois, USA)
Amish Goel (University of Illinois, USA)
9:10 - 9:30 AM Spatiotemporal Utilization of Deep Features for Video Saliency Detection
Trung-Nghia Le (National Institute of Informatics & SOKENDAI, Japan)
Akihiro Sugimoto (National Institute of Informatics, Japan) 
9:30 - 9:50 AM Hierarchical Pedestrian Attribute Recognition Based on Adaptive Region Localization
Chunfeng Yao (Huawei Technologies, China)
Bailan Feng (Huawei Technologies, China)
Defeng Li (Huawei Technologies, China)
Jian li (Huawei Technologies, China)
Keynote Session 1 09:50 - 10:30 AM Unsupervised Incremental Learning of Deep Descriptors from Video Streams

Alberto Del Bimbo
University of Florence, Italy
Bio: Prof. Del Bimbo is Full Professor at the Department of Information Engineering of University of Firenze, where he serves as Director of MICC-Media Integration and Communication Center. He was President of the Foundation for Research and Innovation, Deputy-Rector for Research and Innovation and Director of the Department of Systems and Computer Science. Prof. Del Bimbo leads a research team at the Media Integration and Communication Center investigating cutting-edge solutions in the fields of computer vision, multimedia content analysis, indexing and retrieval, and advanced multimedia and multimodal interactivity. He is the author of over 350 publications that were published in some of the most prestigious journals and conferences. He has been the coordinator of many research and industrial projects at the international and national level.
He provided services to the scientific community having been, among the others, the Program Chair of the Int'l Conferences on Pattern Recognition ICPR 2016, and ICPR 2012, and ACM Multimedia 2008, and the General Chair of the European Conference on Computer Vision ECCV 2012, the ACM Int'l Conference on Multimedia Retrieval ICMR 2011, ACM Multimedia 2010, and IEEE ICMCS 1999, the Int'l Conference on Multimedia Computing & Systems.
Presently, he is the Editor in Chief of ACM TOMM Transactions on Multimedia Computing, Communications, and Applications and Associate Editor of Multimedia Tools and Applications, Pattern Analysis and Applications journals. He was Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and IEEE Transactions on Multimedia and also served as the Guest Editor of many Special Issues in highly ranked journals.
Prof. Del Bimbo is IAPR Fellow and the recipient of the 2016 ACM SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications.
Abstract: We present a novel unsupervised method for face identity learning from video sequences. The method exploits the ResNet deep network for face detection and VGGface fc7 face descriptors together with a smart learning mechanism that exploits the temporal coherence of visual data in video streams. We present a novel feature matching solution based on Reverse Nearest Neighbour and a feature forgetting strategy that supports incremental learning with memory size control, while time progresses. It is shown that the proposed learning procedure is asymptotically stable and can be effectively applied to relevant applications like multiple face tracking.
  10:30 - 11:00 AM Coffee break
Oral 2:Search and Applications 11:00 - 11:20 AM Image Search Re-ranking with an Improved Visualrank Algorithm and Multi-layer DCNN Features
Ai Wei (University of Science and Technology of China, China)
Xinmei Tian (University of Science and Technology of China, China) 
11:20 - 11:40 AM Analysis and Prediction of "Yuru-chara" Mascot Popularity Using Visual and Auditory Features
Yuri Nakasato (Tokyo University of Agriculture and Technology, Japan) 
Toshihisa Tanaka (Tokyo University of Agriculture and Technology, Japan) 
11:40 - 12:00 AM Why My Photos Look Sideways or Upside Down? Detecting Canonical Orientation of Images using Convolutional Neural Networks
Kunal Swami (SamsungR&D Institute-Bangalore, India) 
Pranav Deshpande (SamsungR&D Institute-Bangalore, India) 
Gaurav Khandelwal (SamsungR&D Institute-Bangalore, India) 
Ajay Vijayvargiya (SamsungR&D Institute-Bangalore, India) 
12:00 - 12:20 PM Learning Spatial-temporal Consistent Correlation Filter for Visual Tracking
Han Lou (Beijing University of Posts and Telecommunications, China) 
Dongfei Wang (Academy of Broadcasting Science, China)
Zhuqing Jiang (Beijing University of Posts and Telecommunications, China)
Aidong Men (Beijing University of Posts and Telecommunications, China)
Yun Zhou (Academy of Broadcasting Science, China) 
12:20 - 12:40 PM Deep Hashing with Mixed Supervised Losses for Image Search
Dawei Liang (Peking University, China)
Ke Yan (Peking University, China)
Wei Zeng (Peking University, China)
Yaowei Wang (Beijing Institute of Technology, China)
Qingsheng Yuan
Xiuguo Bao
Yonghong Tian (Peking University, China) 
  12:40 - 13:40 PM Lunch break
Keynote Session 2 13:40 - 14:20 AM Video Content Analysis with Deep Learning

Yu-Gang Jiang
Fudan University, China
Bio: Yu-Gang Jiang is a Full Professor in School of Computer Science and Vice Director of Shanghai Engineering Research Center for Video Technology and System at Fudan University, China. His Lab for Big Video Data Analytics conducts research on all aspects of extracting high-level information from big video data, such as video event recognition, object/scene recognition and large-scale visual search. He is the lead architect of a few best-performing video analytic systems in worldwide competitions such as the annual U.S. NIST TRECVID evaluation. His work has led to many awards, including "emerging leader in multimedia" award from IBM T.J. Watson Research in 2009, early career faculty award from Intel and China Computer Federation in 2013, the 2014 ACM China Rising Star Award, and the 2015 ACM SIGMM Rising Star Award. He holds a PhD in Computer Science from City University of Hong Kong and spent three years working at Columbia University before joining Fudan in 2011.
Abstract: Nowadays people produce a huge number of videos. Many of them are uploaded to the Internet on social media sites. There is a strong need to develop automatic solutions for analyzing the contents of these videos. Potential applications of such techniques include effective video content management and retrieval, open-source intelligence analysis, etc. In this talk, I will introduce our recent works on video content analysis. I will start by introducing a few recently constructed Internet video datasets. After that I will introduce several recent approaches developed in my group, with a focus on deep learning based methods tailored for video analysis.
14:20 - 15:00 PM Deep Prediction and Understanding of the Real World on Social Media

Wen-Huang Cheng
Academia Sinica, Taiwan
Bio: Wen-Huang Cheng received the B.S. and M.S. degrees in computer science and information engineering from National Taiwan University, Taipei, Taiwan, in 2002 and 2004, respectively, where he received the Ph.D. (Hons.) degree from the Graduate Institute of Networking and Multimedia in 2008. He is an Associate Research Fellow (Associate Professor) with the Research Center for Information Technology Innovation (CITI), Academia Sinica, Taipei, Taiwan, where he is the Founding Leader with the Multimedia Computing Laboratory (MCLab), CITI, and an Associate Research Fellow with a joint appointment in the Institute of Information Science. Before joining Academia Sinica, he was a Principal Researcher with MagicLabs, HTC Corporation, Taoyuan, Taiwan, from 2009 to 2010. His current research interests include multimedia content analysis, multimedia big data, deep learning, computer vision, mobile multimedia computing, social media, and human computer interaction. He has received numerous research awards, including the 2016 Y. Z. Hsu Scientific Paper Award, the 2015-2016 Presidential Achievement Award of Rotary International, the Outstanding Youth Electrical Engineer Award from the Chinese Institute of Electrical Engineering in 2015, the Top 10% Paper Award from the 2015 IEEE International Workshop on Multimedia Signal Processing, the Outstanding Reviewer Award from the 2015 ACM International Conference on Internet Multimedia Computing and Service, the Prize Award of Multimedia Grand Challenge from the 2014 ACM Multimedia Conference, the K. T. Li Young Researcher Award from the ACM Taipei/Taiwan Chapter in 2014, the Outstanding Young Scholar Awards from the Ministry of Science and Technology in 2014 and 2012, the Outstanding Social Youth of Taipei Municipal in 2014, the Best Reviewer Award from the 2013 Pacific-Rim Conference on Multimedia, and the Best Poster Paper Award from the 2012 International Conference on 3D Systems and Applications. He is APSIPA Distinguished Lecturer.
Abstract: People are interested in predicting the future. For example, which films will bomb or who will win the upcoming Grammy awards? Making predictions about the future is not only fun matters but can bring real value to those who correctly predict the course of world events, such as which stocks are the best purchases for short-term gains. Predictive analytics is thus a field that has attracted major attention in both academia and the industry. As social media has become an inseparable part of modern life, there has been increasing interest in research of leveraging and exploiting social media as an information source for inferring rich social facts and knowledge. In this talk, we will start by addressing an interesting and challenging problem in social media research, i.e., predicting social media popularity. We demonstrative the use of deep learning techniques to discover which image posts on social media are the "stars of tomorrow", those will be the most engaging for social media audiences, e.g., receiving the most likes. Also, advanced applications, e.g., street fashion understanding of a city, will be presented.
  15:00 - 15:30 PM Coffee break
Poster Session 15:30 - 16:30 PM MFC: A Multi-scale Fully Convolutional Approach for Visual Instance Retrieval
Jiedong Hao* (Institute of Automation, Chinese Academy of Sciences)
Wei Wang (Institute of Automation, Chinese Academy of Sciences)
Jing Dong (Institute of Automation, Chinese Academy of Sciences)
Tieniu Tan (Institute of Automation, Chinese Academy of Sciences)
Solar Radio Spectrum Classification with LSTM
Xuexin Yu (Chinese Academy of Sciences)
Long Xu* (Chinese Academy of Sciences)
Lin Ma (Tecent AI Lab)
Zhuo Chen (Chinese Academy of Sciences)
Yihua Yan (Chinese Academy of Sciences)
Supervised Deep Quantization for Efficient Image Search
Dongbao Yang* (Shandong University,Weihai) 
Hongtao Xie (Institute of Information Engineering, Chinese Academy of Sciences, China) 
Jian Yin (Shandong University , Weihai) 
yizhi Liu (Hunan University of Science and Technology, Xiangtan) 
Chenggang Yan (Institute of Information and Control,Hangzhou Dianzi University, Hangzhou,China) 
Image Blur Classification and Blur Usefulness Assessment
Mingyuan Fan (Tianjin University)
Rui Huang* (Tianjin University)
Wei Feng (Tianjin University)
jizhou Sun (Tianjin University)
PU-LP: A Novel Approach for Positive and Unlabeled Learning by Label Propagation
Shuangxun Ma (Lanzhou University)
Ruisheng Zhang* (Lanzhou University)
Center Contrastive loss regularized CNN for tracking
Ningning Li* (BUPT)
Yun Zhou (ABS)
Zhuqing Jiang (BUPT)
Xiaoqiang Guo (ABS)
Inception Single Shot MultiBox Detector for object detection
Chengcheng Ning (Nanjing University of Science and Technology)
Huajun Zhou (Nanjing University of Science and Technology)
Yan Song* (Nanjing University of Science and Technology)
Jinhui Tang (njust)
Two-layer Video Fingerprinting Strategy for Near-duplicate Video Detection
Xiushan Nie* (Shandong University)
Weizhen Jing (Shandong University)
Linyuan Ma (Shandong University of Finance and Economics)
Chaoran Cui (Shandong University)
Yilong Yin (Shandong University)
CRF Estimation Based HDR Image Generation Method
Yongqing Huo* (University of Electronic Science and Technology of China)
Xudong Zhang (University of Electronic Science and Technology of China)
Deep Saliency Quality Assessment Network
Liangzhi Tang* (University of Electronic Scie)
Qingbo Wu (University of Electronic Scie)
Wei Li (University of Electronic Scie)
Yinan Liu (University of Electronic Scie)
Frame-Skip Convolutional Neural Networks for Action Recognition
Yinan Liu* (University of Electronic Science and Technology of China)
Qingbo Wu (University of Electronic Science and Technology of China)
Liangzhi Tang (University of Electronic Science and Technology of China)
Deep Hash Learning for Efficient Image Retrieval
Xuchao Lu (Shanghai Jiaotong University)
Li Song* (Shanghai Jiaotong University)
Rong Xie (Shanghai Jiaotong University)
Xiaokang Yang (Shanghai Jiaotong University)
Wenjun Zhang (Shanghai Jiaotong University)