Home | People | Research | Publications | Downloads & Demos | Links

This page is under heavy construction.

Selected Recent Projects:

Domain Adaptive Semantic Diffusion for Context-Based Video Annotation


Learning to cope with domain change has been known as a challenging problem in many real-world applications. We propose a novel and efficient approach, named domain adaptive semantic diffusion (DASD), to exploit semantic context while considering the domain-shift-of-context for large scale video concept annotation. Starting with a large set of concept detectors, the proposed DASD refines the initial annotation results using graph diffusion technique, which preserves the consistency and smoothness of the annotation over a semantic graph. Different from the existing graph learning methods which capture relations among data samples, the semantic graph treats concepts as nodes and the concept affinities as the weights of edges. Particularly, the DASD approach is capable of simultaneously improving the annotation results and adapting the concept affinities to new test data. The adaptation provides a means to handle domain change between training and test data, which occurs very often in video annotation task. We conduct extensive experiments to improve annotation results of 374 concepts over 340 hours of videos from TRECVID 2005-2007 data sets. Results show consistent and significant performance gain over various baselines. In addition, DASD is very efficient, completing diffusion over 374 concepts within just 2 milliseconds for each video shot on a regular PC..

Project Page


Novelty and Redundancy Detection for Cross-Lingual News Stories


An overwhelming volume of news videos from different channels and languages is available today, which demands automatic management of this abundant information. To effectively search, retrieve, browse and track cross-lingual news stories, a news story similarity measure plays a critical role in assessing the novelty and redundancy among them. In this paper, we explore the novelty and redundancy detection with visual duplicates and speech transcripts for cross-lingual news stories. News stories are represented by a sequence of keyframes in the visual track and a set of words extracted from speech transcript in the audio track. Furthermore, the textual features and visual features complement each other for news stories. They can be further combined to boost the performance.

Project Page


Near-Duplicate Web Video Detection


Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the near-duplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or near-duplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.

Project Page

Near-Duplicate Keyframe Retrieval


Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual vocabulary and semantic context respectively. The vocabulary, which provides entries for visual keywords, is formed by the clustering of local keypoints. The semantic context is inferred from the speech transcript surrounding a keyframe. We experiment the usefulness of visual keywords and semantic context, separately and jointly, using cosine similarity and language models. By linearly fusing both modalities, performance improvement is reported compared with the techniques with keypoint matching. While matching suffers from expensive computation due to the need of online nearest neighbor search, our approach is effective and efficient enough for online video search.

Project Page


Threading and Autodocumenting News Videos


News videos constitute a huge volume of daily information. It has become necessary to provide viewers with a concise and chronological view of various news themes through story dependency threading and topical documentary. We aim to present techniques in threading and autodocumenting news stories according to topic themes. Initially, we perform story clustering by exploiting the duality between stories and textual-visual concepts through a coclustering algorithm. The dependency among stories of a topic is tracked by exploring the textual-visual novelty and redundancy of stories. A novel topic structure that chains the dependencies of stories is then presented to facilitate the fast navigation of the news topic. By pruning the peripheral and redundant news stories in the topic structure, a main thread is extracted for autodocumentary.

Project Page


Near-Duplicate Image Detection


Near-duplicate images (NDs) is a group of images similar or nearly duplicate of each other, but appear differently due to variations introduced during acquisition time, lens setting, lighting condition and editing operation. The identification of ND pairs is a useful task for a variety of applications such as news story threading, data-driven image/video annotation and image hyperlinking. In this project, we propose a novel approach for the discovery ND pairs which is robust to various of transformations.

Project Page


Common Pattern Discovery


We proposed a new approach for the discovery of common patterns in multiple images by region matching. The issues in feature robustness, matching robustness and noise artifact are addressed to delve into the potential of using regions as the basic matching unit. We novelly employ the many-to-many (M2M) matching strategy, specifically with the Earth Mover's Distance (EMD), to increase resilience towards the structural inconsistency from improper region segmentation of a pattern as a result of various geometric and photometric transformations. However, the matching pattern of M2M is dispersed and unregulated in nature, leading to the challenges of mining a common pattern while identifying the underlying transformation. To avoid analysis on unregulated matching, we propose monolithic matching for the collaborative mining of common pattern from multiple images. The patterns are refined iteratively using the Expectation-Maximization algorithm by taking advantage of the crowding phenomenon in the EMD flows.

Project Page