Selected Recent Projects:
Domain Adaptive Semantic Diffusion for Context-Based Video Annotation
 |
Learning to cope with domain change has been known as a challenging problem in many real-world applications. We propose a novel and efficient approach, named domain adaptive semantic diffusion (DASD), to exploit semantic context while considering the domain-shift-of-context for large scale video concept annotation. Starting with a large set of concept detectors, the proposed DASD refines the initial annotation results using graph diffusion technique, which preserves the consistency and smoothness of the annotation over a semantic graph. Different from the existing graph learning methods which capture relations among data samples, the semantic graph treats concepts as nodes and the concept affinities as the weights of edges. Particularly, the DASD approach is capable of simultaneously improving the annotation results and adapting the concept affinities to new test data. The adaptation provides a means to handle domain change between training and test data, which occurs very often in video annotation task. We conduct extensive experiments to improve annotation results of 374 concepts over 340 hours of videos from TRECVID 2005-2007 data sets. Results show consistent and significant performance gain over various baselines. In addition, DASD is very efficient, completing diffusion over 374 concepts within just 2 milliseconds for each video shot on a regular PC..
|
Novelty and Redundancy Detection for Cross-Lingual News Stories
 |
An overwhelming volume of news videos from different channels and
languages is available today, which demands automatic management of
this abundant information. To effectively search, retrieve, browse
and track cross-lingual news stories, a news story similarity
measure plays a critical role in assessing the novelty and
redundancy among them. In this paper, we explore the novelty and
redundancy detection with visual duplicates and speech transcripts
for cross-lingual news stories. News stories are represented by a
sequence of keyframes in the visual track and a set of words
extracted from speech transcript in the audio track. Furthermore,
the textual features and visual features complement each other for
news stories. They can be further combined to boost the performance.
|
Near-Duplicate Web Video Detection
|
Current web video search results rely exclusively on text keywords
or user-supplied tags. A search on typical popular video often
returns many duplicate and near-duplicate videos in the top results.
This paper outlines ways to cluster and filter out the
near-duplicate video using a hierarchical approach. Initial triage
is performed using fast signatures derived from color histograms.
Only when a video cannot be clearly classified as novel or
near-duplicate using global signatures, we apply a more expensive
local feature based near-duplicate detection which provides very
accurate duplicate analysis through more costly computation. The
results of 24 queries in a data set of 12,790 videos retrieved from
Google, Yahoo! and YouTube show that this hierarchical approach can
dramatically reduce redundant video displayed to the user in the top
result set, at relatively small computational cost.
|
Near-Duplicate Keyframe Retrieval
|
Near-duplicate keyframes (NDK) play a unique role in large-scale
video search, news topic detection and tracking. In this paper, we
propose a novel NDK retrieval approach by exploring both visual and
textual cues from the visual vocabulary and semantic context
respectively. The vocabulary, which provides entries for visual
keywords, is formed by the clustering of local keypoints. The
semantic context is inferred from the speech transcript surrounding
a keyframe. We experiment the usefulness of visual keywords and
semantic context, separately and jointly, using cosine similarity
and language models. By linearly fusing both modalities, performance
improvement is reported compared with the techniques with keypoint
matching. While matching suffers from expensive computation due to
the need of online nearest neighbor search, our approach is
effective and efficient enough for online video search.
|
Threading and Autodocumenting News Videos
|
News videos constitute a huge volume of daily information.
It has become necessary to provide viewers with a concise and chronological view of various news themes through story dependency threading and topical documentary.
We aim to present techniques in threading and autodocumenting news stories according to topic themes.
Initially, we perform story clustering by exploiting the duality between stories and textual-visual concepts through a coclustering algorithm.
The dependency among stories of a topic is tracked by exploring the textual-visual novelty and redundancy of stories.
A novel topic structure that chains the dependencies of stories is then presented to facilitate the fast navigation of the news topic.
By pruning the peripheral and redundant news stories in the topic structure, a main thread is extracted for autodocumentary.
|
Near-Duplicate Image Detection
|
Near-duplicate images (NDs) is a group of images similar or nearly duplicate of each other,
but appear differently due to variations introduced during acquisition time, lens setting, lighting condition and editing operation.
The identification of ND pairs is a useful task for a variety of applications such as news story threading, data-driven image/video annotation and image hyperlinking.
In this project, we propose a novel approach for the discovery ND pairs which is robust to various of transformations.
|
Common Pattern Discovery
|
We proposed a new approach for the discovery of common patterns in
multiple images by region matching. The issues in feature
robustness, matching robustness and noise artifact are addressed to
delve into the potential of using regions as the basic matching
unit. We novelly employ the many-to-many (M2M) matching strategy,
specifically with the Earth Mover's Distance (EMD), to increase
resilience towards the structural inconsistency from improper region
segmentation of a pattern as a result of various geometric and
photometric transformations. However, the matching pattern of M2M is
dispersed and unregulated in nature, leading to the challenges of
mining a common pattern while identifying the underlying
transformation. To avoid analysis on unregulated matching, we
propose monolithic matching for the collaborative mining of common
pattern from multiple images. The patterns are refined iteratively
using the Expectation-Maximization algorithm by taking advantage of
the crowding phenomenon in the EMD flows.
|
|