We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.
A diagram illustrating our multi-camera time delayed activity correlation approach.
The station layout and camera topology of Station A and Station B datasets. Entry and exit points are highlighted in red bars.
Since a complex scene naturally consists of mul- tiple local scene regions that encompass distinctive activities, each camera view is first decomposed automatically into regions, across which different spatio-temporal activity patterns are observed.
Regional activity affinity matrices and the associated time delay matrices obtained using xCCA, xCA and MCMC Bayesian network structure learning. (Left) Station A (Right) Station B.
xCCA yielded the closest topology to the actual one as compared to other methods. M = missing edges, R = redundant edges. (Top) Station A (Bottom) Station B.
(a) Passengers leave the field of view of Cam 5 from a zone marked with ’Exit’ and (b) enter Cam 4 from a zone marked with ’Entrance’. (c) The exit/entry transition time distribution for selected pairs of zones obtained using tracking-based method proposed by Makris et al. (2004). Dotted lines labelled as [i] at 9 frames and [ii] at 25 frames represent the time delays between the selected pairs of zones estimated using our method and the tracking-based method respectively. The average time delay obtained from manual observations is 9.12 frames.
Example queries selected from the person re-identification experiment. The first image in each row is a probe image. It is followed by top 20 results, sorted from left to right according to the ranking obtained using CH+xCCA, with the correct match highlighted using a green bounding box. The ranks returned by the evaluated methods are included at the rightmost columns for comparison. Note the visual ambiguity in the search space due to variations of pose, colours, lighting changes; as well as poor image quality caused by low spatial resolution.
MATLAB code for computing cross canonical correlation (xCCA) between time series.
Reference:
C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010
MATLAB code for segmenting a scene into regions based on activity patterns. C++ codes for spectral clustering by L. Zelnik-Manor and P. Perona are included.
Reference:
C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010
MATLAB and C++ codes for background modelling and subtraction.
A mean-shift based method for robust background subtraction in video with sudden global intensity change. A static background image is first generated based on minimum cut, the method then adapts the background image to the intensity level of current frame prior to actual background subtraction.
Reference:
C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010