We propose a novel approach to understanding activities from their partial observations monitored through multiple non-overlapping cameras separated by unknown time gaps. In our approach, each camera view is first decomposed automatically into regions based on the correlation of object dynamics across different spatial locations in all camera views. A new Cross Canonical Correlation Analysis (xCCA) is then formulated to discover and quantify the time delayed correlations of regional activities observed within and across multiple camera views in a single common reference space. We show that learning the time delayed activity correlations offers important contextual information for (i) spatial and temporal topology inference of a camera network; (ii) robust person re-identification and (iii) global activity interpretation and video temporal segmentation. Crucially, in contrast to conventional methods, our approach does not rely on either intra-camera or inter-camera object tracking; it thus can be applied to low-quality surveillance videos featured with severe inter-object occlusions. The effectiveness and robustness of our approach are demonstrated through experiments on 330 hours of videos captured from 17 cameras installed at two busy underground stations with complex and diverse scenes.

Contribution Highlights

  • The approach is capable of discovering and quantifying the correla- tion and temporal relationships of arbitrary order among local activities across different camera views. To our best knowledge, this study is the first attempt to model time delayed activity correlations among multiple cameras.
  • The approach does not rely on either inter-camera or intra-camera tracking. Therefore it is robust to occlusions and can be applied to crowded scenes of low spatial and temporal resolutions.


  1. Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding
    C. C. Loy, T. Xiang, and S. Gong
    International Journal of Computer Vision, vol. 90, no. 1, pp. 106-129, 2010 (IJCV)
  2. Multi-Camera Activity Correlation Analysis
    C. C. Loy, T. Xiang, and S. Gong
    in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1988-1995, 2009 (CVPR, Oral)


Overview diagram:

A diagram illustrating our multi-camera time delayed activity correlation approach.


The station layout and camera topology of Station A and Station B datasets. Entry and exit points are highlighted in red bars.

Activity-based scene decomposition results:

Since a complex scene naturally consists of mul- tiple local scene regions that encompass distinctive activities, each camera view is first decomposed automatically into regions, across which different spatio-temporal activity patterns are observed.

Regional activity affinity and time delay matrices:

Regional activity affinity matrices and the associated time delay matrices obtained using xCCA, xCA and MCMC Bayesian network structure learning. (Left) Station A (Right) Station B.

Camera topology inference:

xCCA yielded the closest topology to the actual one as compared to other methods. M = missing edges, R = redundant edges. (Top) Station A (Bottom) Station B.

Comparison with tracking-based method:

(a) Passengers leave the field of view of Cam 5 from a zone marked with ’Exit’ and (b) enter Cam 4 from a zone marked with ’Entrance’. (c) The exit/entry transition time distribution for selected pairs of zones obtained using tracking-based method proposed by Makris et al. (2004). Dotted lines labelled as [i] at 9 frames and [ii] at 25 frames represent the time delays between the selected pairs of zones estimated using our method and the tracking-based method respectively. The average time delay obtained from manual observations is 9.12 frames.

Our method discovers time delay information to facilitate more accurate person re-identification:

Example queries selected from the person re-identification experiment. The first image in each row is a probe image. It is followed by top 20 results, sorted from left to right according to the ranking obtained using CH+xCCA, with the correct match highlighted using a green bounding box. The ranks returned by the evaluated methods are included at the rightmost columns for comparison. Note the visual ambiguity in the search space due to variations of pose, colours, lighting changes; as well as poor image quality caused by low spatial resolution.

More figures in the paper

Datasets and Codes

MATLAB code for computing cross canonical correlation (xCCA) between time series.

C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010

Download Codes [15 KB]

MATLAB code for segmenting a scene into regions based on activity patterns. C++ codes for spectral clustering by L. Zelnik-Manor and P. Perona are included.

C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010

Download Codes + Data [7.8 MB]

MATLAB and C++ codes for background modelling and subtraction.

A mean-shift based method for robust background subtraction in video with sudden global intensity change. A static background image is first generated based on minimum cut, the method then adapts the background image to the intensity level of current frame prior to actual background subtraction.

C. C. Loy, T. Xiang, and S. Gong, Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding, IJCV 2010

Download Codes [50 KB] Download Testing Data [13 MB]

Related Projects