Login Form

Editors

Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Read More...

Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Read More...

Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


Read More...

Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in...


Read More...

L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


Read More...
01234

Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset. We report recognition rate with respect to the number of layers used in the compositional scheme. To provide a fair comparison, we also performed tests using only the output of the layer L1 (quantized motion), without any compositional structure. For UCF Sports Action dataset, the results are shown in Table 1. The efficiency of our scheme does not benefit when including additional layers, so we limit ourselves to L1 and L2 in this case. It can be seen, that even in L1+L2 configuration, our approach outperforms the state-of-the-art approach [19] (the implementation that relies exclusively on motion trajectories). Since the UCF Sports Action dataset contains well structured motion – sports – these results are not surprising.

 

 

The second table (above) shows the results on the Holywood2 dataset. The results on the Hollywood2 dataset are worse, even though they are comparable to the initial results on the same dataset by the dataset authors.

 

References:

 

[14] M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, pages 2929–2936, June

[19] H.Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, pages 3169–3176. IEEE, 2011.

 

Our results are taken from:

PERŠ, Janez, KRISTAN, Matej, MANDELJC, Rok, KOVAČIČ, Stanislav, LEONARDIS, Aleš. Hierarhična kompozicionalna arhitektura za detekcijo in razpoznavanje aktivnosti. Elektrotehniški vestnik.

Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many of the tasks are implemented on GPU (see tables), and the optimization step of the learning process runs in parallel on all four cores in parallel.

 

The first table shows the time spent in a each stage of the algorithm, per frame. Tested on a largest video from the Holywood2 dataset (actioncliptrain00863. avi, 2958 frames), resampled to the resolution of 521288 pixels. Times do not include overhead, such as reading/ writing to disk, image resampling and similar tasks. *Task was at least partially runing on the GPU.

 

A

 

The second table shows the approximate times spent for optimization during learning (480 frames, 19.2 s video), times needed for SVM training and testing on the whole Holywood2 dataset (assuming all features are already sampled and arranged into the feature vectors) and the feature dimensionality in the SVM tests. * denotes that the task was run on all four CPU cores in parallel.

Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in the process of reloading the server with the contents - we expect that the site will be fully operational and up-to-date with information on May 12th, 2014.

Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers L0 and L1) is the extraction of relatively basic motion features. The middle stage implements hierarchical compositional structure, and consists of multiple layers, L2 to LN. The layered structure decreases the complexity of the learning process due to limited receptive field in which the neighborhood of each element is observed. In particular, such a scheme avoids the need to jointly estimate a rather large number of parameters for all layers by decomposing the learning into sequence of layer-wise training epochs. The topmost stage provides discriminative capabilities using SVM classification on the outputs of the middle, compositional stage. Each layer Li is associated with its dictionary, i. While the 1 is fixed, the higher layer dictionaries are obtained through learning.

L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.

Read more...

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.