Login Form

Editors

Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Read More...

Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Read More...

Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


Read More...

Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in...


Read More...

L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


Read More...
01234

Task 1.1: Low level features and invariance to time and scale

When compositional model is used, only the lowest layer of the hierarchy comes into direct contact with visually­sensed data. Therefore, the choice of features, that are extracted on the lowest layer, is of utmost importance. Lowest level features should extract as much useful information as possible from input images, and therefore, the choice of those features depends on the particular task. Gabor filters have been successfully used in shape­based object categorization [Fidler2010]. There is also the neurological and psycho­physical evidence that Gabor features correspond well to basic shape detection units of human visual system. Spatial Gabor filtering has been extended into spatio­temporal domain [Dollar2005] and [Jhuang2007], resulting in spatio­temporal descriptors, describing both spatial surroundings of detected motions (action primitives) and its short­term temporal surroundings. Nevertheless, as noted in [Jhuang2007], this is not the only choice. For example, variants of optical flow, and even raw pixel values observed through time, could be used for this purpose as well. It is quite possible, that the features could be derived automatically, i.e. learned from the appropriate data set of image sequences, and potentially providing better performance than the approaches known and published so far. It remains to be determined, whether these features need to be spatio­temporal in nature at all. It is possible that purely spatial features will perform just as well, if we can introduce the temporal dimension into the hierarchy at higher levels. On the other hand, they could be purely temporal in nature, that is, represented by temporarily sensitive single pixel processing units, with the spatial dimension introduced into hierarchy at higher levels. Alternatively, we could apply more advanced local descriptors, as in [Ojala2002] or [Zhao2007], which are still simple on their own and have been reported to capture variety of motion textures. In this task, we will test all those hypotheses, and choose the low­level features that will perform best in our compositional hierarchical model.

Introduction of time into the compositional hierarchy is not a straightforward task. The temporal dimension can be introduced in different ways. One common approach is to use spatio­temporal descriptors at the lowest stage of the hierarchy, such as in [Jhuang2007] or [Zhao2007]. This may be sufficient even for tasks such as activity recognition, but results can be improved by explicitly modeling temporal relationships in the hierarchy [Jhuang2007]. It is likely that temporal dimension has to be introduced into all levels of compositional hierarchy to achieve adequate performance. In this task, we will test those approaches and choose the ones that will perform best in our compositional hierarchical model.

Scale invariance is important property when dealing with recognition of visual phenomena. In shape­based object categorization tasks [Fidler2010] we can simply resort to processing on multiple scales, even if compositional models are used. Such approach may become less viable as the temporal dimension is introduced into the model. In that case, we have to deal with scale in both temporal and spatial dimension, as pointed to by Laptev and Lindeberg [Laptev2003, Laptev2005]. Although, in human motion, those two scales are interdependent to some extent due to physical and physiological constraints (people cannot perform activities of an arbitrary intensity with arbitrary frequency due to lack of available power) that still leaves open the problem of scale changes due to viewpoint changes. In this task, and efficient mechanism for dealing with scale changes in spatial and temporal domains will be investigated.

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.