Login Form

Editors

Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Read More...

Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Read More...

Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


Read More...

Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in...


Read More...

L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


Read More...
01234

Problem identification

Action and activity recognition and categorisation under real­world conditions are of crucial importance for awareness of one’s environment and for interaction with one’s surroundings. Perception of motion plays a central role in biological visual systems. Sophisticated mechanisms for observing, extracting, and utilizing motion exist even in primitive animals [Ullman1981]. For humans, successful motion processing is a prerequisite for accomplishing many everyday tasks [Orban2008]. Given the crucial importance of motion in biological systems, there has been a huge interest in motion related research in computer vision and artificial intelligence communities, as they strive to bring their algorithms and applications closer to the real world, and into everyday use [Aggarwal2011].

Current state­of­the­art computer vision methods work well for problems within limited domains and for specific task, and activity recognition and categorisation is no exception [Niebles2008]. However, when such methods are applied in more general settings, they become brittle, much less efficient, or even computationally intractable. In a nutshell, classic approaches are not general, and they do not scale well. Consequently, new paradigms, which would alleviate those problems, are constantly sought.

Scientific advances in the recent years, especially in the field of neuroscience [Orban2008], have provided us with inspiration and insights that have given rise to novel approaches in computer vision [Pinto2009]. Those methods do not aim to exactly duplicate the functionality of human brain, however, their goal is to improve the performance of computer vision methods using selected design principles, that take inspiration in human and primate visual perceptual systems. In some areas, those efforts have already begun to bear fruit, i.e., significantly improving the performance of computer vision methods, especially when applied to complex, real­life problems.

One of most important design principles that appears to have a potential for robust and scalable solutions is a concept of hierarchical compositionality [Bienenstock1994]. There are many reasons in favor of using hierarchical approaches in computer vision. Relation to human perception is only one of them, and in terms of computer vision, it is not the most important one. The most important reason is, that hierarchical compositionality approach allows much more efficient use of available resources than it is possible with other state­of­ the­art approaches [Fidler2010a]. These properties have been demonstrated primarily for modeling visual shape categories. What remains as an open research issue is whether a similar approach can be applied for processing motion. Showing that motion information can be systematically learned in a number of hierarchical stages (from local to global, from specific to more abstract / invariant) and then inferred in an efficient process would shed new insights into building robust computer vision systems.

The success of hierarchical compositionality models can be explained as follows. In any computer vision method there is an inherent problem of knowledge representation. This problem is especially acute with complex problems, where the knowledge is correspondingly complex. Flat representation of knowledge retains its complexity. For example, in object recognition and object categorization tasks, we deal with many objects that may have shared properties, but are otherwise distinct. If such a task is applied to a large problems of a general nature, all variations of objects have to be stored in its knowledge base. For real world problems, the dimensionality of such task may be prohibitive. On the other hand, with hierarchical compositionality model, the knowledge is stored throughout the visual hierarchy. Since the knowledge is spread throughout the hierarchy, the shared properties between different observations may be encoded only once. Such knowledge representation is significantly more flexible, it scales better, and it generalizes well. The idea that knowledge is spread throughout the visual hierarchy is also consistent with current understanding of human visual processing [Hawkins2004].

The inclusion of hierarchical compositionality models into computer vision algorithms has been gradual, and the analysis of motion is no exception. Currently, state­of­the­art approaches for motion analysis use hierarchy only in parts of visual processing pipeline, complementing it with well known and tested algorithms for low­level image processing and classification [Pinto2009]. When used this way, only a few levels in hierarchy are needed. Even using this approach, significant improvements in performance of motion analysis methods, especially in activity recognition, have been reported.

It is our aim to take research a step further, and employ hierarchical compositionality across the whole motion processing pipeline. Contrary to the state­of­the­art approaches, we plan to base our algorithm on extremely simple motion detection units, and employ learning even at the lowest stages of hierarchy. We plan then to introduce additional hierarchy levels, which would offload significant amount of complexity from the end­stage classifier to the hierarchical structure itself. Our aim is to develop a model, which will be general in nature, which means, it will be useful for different motion related tasks. Finally, our aim is to combine such model with already developed hierarchical compositional model for shape representation, resulting in a combined model, which would outperform either of those two separate models.


To successfully apply such framework to motion perception, several problems need to be tackled.

Hierarchical structure

Structure of the hierarchical compositional model itself needs to be devised. The following research questions have to be addressed.

  1. What is the most appropriate basic design for the initial elementary detectors on the lowest level of the hierarchy?
  2. How to map spatial and temporal motion aspects into a hierarchical spatio­temporal structure?
  3. How to introduce spatial and temporal scale in the hierarchical structure?
  4. How to achieve an appropriate invariance and abstraction on each level of the hierarchy?
  5. How to learn the connections between the elements of the hierarchy and how to perform efficient inference?

Parameters

The parameters will be obtained by testing the model with real­world visual data, obtaining the model output, and examining the output for conformance with the statistics of natural images. Among others, the dimensionality and granularity of the motion hierarchy will be established.

Validation and adjustment

The primary goal of our research is a generative model. Nevertheless, for use in real 
computer vision problems, the model will have to exhibit appropriate discriminative 
properties as well. For this purpose the model with have to be evaluated on a real world 
data, and compared to state­of­the­art results. Based on those results, the model will be 
refined and it structure re­examined. The model will be also evaluated in terms of storage 
efficiency, that means that it will be able to store the learned information in a much more 
efficient way, than is currently possible with the state­of­the­art methods, at the 
comparable recognition accuracy. Other performance evaluation criteria will be the efficiency 
of inference and learning, transfer of knowledge, and generalisation capabilities.

Integration with shape­based hierarchical compositionality model

In biological systems, motion is only one of the cues, which are exploited for observation and understanding of one’s environment. It is known that motion pathway is interconnected with the pathway processing shape [Beck2010]. Therefore we will also look at computational reasons and advantages for combining shape and motion hierarchies. We expect that the newly developed hierarchical compositional motion model will be integrated with existing shape­based hierarchical compositionality model. The nature and the appropriate extent of communication between the two models will be studied with further testing and adaptation of both models. The end result of such integration will be the composite model, which will be able to use both shape and motion information to understand real world environment.

Efficient parallel implementation

To enable productive pace of development, parallel versions of algorithms will be developed through several phases of the project. This will allow real­time or near real­time implementations of the low level algorithms on the state­of­the­art, massively parallel architectures such as Nvidia CUDA, and significantly improved performance of the higher level algorithms on the modern multi­core processors. It is expected that the developed algorithms could be ported to parallel architectures with relative ease, since the modern massively parallel architectures correspond well to the lower levels of the human visual perceptive system, and the hierarchical compositional structure offers ample opportunities for task parallelization on the higher levels of the structure.

Consortium

In line with the required competence of the project partners, we have composed our consortium from four groups: University of Ljubljana, Faculty of Computer and Information Science, Visual Cognitive Systems Laboratory (FRI VICOS), and Faculty of Electrical Engineering Machine Vision Laboratory (FE MVL), Jožef Stefan Institute, Department of Automatics, Biocybernetics, and Robotics (IJS DABR) and Department of Communication Systems (IJS DCS), each of which specializes in a particular topic that will be indispensable for achieving the project’s goal. The VICOS group has recently successfully developed a methodology for automatic construction (learning) and inference in compositional models for shape categorization. The MVL group has a long history of research in motion tracking and analysis. The expertise of VICOS and MVL will play a central role to design the methodology of hierarchical compositional models of motion. The DCS group specializes in parallel algorithms and parallel architectures and will contribute to fast implementation of our algorithms. The scientific focus of the DABR group is cognitive robotics and will provide an experimental platform for evaluation of the implementations.

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.