In this paper we describe a system which makes context-based decisions about the actions of people in a room. These actions include entering a room, using a computer terminal, opening a cabinet, picking up a phone, etc. Our system is able to recognize these actions by using prior knowledge about the layout of the room. The ideas presented in this system are applicable to automated security. The low-level Computer Vision techniques of tracking, skin detection, and scene change detection are used in our system to help perform action recognition. The output of this system is both a textual and a key frame description of the recognized actions.
It is is a technology which can be used for many uses, but it is being promoted as a surveillance tool, so as to monitor employees and their actions.