Most current research on computer vision attempts to model generic capabilities for image feature detection and region segmentation, stereo and motion perception, object recognition and tracking etc. However, these general approaches are not sufficient, in themselves, to cope with the wide variability in real-world scenes and task-specific requirements of many applied vision systems. The subfield of visual interpretation and understanding combines techniques from AI and knowledge-based systems with computer vision techniques to deliver enhanced functionality in such systems. Naturally, we encounter many of the major issues in AI such as knowledge representation and reasoning, control and the handling of uncertainty, as well as machine learning. Much of the work assumes that knowledge drives reasoning in visual interpretation, using expectation or hypothesis-driven processing. This means we take sides in the one of the great debates in vision since knowledge-based vision has a major ``seeing as'' bias, rather than using generic processing. In our subfield, visual context is seen as essential for understanding what is depicted in images or image sequences. If we are to build efficient systems that can tackle many different tasks, high-level attention and control of the processing is also essential. In addition, if we are to use scene and task knowledge, we have to address the question of how knowledge can be acquired. Formal structural and propositional knowledge has to be designed by hand but, as we will see, representations can also be learnt.
Research in this subfield goes beyond recognising features and objects to descriptions of the scene content that are meaningful to the observer or user of the system. To achieve this, we build in domain specific knowledge of the scene and tasks by representing prior knowledge in a readily accessible form. For example, in VIEWS [12,18], which was a major European knowledge-based vision project, advanced visual surveillance capabilities were developed using a mixture of constraint-based, model-based, logic-based and probabilistic interpretation techniques. Demonstrations showed that, for object detection and tracking, performance was much improved by scene-based knowledge of expected object trajectories, size and speed [26,27]. Also, both scene and task-based knowledge allowed selective processing under attentional control for behavioural evaluation in traffic scenes [31,32]. There are many such applications where the existence of prior scene and task knowledge provides context for conditioning computer vision algorithms.