Pointing and command gestures under mixed illumination
conditions: video sequence dataset

M. B. Holte and M. Stoerring

This text as pdf (1MB)

Introduction

Robust computer vision based gesture recognition is important for future human computer interfaces. In collaborative Augmented and Virtual Reality, e.g., for urban planning, where several people work together on a virtual model of a town, a set of hand gestures such as select, copy, paste, and move would be more convenient than using devices like the mouse.

A typical place to use (future) gesture interfaces is an office environment with cluttered background and mixed illumination conditions. The combination of artificial indoor illumination and outdoor illumination (through windows) may cause very high intensity variations and large changes in the colour of the illumination ranging from rather bluish outdoor to yellowish-reddish indoor. A cluttered background and such illumination conditions make the low level segmentation of computer vision based gesture interfaces often fail. In particular skin colour like objects and illumination colour changes are difficult to cope with, whereas the problem of high intensity ranges will be solved by future camera technology – high dynamic range cameras are already available that can capture much higher ranges than the human eye.

The sequences of this database were recorded using a rather high quality (low noise) 3CCD computer vision camera. The scene contains objects that appear under certain illuminations skin colour like, and the illumination colour ranges from indoor to outdoor illumination.

The range of intensities is, however, not that drastic than it would be when mixing indoor and outdoor light sources. All light sources used are artificial ones and their intensities were adjusted such that the number of under and overexposed pixels is small. Although this is not realistic it is not considered as a “constraint” since camera technology that can cope with high intensity ranges is on its way.

This website documents the scene setup and camera/software settings utilized for capturing the 16 video sequences of hand gestures, which can be downloaded here.

The document includes issues as, the chosen gesture vocabulary, the scenario script performed by the test persons, scene setup, camera calibration/software settings, annotation and download of video sequences. These informations can be found in the following five sections:

 

 

Gesture Vocabulary

The gesture vocabulary consists of 13 gestures. Where 9 gestures are static and 4 are dynamic. All other hand movements and postures are included in an “unspecified gesture”. The 13 chosen gestures are shown on figure 1 - 13. Figure 1 – 9 illustrates the static gestures and figure 10 – 13 the dynamic.

 

Figure 1: Residue.

Figure 2: Point.

Figure 3: Copy.

Figure 4: Paste.

Figure 5: Properties.

Figure 6: Deselect.

Figure 7: Menu.

Figure 8: Delete.

Figure 9: Yes/confirm.

 

Figure 1 – 9: Static gestures.

 

 

Figure 10: Select.

Figure 11: Select all.

Figure 12: Scale.

Figure 13: No/undo.

 

Figure 10 – 13: Dynamic gestures.

 

Scenario Script

The scenario script is created to make the test persons imaging that they interact with the object placed on the table. Furthermore, some of the objects are moved during the scenario to introduce background changes in addition to hand movements.

The full scenario script is listed in the following.

 

Scenario Script for Test Persons

 

  1. Start with the fist in the middle of the camera view.

 

  1. Point with the index finger and move the hand to the pen.

            Select (click) the pen and move to the paper.

            Move to the pencil case, close the hand, deselect and close the hand again.

 

  1. Add the marker pen to the left of the compass, with the other hand.

            Remove the other hand from the field of view.

 

  1. Point and move to the rubber, select (click) it, close the hand, copy and close the hand again.

            Point and move to the book, close the hand, paste and close the hand again.

 

  1. Point and move to the paper, select (click) it, close the hand, delete and close the hand again.

            Confirm no and close the hand.

 

  1. Shift the position of the pencil case and the pen, with both hands.

            Close the hands and remove the other hand form the field of view.

 

  1. Point and move to the calculator, select (click) it, close the hand, choose properties and close the hand again.

            Confirm yes and close the hand.

 

  1. Point and move to the compass, select (click) it, close the hand, scale up, down and up again and close the hand again.
  2. Point and move to the middle of the view, close the hand, select all objects, close the hand, deselect them and close the hand again.

 

  1. Point and move to the book, close the hand, display the menu and close the hand again.

            Confirm yes and close the hand.

 

  1. Point and move to the middle of the field of view and make the fist gesture.

 

 

Additionally, the test person has to comply with the following restrictions:

 

 

 

            Any fast movements will blur the sequences.

 

 

            For instance the fingers should be fully stretched.

 

Scene Setup

The captured scene includes a messy table environment with normal stuff for paper work. As these objects are used by the test persons to interact with, it is important that the scene setup do not get to complex, hence the test persons can get confused. The objects on the table are placed, so the distance to the edge of the field of view is large enough, to secure that the whole hand is captured during the recording process.

The light setup is arranged so that the table is split up in two parts with the same intensity (measured with luxmeter). One side of the table has a color temperature of 2600K and the other 4700K. The two light sources are of type Photax 3200K head with Philips PF 308 E12 Argaphoto-B 240V / 500W light bulbs. The scene setup is illustrated on figure 14, 15, 16 and 17.

 

Figure 14: Scene setup including a test person observed from the front.

Figure 15: Scene setup including a test person observed from the side.

Figure 16: Hands illuminated by the split light configuration.

Figure 17: The messy table environment captured by the utilized camera.

 

Figure 14, 15, 16 and 17: The scene setup.

 

Camera Calibration

The utilized camera and frame grabber are of type JAI CV-M90 and Picasso PCI-3C respectively. The aperture of the camera is set to 2.2, white balanced has been performed with a color temperature of 3040K and the camera is calibrated to have an offset/black current close to zero. Further camera and frame grabber software settings concerning Iris control, RGB gain and offset are illustrated on figure 26 and 27.

 

Figure 18: Picasso PCI-3C frame grabber software settings.

Figure 19: ArvooCord recording software settings for JAI CV-M90 camera.

 

Figure 18 and 19: Camera and frame grabber software settings.

 

To investigate the linearity of the camera response, a series of images is captured of a static scene while changing the exposure. Figure 20 shows the camera response function as the relation between the intensity and the exposure at a certain point in the image sequence. Figure 21 shows the result of plotting the intensity value for five of the neutral colored squares of the Macbeth ColorChecker captured with the camera. As the figures illustrates, the camera response is nearly linear and no further gamma correction or nonliniarity compensation has to be done.

Figure 20: Camera response based on images captured whit different exposures.

Figure 21: Camera response based on Macbeth Y-values

Figure 20 and 21: JAI CV-M90 camera response function.

A final check is carried out to investigate the RGB gains. This is done to avoid saturated intansity values or values close to zero in importen parts of the sequences. Figure 22 shows some selections on a captured image, which are processes to calculate the minimum, maximum and mean intensity values for each RGB color channel. Furthermore the r and g chromaticities are computed and plottet to check the influence of the split light configuration with two different color temperatures.

 

Figure 22: Investigation RGB gains to check intensity and chromaticity values.

 

Annotation

The recorded video sequences are annotated by the program, FGAnno, to provide the ground truth for all sequences. This program is specially developed to annotate video sequences with hand gestures. The user interface can be seen on figure 23. The structure of the annotation files are also illustrated in figure 24 and further specified in the following.

The annotation files includes the test person in the top of the file and the list of gestures afterwards. In the EVENTLIST all the events a test person has performed during a sequence is listed. Each event contains the elements:

Figure 23: The utilized annotation program with Person Lister, Gesture Lister, Event Lister, Video Viewer and Annotation widows.

Figure 24: Structure of annotation file with Person List, Gesture List , Annotation Layers and Event List.

Figure 23 and 24: Annotation program and file.

 

Further Annotation Work

Annotation on layer 1 and 2 , which describe the hands posture and movement on pixel and image level. The annotation layer 1 includes information about the position in pixels of a contour of the hand and layer 2 concerns tracking of all visible fingers. The FGAnno user interface for annotation on layer 1 and 2 is shown on figure 25.

Figure 25: Annotation program setup for annotation on layer 1 and 2.

 

Gesture Video Sequence Database

The video sequences are recorded in PAL resolution, 768 x 576, and each image in the video sequences are named sequenceType_personID_skinType_sex_frameNumber.fileName. The sequenceType is set to A for all the available sequences, hence all sequences has been recorded as the same type of sequence. The personID is a number from 01 to 16, skinType is either (White, Middle east, Yellow, Black), the sex is either (Male, Female), and frameNumber is a four digit number (0000 >> XXXX). The Annotation files are nemed in the same way, just without the frameNumber.

A compressed video sequence can be found here. The full sequences down sampled to 384x288 can be downloaded here. If you are interested in the full size images please contact: fgnet AT cvmt DOT dk


Last modified: Wed Nov 17, 2004