Pointing Gestures: video sequence database

Introduction

This database consists of 24 video sequences of hands pointing onto a desk. The sequences are recorded with a head mounted camera (HMC) under three different lighting and background setups. It was originally recorded as part or the FGnet for the Pointing'04 ICPR Workshop (Cambridge, United Kingdom - 22 August 2004).

The focus and guidelines of the recording was to make the sequences simulate a person operating an augmented reality system on top of a messy office desk.

As the focus of the workshop is pointing, the gestures of the sequences are divided into only three categories (point with the thumb out, pointing with the thumb in, and not pointing).

The research task at hand is to classify each image in each sequence into one of this three categories.

The test persons were told that they should think of the hand as a pointer, and the thumb as a left-mouse-button. This was done to insure that they would use a significant part of the time pointing.

Point with the thumb out
Pointing with the thumb in
Not pointing
It was also decided not to give the test persons too many instructions, to insure that they would react more natural doing recordings. This also meant that re-recordings of sequences had to be kept at a minimum, since the test persons quickly changed they behaviour after only a few recordings. The test persons' movements were also restricted to insure certain lighting conditions.

The following will describe each aspect of the recording, and post processing of the sequences.

Representation of the capture environment.

Capture setup

The capture setup was made with focus on controlling the lightning conditions of the scene. Each test person was given a short instruction and placed in front of the scene, so all sequences would have a more similar setup.

Each recording was planed to last a little longer than a minute. Depending on the test person's reaction time and the sequence pre-processing the length of each sequence varies by several seconds. The captures are made with three different setups:

X:     The scene background is a table covered with a black piece of fabric, a Macbeth color-checker, and two small pieces of light gray rubber. The rubber pieces were placed in the scene to give the test persons some objects to interact with. Doing recordings the light in the scene is switched forth and back between four different light sources, which is described in more details below.
Y:     The scene background is a messy table with a lot of moveable objects, and a Macbeth color-checker. The test persons were asked to interact with as many objects as they felt like, but not to start reading any of the books. Doing recordings the light in the scene was fixed to a specific light type, which is described in more details below.
Z:     The scene background is a messy table with a lot of moveable objects, and a Macbeth color-checker. The test persons were asked to interact with as many objects as they felt like, but not to start reading any of the books. Doing recordings the light in the scene is switched forth and back between four different light sources, which is described in more details below.

Lighting conditions

The scene was illuminated by four different light sources and some faint normal artificial office lights. The four different primary lights all consisted of a pair of two fluorescent tubes with a specific color temperature. The Y sequences were recorded under light #2 only.

Light 1 : Philips : TLD 58W/965 (6200K)
Light 2 : Philips : TLD 58W/950 (4700K)
Light 3 : Philips : TLD 58W/940 (3680K)
Light 4 : Philips : TLD 58W/927 (2600K)

Capture

Representation of the HMC setup.

The sequences were recorded with a head mounted JAI-CVM2250 camera. With the following setup :

Lens aperture: Fully opened.
Gamma correction: None.
Y/C Reversal: Y/C Positive
Shutter speed: 1/250
Gain: 140
Levels (Black / White): (2/254)
Chroma (R-Y / B-Y): (136/96)
Backlight: Off
WBC: 4600K


As described above, the camera was set to a fixed gain and a color temperature of 4600K. The sequences were recorded in 768*584 pixels interlaced video with a frame rate of 25 Hz.

Provided data

The original database made for this workshop contains, in a compressed form, in the area of 45Gb of data. As a result of of this, a smaller downscaled version of the database is used instead. Each sequence was downscaled to a 384x292 pixels none interlaced version, and only 24 of the original 40 sequences are used.

The size of the compressed database is 7Gb.

Image sequences

The sequences are provided as a set of PNG format images, (View sequence size), with lose-less compression. The .png files are named sequenceType_personID_skinType_sex_frameNumber.png. The sequenceTypes are described above, the personID is a number from 01 to 15, skinType is either (White, Yellow, Black), the sex is either (Male, Female), and frameNumber is a four digit number (0000 >> XXXX).

To ease the downloading of the database, all sequences are compressed into zip-files (one file for each sequence sequenceType_personID_skinType_sex.zip).

The workshop image sequences can be downloaded HERE!

The original ~45Gb database is also available for download HERE (/original_db/).

Ground truth

To provide a ground truth for all sequences, each sequence was annotated by an annotation program specially build for this purpose. The program was build as a part of the FG-Net project, and can be downloaded from : http://www.cvmt.dk/~fgnet/apps.html.

The transition between pointing and not pointing is not always clear in the sequences, so when the annotator was in doubt an (Uncertain) annotation was used.

The annotation uses four different categories:
 HAND : A hand is present in the picture, but NOT pointing.
 POINT : A pointing hand (with thumb IN) is present in the picture.
 POINT_THUMB : A pointing hand (with thumb OUT) is present in the picture.
 UNCERTAIN : A hand is present in the picture, but two or more gestures might be correct.

Finally, all picture not including a part of a hand is left with no annotation.

Each test person altered between using left hand right hand, i.e., the annotation was done for each hand separately, resulting in two files for each sequence.

The annotation files are named sequenceType_personID_skinType_sex_hand.txt, and can be downloaded HERE.

Instructions on how to interpret the annotation files can be found HERE

Last modified: Thursday November 18, 2004