Pointing Gestures: video sequence database
This database consists of 24 video sequences of hands
pointing onto a desk. The sequences are recorded with a head mounted
camera (HMC) under three different lighting and background setups. It was
originally recorded as part or the
FGnet for the
Pointing'04 ICPR Workshop
(Cambridge, United Kingdom - 22 August 2004).
The focus and guidelines of the recording was to make the sequences
simulate a person operating an augmented reality system on top of a
messy office desk.
As the focus of the workshop is pointing, the gestures of the sequences
are divided into only three categories (point with the thumb out,
pointing with the thumb in, and not pointing).
The research task at hand is
to classify each image in each sequence into one of this three
categories.
The test persons
were told that they should think of the hand as a pointer, and the
thumb as a left-mouse-button. This was done to insure that
they would use a significant part of the time pointing.
Point with the thumb out
 |
Pointing with the thumb in
 |
Not pointing
 |
It was also decided not to give the test persons too many instructions,
to insure that they would react more natural doing recordings. This
also meant that re-recordings of sequences had to be kept at a minimum,
since the test persons quickly changed they behaviour after only a few
recordings. The test persons' movements were also restricted to insure
certain lighting conditions.
The following will describe each aspect of the recording, and post
processing of the sequences.
Capture setup
The capture setup was made with focus on controlling the lightning
conditions of the scene. Each test person was given a short instruction
and placed in front of the scene, so all sequences would have a more
similar setup.
Each recording was planed to last a little longer than a minute.
Depending on the test person's reaction time and the sequence
pre-processing the length of each sequence varies by several seconds.
The captures are made with three different setups:
| X: |
The scene background is a table covered with a black
piece of fabric, a Macbeth color-checker, and two small pieces of light
gray rubber. The rubber pieces were placed in the scene to give the
test persons some objects to interact with. Doing recordings the light
in the scene is switched forth and back between four different light
sources, which is described in more details below. |
 |
| Y: |
The scene background is a messy table with a lot of
moveable objects, and a Macbeth color-checker. The test persons were
asked to interact with as many objects as they felt like, but not to
start reading any of the books. Doing recordings the light in the scene
was fixed to a specific light type, which is described in more details
below. |
 |
| Z: |
The scene background is a messy table with a lot of
moveable objects, and a Macbeth color-checker. The test persons were
asked to interact with as many objects as they felt like, but not to
start reading any of the books. Doing recordings the light in the scene
is switched forth and back between four different light sources, which
is described in more details below. |
 |
Lighting conditions
The scene was illuminated by four different light sources and some
faint normal artificial office lights.
The four different primary lights all consisted of a pair of two
fluorescent tubes with a specific color temperature.
The Y sequences were recorded under light #2 only.
| Light 1 : |
Philips : TLD 58W/965 (6200K) |
| Light 2 : |
Philips : TLD 58W/950 (4700K) |
| Light 3 : |
Philips : TLD 58W/940 (3680K) |
| Light 4 : |
Philips : TLD 58W/927 (2600K) |
Capture
The sequences were recorded with a head mounted JAI-CVM2250 camera.
With the following setup :
| Lens aperture: |
Fully opened. |
| Gamma correction: |
None. |
| Y/C Reversal: |
Y/C Positive |
| Shutter speed: |
1/250 |
| Gain: |
140 |
| Levels (Black / White): |
(2/254) |
| Chroma (R-Y / B-Y): |
(136/96) |
| Backlight: |
Off |
| WBC: |
4600K |
As described above, the camera was set to a fixed gain and a color
temperature of 4600K. The sequences were recorded in 768*584 pixels
interlaced video with a frame rate of 25 Hz.
Provided data
The original database made for this workshop contains, in a compressed
form, in the area of 45Gb of data. As a result of of this, a smaller
downscaled version of
the database is used instead.
Each sequence was downscaled to a 384x292 pixels none interlaced
version,
and only 24 of the original 40 sequences are used.
The size of the compressed database is 7Gb.
Image sequences
The sequences are provided as a set of PNG format images, (View sequence size), with lose-less
compression. The .png files are named sequenceType_personID_skinType_sex_frameNumber.png.
The sequenceTypes are described above, the personID is a number from 01
to 15, skinType is either (White, Yellow, Black), the sex is
either (Male, Female), and frameNumber is a four digit number (0000
>> XXXX).
To ease the downloading of the database, all sequences are compressed
into zip-files (one file for each sequence sequenceType_personID_skinType_sex.zip).
The workshop image sequences can be downloaded HERE!
The original ~45Gb database is also available for download
HERE (/original_db/).
Ground truth
To provide a ground truth for all sequences, each sequence was
annotated by an annotation program specially build for this purpose.
The program was build as a part of the FG-Net project, and can
be
downloaded from : http://www.cvmt.dk/~fgnet/apps.html.
The transition between pointing and not pointing is not always clear in
the sequences,
so when the
annotator was in doubt an (Uncertain) annotation was used.
The annotation uses four different categories:
HAND : A hand is present in the picture, but NOT
pointing.
POINT : A pointing hand (with thumb IN) is
present in the picture.
POINT_THUMB : A pointing hand (with thumb OUT)
is present in the picture.
UNCERTAIN : A hand is present in the picture,
but two or more gestures might be correct.
Finally, all picture not including a part of a hand is left with no
annotation.
Each test person altered between using left hand right hand, i.e., the
annotation was done for each hand separately, resulting in two files
for each sequence.
The annotation files are named sequenceType_personID_skinType_sex_hand.txt,
and can be downloaded HERE.
Instructions on how to interpret the annotation files can be found HERE
Last modified: Thursday November 18, 2004