Minutes of ECVnet workshop on Benchmarking


held Monday 10th July 1995 at Cap Gemini Sogeti offices, 76 Avenue Kleber, PARIS

This workshop was attended by 11 delegates: Pascal Brand, LIFIA, France; Henrik I. Christensen, Aalborg University, Denmark; Adrian Clark, University of Essex, England; Patrick Courtney, ITMI-Aptor, France; Jim Crowley, LIFIA-IMAG, France; Wolfgang Foerstner, University Bonn, Germany; Christophe Guizard, Cemagref, France; Claus B. Madsen, Royal Institute of Technology, Sweden; Jan Nielsen, AITEK S.r.l., Italy; Patrick Stelmaszyk, ITMI-Aptor, France; Neil Thacker, University of Sheffield, England.

copies of overheads available from Patrick.Courtney@itmi.cgs.fr

INTRODUCTION

what is benchmarking and why do it ? Patrick Stelmaszyk

Benchmarking is defined here as "qualitative and quantitative evaluation of performance" where performance has many dimensions. Beneficiaries of benchmarking are identified and the usefulness of benchmarking at five key stages of system design (tuning; selection; combination; improvement and resue) are presented.

STATE OF THE ART - THEORETICAL AND NUMERICAL APPROACHES

1: studies in pose estimation stability, Claus Madsen

A study of pose estimation stability is motivated by the observation that some views of 3D objects seem subjectively preferable to others, and that an understanding would be helpful in developing strategies for camera placement. It is subsequently demonstrated that theoretical analysis and experiments result in knowledge about the stability of edge segments and that this knowledge can be used to study the stability of a pose estimation technique. The non-intuitiveness of some of the results obtained emphasise the importance of studying stability issues.

2: 10 con's and pro's for testing vision algorithms, Wolfgang Foerstner

Responses to ten common counter-arguments to benchmarking are presented (it is task-dependent; vision is only one module; vision is too complex; the models used are wrong; measures are not comparable; there is no theory; there are too many parameters; ground truth is too costly; simulations are not reality; testing is not acknowledged), together with recommendation for circumventing them.

3: statistical analysis of correspondence algorithms, Neil Thacker

A procedure for algorithmic modeling is presented. This involves identifying the statistical assumptions underlying the algorithms and sampling the corresponding data distributions. From this, algorithm performance parameters can be calculated directly. This has the advantages of needing less data, allowing cross-comparison; being quick and giving data independent behaviour. Examples are given for both area and feature based correspondance algorithms.

4: characterisation of vision algorithms: an experimental approach, Jan Nielsen

A space pose algorithm is described comprising several cameras, an edge detector, line linker, line grouper and search engine. Benchmarking of this system in terms of success rate and pose error is carried out for variable numbers of observations, distance between observers, layout and object. Recommendations are made regarding this system and benchmarking in general.

5: evaluating reliability and robustness - an example in estimation techniques, Patrick Courtney

An unreliable feature detector is considered to given valid but noisy as well as erroneous outlier data. Criteria for reliability (percentage of estimates within given error limits) and robustness (variation in reliability with proportion of outliers) are described. The performance of various well-known estimation techniques are given with respect to these criteria, indicating when they should be used.

6: benchmarking for recognition and indexing, Henrik Christensen

An analysis of the issues surrounding benchmarking for recognition system is presented, outlining the underlying goals which should be addressed, the diversity of the problem and methods. Possible approaches based on standards and objective measures are presented.

7: testing an industrial camera - practical experiences, Christophe Guizard

A survey of CCD array cameras is presented to address the problem of the lack of available information. The workplan is outlined covering electrical, optical and mechanical considerations. An extension to consider line scan and colour cameras is outlined. Conclusions are made as to future related studies.

8: sharing software and image datasets, Adrian Clark

Reasons for sharing code and data are proposed (difficulty in obtaining data; to permit comparisons; improve efficiency of research). The history of the development of PEIPA - the pilot european image processing archive is also presented. Recommendations are made for future use of this archive for data, software, its use and dissemination.

9: testing image matching procedures, Wolfgang Foerstner

The automation of a satellite image scanning task is described which requires external (controlled tests) and internal evaluation (self diagnosis). A comparison is made of cost with and without self diagnosis using a cost matrix. Some of the issues and problems surrounding self diagnosis are discussed.

afternoon - identification of blocks and proposals for the future

DISCUSSION AND SYNTHESIS

Points of Consensus

Is There a Methodology ?

The key points in dealing with benchmarking seem to be:

Several approaches have been successfully demonstrated including:

This range of techniques can be used on a single algorithms to progressively qualify it.

Resistance to Benchmarking - A Cultural Attitude

There is a need to change the cultural attitude towards benchmarking by:

Proposals for Action

The following actions were proposed: minutes by Patrick Courtney 19/7/95


For more information please contact : Patrick.Courtney@itmi.cgs.fr


Last updated 24 April 1997 by Patrick Courtney.