Hand Gesture Recognition Based on Computer Vision: A Review of Techniques

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Abstract

Hand gestures are a form of nonverbal communication that can be used in several fields such as communication between deaf-mute people, robot control, human–computer interaction (HCI), home automation and medical applications. Research papers based on hand gestures have adopted many different techniques, including those based on instrumented sensor technology and computer vision. In other words, the hand sign can be classified under many headings, such as posture and gesture, as well as dynamic and static, or a hybrid of the two. This paper focuses on a review of the literature on hand gesture techniques and introduces their merits and limitations under different circumstances. In addition, it tabulates the performance of these methods, focusing on computer vision techniques that deal with the similarity and difference points, technique of hand segmentation used, classification algorithms and drawbacks, number and types of gestures, dataset used, detection range (distance) and type of camera used. This paper is a thorough general overview of hand gesture methods with a brief discussion of some possible applications.

Keywords: hand gesture, hand posture, computer vision, human–computer interaction (HCI)

1. Introduction

Hand gestures are an aspect of body language that can be conveyed through the center of the palm, the finger position and the shape constructed by the hand. Hand gestures can be classified into static and dynamic. As its name implies, the static gesture refers to the stable shape of the hand, whereas the dynamic gesture comprises a series of hand movements such as waving. There are a variety of hand movements within a gesture; for example, a handshake varies from one person to another and changes according to time and place. The main difference between posture and gesture is that posture focuses more on the shape of the hand whereas gesture focuses on the hand movement. The main approaches to hand gesture research can be classified into the wearable glove-based sensor approach and the camera vision-based sensor approach [1,2].

Hand gestures offer an inspiring field of research because they can facilitate communication and provide a natural means of interaction that can be used across a variety of applications. Previously, hand gesture recognition was achieved with wearable sensors attached directly to the hand with gloves. These sensors detected a physical response according to hand movements or finger bending. The data collected were then processed using a computer connected to the glove with wire. This system of glove-based sensor could be made portable by using a sensor attached to a microcontroller.

As illustrated in Figure 1 , hand gestures for human–computer interaction (HCI) started with the invention of the data glove sensor. It offered simple commands for a computer interface. The gloves used different sensor types to capture hand motion and position by detecting the correct coordinates of the location of the palm and fingers [3]. Various sensors using the same technique based on the angle of bending were the curvature sensor [4], angular displacement sensor [5], optical fiber transducer [6], flex sensors [7] and accelerometer sensor [8]. These sensors exploit different physical principles according to their type.

An external file that holds a picture, illustration, etc. Object name is jimaging-06-00073-g001.jpg

Different techniques for hand gestures. (a) Glove-based attached sensor either connected to the computer or portable; (b) computer vision–based camera using a marked glove or just a naked hand.

Although the techniques mentioned above have provided good outcomes, they have various limitations that make them unsuitable for the elderly, who may experience discomfort and confusion due to wire connection problems. In addition, elderly people suffering from chronic disease conditions that result in loss of muscle function may be unable to wear and take off gloves, causing them discomfort and constraining them if used for long periods. These sensors may also cause skin damage, infection or adverse reactions in people with sensitive skin or those suffering burns. Moreover, some sensors are quite expensive. Some of these problems were addressed in a study by Lamberti and Camastra [9], who developed a computer vision system based on colored marked gloves. Although this study did not require the attachment of sensors, it still required colored gloves to be worn.

These drawbacks led to the development of promising and cost-effective techniques that did not require cumbersome gloves to be worn. These techniques are called camera vision-based sensor technologies. With the evolution of open-source software libraries, it is easier than ever to detect hand gestures that can be used under a wide range of applications like clinical operations [10], sign language [11], robot control [12], virtual environments [13], home automation [14], personal computer and tablet [15], gaming [16]. These techniques essentially involve replacement of the instrumented glove with a camera. Different types of camera are used for this purpose, such as RGB camera, time of flight (TOF) camera, thermal cameras or night vision cameras.

Algorithms have been developed based on computer vision methods to detect hands using these different types of cameras. The algorithms attempt to segment and detect hand features such as skin color, appearance, motion, skeleton, depth, 3D model, deep learn detection and more. These methods involve several challenges, which are discussed in this paper in the following sections.

Several studies based on computer vision techniques were published in the past decade. A study by Murthy et al. [17] covered the role and fundamental technique of HCI in terms of the recognition approach, classification and applications, describing computer vision limitations under various conditions. Another study by Khan et al. [18] presented a recognition system concerned with the issue of feature extraction, gesture classification, and considered the application area of the studies. Suriya et al. [19] provided a specific survey on hand gesture recognition for mouse control applications, including methodologies and algorithms used for human–machine interaction. In addition, they provided a brief review of the hidden Markov model (HMM). A study by Sonkusare et al. [20] reported various techniques and made comparisons between them according to hand segmentation methodology, tracking, feature extraction, recognition techniques, and concluded that the recognition rate was a tradeoff with temporal rate limited by computing power. Finally, Kaur et al. [16] reviewed several methods, both sensor-based and vision-based, for hand gesture recognition to improve the precision of algorithms through integrating current techniques.

The studies above give insight into some gesture recognition systems under various scenarios, and address issues such as scene background limitations, illumination conditions, algorithm accuracy for feature extraction, dataset type, classification algorithm used and application. However, no review paper mentions camera type, distance limitations or recognition rate. Therefore, the objective of this study is to provide a comparative review of recent studies concerning computer vision techniques with regard to hand gesture detection and classification supported by different technologies. The current paper discusses the seven most reported approaches to the problem such as skin color, appearance, motion, skeleton, depth, 3D-model, deep-learning. This paper also discusses these approaches in detail and summarizes some modern research under different considerations (type of camera used, resolution of the processed image or video, type of segmentation technique, classification algorithm used, recognition rate, type of region of interest processing, number of gestures, application area, limitation or invariant factor, and detection range achieved and in some cases data set use, runtime speed, hardware run, type of error). In addition, the review presents the most popular applications associated with this topic.

The remainder of this paper is summarized as follows. Section 2 explains hand gesture methods and take consideration and focus on computer vision techniques, where describe seven most common techniques such as skin color, appearance, motion, skeleton, depth, 3D-module, deep learn and support that with tables. Section 3 illustrates in detail seven application areas that deal with hand gesture recognition systems. Section 4 briefly discusses research gaps and challenges. Finally, Section 5 presents our conclusions. Figure 2 below clarify the classification methods conducted by this review.

An external file that holds a picture, illustration, etc. Object name is jimaging-06-00073-g002.jpg

Classifications method conducted by this review.