In recent years, Apple has made significant strides in the realm of machine learning and camera technology, both in terms of hardware and software. Notably, many Apple devices now come equipped with a dedicated neural engine, a specialized processor designed to accelerate machine learning models.
Apple's Advancements in Machine Learning and Camera Capabilities
During WWDC 2020, Apple unveiled several exciting developments. The introduction of iOS 14 brought a plethora of enhancements and intriguing new features to Apple's computer vision framework. Initially released in 2017, the Vision framework empowered developers to harness sophisticated computer vision algorithms effortlessly. In iOS 14, Apple focused on expanding the capabilities of the Vision framework, particularly in the realm of hand tracking and improved body pose estimation for images and videos. Alongside hand and body tracking, the update introduced various other captivating features
- Trajectory Detection: the ability to analyze and detect objects’ trajectories in a given video sequence. With iOS 14 a new Vision request was introduced VNDetectTrajectoriesRequest.
- Contour Detection: VNDetectContoursRequest allows you to identify the contours of shapes in an image. It will come in handy in places where we need to locate specific objects or group them by form or size.
- Optical Flow: VNGenerateOpticalFlowRequest determines directional change for pixels in a given image, useful for motion estimation or surveillance tracking.
Let’s look at how to create a Vision Hand Pose Request in iOS 14.
Vision Hand Pose Estimation
So how do you use Vision for this? To use any algorithm in Vision, you generally follow these three steps:
- The first step is to create a request handler. Here we are using the ImageRequestHandler.
2. Next, create the request. In this case, use VNDetectHumanHandPoseRequest.
3. Finally, you get potential results or observations back. These observations are instances of VNObservation based on the request you made.
Vision framework detects hands in a detailed manner
Here I am showing you a quick overview of the hand landmarks that are returned. There are four for each finger and thumb and one for the wrist, for a total of twenty-one hand landmarks. We have a new type called VNRecognizedPointGroupKey. Each of the hand landmarks belong to at least one of VNRecognizedPointGroupKey groups.
If we find a hand, we’ll be getting an observation back and from that observation we can get the thumb points and the index finger points by using their VNRecognizedPointGroupKey by calling recognizedPoints. Using those collections, we can look for the finger points. We ignore any low confidence points and then at the end of this section, we convert the points from Vision coordinates to AVfoundation coordinates.
So let’s go into processPoints.
- Convert from AVFoundation relative coordinates to UIKit coordinates so you can draw them on screen
- You call the closure with the converted points.
Displaying Fingertips
pointsProcessorHandler is going to get your detected fingerprints on the screen. You can pass those values to your SwiftUI view and display them on your camera overlay.
Summary
In this tutorial, as you can see, it is so easy to take advantage of all the new API’s in Vision to perform hand recognition. Here, I showed you how easy it is to detect individual fingers on your hand. You can develop your project in a more advanced way, for example, by detecting the junction of the thumb with the index finger and then draw on the screen of your iPhone without touching it. It is also possible to drag items on the screen of the selected device. Now a question for you: how should the hand be placed to do something like this? 😀 You can also use all the points on the hand and thus control a robot arm remotely. Which seems really interesting. As you can see, the possibilities are endless.
Below you can see video from our sample project with using hand recognition.