In today's digital landscape, face detection has emerged as a prominent focus. A quick glance at the latest smartphone releases reveals the integration of facial recognition technology, allowing users to conveniently unlock their devices or capture potential intruders. Social media platforms like Facebook offer automatic tagging features, effortlessly identifying you and your friends in uploaded images. Meanwhile, Snapchat enchants us with its popular face filters, transforming our appearances into playful characters like pirates or puppies. Google Photos facilitates effortless photo organization by grouping similar faces, enabling easy access to cherished memories with our closest companions. However, it's important to acknowledge that face detection technology has also become entangled in controversial debates surrounding issues such as DeepFakes or the privacy implications associated with services like FindFace.

Meet Pepper the Robot

Another technology we are fascinated with is robotics; and Pepper is one of most popular humanoid robots. He can speak, listen, gesture, dance and even recognize your emotions. But one thing that Pepper is currently unable to do is remember a human face. Today I will show our approach to give Pepper this ability.

Face recognition

So how we can enable Pepper to “see” people speaking to him, and thereby enhance the quality of dialog and the conversation? We have to take few steps to achieve that.

Capturing image from camera

Pepper has two cameras built-in to his body. One on his tablet and the second on his forehead. Because Pepper can move his head it will be easier to capture people’s faces from the forehead camera as in the picture below.

Head of the robot - Pepper

Face detection

N/A
Example filters used to face detection.

Before recognizing a face, Pepper first needs to detect that a face is present. This is commonly achieved with the Viola Jones Algorithm, which uses group of filters to make a decision if the current fragment of image contains a face. This algorithm is commonly used in cameras to spot the face and sign it with square.

Pepper has a face detection feature implemented, but in order to recognize the corresponding face, we need to take a full-size color image. So we combined Pepper’s ability to detect faces to ensure that he is facing his camera to a person before capturing the image. That way, we don’t accidentally take a picture of wall or table if the person suddenly moves.

N/A
Result of face detection algorithm

Pre-processing

When a face is detected, we can focus on one person at time. This next step is not needed but it helps in further processing. We want to align the detected face so that the eyes are aligned horizontally and placed in the same place in processed picture. We can also apply histogram equalization and gamma correction. In the last step we crop the image.

N/A
Eye alignment

Feature extraction

Now we can transform the image into a vector of numbers, which will be used to describe the person. Sounds like rocket science? Let’s simplify it to only include two features: skin color and gender. Let’s assign a dark skin tone -1 and fair skin tone as 1, and assign 1 for women and -1 for men. With this we are able to place people on this feature space:

N/A

  

With such an assignment we can compute the Euclidian distances between these vectors:

N/A

We could choose many other features, like nose size, hair color or head shape. But then we would have to train classifiers to look at every specific feature. But instead of doing it manually we can hire neural networks to select and tune up the best features. Luckily some researchers have already done this, so we can use it too. Such a network for the given image returns 128 numbers. In order for facial recognition to work, the numbers resulting from pictures of the same person should be close in distance, while further in distance from images of different people.

N/A

Finally, we compare the calculated vector against a database of vectors to predict if the person Pepper is looking at has met him before.

Let’s be friends

We can use this technique to enable Pepper to remember people. If he meets you and doesn’t know you yet, he can ask you to introduce yourself. Then when he knows your name he can take a picture and transform it into a vector of numbers. Now you have a friend in Pepper. If Pepper sees you again, he will be able to calculate the mathematical distance to all the people he has already met and predict the person you are most similar to. Almost always you will be most similar to yourself, and that’s how he will retrieve your name.

Trivia

There is cognitive disorder called prosopagnosia which causes problems with remembering faces. People often cope with this by looking at certain details such as clothes, hair color or skin color. That’s very similar to what the neural network tries to do. It can’t perceive the face as whole, so it tries to capture many features which when combined, allow it to identify the person.

Summary

This solution open a lot of new capabilities, starting from simply remembering someone’s name. Pepper could now retrieve last topic he was talking about. He could also build some knowledge about the people he has met, like their preferences or hobbies. He could bring back emotions felt in the last conversation. Such interactions would make him an even more authentic humanoid robot. We could also give permissions to allow Pepper to send mail from your account. There are infinite possibilities and only our imagination is the limit.

Sources