Everyone has experienced the feeling of seeing yourself appear on the screen of their phone unexpectedly, because they opened the camera-app with the selfie camera enabled. Not pretty. Well, today I learned that this is due to an effect called perspective distortion.
According to wikipedia, when shooting a portrait foto and fitting the same area inside the frame:
The wide-angle will be used from closer, making the nose larger compared to the rest of the photo, and the telephoto will be used from farther, making the nose smaller compared to the rest of the photo.
Photographers have known this for ages. That’s why professional portraits are usually shot from a distance, using a telephoto lens to fit the subject’s face in the frame. But us civilians mostly capture their faces with a selfie camera, which uses a wide angle lens. Maybe if we had 4m long selfie sticks, we could do something about it. But that doesn’t seem very practical to me.
Researchers from Princeton and Adobe have developed an algorithm that can adjust camera distance in post production . They achieve this by estimating a 3D model of the face, in which camera position and orientation are included, and fitting this to the 2D image. If you then manually change one variable in this model, the algorithm calculates the expected changes in the remaining properties and the result is projected onto a 2D image corresponding with the new camera position.
The representation of the face is obtained by automatically location 66 facial features around the eyes, nose and chin. For this step, the researches employ existing technology by Saragih et al (2009) . Because the detector they use doesn’t find key points on the ears and top of the head, these points have to be added manually. Those points are necessary to incorporate the ears and hair into the model. Without those, warping would result in an uncanny result where the perspective of the face changes, but that of the hair and ears stays the same.
The head model is trained on existing data set of 150 faces that are scanned with a kinect camera (these cameras capture depth information in addition to just light). These scans are automatically annotated with the same facial key points, but in 3D.
The key points on the 3D head model and the 2D image are marked in the same order. This way, one knows the location of a 2D facial point on the 3D mesh. This correspondence is used to minimize the sum of distances between each pair of key points, resulting in a head model that fits the profile of the subject in the selfie.
Why should you care?
It may not me life changing technology, but I think it is valuable to celebrate minor things too. Especially because ‘groundbreaking’ research never seems to make its way into consumer products. But this technology works as we speak, despite the unpractical runtime. You can try out the demo here.
This is what makes it so exciting. One can easily speculate that Adobe will use this technology in future releases of their photoshop software, as it becomes more accurate and efficient. Further down the road, it might even be a standard filter in any photography or social media app.
Other Artificially Intelligent selfie tools
When looking further into the topic of ai assisted selfie beautification, I came across an app called FaceApp. It utilizes deep learning techniques to modify your face. You can, for example, put a smile on your face. They also include an age and gender swap function, but these are very unstable. The creators promise ‘photo realistic’ renders, but in practice you can see that the images are obviously altered.
Another project I came across attempted to gain some insight into what makes a good selfie. The developer trained a convolutional neural network to classify a selfie as either ‘good’ or ‘bad’. Good selfies are those that were seen by many people and liked by many people, bad selfies are those that are seen by many but not liked that often.
It is worth noting that one of the conclusions of examining the results of the selfie judging neural net was that those pictures that were taken very close-up often ranked as ‘bad’ selfies.
Real time video manipulation
On a scarier note, there are also some algorithms around that are capable of manipulating video in a way comparable to what FaceApp does. Stanford researches have named their implementation Face2Face. Their software allows for an actor to project his or her facial expression onto a video of a target face. This all happens in real time. It is easy to see that this can be used for malignant purposes, like fake news. But the researches mention that this technology will see its main application in simulated environments like a virtual reality conference call .
[^1]: Fried, O., Shechtman, E., Goldman, D. B., & Finkelstein, A. (2016). Perspective-aware manipulation of portrait photos. ACM Transactions on Graphics (TOG), 35(4), 128.
[^2]: Saragih, J. M., Lucey, S., & Cohn, J. F. (2009, September). Face alignment through subspace constrained mean-shifts. In Computer Vision, 2009 IEEE 12th International Conference on (pp. 1034-1041). Ieee.
[^3]: Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2387-2395).