The following pages will help you understand the technology behind deepfakes
through the narrative of our hands-on research. Furthermore, we will discuss
how visual flaws allow you to recognise them through our experiments, and ex ternal examples.
In this video Elon Musk’s face has been translated to a baby. This type of face swap is the most common use of deepfake technologies. The edges are not sharp and there is a difference in skin colour.
Technical details
Visual flaws
Skin colour mismatch:
A difference in skin tone between
the mask and target face. The face seems
to be covered by a layer of different
colours, showing edges or spots.
A deepfake is created by a computer program that trains itself to reproduce a face. By adjusting parameters in its system, the program becomes better in recreating a specific person, this is a type of deep learning. As an output, the model creates an overlay for the face, traces of this facial mask can be seen in this video.
Deepfake target video: YouTube - Cutest Baby
Montage Ever
Deepfake video source: YouTube - Baby Elon
Target Video
Deepfaked Video
Deepfake videos can be made with consumer computers but you need quite a powerful graphical card. This video shows our first trial. It is very important to have the right source videos.
Shia LaBeouf
Pilar
Technical details
Visual flaws
Skin colour mismatch:
A difference in skin tone between
the mask and target face. The face seems
to be covered by a layer of different
colours, showing edges or spots.
Mismatch expressions:
The expressions on the face do not match the target
face. Natural behaviour of facial features seems
unarticulated, doubled, or not present.
Visible edges:
The edges of the mask are visible. This can either be
as a sharp, or blurred edge surrounding the face.
The images used to train the algorithm did not contain the right facial expressions to cover Shia, nor did they contain footage of a facial profile. If the neural network is not trained for these situations, it cannot produce the right overlay. Pay attention to how Shia's mouth appears from underneath the mask, resulting in two mouths.
Target video source: YouTube -
Just Do It!
You need two videos, a source video, and a target video. The program will train itself according to both and create a mask that can be placed onto the target video with video editing software.
Original target video: The Devil Wears Prada (2/5) Movie CLIP -
Andy's Interview (2006) HD
Original
Dataset
Mask
Alignment
Deepfake
Post
Select a target video you want to insert a face into. Choosing a video that is steady and consistent will give you a better result.
Record a dataset for the face you want to place, matching the lighting and expressions as much as possible.
Cover the faces from other people in the target video, else, the algorithm will use them to train itself.
The algorithm will crop the faces for the training process and save the position to later position the output mask correctly.
The algorithm generates a mask of the face of the source that you need to align onto the target video.
Video editing software will allow you to blend in the mask better and refine the final result.
In this experiment, two programs were given a different number of images. More source material clearly improves the output. The model had more information on the face and could develop a better result.
Technical details
This experiment was executed with the same source video exported at a different framerate, hence both models were trained with the exact same studio setup. The amount of training cycles per image is equal but the training time increased due to a bigger dataset. It is clearly visible that the algorithm trained with more images can produce a more refined expression matching the target.
Original target video: Rooftop Showdown | The
Reichenbach Fall | Sherlock
Benedict Cumberbatch
Arthur
We took all the Facebook images from one of our team members and created a deepfake. She smiled in almost all of the source images, as a result, the algorithm could not generate nonsmiling output.
Natalie Portman
Pilar
Technical details
Visual flaws
Skin colour mismatch:
A difference in skin tone between
the mask and target face. The face seems
to be covered by a layer of different
colours, showing edges or spots.
Mismatch expressions:
The expressions on the face do not match the target
face. Natural behaviour of facial features seems
unarticulated, doubled, or not present.
Blurred face:
The mask is blurred. There is a difference in
sharpness between the mask and the rest of the video, either
blurred or a different resolution.
Profile borders:
The side view of the face seems incorrect. The deepfake
mask is not working, less detailed, or incorrectly aligned.
A video contains much more facial nuances compared to the images we took from Facebook. Our team member presented herself on the internet according to self-selected images, leaving out all the content needed to create realistic facial expressions for speech. Although better technologies might be able to fabricate expressions, without diverse source material, it will never convince an acquaintance.
Original target video: YouTube - Padmé meets
Anakin
Even with a good source, it can be hard to create a deepfake. Indiana Jones contains chaotic shots, compared to the cleaner videos we used before, the algorithm now has difficulty keeping up.
Technical details
Visual flaws
Visible edges:
The edges of the mask are visible. This can either be
as a sharp, or blurred edge surrounding the face.
Blurred face:
The mask is blurred. There is a difference in
sharpness between the mask and the rest of the video, either
blurred or a different resolution.
Flicker effect:
Flicker of the original and the deepfake face. In
critical moments the algorithm can’t recognise the face
and ceases to create the mask for a moment.
Wrong perspective:
The deepfake has a different perspective from the rest
of the video. The source and target video differ in
focal length.
The deepfake was exported with a resolution of 64 px which meant that it took less time to train the algorithm. In this case, the model only had to learn how to create a low-resolution image. In close up face shots, the low resolution is evident.
Original target video: Rope Bridge Fight |
Indiana Jones and the Temple of Doom
Harrison Ford
Andrej
This deepfake video was made out of a talk show fragment where Bill Hader impersonates Arnold Schwarzenegger. By using the right source material for Arnold Schwarzenegger, the results came out very convincing.
Bill Hader
Arnold Schwarzenegger
Technical details
Visual flaws
Face occlusion:
When objects pass in front of the face, the mask
distorts or covers the object.
The face blending, skin tone, and resolution are very good. The distant shot makes it difficult to see any blur. Probably the post-production was well executed. Only when Bill Hader moves his finger in front of his face, it disappears behind the mask. The difference in sharpness and the angle of the finger suggest that the creator has tried to hide the effect in post-production.
Original target video: Bill Hader Presents: Schwarzenegger Baby
Source deepfaked video: Bill Hader impersonates
Arnold Schwarzenegger
For this experiment, both models were trained for a different amount of time. One model was trained for 4 hours whilst the other ran for 48 hours. The facial detail improved, and the face is more three dimensional.
Technical details
Ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc .
Ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc .
Ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc .
Ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc .
Ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc ico desc .
Training time is related to the number of iterations the algorithm performs. The model iterates between creating a face, applying recognition to compare with the source, and adjusting the parameters of the system to improve. Every iteration, the model does this cycle once for all source images. Depending on the power of the computer the iterations can be executed in less time.
Original target video: Constance Wu Explains What "Couture" Means
Constance Wu
Yueling
For this experiment, we created both the source and the target video ourselves. The approach of the algorithm is clearly visible. H128 creates a square mask whilst SAEHD matches the face better.
Arthur
Andrej
Technical details
H128 is the lighter model of the two. It reaches quality in less time. The more precise mask of SAEHD deals better with the occluding finger and it blends in better with the lighting. H128 seems to be better trained to make the face. The mask is sharper, more stable, and performs better with movement and perspective changes. However, fora inform that with more training time, SAEHD will outperform H128.
Nowadays, deepfakes can reach a really high quality that is very difficult to recognize. Although we have solely focussed on face swaps, deepfakes can also re-enact a face to make it seem as if a person said something. Be aware of this technology and remember the signs we have showed you.
Technical details
Facial re-enactment takes much more computing power but is much harder to recognize. Many of the difficulties coming from the source video do not count for re-enactment, however, the algorithm will still act similarly. The recreated parts of the face will be slightly blurred and less detailed.
Deepfake source video: Fake Freeman mouth manipulation.
Also, pay attention to the audio, there might be flaws or lip-sync problems. If a video is likely to be a target, you question if it is real, and, according to what you have learned on this website, the conditions are suitable for a deepfake, always check the source.