UAlbany News Podcast

Detecting Deepfake Videos with Siwei Lyu

Episode Summary

Siwei Lyu is an associate professor of computer science in UAlbany's College of Engineering and Applied Sciences. Lyu and his research team discovered a way to detect deepfake videos. The clue is in the blink of an eye - literally.

Episode Notes

The UAlbany News Podcast is hosted and produced by Sarah O'Carroll, a Communications Specialist at the University at Albany, State University of New York, with production assistance by Patrick Dodson and Scott Freedman.

Have a comment or question about one of our episodes? You can email us at mediarelations@albany.edu, and you can find us on Twitter @UAlbanyNews.

Episode Transcription

Sarah O'Carroll:
Welcome to the U Albany news podcast. I'm your host, Sarah O'Carroll.

Sarah O'Carroll:
Creating a deceptive photo or video by combining source material with something added has never been easier. Artificial intelligence software is being used both to make these so-called deep fakes as well as to detect them. With me today is Siwei Lyu, an associate professor of computer science. His research looks at how to detect in a video, whether a face has been super imposed on another person's body. He says the difference can be detected in the blink of an eye.

Newscast:
The old days, if you wanted to threaten the United States, you needed 10 aircraft carriers and nuclear weapons and long-range misses. Today, you just need access to our internet system, to our banking system, to our electrical grid and infrastructure. And increasingly all you need is the ability to produce a very realistic fake video.

Newscast:
They're called deep fake videos. Lawmakers on both sides, fear they could become the latest weapon in disinformation wars against the US and other democracies.

Sarah O'Carroll:
Siwei why are deep fakes a problem? I know it might sound obvious, but what makes this form of deception important enough to research and where are they having an impact?

Siwei Lyu:
Well I think, well first of all, as a basic psychology, you know, we humans understand the world using highly relied on the information we get from our visual system. So, there are sayings like, you know, seeing is believing, you know, a word as a picture worth a thousands of words and that means that, you know, we truly rely and trust what we see with our eyes as what is happening in the world. So if there's any possible way that we can distort what we are seeing that is, you know, just hypothetically that would cause a big problem. And the deep fake is actually playing with this kind of tricks with our eyes. So what I did is basically generating a fake videos and in particularly fake face videos, again like we recognize a person, another person based on mostly by looking at their faces or listening to their voices.

Siwei Lyu:
If these are becoming unreliable because we can sort of fake them, obviously then that also distort our perception. You know very significant way distortion, distorting our perception of what is happening. For instance, suppose somebody, we see somebody in a place that is at odd with what we usually believe that person should be and this suppose that is actually coming from a video that is being manipulated or synthesized using some software. Then once we started making conclusions and making decisions about that situation, that is actually causing the problem, causing the impact. So I will say, the deep fake software is actually potentially because the way it can manipulate our perception, it will actually causing more impact down road in the future.

Sarah O'Carroll:
Now, speaking for those who might have relatively little knowledge or exposure to artificial intelligence software, can you walk us through how deep fake technology works?

Siwei Lyu:
Okay. So deep fake technology is a recently developed method that can synthesize a videos, realistic videos, in particular face videos. So what it does, I always take an analogy like what I do when I teach my machine learning class at UAlbany is a machine learning algorithm. You can think about it as a baby, okay, so you want to teach a baby speak some words or you know you teach the baby to recognize some objects, right? So what do you do teaching a baby? You actually gave her a lot of examples, right? You show her. This is the, you sold her a picture of Apple. You tell her this as an apple, you silver, a real fruit of apple. You'll tell her this in the Apple, right? And hopefully the kid will actually, the baby, you actually start to make the association of those images and those names and later on when she is presented with something else and she will be able to say, this is an apple, this is not right.

Siwei Lyu:
So deep fake algorithm is doing something like that. And the task is specifically trying to achieve is actually taking one person's face. And since the synthesize using another person's face, that is mimicking the original person's facial movement, emotion and micro-expressions, right? And again you can think about this as a translation. Just not translating languages but translating facial emotions and facial characteristics. Right? And so like translation, I gave English sentences, I gave you a lot of French sentences. I tell you this are sentences have similar meanings. The software, the algorithm or the baby actually learned to set up that connection. And the same thing happening here. What we do in deep fake is actually, we got a lot of images of one particular person and we train the model of that person.

Siwei Lyu:
And then whenever you have, we got a face of another person that is in the original video, we treat it as like say an English sentence and that model will actually translate that face into another person's face. Like translating the English sentences in French. And then we put that face back into the original video using again a computer with otherwise computer vision technology like face-face alignment. And then we put the face back either in the video taking care of those artifacts. When you look at the face, the end result it looks like a realistic, fake videos.

Sarah O'Carroll:
Wow, that's really interesting.

Siwei Lyu:
Yeah.

Sarah O'Carroll:
And this technology is accessible to anyone, not just movie producers with big budgets.

Siwei Lyu:
That's right. That's one of the reason that is deep fake is becoming so popular because it used to be you can we already have capacity of making a realistic fake videos almost 10 years ago, but to be able to do that, you need high end computers, you need a lot of training, you need to understand all those 3D modeling works in computer graphics.

Siwei Lyu:
Now what deep fake changed is essentially lowering the threshold. It is basically a piece of software and you need to have access to a lot of images of someone. But this is not a problem currently with the internet. Right? And the Facebook and all the photograph portals who can, you get tons and tons of images of a particular subject and then besides you need a computer that is faster enough and one this machine learning software and hardware on it. And again that is accessible. I mean spending a couple of thousands dollars these days, you can get a very decent computer for doing that. So I think the, the problem caused by the defect is essentially lowering the threshold of anybody can make this kind of fake videos.

Sarah O'Carroll:
Well going back to before you started this project, can you tell me a bit about what your goals were going into it?

Siwei Lyu:
We actually got interested in this problem in January during a meeting. One of my colleague and my former PhD advisor, professor [inaudible 00:07:58] from Dartmouth college. He actually pointed out this problem to me, and we actually started collaborating on solving this problem. We actually, you know, I have been working in this area of digital media forensics for more than 15 years and this is actually one of the research team main research team I am doing here at UAlbany with my lab. So immediately because of the potential social impact of this problem, we got interested. So I work with my PhD student, [inaudible 00:08:32] Lee and my colleague at a computer science assistant professor [inaudible 00:08:37] . So we started by generating a lot of fake videos ourselves. So we got a copy of the deep fake software. We made some modification to it.

Siwei Lyu:
We're actually generating better fake videos than anybody else for a while. And we have a little fun that. We made quite a few fake videos and the way I believe, any detection algorithm we can develop, the first step should be understanding how the fate, how the [inaudible 00:09:09] precise goals and that's our purpose. So we started with generating a lot of fake videos. We observed them. We started coming up with some simple, some small ideas to try to detect them. Nothing actually works for a while until one day we actually staring at the video for quite a long time. You know, this is, it's kind of a sort of pain staking and a little bit frustrating but we always got this kind of eerie feeling some of these fake videos do not look right and until at that moment we realize it was like all the faces in the synthetic videos, they don't blink.

Siwei Lyu:
So the kind of weird feeling, it's like you are in a staring task with your friends and your friend is not blinking for a whole minute. Right? So that kind of feeling, right? That's kind of, yeah, it's uncomfortable. So the first time we noticed this, you know, the first thought I had is this must be some sort of artifact of that particular video. And then we go back and check all the videos we generated. And it seems very consistent. No blinking in all the faces synthesized. And then we go back to some of the videos we clocked at the online and same happened. So that actually gave us a sort of interesting cue that this may be one way we can tell the difference between a fake video and a real video.

Sarah O'Carroll:
So, but that must've been a really neat moment too.

Siwei Lyu:
Yeah, it is. It's kind of an aha moment. Yes. It's a kind of a [inaudible 00:10:43] discovery in my opinion. Yes. So we go ahead and look for, we work on two things. The first is we want to understand why this happened. I mean, it cannot be a coincidental, just a coincidence. It has to have some deeper reason. And we go back to the original deep fake method and understand it and we actually figure out there may be a reason for that. And this is actually go back to my previous explanation, algorithm as a baby, right? So the baby learn by associating the examples and the labels we gave them. Right? And the same thing happened for this deep fake algorithm. It learns to generate faces based on all the faces we give it, and where do we got all those faces?

Siwei Lyu:
As I mentioned that we need to train, if we want to train this deep fake algorithm we need tens of thousands of images and maybe more of one particular person's face. So usually we get them from the internet, we download all this images and we do a Google search of somebody's name, we got, you know, many, many faces. We download all of them and we use them as the training for this model. So when we actually look at all those faces using in the training data set of this algorithm, one thing was very striking is we rarely find any face with eyes closed. And this actually is one technical reason why this software has difficulty generating a blinking eye, because it never seen a closed eye. So it's just like a baby. If you have never given her a picture of a banana and tell her this is a banana, when you actually show a banana, she would have a lot of difficulty understanding what it is.

Siwei Lyu:
Right? The same thing happened here because in the training we have no images with eyes closed. And the reason for that is kind of interesting because for all of those images, first of all, blinking is a very transient eye movement, right? So actually capturing blinking in action using a camera is not easy. And so it can usually take less than you know, one 10th of a second. So you know, it's really not easy to capture that. But even in some rare moments we actually capture the eyes closed when we offload the photographs, either ourselves or somebody else. We think those photographs are not good photographs. So there's a bias in the training dataset that we actually accidentally removed those images with eyes closed. So all of this artifacts contributes to the fact that the defect algorithm has a lot of trouble synthesizing images with eyes closed.

Siwei Lyu:
So this helps give us a first a hint of how to detect this kind of fake videos. So subsequently the second thing our group did is developing the algorithm actually can detect eyes closed, open and close and sort of ironically the way we do that and we did that is actually training another deep learning algorithm for detection of eye blinking, sort of using the machine learning algorithm to detect another machine learning algorithms product. As it turns out that this algorithm works pretty reliably and we summarize the results in the paper we put online in June archive. And ever since we start to further explore this algorithm. So we'll keep working along that direction and trying to find a more reliable ways to detect this kind of fake videos.

Sarah O'Carroll:
You've been talking about blinking and I'm wondering about the other gestures or bodily behaviors.

Sarah O'Carroll:
What about those and other signals?

Siwei Lyu:
Yes, there are. Actually, this is actually a general theme along the way, we were working on detecting deep fake videos using blinking, there was a general theme emerging from this line of work is using physiological signals to detect fake media. Okay? So physiological signals here, I mean all the signals we can detect that is originated from a human's biological process. For instance, blinking is one, heart beating, breathing or those macro expressions on faces when we have, when we spontaneously react too certain you things that we see and those things are very, very subtle and difficult to fake and also there'll be very effective for detecting fake videos involving human faces or human body.

Siwei Lyu:
And also is as we have been working in this area of digital video forensics, the previous established method, there are three categories. There Are signal based, there are physics-based and there are semantics based. So signal based just means that we detect with fake media based on any sort of statistical abnormality in this at a signal level, right? Like [inaudible 00:16:12] compression or a camera noise and we have work on that. Our previous work on that, you're right and physics-based is like shadows lighting so we put for instance two faces in the same image but they come from different lighting environment. We detect the inconsistency of the light environment to be able to know this image is actually manipulated. And semantic-based is essentially verifying if this image, this just picture or this image is happening at a time and place as it's supposed to be or you know that, that kind of a verification, right? GPS verification.

Siwei Lyu:
So we think that the physiological signal is a complimentary category of this action and it has not been extensively explored. So that's going to be one of the main research theme of my lab in the coming years.

Sarah O'Carroll:
So I was gonna ask, what's next for this avenue of research? It sounds like it might be building upon this because it's so new.

Siwei Lyu:
Right, right. I think we'll keep pushing on the frontier of detecting this AI generated fake videos. There are a lot more challenging problems there and I think blinking is only a first a small staff. Even though we believe this is the first ever method directly designed, developed to targeting the deep fake generated videos. But I think to be able to do this more reliably, we need more sophisticated algorithms and the methods and that's what we're working very hard on these days.

Sarah O'Carroll:
Well, one of the questions that I have is, you know, technology seems to be eating away at these distinctions from, on one hand we have the real and the authentic and then the other we have the simulated, the created and the recreated. But it almost feels like it's an arms race where no matter how sophisticated our tools for detection become, well there's those on the other side that are advancing this dissemination of misinformation and they're just as a savvy technologically. So is this an accurate description or where do we stand?

Siwei Lyu:
Yes, I think you're right. It's arm race. Exactly. That's the exact word we use. And for the particular nature of design of work. Is very different from other academia research area because we do have an active, I would say enemy on the other side, which are those people making fake media. And I think this is a, there are two levels. My understanding of this issue is two level. The first, the level on pure research is this, just make this line of work very exciting and challenging. We're not trying to solve a problem that's there fixed like we're actually working against an opponent. It's like playing a chess game. So we always getting better, once our opponent getting better. So I think this make this work challenging, more challenging and more exciting.

Siwei Lyu:
On the other hand, I think it does causing a problem because at the very high level, this is an imbalanced arm race. The people making those false fake media, they actually got more incentive, have more advantage on their side. On the other hand, the digital media forensic is still relevant. Speaking, less known field research field in computer science. Even though we're seeing a lot more activities recently, but I think we're still at the little disadvantage, but I believe the whole community will working hard and we were catching up.

Sarah O'Carroll:
That actually was more positive response of saying, you know, well we might be facing an opponent, but where it makes it more exciting that way versus you know, we're doomed.

Siwei Lyu:
Right. Right. I think every time I gave a few [inaudible 00:20:25] talks at the various places, especially for a journalist. I also gave a talk you might've, talked at Yale law school and I think each time I went there, not just presenting my work, but I'm also giving people hope because we're not, we're not sitting here idly, you know, letting this wave of fake media, distorting us. There are actually, you know, scientists and researchers trying to use technical methods to stop this problem. So I think in general, that's the message people take the most. But on the other hand, I think it by itself, this is a very interesting and challenging research problem and we'll keep working on that.

Sarah O'Carroll:
And another concern that I would have is that we might be losing even an interest in distinguishing between what's been distorted versus something that's just been filtered so that it looks really nice on our Instagram or something. I hope that we aren't losing the priority of making sure we believe that there is, there should be a distinction, a separation between what is changed and what is truth.

Siwei Lyu:
Right. I think, I think that's more of a philosophical question is once we have so much information and some of this information is very likely, it's under, it's subject to, obviously tampering can we believe, can we say what is, what is true and what is false? Right? So I use my talk, I use a title I call seeing is not believing, seeing is deceiving because truly we're facing an era where we have so much information, but we have so little trust in them. And that is what is, I think that to me, that is the most severe problem that this kind of deep fake, deep media, fake media and fake news are creating, right? There always fake media always, you know, some misinformation. But the amount we have right now and the level of easiness that people can make them is really bothersome. And to a certain extent, the worst scenario is we stop trusting everything and that's what do we don't want to see. And that's why, you know, we need all kinds of measures from an individual user's point of view, from the technical point of view, from the ethical journalists, you know, and legal point of view. We need to do some to put some measures in and stop or defer this trend.

Sarah O'Carroll:
Siwei, thank you so much for sharing your research with us. Really important work.

Siwei Lyu:
You're welcome. My pleasure.

Newscast:
Thank you for listening to the U Albany news podcast. I'm your host Sarah O'Carroll. And that was Siwei Lyu, an associate professor of computer science. You can let us know what you thought of the episode by emailing us mediarelations@albany.edu and you can find us on Twitter at U Albany news.