Mathematics and Fluid Dynamics: How Scientists Recognize Audio Deepfakes
Audio deepfakes are even more of a threat than video fakes, as they are much more difficult to detect. Researchers are looking for ways to solve a problem in mathematics and even anatomy
Video and audio deepfakes have become so realistic that even large companies and media can be deceived by them , not to mention ordinary users. To identify deepfakes, many researchers have turned to the analysis of visual artifacts on video. However, audio deepfakes without a picture are more difficult to identify. New research is trying to solve this problem as well. RBC Trends tells about the most promising developments in this area.
Sound Recognition Of Deepfakes
Many researchers are turning to the search for extra sound elements in fake voice recordings. The American Pindrop has developed a method for recognizing audio deepfakes by sound artifacts. Later, the technique was improved to be used to detect more complex deepfakes. The algorithm is capable of analyzing from 8,000 to 50,000 data samples per second of recording.
The method is aimed at identifying artifacts or elements of sound that should not be present in audio deepfakes. For example, there are fricatives or fricatives f, s, v and z. It is especially difficult for deep learning systems to learn the sounds that occur when pronouncing such consonants, because the program mistakes them for background noise. As a result, on fake recordings, these consonants sound different or are completely omitted.
In addition, it is difficult for algorithms to generate word endings that they perceive as background noise. Because of this, many simulated recordings break off abruptly at the end, while human speech sounds smoother.
American Resemble AI, which previously launched an audio deepfake synthesis platform, has now developed a deepfake recognition tool.
An open source tool called Resemblyzer uses artificial intelligence and machine learning to detect deepfakes, producing high-quality representations of voice samples. Next, the system predicts whether it is real or generated. To do this, she creates a vector of 256 values that summarizes the characteristics of the voice.
Resemblyzer works about 1000 times faster in real-time than its counterparts and also recognizes third-party noise in recordings.
Hydrodynamics To The Rescue
The Conversation teamed up with the University of Florida to develop a method that measures the acoustic and hydrodynamic differences between voice samples from real people and deepfakes. The researchers used knowledge about the anatomy of the human voice.
You can measure the acoustic properties of your voice if you analyze the sounds that a person produces using the vocal cords, tongue, and lips. Human anatomy does not allow him to make more than 200 of these sounds, so his range is quite small.
In contrast, audio deepfakes are created after a computer listens to a set of audio recordings to extract key information about unique aspects of the victim’s voice. The attacker (in most cases) chooses a phrase that should sound like a deepfake, and then uses a modified text-to-speech algorithm and generates a sample that sounds like the victim is speaking the selected phrase.
When creating an audio deepfake, the computer reconstructs the human vocal pathway. However, AI is not able to recreate all its anatomical features.
As a result, deepfakes mimic forms of the vocal pathway that humans simply do not have.
Some are as thin as a drinking straw, unlike the human vocal tract, which is much wider and more variable in shape.
Detecting Deepfakes By Voice Frequencies
Other researchers are turning to physics to identify audio deepfakes. Scientists Joel Frank and Lea Schoenherr from the Horst Gortz Institute for IT Security at the Ruhr University Bochum have developed an algorithm that can distinguish a real human voice from a deepfake by frequency.
They collected about 118 thousand samples of synthesized voice recordings or 196 hours of deepfakes in English and Japanese. To keep the dataset diverse, the team used six different AI algorithms to create the deepfakes. They included algorithms for detailed frequency analysis of audio data.
After that, the researchers analyzed the frequency distribution in real and fake audio recordings and compared them.
This comparison “revealed subtle differences in high frequencies between real and fake files.” According to the researchers, the difference was significant enough to detect a deepfake.
The developed software is only the beginning, as “the algorithms are intended for other researchers as a starting point for developing new methods for detecting deepfakes.”
Some research groups are working ahead of the curve, trying not only to create a mechanism for detecting deep fakes, but also for classifying them. Researchers at the Spanish University of Malaga have developed a fake audio dataset. It was created by analyzing the deviations of the characteristics of real and fake sounds. Using the generated H-Voice dataset, the researchers were able to build a machine-learning model to detect fake audio. The model in 98% accurately determines the deepfake.
The H-Voice dataset includes 6672 visualizations of voice recordings. Of these, 2088 are fake and 2020 are original, as well as 864, which are a mixture of them. With this dataset, researchers can train, validate, and test deepfake classification models. It can be used to identify the type of deepfake, for example, to determine if it was obtained through machine learning, voice simulation techniques, or by manipulating a real voice.
Researchers are also turning to the analysis of extraneous and unnatural noises that may be present in voice fakes. Singapore’s DSO National Laboratories, which specializes in defense research, is developing such tools. She developed a program that assesses unnatural effects in audio recordings – sharp pauses and sudden changes in the pace of speech. In addition, the algorithm is resistant to “hostile noises”, especially embedded sounds in the video, which should make it difficult to recognize a deepfake.
Detecting Deepfakes Using Math
Some scientists use mathematics to study audio deepfakes. An international team of researchers has unveiled a method that analyzes a few seconds of audio to determine if it’s genuine human speech or a deepfake. The method includes four main steps. First, the researchers apply the mathematical rule to the raw audio signals. They then use the resulting Fourier coefficients to construct spectrograms of audio signals. After that, they analyze the spectrograms using a neural network and classify them.
The researchers used a dataset of 120,000 audio, which included both deepfakes and recordings of real people speaking. On this set, they trained the neural network to classify sounds.
The researchers then used the Fast Fourier Transform (FFT), a faster calculation algorithm that produces results in less time than formula calculations. They converted each received coefficient into a value in decibels. After that, for each sound recording, spectrograms were built – images showing the dependence of the signal power density on time. They transmit information about the intensity of the audio signal as a function of time and frequency.
One axis of the spectrogram represents time and the other axis represents frequency. The intensity of an audio signal is represented by color at a specific time and at a specific frequency. Brighter colors, close to shades of yellow, indicate greater intensity and loudness of audio signals. On the other hand, darker colors, close to shades of purple or black, indicate less intensity and low volume of the sound. The neural network analyzes in detail various frequency ranges according to the spectrogram, revealing artifacts in them. This method can also help in detecting video deepfakes.
For More Updates Click Here.