Detecting the Deceptive: Unmasking Deep Fake Voices (2024)

Back to Articles
Community Article

PublishedOctober 29, 2023

Upvote1

AndyrasikaAnkush Singal

Introduction:

In an era where artificial intelligence continues to redefine the boundaries of technology, one of the most intriguing and concerning developments is the emergence of deep fake voices. These uncanny imitations of real human voices are crafted with remarkable precision and have the potential to deceive even the most discerning ears. In this article, we'll delve into the world of Audio Deep Fake Detection, exploring its significance, the challenges it poses, and the strategies employed to combat the rise of deceptive deep fake voices.

The concept of artificial intelligence has gained significant prominence throughout history, persisting as a subject of regular discussion and exploration in contemporary times. Artificial intelligence (AI) has been a recurring theme in numerous literary works and films, with its projected significance in future contexts. This thematic exploration of AI has been a subject of creative endeavors spanning several decades. In recent years, deepfake technology has emerged as a prominent subject of interest within the realm of artificial intelligence. Deepfake technology is widely recognized as an artificial intelligence and deep learning-based innovation. Numerous deepfake applications have had a big impact on the public recently. In addition to the production of manipulation films targeting individuals of high popularity, it is evident that deepfake technology possesses many potential applications across several domains. The objective of this study is to explore the potential applications of deepfake technology across many domains. Deepfake technology was examined in the study by concentrating on the concept of deep learning and referencing artificial intelligence technology. The study involved the classification of the many applications of deepfake technology by conducting a comprehensive literature analysis and analyzing examples of its usage in diverse domains. Based on the findings of the study, it is possible to categorize the significant applications of deepfake technology into four distinct groups. The previously mentioned categories include arts and entertainment, advertising and marketing, the film industry, political communication, and media.

The Role of Voice in AI:

The human voice is a powerful tool for communication, emotion, and identity. In the realm of AI, the role of voice has expanded dramatically, giving rise to a plethora of voice-related applications:

  1. Voice Assistants: Virtual assistants like Siri, Alexa, and Google Assistant rely on voice recognition technology to understand and respond to user commands.

  2. Text-to-Speech (TTS): AI-driven TTS systems transform written text into natural-sounding speech, enhancing accessibility and enabling natural human-machine interaction.

  3. Voice Authentication: Voice biometrics are used for security and authentication, allowing individuals to unlock devices or access sensitive information with their unique voiceprints.

  4. Audiobooks and Podcasts: AI has made it possible to convert written content into spoken words, expanding the reach and accessibility of literature and information.

Audio Deep Fake Detection: Revealing the Sounds of Deceit

  1. The Challenge of Audio Deepfake:With startling precision, audio deepfake technology can mimic a person's voice and speech pattern. This poses a significant challenge because it's increasingly difficult to distinguish between real and fake audio. Identifying audio deepfakes necessitates a multifaceted strategy that combines knowledge, technology, and alertness.

  2. Data Gathering and Arrangement:Data is the cornerstone of every deepfake detection algorithm. A diverse dataset encompassing both real and deepfake audio recordings is imperative. This dataset should represent a wide array of voices, languages, and settings. To extract significant elements from the audio, such as spectrograms or mel-frequency cepstral coefficients (MFCCs), preprocessing approaches are used. These characteristics serve as the foundation for machine learning models.

  3. Models of Machine Learning:Selecting the appropriate machine learning model is a crucial choice in the identification of audio deep fakes. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and hybrid architectures are examples of several types of models. Using pre-trained models intended for audio classification can be a good place to start.

  4. Extraction of Features:To distinguish between real and deep fake audio, feature extraction is essential. MFCCs, spectrogram pictures, or a mix of the two can be utilized as the model’s input features. These features capture the frequency and temporal aspects of the audio, aiding the model in identifying anomalies.

  5. Education and Assessment:The training procedure is the primary component of the detecting system. To train the model to distinguish between the two types of data, real and deep fake data are used. Techniques for augmenting data are applied to improve the robustness of the model. Many metrics are used to assess the model’s performance. To ensure the effectiveness of the model, testing with unseen data and cross-validation are essential processes.

  6. Optimization and After-Processing:The model's performance is maximized, and any biases or weaknesses are addressed through fine-tuning. Post-processing methods are used to improve the model’s predictions and lower the number of false positives.

  7. Continuous Monitoring and Real-Time Detection:The final goal is to deploy the model for real-time detection in audio files or streams. The model can function in real-world situations thanks to integration with audio processing frameworks and tools. It takes constant observation and updating to adjust to new deepfake methods.

  8. Ethical Considerations and User Education:For individuals and organizations alike, it's imperative that they are informed about the presence of audio deep fakes. Encouraging the responsible use of audio content and confirming its validity is a shared responsibility. Addressing moral and legal considerations, such as security and privacy concerns, is also critical.

  9. Ethical Considerations and User Education:For people and organizations alike, it is imperative that they are informed about the presence of audio deep fakes. It is our common duty to encourage the responsible use of audio content and to confirm its validity. Furthermore, it is critical to address moral and legal considerations, such as security and privacy concerns.

Source: Deepfake

Code Implementation

In this section, we will walk through the steps to download the Deepfake Detection Challenge dataset from Kaggle, which will serve as the foundation for your deepfake detection project. The Deepfake Detection Challenge dataset is a rich resource of manipulated and unaltered videos, an essential component for training and evaluating deepfake detection models.

Step 1: Import libraries

import numpy as npimport pandas as pdimport osimport matplotlibimport seaborn as snsimport matplotlib.pyplot as pltfrom tqdm import tqdm_notebook%matplotlib inline import cv2 as cvfrom pathlib import Pathimport subprocessimport librosa.displayimport librosa.filtersDATA_FOLDER = '/kaggle/input/deepfake-detection-challenge'TRAIN_SAMPLE_FOLDER = 'train_sample_videos'TEST_FOLDER = 'test_videos'INPUT_PATH = '../input/realfake045/all/all'WAV_PATH = './wavs/'print(f"Train samples: {len(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))}")print(f"Test samples: {len(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER)))}")

This code sets up variables for the paths to data files in a deepfake detection challenge. It defines "DATA_FOLDER" as the main data directory, "TRAIN_SAMPLE_FOLDER" as the folder containing labeled training videos, and "TEST_FOLDER" as the folder for testing videos. It uses the "os" module to count the files in these folders. The code utilizes f-strings to print the sample counts for training and testing data. This code is a helpful step in data exploration for a deepfake detection challenge, allowing easy assessment of data sample sizes.

Step2: Check files type

Here we check the train data files extensions. Most of the files looks to have mp4 extension, let's check if there is other extension as well.

train_list = list(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))ext_dict = []for file in train_list: file_ext = file.split('.')[1] if (file_ext not in ext_dict): ext_dict.append(file_ext)print(f"Extensions: {ext_dict}")

Output:

Extensions: ['mp4', 'json']
Let's count how many files with each extensions there are.
for file_ext in ext_dict: print(f"Files with extension `{file_ext}`: {len([file for file in train_list if file.endswith(file_ext)])}")

Output:

Files with extension `mp4`: 400Files with extension `json`: 1

Let's repeat the same process for test videos folder.

test_list = list(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER)))ext_dict = []for file in test_list: file_ext = file.split('.')[1] if (file_ext not in ext_dict): ext_dict.append(file_ext)print(f"Extensions: {ext_dict}")for file_ext in ext_dict: print(f"Files with extension `{file_ext}`: {len([file for file in train_list if file.endswith(file_ext)])}")

Lets check the json file

json_file = [file for file in train_list if file.endswith('json')][0]print(f"JSON file: {json_file}")

This code snippet searches for a file in the train_list that ends with the extension .json and assigns it to the variable json_file.

Let's explore this JSON file.

def get_meta_from_json(path): df = pd.read_json(os.path.join(DATA_FOLDER, path, json_file)) df = df.T return dfmeta_train_df = get_meta_from_json(TRAIN_SAMPLE_FOLDER)meta_train_df.head()

Output

 label split originalaagfhgtpmv.mp4FAKEtrainvudstovrck.mp4aapnvogymq.mp4FAKEtrainjdubbvfswz.mp4abarnvbtwb.mp4REALtrainNoneabofeumbvv.mp4FAKEtrainatvmxvwyns.mp4abqwwspghj.mp4FAKEtrainqzimuostzz.mp4

Step 3: Meta data exploration

Let's explore now the meta data in train sample.

Missing data

  1. We start by checking for any missing values.
def missing_data(data): total = data.isnull().sum() percent = (data.isnull().sum()/data.isnull().count()*100) tt = pd.concat([total, percent], axis=1, keys=['Total', 'Percent']) types = [] for col in data.columns: dtype = str(data[col].dtype) types.append(dtype) tt['Types'] = types return(np.transpose(tt))

This code defines a function missing_data(data) that takes a pandas DataFrame object data as input and returns a summary of the missing data in the DataFrame.

missing_data(meta_train_df)

Output

 label splitoriginalTotal0 0 77Percent0 0 19.25Types object objectobject

This code is calling the missing_data() function and passing the meta_train_df DataFrame as an argument.

  1. There are missing data 19.25% of the samples (or 77). We suspect that actually the real data has missing original (if we generalize from the data we glimpsed). Let's check this hypothesis.
missing_data(meta_train_df.loc[meta_train_df.label=='REAL'])

This code is calling the missing_data() function on a subset of the meta_train_df DataFrame that meets a specific condition, using the .loc method to select rows based on the value of the label column.

Step 4: Unique values

def unique_values(data): total = data.count() tt = pd.DataFrame(total) tt.columns = ['Total'] uniques = [] for col in data.columns: unique = data[col].nunique() uniques.append(unique) tt['Uniques'] = uniques return(np.transpose(tt))

This code defines a function unique_values(data) that takes a pandas DataFrame object data as input and returns a summary of the unique values in the DataFrame.

  • Overall, this code is useful for quickly identifying the number of unique values in a pandas DataFrame, providing a summary of the number of unique values for each column in the DataFrame.
unique_values(meta_train_df)

This code is calling the unique_values() function and passing the meta_train_df DataFrame as an argument.

Step 5: Most frequent originals

def most_frequent_values(data): total = data.count() tt = pd.DataFrame(total) tt.columns = ['Total'] items = [] vals = [] for col in data.columns: itm = data[col].value_counts().index[0] val = data[col].value_counts().values[0] items.append(itm) vals.append(val) tt['Most frequent item'] = items tt['Frequence'] = vals tt['Percent from total'] = np.round(vals / total * 100, 3) return(np.transpose(tt))
most_frequent_values(meta_train_df)

The code "most_frequent_values(meta_train_df)" is calling the "most_frequent_values" function with an argument named "meta_train_df". This suggests that "meta_train_df" is a pandas DataFrame, and the function is being used to calculate the most frequent value(s) and additional information for each column in this DataFrame.

Step 6: data distribution visualizations

def plot_count(feature, title, df, size=1): ''' Plot count of classes / feature param: feature - the feature to analyze param: title - title to add to the graph param: df - dataframe from which we plot feature's classes distribution  param: size - default 1. ''' f, ax = plt.subplots(1,1, figsize=(4*size,4)) total = float(len(df)) g = sns.countplot(df[feature], order = df[feature].value_counts().index[:20], palette='Set3') g.set_title("Number and percentage of {}".format(title)) if(size > 2): plt.xticks(rotation=90, size=8) for p in ax.patches: height = p.get_height() ax.text(p.get_x()+p.get_width()/2., height + 3, '{:1.2f}%'.format(100*height/total), ha="center") plt.show() 
plot_count('split', 'split (train)', meta_train_df)

Step 7: Video data exploration

In the following we will explore some of the video data.

Missing video (or meta) dataWe check first if the list of files in the meta info and the list from the folder are the same.

meta = np.array(list(meta_train_df.index))storage = np.array([file for file in train_list if file.endswith('mp4')])print(f"Metadata: {meta.shape[0]}, Folder: {storage.shape[0]}")print(f"Files in metadata and not in folder: {np.setdiff1d(meta,storage,assume_unique=False).shape[0]}")print(f"Files in folder and not in metadata: {np.setdiff1d(storage,meta,assume_unique=False).shape[0]}")

Output

Metadata: 400, Folder: 400Files in metadata and not in folder: 0Files in folder and not in metadata: 0

Few fake videos

fake_train_sample_video = list(meta_train_df.loc[meta_train_df.label=='FAKE'].sample(3).index)fake_train_sample_video

Output

['bguwlyazau.mp4', 'byfenovjnf.mp4', 'dsndhujjjb.mp4']

Modifying a function for displaying a selected image from a video

def display_image_from_video(video_path): ''' input: video_path - path for video process: 1. perform a video capture from the video 2. read the image 3. display the image ''' capture_image = cv.VideoCapture(video_path) ret, frame = capture_image.read() fig = plt.figure(figsize=(10,10)) ax = fig.add_subplot(111) frame = cv.cvtColor(frame, cv.COLOR_BGR2RGB) ax.imshow(frame)
for video_file in fake_train_sample_video: display_image_from_video(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER, video_file))

Output:

Let's try now the same for few of the images that are real.

real_train_sample_video = list(meta_train_df.loc[meta_train_df.label=='REAL'].sample(3).index)real_train_sample_video

Output

['ciyoudyhly.mp4', 'ekcrtigpab.mp4', 'cfxkpiweqt.mp4']
for video_file in real_train_sample_video: display_image_from_video(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER, video_file))

Step 8: Videos with same original

meta_train_df['original'].value_counts()[0:5]

Output:

meawmsgiti.mp4 6atvmxvwyns.mp4 6qeumxirsme.mp4 5kgbkktcjxf.mp4 5qzklcjjxdq.mp4 4Name: original, dtype: int64

modify our visualization function to work with multiple images.

def display_image_from_video_list(video_path_list, video_folder=TRAIN_SAMPLE_FOLDER): ''' input: video_path_list - path for video process: 0. for each video in the video path list 1. perform a video capture from the video 2. read the image 3. display the image ''' plt.figure() fig, ax = plt.subplots(2,3,figsize=(16,8)) # we only show images extracted from the first 6 videos for i, video_file in enumerate(video_path_list[0:6]): video_path = os.path.join(DATA_FOLDER, video_folder,video_file) capture_image = cv.VideoCapture(video_path) ret, frame = capture_image.read() frame = cv.cvtColor(frame, cv.COLOR_BGR2RGB) ax[i//3, i%3].imshow(frame) ax[i//3, i%3].set_title(f"Video: {video_file}") ax[i//3, i%3].axis('on')
same_original_fake_train_sample_video = list(meta_train_df.loc[meta_train_df.original=='meawmsgiti.mp4'].index)display_image_from_video_list(same_original_fake_train_sample_video)

The overall purpose of the code is to display the first frame of each fake video file in the training set of the metadata DataFrame that was generated from the original video file named "meawmsgiti.mp4". This can be useful for analyzing the quality and characteristics of the fake videos generated from a specific original video.

Step 9: Test video files

Let's also look to few of the test data files.

test_videos = pd.DataFrame(list(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER))), columns=['video'])test_videos.head()

Let's visualize now one of the videos.

display_image_from_video(os.path.join(DATA_FOLDER, TEST_FOLDER, test_videos.iloc[0].video))

The purpose of the "display_image_from_video" function is to display the first frame of the specified video file as an image. Therefore, the overall purpose of the code is to display the first frame of the first video file in the "test" folder of the data directory, allowing for easy inspection of the content and quality of the video.

Step 10: Play video files

fake_videos = list(meta_train_df.loc[meta_train_df.label=='FAKE'].index)
from IPython.display import HTMLfrom base64 import b64encodedef play_video(video_file, subset=TRAIN_SAMPLE_FOLDER): ''' Display video param: video_file - the name of the video file to display param: subset - the folder where the video file is located (can be TRAIN_SAMPLE_FOLDER or TEST_Folder) ''' video_url = open(os.path.join(DATA_FOLDER, subset,video_file),'rb').read() data_url = "data:video/mp4;base64," + b64encode(video_url).decode() return HTML("""<video width=500 controls><source src="%s" type="video/mp4"></video>""" % data_url)
play_video(fake_videos[0])

<video controls autoplay src="

">

Step 11: Download The public data set found : https://www.kaggle.com/rakibilly/ffmpeg-static-build and https://www.kaggle.com/datasets/phoenix9032/realfake045

!tar xvf /kaggle/input/ffmpeg-static-build/ffmpeg-git-amd64-static.tar.xz
output_format = 'wav' # can also use aac, wav, etcoutput_dir = Path(f"{output_format}s")Path(output_dir).mkdir(exist_ok=True, parents=True)fake_name ='aaeflzzhvy'real_name = 'flqgmnetsg'
list_of_files = []for file in os.listdir(os.path.join(DATA_FOLDER,TRAIN_SAMPLE_FOLDER)): filename = os.path.join(DATA_FOLDER,TRAIN_SAMPLE_FOLDER)+file list_of_files.append(filename)
%%timecreate_wav(list_of_files)

Conclusion

In conclusion, "Detecting the Deceptive: Unmasking Deep Fake Voices" sheds light on the ever-evolving realm of audio deep fake technology. As the digital era progresses, the ability to manipulate audio recordings with unprecedented realism has raised significant concerns, including misinformation, privacy breaches, and cybersecurity risks.

This article has delved into the intricate landscape of audio deep fake detection, elucidating the challenges faced in this domain. From the intricate process of data collection and arrangement to the utilization of various machine learning models, feature extraction techniques, and robust training procedures, the methodologies behind unmasking deep fake voices are diverse and demanding.

Furthermore, the critical phase of model optimization and after-processing ensures the highest levels of performance while addressing biases and weaknesses. Achieving real-time detection in audio streams and files is the ultimate goal, requiring continuous monitoring and updates to thwart new deep fake methods.

Not only is the article a technical exploration, but it also emphasizes the ethical considerations surrounding the responsible use of audio content. It underscores the collective responsibility to safeguard the integrity of audio information and addresses the moral and legal dimensions, including security and privacy.

In a world increasingly shaped by artificial intelligence, understanding and countering the rise of deceptive deep fake voices is a paramount endeavor. With vigilance, innovation, and a commitment to ethical principles, we can strive to preserve the authenticity of audio in an era of technological marvels and deceptions.

“Stay connected and support my work through various platforms:

Requests and questions: If you have a project in mind that you’d like me to work on or if you have any questions about the concepts I’ve explained, don’t hesitate to let me know. I’m always looking for new ideas for future Notebooks and I love helping to resolve any doubts you might have.

Remember, each “Like”, “Share”, and “Star” greatly contributes to my work and motivates me to continue producing more quality content. Thank you for your support!

Resources:

Detecting the Deceptive: Unmasking Deep Fake Voices (2024)
Top Articles
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 6458

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.