Image

Voice Cloning Application Using RVC

Ever been curious about voice cloning? Thanks to advanced technology such as deep learning and RVC (Retrieval-based Voice Conversion), it is readily available! In this project, we will iterate the detailed process of creating a Voice Cloning Application. Don't panic if you are not a computer wizard - every detail is put in the simplest way possible. If you have an interest in AI and machine learning and voice technology comes in between, this project is for you!

Project Overview

In this project, you will experience the process of making a voice cloning tool using RVC technology. The platform for this tutorial would be Google Colab, so you don't need to worry about any troublesome installations and just follow the steps. You will know how to use some pre-trained models to get realistic clones of someone's voice based on the input audio. What is more interesting? You will be able to manipulate the voice, making this project suitable for voice transformation. It can transform a man's voice into a woman's voice for various purposes.


With the more advanced RVC, they have also made it possible to clone voices with great precision. If you are a developer, a voice tech hobbyist, or simply interested in AI voice synthesis, this project will help you get hands-on with the voice cloning technology that everyone has been wondering about.


Prerequisites

Before embarking on this fun-filled Voice Cloning Application project, there are a few prerequisites that you need to know.

  • Basic knowledge of Python programming is required in order to catch up with its coding tasks and scripts in the project.

  • It is necessary to know how to work with Google Colab to create an environment for the project and run the code.

  • Background in deep learning will be useful. Particularly for comprehending how models are trained and the use of existing models.

  • Knowledge of Librosa and PyDub libraries for tasks like audio processing and manipulation.

  • Good understanding of RVC and its significance in voice cloning.

  • You need basic knowledge of WAV/MP3 standards to create and handle voice databases correctly.

  • Knowledge of using pip for the installation of Python packages.

    Approach

    In this project, we create the Voice Cloning Application employing RVC (Retrieval-based Voice Conversion) and deep learning techniques. To begin with, we will set up our environment in Google Colab. This will help to avoid the hassles with the local installations that come with running the project. After that, we will gather and prepare audio sources and pre-trained models which are essential for voice processing. In addition to that, we will use Python libraries of Librosa, and PyDub to manipulate the audio files and obtain the exact features.


    This will be useful for training the model. When the data is prepared, we will move forward to the model training stage. We will use the already existing model's weight for the enhancement of the voice. Upon completing this phase of the work, we will proceed to the most entertaining aspect of the work - inference!


    In this stage, we'll utilize the trained model to replicate voices from input audio samples, tweaking features like pitch for extra customization. Throughout the project, we'll maintain a straightforward and approachable approach. So that even beginners can easily follow along.


    Workflow and methodologies

    The workflow and methodology for building the "Voice Cloning Application using RVC" are as follows:


    Workflow

    • Configure the Google Colab environment to run the Project without the need for installation on local computers.

    • Obtain the necessary pre-trained models and audio datasets to begin working on the task of voice cloning.

    • Implement audio processing libraries such as Librosa and PyDub, working with audio files, extracting essential components, and cleaning out the dataset.

    • Select the most suitable RVC (Retrieval-based Voice Conversion) technique for the training and inference operations.

    • Refine the model so that it is efficient to use on given tested audio for voice cloning accuracy.

    • Tune vocal characteristics such as the pitch of the projected voice to make the retrieved voice suitable and modified for transformations.

    • Test the training outcomes by assessing the performance of the model in terms of accuracy and voice quality of the trained model.

    • Use tensorBoard throughout the entire process of model training to track the training performance in real-time.


    Methodology

    • Mount Google Drive to save and access files within the Colab environment.

    • Clone the RVC repository from GitHub to have the right tools and software for the project.

    • Download Pre-trained models from Hugging Face using aria2c for faster and more efficient download.

    • Either upload or download audio files but make sure they are the right format for training and processing.

    • Use Librosa to process audio which includes changing the format and feature extraction of audio.

    • Pass the processed data to the RVC model to train it using the existing pre-trained weights to enhance the accuracy of the model.

    • Apply pitch and f0 extraction techniques in order to manipulate voice transform without any limitations.

    • Test and verify the output through inference.


    Data Collection and Preparation

    Data Collection Workflow

    • Collect datasets: Collect audio files from various sources for voice cloning tasks.
    • Format datasets: Make sure all the audio files are in the correct format. For example in MP3, WAV format for processing.
    • Evaluate datasets: Assess audio files for quality as well as for their suitability in voice cloning.

    Data Preparation Workflow

    • Preprocess audio data: For extracting key features, cleaning the dataset, and remove noise use Librosa Pydub.
    • Normalize and resample audio: Standardize sampling rates. Then normalize audio levels for consistent input to the model.
    • Split datasets: Preparing the audio data into three portions. These are training, validation, and testing in order to train and assess the model performance effectively.

    Code Explanation

    STEP 1:

    Mounting Drive

    This code shows how to connect your Google Drive account to a Colab workspace. It helps in accessing the files available in the user's Google Drive by making it present in a particular folder (which is '/content/drive').

    from google.colab import drive
    drive.mount('/content/drive')

    Initial Setup for WebUI Voice Conversion

    This code changes the current working directory in the Google Colab environment to /content. Then, it imports required packages such as clear_output, Button, subprocess, shlex, os. These are used to clear output cells, create UI buttons and run shell commands including manipulation of System operation respectively. Furthermore, it mounts the google drive. Subsequently, a few string variables(var, test, c_word, r_word) are defined. These will be used in the later processes associated with the WebUI, Voice Conversion and Retrieval.

    %cd /content
    from IPython.display import clear_output
    from ipywidgets import Button
    import subprocess, shlex, os
    from google.colab import drive
    var = "We"+"bU"+"I"
    test = "Voice"
    c_word = "Conversion"
    r_word = "Retrieval"

    Cloning Repository and Installing Dependencies

    The code initially clones a GitHub repository to the /content/RVC directory. It then downloads pip version 24.0 (the Python package management). Finally, it uses the apt package management to install the aria2 package, which is a command-line downloader. This prepares the environment for working with the repository.

    !git clone https://github.com/splendormagic/RVC_BahaaMahmoud /content/RVC
    !pip install pip==24.0
    !apt -y install -qq aria2

    Downloading Pretrained Models for Voice Conversion

    The function checks /content/RVC/assets/pretrained_v2 for specified pretrained files. If not present, the aria2c download manager downloads Hugging Face repositories' missing files. The filenames to download are in pretrains and new_pretrains. The subprocess module provides means of instruction provision for the downloads. It handles problems with exception blocks.

    pretrains = ["f0D32k.pth","f0G32k.pth"]
    new_pretrains = ["f0Ov2Super32kD.pth","f0Ov2Super32kG.pth"]
    for file in pretrains:
        if not os.path.exists(f"/content/RVC/assets/pretrained_v2/{file}"):
            command = "aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/%s%s%s/resolve/main/pretrained_v2/%s -d /content/RVC/assets/pretrained_v2 -o %s" % ("Voice","Conversion","WebUI",file,file)
            try:
                subprocess.run(shlex.split(command))
            except Exception as e:
                print(e)
    for file in new_pretrains:
        if not os.path.exists(f"/content/RVC/assets/pretrained_v2/{file}"):
            command = "aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/poiqazwsx/Ov2Super32kfix/resolve/main/%s -d /content/RVC/assets/pretrained_v2 -o %s" % (file,file)
            try:
                subprocess.run(shlex.split(command))
                print(shlex.split(command))
            except Exception as e:
                print(e)

    STEP 2:

    Setting Up Directories and Downloading Necessary Files

    The code creates directories for data and audio files. It also retrieves Python scripts and sound files from many sources. Wget only downloads files that have not been downloaded before (-nc flag). After setup, the download_files.py script is performed to set up or download additional files.

    !mkdir -p /content/dataset && mkdir -p /content/RVC/audios
    !wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/original -O /content/RVC/original.py
    !wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/app.py -O /content/RVC/demo.py
    !wget -nc https://raw.githubusercontent.com/RejektsAI/EasyTools/main/easyfuncs.py -O /content/RVC/easyfuncs.py
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/download_files.py -O /content/RVC/download_files.py
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/a.png -O /content/RVC/a.png
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/easy_sync.py -O /content/RVC/easy_sync.py
    !wget -nc https://huggingface.co/spaces/Rejekts/RVC_PlayGround/raw/main/app.py -O /content/RVC/playground.py
    !wget -nc https://huggingface.co/spaces/Rejekts/RVC_PlayGround/raw/main/tools/useftools.py -O /content/RVC/tools/useftools.py
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/astronauts.mp3 -O /content/RVC/audios/astronauts.mp3
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/somegirl.mp3 -O /content/RVC/audios/somegirl.mp3
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/someguy.mp3 -O /content/RVC/audios/someguy.mp3
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/unchico.mp3 -O /content/RVC/audios/unchico.mp3
    !wget -nc https://huggingface.co/Rejekts/project/resolve/main/unachica.mp3 -O /content/RVC/audios/unachica.mp3
    !cd /content/RVC && python /content/RVC/download_files.py

    Installing Project Dependencies

    The code inspects whether the installed variable exists or not. If these do not exist then it installs the Python packages stated in requirements.txt by executing pip install in the directory /content/RVC. There are other dependencies that are also installed like mega.py, gdown, pytube, pydub, and a specific version of gradio . After installation, it sets installed=True in the event that there is not a need for installations in future runs. This makes sure that all the packages essential for the project are present.

    if not "installed" in locals():
        !cd /content/RVC && pip install -r requirements.txt
        !pip install mega.py gdown==5.1.0 pytube pydub  gradio==3.42.0
    installed=True

    Optional Google Drive Integration for Saving Files

    This snippet of code facilitates the optional google drive saving functionality through the save_to_drive flag. When it is set to True, it authenticates the user to Google Drive and mounts the drive in Colab. After that the GarbageMan class from the easy_sync module is imported. This helps in organizing and deleting the unwanted files on Google Drive after some time. Any errors arising from this process are captured and printed. This feature is useful in automating the file management.

    #save_to_drive=True#@param {type:"boolean"}
    save_to_drive=False
    if save_to_drive:
        try:
            from google.colab import auth
            from pydrive2.auth import GoogleAuth
            from oauth2client.client import GoogleCredentials
            from pydrive2.drive import GoogleDrive
            auth.authenticate_user()
            gauth = GoogleAuth()
            gauth.credentials = GoogleCredentials.get_application_default()
            my_drive = GoogleDrive(gauth)
            drive.mount('/content/drive')
            drive_trash = my_drive.ListFile({'q': "trashed = true"}).GetList()
            from RVC.easy_sync import GarbageMan
            kevin = GarbageMan()
            kevin.start(path=drive_trash,every=40,pattern="[GD]_*.pth")
        except Exception as e:
            print(e)

    Automating Backup of Logs and Weights to Google Drive

    This code automatically backs up logs and model weights from the /content/RVC to Google Drive. It starts by importing the easy_sync Channel class. Which is used for managing file synchronization. Two channels, logs_backup and weights_backup, are set up to sync logs and weights every 30 minutes. While excluding any "mute" files. If necessary, it creates folders in Google Drive. After that, the syncing process begins. Then a success button shows that the setup was successful.

    from RVC.easy_sync import Channel
    logs_folder ='/content/drive/MyDrive/project-m/logs'
    weights_folder = '/content/drive/MyDrive/project-m/assets/weights'
    if not "logs_backup" in locals(): logs_backup = Channel('/content/RVC/logs',logs_folder,every=30,exclude="mute")
    if not "weights_backup" in locals(): weights_backup = Channel('/content/RVC/assets/weights',weights_folder,every=30)
    if os.path.exists('/content/drive/MyDrive'):
        if not os.path.exists(logs_folder): os.makedirs(logs_folder)
        if not os.path.exists(weights_folder): os.makedirs(weights_folder)
        logs_backup.start()
        weights_backup.start()
    clear_output()
    Button(description="\u2714 Success", button_style="success")

    STEP 3:

    Importing Modules for File Operations and Audio Display

    The os library is used for interacting within the file system. On the other hand, IPython.display.Audio is developed to allow the user to engage the audio file inside the notebook. Besides that, the display function is also used to present an audio player to the user. This setup enhances the ease of handling and listening to sound files within the environment.

    import os
    from IPython.display import Audio
    from IPython.core.display import display

    Handling Dataset Directory and Removing Previous Audio Files

    The following code implements the method for uploading files It ensures the presence of the /content/dataset directory. In case the directory is missing, it is created. The code then looks into the dataset folder and checks if a previously recorded audio file named vocal_audio.wav exists. Once found, the file is deleted to make room for a new input audio file. This facilitates the uploading of new audio content.

    upload_method = 'Upload'
    if not os.path.exists('/content/dataset'):
        os.makedirs('/content/dataset')
    # Remove previous input audio
    if os.path.isfile('/content/dataset/vocal_audio.wav'):
        os.remove('/content/dataset/vocal_audio.wav')
    def displayAudio():
      display(Audio('/content/dataset/vocal_audio.wav'))

    STEP 4:

    Uploading and Processing Audio Files

    This code permits users to upload audio files when the upload type is Upload. It takes advantage of the files.upload() function offered by Google Colab to do the uploading task. After that, whenever a particular file is uploaded, it prints the name and the corresponding uploaded file size.


    The first uploaded file is then read in using librosa, which reads the audio on the file, and also the sampling rate. Next, the soundfile python library is used to save the audio as vocal_audio.wav in the /content/dataset directory. In that, soundfile saves the audio properly for any further processing that's required. In the last stage, a 'DONE' message is displayed.

    if upload_method == 'Upload':
      from google.colab import files
      uploaded = files.upload()
      for fn in uploaded.keys():
        print('User uploaded file "{name}" with length {length} bytes.'.format(
            name=fn, length=len(uploaded[fn])))
      # Consider only the first file
      PATH_TO_YOUR_AUDIO = str(list(uploaded.keys())[0])
      # Load audio with specified sampling rate
      import librosa
      audio, sr = librosa.load(PATH_TO_YOUR_AUDIO, sr=None)
      # Save audio with specified sampling rate
      import soundfile as sf
      sf.write('/content/dataset/vocal_audio.wav', audio, sr, format='wav')
    print("DONE.")

    Audio Processing, Preprocessing, and Index Training for Voice Conversion Model

    This piece of code performs multiple tasks for the purpose of developing and training a voice conversion model. First, if a YouTube link is provided the source audio is downloaded from the link. Then it converts the audio file in WAV format. After that it is stored in a dataset folder. Then it attempts to determine the duration of the sound before deciding to enable caching.


    In the next step, the model runs the script named (preprocess.py) to log the process. Also to save the logs in the folder named after the model name.The code performs extraction of F0 features as implemented by a specified method e.g. rmvpe_gpu.


    This ensures that the audio is in a usable format for voice conversion. In the last step, the FAISS library is employed to train and build an index using the features that were captured earlier. The index is stored for the subsequent processes of voice conversion.

    import os
    from pytube import YouTube
    from IPython.display import clear_output
    def calculate_audio_duration(file_path):
        duration_seconds = len(AudioSegment.from_file(file_path)) / 1000.0
        return duration_seconds
    def youtube_to_wav(url,dataset_folder="/content/dataset"):
        try:
            yt = YouTube(url).streams.get_audio_only().download(output_path=dataset_folder)
            mp4_path = os.path.join(dataset_folder,'audio.mp4')
            wav_path = os.path.join(dataset_folder,'audio.wav')
            os.rename(yt,mp4_path)
            !ffmpeg -i {mp4_path} -acodec pcm_s16le -ar 44100 {wav_path}
            os.remove(mp4_path)
        except Exception as e:
            print(e)
    %cd /content/RVC
    #@markdown  Model name must be in English. Don't use "Spaces" or "Symbols".
    #@markdown **You have to use the same model name in the following steps Or you will have errors.**
    model_name = 'Aionline_voice' #@param {type:"string"}
    #dataset_folder = '/content/dataset' #@param {type:"string"}
    dataset_folder = '/content/dataset'
    #or_paste_a_youtube_link=""#@param {type:"string"}
    or_paste_a_youtube_link=""
    if or_paste_a_youtube_link !="":
        youtube_to_wav(or_paste_a_youtube_link)
    from pydub import AudioSegment
    file_path = dataset_folder
    try:
        duration = calculate_audio_duration(file_path)
        if duration < 600:
            cache = True
        else:
            cache = False
    except:
        cache = False
    while len(os.listdir(dataset_folder)) < 1:
        input("Your dataset folder is empty.")
    !mkdir -p ./logs/{model_name}
    with open(f'./logs/{model_name}/preprocess.log','w') as f:
        print("Starting...")
    !python infer/modules/train/preprocess.py {dataset_folder} 32000 2 ./logs/{model_name} False 3.0 > /dev/null 2>&1
    with open(f'./logs/{model_name}/preprocess.log','r') as f:
        if 'end preprocess' in f.read():
            clear_output()
            display(Button(description="\u2714 Success", button_style="success"))
        else:
            print("Error preprocessing data... Make sure your dataset folder is correct.")
    f0method = "rmvpe_gpu" # @param ["pm", "harvest", "rmvpe", "rmvpe_gpu"]
    %cd /content/RVC
    with open(f'./logs/{model_name}/extract_f0_feature.log','w') as f:
        print("Starting...")
    if f0method != "rmvpe_gpu":
        !python infer/modules/train/extract/extract_f0_print.py ./logs/{model_name} 2 {f0method}
    else:
        !python infer/modules/train/extract/extract_f0_rmvpe.py 1 0 0 ./logs/{model_name} True
    #!python infer/modules/train/extract_feature_print.py cuda:0 1 0 0 ./logs/{model_name} v2
    !python infer/modules/train/extract_feature_print.py cuda:0 1 0 ./logs/{model_name} v2 True
    with open(f'./logs/{model_name}/extract_f0_feature.log','r') as f:
        if 'all-feature-done' in f.read():
            clear_output()
        else:
            print("Error preprocessing data... Make sure your data was preprocessed.")
    import numpy as np
    import faiss
    %cd /content/RVC
    def train_index(exp_dir1, version19):
        exp_dir = "logs/%s" % (exp_dir1)
        os.makedirs(exp_dir, exist_ok=True)
        feature_dir = (
            "%s/3_feature256" % (exp_dir)
            if version19 == "v1"
            else "%s/3_feature768" % (exp_dir)
        )
        if not os.path.exists(feature_dir):
            return " "
        listdir_res = list(os.listdir(feature_dir))
        if len(listdir_res) == 0:
            return " "
        infos = []
        npys = []
        for name in sorted(listdir_res):
            phone = np.load("%s/%s" % (feature_dir, name))
            npys.append(phone)
        big_npy = np.concatenate(npys, 0)
        big_npy_idx = np.arange(big_npy.shape[0])
        np.random.shuffle(big_npy_idx)
        big_npy = big_npy[big_npy_idx]
        if big_npy.shape[0] > 2e5:
            infos.append("Trying doing kmeans %s shape to 10k centers." % big_npy.shape[0])
            yield "\n".join(infos)
            try:
                big_npy = (
                    MiniBatchKMeans(
                        n_clusters=10000,
                        verbose=True,
                        batch_size=256 * config.n_cpu,
                        compute_labels=False,
                        init="random",
                    )
                    .fit(big_npy)
                    .cluster_centers_
                )
            except:
                info = traceback.format_exc()
                logger.info(info)
                infos.append(info)
                yield "\n".join(infos)
        np.save("%s/total_fea.npy" % exp_dir, big_npy)
        n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
        infos.append("%s,%s" % (big_npy.shape, n_ivf))
        yield "\n".join(infos)
        index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
        infos.append("training")
        yield "\n".join(infos)
        index_ivf = faiss.extract_index_ivf(index)  #
        index_ivf.nprobe = 1
        index.train(big_npy)
        faiss.write_index(
            index,
            "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
            % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
        )
        infos.append("adding")
        yield "\n".join(infos)
        batch_size_add = 8192
        for i in range(0, big_npy.shape[0], batch_size_add):
            index.add(big_npy[i : i + batch_size_add])
        faiss.write_index(
            index,
            "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
            % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
        )
        infos.append(
            "added_IVF%s_Flat_nprobe_%s_%s_%s.index"
            % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
        )
    training_log = train_index(model_name, 'v2')
    for line in training_log:
        print(line)
        if 'adding' in line:
            clear_output()
            display(Button(description="\u2714 Success", button_style="success"))

    Changing Directory and Importing Required Modules

    The first thing this code does is to change the working directory to /content/RVC. Then proceed to import relevant libraries for file I/O operations, JSON operation, and executing shell commands. After that it saves the current directory path into the variable now_dir.

    %cd /content/RVC
    from random import shuffle
    import json
    import os
    import pathlib
    from subprocess import Popen, PIPE, STDOUT
    now_dir = os.getcwd()

    This code initializes the parameters for training a voice conversion model. Consisting of the model name, number of training epochs, and frequency are saved. Additionally, it specifies which predefined generator (G_file) and discriminator (D_file) models to use depending on the state of OV2 option. It then chooses the correct files from assets/pretrained_v2 according to the defined sample rate 32k.

    model_name = 'Aionline_voice' #@param {type:"string"}
    epochs = 50 # @param {type:"slider", min:50, max:2000, step:10}
    save_frequency = 50 # @param {type:"slider", min:10, max:100, step:10}
    OV2 = True
    batch_size = 8
    sample_rate = '32k'
    if OV2:
        G_file = f'assets/pretrained_v2/f0Ov2Super{sample_rate}G.pth'
        D_file = f'assets/pretrained_v2/f0Ov2Super{sample_rate}D.pth'
    else:
        G_file = f'assets/pretrained_v2/f0G{sample_rate}.pth'
        D_file = f'assets/pretrained_v2/f0D{sample_rate}.pth'

    Filelist Generation and Model Training for Voice Conversion

    This function click_train generates a filelist for training and starts training of a voice conversion model. It creates the file paths for ground truth waveforms, features, and F0-related data (if it exists). After that writes all of them to a filelist.txt. Based on the version and model configuration, it sets the training parameters which include These parameters include choices of pretrained models, GPUs, batch size, and the number of epochs.


    The function forms the command that will be used to start the training script. It then prints the output of the process. Then it simply waits for the training to finish. Which signals the end of the training process.

    def click_train(
        exp_dir1,
        sr2,
        if_f0_3,
        spk_id5,
        save_epoch10,
        total_epoch11,
        batch_size12,
        if_save_latest13,
        pretrained_G14,
        pretrained_D15,
        gpus16,
        if_cache_gpu17,
        if_save_every_weights18,
        version19,
    ):
        # Filelist Generation
        exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
        os.makedirs(exp_dir, exist_ok=True)
        gt_wavs_dir = "%s/0_gt_wavs" % (exp_dir)
        feature_dir = (
            "%s/3_feature256" % (exp_dir)
            if version19 == "v1"
            else "%s/3_feature768" % (exp_dir)
        )
        if if_f0_3:
            f0_dir = "%s/2a_f0" % (exp_dir)
            f0nsf_dir = "%s/2b-f0nsf" % (exp_dir)
            names = (
                set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
                & set([name.split(".")[0] for name in os.listdir(feature_dir)])
                & set([name.split(".")[0] for name in os.listdir(f0_dir)])
                & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
            )
        else:
            names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
                [name.split(".")[0] for name in os.listdir(feature_dir)]
            )
        opt = []
        for name in names:
            if if_f0_3:
                opt.append(
                    "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
                    % (
                        gt_wavs_dir.replace("\\", "\\\\"),
                        name,
                        feature_dir.replace("\\", "\\\\"),
                        name,
                        f0_dir.replace("\\", "\\\\"),
                        name,
                        f0nsf_dir.replace("\\", "\\\\"),
                        name,
                        spk_id5,
                    )
                )
            else:
                opt.append(
                    "%s/%s.wav|%s/%s.npy|%s"
                    % (
                        gt_wavs_dir.replace("\\", "\\\\"),
                        name,
                        feature_dir.replace("\\", "\\\\"),
                        name,
                        spk_id5,
                    )
                )
        fea_dim = 256 if version19 == "v1" else 768
        if if_f0_3:
            for _ in range(2):
                opt.append(
                    "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
                    % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
                )
        else:
            for _ in range(2):
                opt.append(
                    "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
                    % (now_dir, sr2, now_dir, fea_dim, spk_id5)
                )
        shuffle(opt)
        with open("%s/filelist.txt" % exp_dir, "w") as f:
            f.write("\n".join(opt))
        # Training Process
        print("Write filelist done")
        print("Use gpus:", str(gpus16))
        if pretrained_G14 == "":
            print("No pretrained Generator")
        if pretrained_D15 == "":
            print("No pretrained Discriminator")
        if version19 == "v1" or sr2 == "40k":
            config_path = "configs/v1/%s.json" % sr2
        else:
            config_path = "configs/v2/%s.json" % sr2
        config_save_path = os.path.join(exp_dir, "config.json")
        if not pathlib.Path(config_save_path).exists():
            with open(config_save_path, "w", encoding="utf-8") as f:
                with open(config_path, "r") as config_file:
                    config_data = json.load(config_file)
                    json.dump(
                        config_data,
                        f,
                        ensure_ascii=False,
                        indent=4,
                        sort_keys=True,
                    )
                f.write("\n")
        cmd = (
            'python infer/modules/train/train.py -e "%s" -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s'
            % (
                exp_dir1,
                sr2,
                1 if if_f0_3 else 0,
                batch_size12,
                gpus16,
                total_epoch11,
                save_epoch10,
                "-pg %s" % pretrained_G14 if pretrained_G14 != "" else "",
                "-pd %s" % pretrained_D15 if pretrained_D15 != "" else "",
                1 if if_save_latest13 == True else 0,
                1 if if_cache_gpu17 == True else 0,
                1 if if_save_every_weights18 == True else 0,
                version19,
            )
        )
        p = Popen(cmd, shell=True, cwd=now_dir, stdout=PIPE, stderr=STDOUT, bufsize=1, universal_newlines=True)
        for line in p.stdout:
            print(line.strip())
        p.wait()
        return "train.log"

    STEP 5:

    Launching TensorBoard and Initiating Model Training

    The following code first imports TensorBoard. After that it initiates it to display the logs in the ./logs directory in the current server port 8888. Then it determines if a cache variable is available and if not, sets it to False. The function click_train also takes several arguments such as the model name, sample rate, F0 processing indicator, save frequency, number of epochs, batch size, pre-trained model files, GPU configurations, and caching options. This initiates the training of the voice conversion model and the logs are printed once the training process is over.

    %load_ext tensorboard
    %tensorboard --logdir ./logs --port=8888
    if "cache" not in locals():
        cache = False
    training_log = click_train(
        model_name,
        sample_rate,
        True,
        0,
        save_frequency,
        epochs,
        batch_size,
        True,
        G_file,
        D_file,
        0,
        cache,
        True,
        'v2',
    )
    print(training_log)

    Backing Up or Loading a Trained Model to/from Google Drive

    This code allows you to either save the trained model or get the saved one from Google Drive. If the folder /content/RVC is missing, then the CLI tells the user that they need to go through the initialization process. If the folder exists the script goes ahead to save the trained model or to load the trained model from the Google drive. When saving (Type == "Save"), the code stores model logs and weights in a .tar.gz file. Then upload in Google Drive under the folder name RVC_Packages. When loading, it extracts the backup from google drive.

    #@markdown  Enter the same exact name of the model you trained.
    import os
    if not os.path.exists('/content/RVC'):
      print("You need to run the first cell before loading your model! Run the GUI, stop it and then load the model.")
    else:
      Model_Name = "Aionline_voice"#@param {type:"string"}
      folder = Model_Name
      Type = "Save" #@param ["Save"]
      import tarfile, os
      from google.colab import drive
      drive.mount('/content/drive')
      !mkdir -p /content/drive/MyDrive/RVC_Packages
      if Type=='Save':
        with tarfile.open(f'/content/drive/MyDrive/RVC_Packages/{folder}.tar.gz','w:gz') as tar:
          tar.add(f'/content/RVC/logs/{folder}', arcname=f'logs/{folder}')
          if os.path.exists(f'/content/RVC/assets/weights/{folder}.pth'):
            tar.add(f'/content/RVC/assets/weights/{folder}.pth', arcname=f'assets/weights/{folder}.pth')
          print(f'Backed up {folder} to RVC_Packages in your google drive.')
      else:
        if not os.path.exists(f'/content/drive/MyDrive/RVC_Packages/{folder}.tar.gz'):
          print("File not found.")
        else:
          with tarfile.open(f'/content/drive/MyDrive/RVC_Packages/{folder}.tar.gz','r:gz') as tar:
            tar.extractall('/content/RVC')

    Upgrading gdown and Importing tarfile

    The first command upgrades the gdown package, which is used for downloading files from Google Drive. It simply updates the gdown package. It checks for downloading a new version without cached files. The second line imports the tarfile module.It helps in manipulating the .tar file. It assists in compression and decompression in the script. This configuration ensures that the environment is set up correctly for handling downloads and archives.

    !pip install --upgrade --no-cache-dir gdown
    import tarfile

    Downloading and Extracting a Pretrained Model from Google Drive

    The following code retrieves a pretrained model from a specified Google Drive link . Then extracts the downloaded model into the /content/RVC directory. At first, the code builds the model's filename (Model_Name + ".tar.gz"). Then proceeds to use gdown to fetch the .tar.gz file from Google Drive to the /content/sample_data/ folder. If the download is done successfully, the .tar.gz file is then extracted in the /content/RVC directory.


    Then, if there are any extracted files consisting of a weights folder, it will shift those weights to the /content/RVC/assets folder and delete the weights folder. This assures that the model files are in suitable positions for use.


    Note: Such an interface will appear in the following code. Then run with model name and model link.


    1724319401_exact_name_of_the_model

    #@markdown  Enter the same exact name of the model you want to load.
    Model_Name = "Aionline_voice"#@param {type:"string"}
    #@markdown Paste the link of (.tar.gz) file from Google Drive and Remember to make it public link.
    model_pth_name = Model_Name + ".tar.gz"
    MODEL_LINK = "/content/drive/MyDrive/RVC_Packages/Aionline_voice.tar.gz" #@param {type:"string"}
    if MODEL_LINK != "":
      pth = '/content/sample_data/'
      dwnld = pth + model_pth_name
      print('Download model...')
      !gdown --fuzzy -O $dwnld "$MODEL_LINK"
      #clear_output()
      print('Done!')
    else:
      print('Paste model link and try again!')
    if not os.path.exists(f'/content/sample_data/{Model_Name}.tar.gz'):
      print("File not found.")
    else:
      with tarfile.open(f'/content/sample_data/{Model_Name}.tar.gz','r:gz') as tar:
            tar.extractall('/content/RVC')
    if os.path.exists(f'/content/RVC/weights'):
      !cp -R /content/RVC/weights /content/RVC/assets
      !rm -R /content/RVC/weights

    Downloading the Pre-trained Model's .pth File

    This code makes it easy for you to download the .pth file of a pretrained model. It allows the download from Google Colab to your local machine. From the provided Model_Name, it builds the file path. It checks for the existence of the file in the directory (/content/RVC/assets/weights/). Then, it uses Colab's files.download() to start the download process. If the file is not found, it prints an error message. This means ensuring that the model name is correct.

    #@markdown  Use this code block to download the .pth file of model to use it anywhere.
    # Step 1: Set the Model_Name variable
    Model_Name = "Aionline_voice"#@param {type:"string"}
    # Step 2: Construct the file path
    file_path = f'/content/RVC/assets/weights/{Model_Name}.pth'
    # Step 3: Download the file
    from google.colab import files
    # Check if the file exists before attempting to download
    import os
    if os.path.exists(file_path):
        files.download(file_path)
    else:
        print(f"File not found. Make sure that the model name is correct")

    STEP 6:

    Uploading, Processing, and Playing an Audio File

    This piece of code is capable of uploading, processing and playing audio files in the Google Colab environment. Initially, the code searches for a previously saved recorded audio file which is called input_audio.wav in the directory /content/sample_data/ and removes it to prevent processing the same audio file more than once.


    When the upload method is modified to 'Upload', the user gets a prompt to upload any new audio file. Then it displays the new audio file's name and size. After the first upload librosa reads the audio file and fetches the sampling rate and audio data. The file input_audio.wav is then stored in that directory.Once the file is uploaded, it clears the previous output. Then it uses the IPython.display.Audio() function to play the uploaded audio. This was made possible by the Colab interface as it enabled users to listen to it right away.

    import os
    from IPython.display import Audio
    from IPython.core.display import display
    upload_method = 'Upload'
    #remove previous input audio
    if os.path.isfile('/content/sample_data/input_audio.wav'):
        os.remove('/content/sample_data/input_audio.wav')
    def displayAudio():
      display(Audio('/content/sample_data/input_audio.wav'))
    if upload_method == 'Upload':
      from google.colab import files
      uploaded = files.upload()
      for fn in uploaded.keys():
        print('User uploaded file "{name}" with length {length} bytes.'.format(
            name=fn, length=len(uploaded[fn])))
    # Consider only the first file
    PATH_TO_YOUR_AUDIO = str(list(uploaded.keys())[0])
    # Load audio with specified sampling rate
    import librosa
    audio, sr = librosa.load(PATH_TO_YOUR_AUDIO, sr=None)
    # Save audio with specified sampling rate
    import soundfile as sf
    sf.write('/content/sample_data/input_audio.wav', audio, sr, format='wav')
    from IPython.display import clear_output
    clear_output(wait=True)
    displayAudio()

    Using a Pretrained Voice Conversion Model for Audio Processing

    This method uses a pretrained model to convert voice in Google Colab. A model name (e.g., 'Aionline_voice') is entered, and the model's.pth and.index files are copied to a temporary folder. The index file is dynamically determined from logs folder files, and the temporary folder is erased after use to clear up. The script checks the necessary input audio and model files. It then sets pitch adjustment , f0 extraction method (rmvpe, pm, or harvest), and output file path. The index rate, volume normalization, and consonant protection settings affect audio quality.


    The output audio is saved to the chosen location once the Python command-line tool (infer_cli.py) converts. IPython.display plays processed audio immediately in Colab if there are no mistakes.Audio().

    import os
    import IPython.display as ipd
    %cd /content/RVC
    #@markdown  Enter the name of the model you want to use.
    model_name = 'Aionline_voice'#@param {type:"string"}
    model_filename = model_name + '.pth'
    #index_filename = 'added_IVF156_Flat_nprobe_1_' + model_name + '_v2.index'
    ###########
    # Setting .index File Name
    #Create Folder temp .index file
    %cd /content
    index_temp = 'Index_Temp'
    if not os.path.exists(index_temp):
      os.mkdir(index_temp)
      print("Index_Temp Folder Created.")
    else:
      print("Index_Temp Folder Found.")
    #Copying .index file to Index_Temp folder
    from os import listdir
    import shutil
    index_file_path = os.path.join('/content/RVC/logs/', model_name,'')
    for file_name in listdir(index_file_path):
       if file_name.startswith('added') and file_name.endswith('.index'):
            shutil.copy(index_file_path + file_name, os.path.join( '/content/', index_temp, file_name))
            print("File exists")
            shutil.copyfile(index_file_path + file_name, os.path.join( '/content/', index_temp, file_name))
            print('Index File copied successfully.')
    #Getting the name of .index file
    %cd /content/Index_Temp
    import os
    # Get the current working directory
    indexfile_directory = os.getcwd()
    # List all files in the current directory
    files = os.listdir(indexfile_directory)
    # Get the first filename from the list
    first_filename = files[0]
    print(first_filename)
    # Save the filename as a variable
    index_filename = first_filename
    #Deleting Index_Temp folder
    shutil.rmtree('/content/Index_Temp')
    %cd /content/RVC
    #############
    model_path = "/content/RVC/assets/weights/" + model_filename
    index_path = "/content/RVC/logs/" + model_name + "/" + index_filename
    #model_path = "/content/RVC/assets/weights/My-Voice.pth"#@param {type:"string"}
    #index_path = "/content/RVC/logs/My-Voice/added_IVF439_Flat_nprobe_1_My-Voice_v2.index"#@param {type:"string"}
    from colorama import Fore
    print(Fore.GREEN + f"{index_path} was found") if os.path.exists(index_path) else print(Fore.RED + f"{index_path} was not found")
    #@markdown ---
    #@markdown **Set the "pitch" value if you are clonning...**
    #@markdown **- Male to Male OR Female to Female : 0**
    #@markdown **- Female to Male : -12**
    #@markdown **- Male to Female : 12**
    pitch = -12 # @param {type:"slider", min:-12, max:12, step:1}
    #input_path = "/content/sample_data/input_audio.wav"#@param {type:"string"}
    input_path = "/content/sample_data/input_audio.wav"
    if not os.path.exists(input_path):
        raise ValueError(f"{input_path} was not found in your RVC folder.")
    os.environ['index_root']  = os.path.dirname(index_path)
    index_path = os.path.basename(index_path)
    #@markdown ---
    f0_method = "rmvpe" # @param ["rmvpe", "pm", "harvest"]
    save_as = "/content/RVC/audios/output_audio.wav"#@param {type:"string"}
    model_name = os.path.basename(model_path)
    os.environ['weight_root'] = os.path.dirname(model_path)
    index_rate = 0.5 # @param {type:"slider", min:0, max:1, step:0.01}
    volume_normalization = 0 #param {type:"slider", min:0, max:1, step:0.01}
    consonant_protection = 0.5 #param {type:"slider", min:0, max:1, step:0.01}
    !rm -f $save_as
    !python tools/cmd/infer_cli.py --f0up_key $pitch --input_path $input_path --index_path $index_path --f0method $f0_method --opt_path $save_as --model_name $model_name --index_rate $index_rate --device "cuda:0" --is_half True --filter_radius 3 --resample_sr 0 --rms_mix_rate $volume_normalization --protect $consonant_protection
    #show_errors = True #@param {type:"boolean"}
    show_errors = True
    if not show_errors:
        ipd.clear_output()
    ipd.Audio(save_as)

    Project Conclusion

    Together, we've successfully built a voice cloning application using RVC. To simplify everything, we also created the setup of our project on Google Colab which was a packaging of everything so that no local installation was needed. We had the pleasure of playing with a pre-trained model, dealt with quite a bit of audio engineering work and modified the model in ways that let us replicate voices and change their pitch.


    By the end of it, we didn't just prove a theory. We built a functional system which can take an audio sample and then turn it into a realistic clone of that person's voice. Whether you are a developer, a creator, or just someone with an interest in voice technology, this project demonstrates the effective use of AI in creating sound. We were truly blown away by the possibilities offered by synthesis of speech by artificial intelligence.Hope that this project will have the same effect on your creativity.


    So why limit yourself to this? Try more, enjoy more, and who knows, what other amazing voices you may manage to imitate!


    Challenges and Troubleshoot

    Challenge: Longtime Taken by the Model Training Due to Scarce Resources

    Solution: Google Colab must be preferred as it gives free access to GPU which helps in making the training faster. You can especially use Google Colab Pro for enhanced performance, but Google Colab's free version is going to be quite sufficient for most people.

    Challenge: Error occurred while installing dependencies

    Solution: If a package does not install, try reinstalling by typing !pip install -upgrade [library- name] or !pip install[specific version]. In cases of critical errors, the problems can also be fixed by restarting the Colab session.

    Challenge: Dataset Format Issues

    Solution: Use Librosa and PyDub for the preprocessing of audio. With these libraries, you are able to resample audio, change the format, as well as remove noise in the dataset. It's important that all audio files to be used are in the same format: WAV with the correct sample rate.

    Challenge: Long Inference Time or become Unresponsive

    Solution: During inference, shorter audio files should be used for testing. If files are large it might be good to divide the files into smaller parts in order to process them. Also, in Colab, make sure you have selected a GPU because it makes the process much quicker.


    FAQ

    1. What is voice cloning, and how does this project achieve it?

      • Answer: Essentially voice cloning means creating a machine-synthesized speech which is similar to a person’s natural voice. In this project, we apply RVC: retrieval-based voice conversion techniques along with deep learning to take an audio sample and replicate the voice by training a model on the provided data set.

    2. Does this voice cloning project require you to have a very powerful computer?

      • Answer: No you don’t! This project is built to work on Google Colab thus enabling you to carry out the entire process online without the need for a powerful computer. Furthermore, training models become easier because Colab provides GPU.

    3. What audio formats does voice cloning application accept or support?

      • Answer: The project incorporates popular audio formats like WAV and MP3. During pre-processing, we adopt Librosa and PyDub for the purpose of making the audio format suitable for voice cloning.

    4. Can I modify such aspects of the synthesized voice as pitch anytime?

      • Answer: Of course you can! This is the project that lets one change the scope of whichever attribute of the cloned voice inclusive of pitch, hence it can work on various changes such as change of sex to voice and vice versa.

    5. How long does it take to train the voice cloning model?

      • Answer:It depends on the amount of your data and the presence of a GPU. So with pre-trained models and in the case of moderate sized datasets, training in Google Colab is not very time consuming.

    Code Editor