This dataset consists of audio samples and metadata for a voice cloning task, particularly designed for the TTS (Text-to-Speech) system. The dataset has been pre-processed and includes high-quality audio files featuring the voice of Ratan Tata in English. Dataset Information

License: OpenRAIL (Open Responsible AI License)
Total Dataset Duration: 2880.55 seconds (approx. 48 minutes)
Minimum Audio Length: 3.65 seconds
Maximum Audio Length: 11.77 seconds
Average Audio Length: Varies, with an average around 7 seconds per clip.

Audio Format

Audio Channels: Mono (1 channel)
Sample Rate: 16 kHz
Precision: 16-bit
Audio Encoding: 16-bit Signed Integer PCM
Typical Bit Rate: 256 kbps

Example of an audio file specification (using SoX tool):

Input File : 'converted_ratan_tata_tts_1.wav' Channels : 1 Sample Rate : 16000 Precision : 16-bit Duration : 00:00:07.10 (approx. 7 seconds) File Size : 227 KB Bit Rate : 256 kbps Sample Encoding: 16-bit Signed Integer PCM

Datasets:

RamananR
/

Ratan_Tata_TTS_Data_English

You need to agree to share your contact information to access this dataset

Models trained or fine-tuned on RamananR/Ratan_Tata_TTS_Data_English

RamananR/Ratan_Tata_SpeechT5_Voice_Cloning_Model