speech_recognition
Public Member Functions | Public Attributes | List of all members
prepare.DataPreparation Class Reference

Public Member Functions

def __init__ (self, csv_path, srs_path, dataset)
 
def csv_check (self)
 
def spk2gender (self)
 
def spk2utt (self)
 
def text (self)
 
def utt2spk (self)
 
def wavscp (self)
 

Public Attributes

 csv_data_root
 
 csv_delimiter
 
 csv_path
 
 flag
 
 index
 
 srs_path
 
 srs_path_data
 
 srs_path_data_dataset
 
 total_files
 

Detailed Description

Class DataPreparation takes 2 input arguments

Args:
    param1 (str): Absolute path of the CSV metadata file
    param2 (str): Absolute path of the root directory of Speech Recognition System
    param3 (str): Type of Dataset

Definition at line 48 of file prepare.py.

Constructor & Destructor Documentation

◆ __init__()

def prepare.DataPreparation.__init__ (   self,
  csv_path,
  srs_path,
  dataset 
)
DataPreparation class constructor 

Definition at line 57 of file prepare.py.

Member Function Documentation

◆ csv_check()

def prepare.DataPreparation.csv_check (   self)
csvCheck performs the preliminary checks before the construction of
the required data files.

The following flow has been established:
    1. Read the CSV file to find the delimiter
    2. Read the Header of the CSV file
    3. Check whether the Header contains fields and store their indices:
a. SPEAKER_ID
b. WAV_PATH         (relative to the CSV file)
c. transcription
d. GENDER
e. UTTERANCE_ID
    4. Read every row of the CSV file and check if all the wav paths exist
    5. Check whether number of wav files and transcriptions are equal

Definition at line 108 of file prepare.py.

◆ spk2gender()

def prepare.DataPreparation.spk2gender (   self)
spk2gender prepares the file 'spk2gender' in the DATASET directory.

The following flow has been established:
1. Read the CSV file
2. Read the first row (First Speaker) and extract the gender details
3. Search for a new speaker and extract its gender details
4. Write an output file where each line has the structure:
    <SPEAKER_ID><Tab_space><GENDER>

Definition at line 246 of file prepare.py.

◆ spk2utt()

def prepare.DataPreparation.spk2utt (   self)
spk2utt prepares the file 'spk2utt' in the DATASET directory.

The following flow has been established:
1. Read 'utt2spk' from the DATASET directory (if missing, create it)
2. From each row extract:
   a. FILE_ID
   b. SPEAKER_ID
3. Write an output file where each line has the structure:
   <SPEAKER_ID> <FILE_ID_1> <FILE_ID_2> ... <FILE_ID_END>

Definition at line 369 of file prepare.py.

◆ text()

def prepare.DataPreparation.text (   self)
text prepares the file 'text' in the DATASET directory.

The following flow has been established:
1. Read the CSV file
2. From each row extract:
    a. SPEAKER_ID
    b. UTTERANCE_ID
    c. TRANSCRIPTION
3. Make file id "<SPEAKER_ID>U<UTTERANCE_ID>"
4. Write an output file where each line has the structure:
    <FILE_ID><Tab_space><TRANSCRIPTION>

Definition at line 209 of file prepare.py.

◆ utt2spk()

def prepare.DataPreparation.utt2spk (   self)
utt2spk prepares the file 'utt2spk' in the DATASET directory.

The following flow has been established:
1. Read the CSV file
2. From each row extract:
    a. SPEAKER_ID
    b. UTTERANCE_ID
3. Make FILE_ID "<SPEAKER_ID>U<UTTERANCE_ID>"
4. Write an output file where each line has the structure:
    <FILE_ID><Tab_space><SPEAKER_ID>

Definition at line 334 of file prepare.py.

◆ wavscp()

def prepare.DataPreparation.wavscp (   self)
wavscp prepares the file 'wav.scp' in the DATASET directory.

The following flow has been established:
1. Read the CSV file
2. From each row extract:
    a. SPEAKER_ID
    b. UTTERANCE_ID
    c. WAV_PATH     (relative to the CSV file)
3. Make FILE_ID "<SPEAKER_ID>U<UTTERANCE_ID>"
4. Make FILE_PATH "<CSV_DATA_ROOT>/<WAV_PATH>"
5. Write an output file where each line has the structure:
    <FILE_ID><Tab_space><FILE_PATH>

Definition at line 295 of file prepare.py.

Member Data Documentation

◆ csv_data_root

prepare.DataPreparation.csv_data_root

Definition at line 68 of file prepare.py.

◆ csv_delimiter

prepare.DataPreparation.csv_delimiter

Definition at line 128 of file prepare.py.

◆ csv_path

prepare.DataPreparation.csv_path

Definition at line 67 of file prepare.py.

◆ flag

prepare.DataPreparation.flag

Definition at line 70 of file prepare.py.

◆ index

prepare.DataPreparation.index

Definition at line 71 of file prepare.py.

◆ srs_path

prepare.DataPreparation.srs_path

Definition at line 86 of file prepare.py.

◆ srs_path_data

prepare.DataPreparation.srs_path_data

Definition at line 87 of file prepare.py.

◆ srs_path_data_dataset

prepare.DataPreparation.srs_path_data_dataset

Definition at line 88 of file prepare.py.

◆ total_files

prepare.DataPreparation.total_files

Definition at line 177 of file prepare.py.


The documentation for this class was generated from the following file: