ASR-with-Speech-Sentiment-&-Text-Summarizer

Introduction

This project aims to develop an advanced system that integrates Automatic Speech Recognition (ASR), Speech Emotion Recognition (SER), and Text Summarizer. The system will address challenges in accurate speech recognition across diverse accents and noisy environments, providing real-time emotional tone interpretation (sentiment analysis), and generating summaries to retain essential information. Targeting applications such as customer service, business meetings, media, and education, this project seeks to enhance documentation, understanding, and emotional context in communication.

Intermediate Goals

Baseline Model for ASR: CNN-BiLSTM
Baseline Model for SER: XGBoost
Baseline Model for Text Summarizer: T5-Small, T5-Base
Final Model for ASR: Conformer
Final Model for SER
Final Model for Text Summarizer: BART Large

Goals

Accurate ASR System: Handle diverse accents and operate effectively in noisy environments
Emotion Analysis: Through tone of speech
Meaningful Text Summarizer: Preserve critical information without loss
Integrated System: Combine all components to provide real-time transcription and summaries

Contributors

Project Architecture

1. ASR (Automatic Speech Recognition)

Base Model (CNN-Bi_LSTM)	Final Model

2. SER (Speech Emotion Recognition)

Base Model (XGBoost)	Final Model

3. Text Summarizer

Base Model (T5-Small, T5-Base)	Final Model

High Level Next Steps

Usage

Clone the Repository

Important

To clone the repository with its sub-modules, enter the following command:

git clone --recursive https://github.com/LuluW8071/ASR-with-Speech-Sentiment-and-Text-Summarizer.git

1. Install Required Dependencies

Important

Before installing dependencies from requirements.txt, make sure you have installed
No need to install CUDA ToolKit and PyTorch CUDA for inferencing. But make sure to install PyTorch CPU.

CUDA ToolKit v11.8/12.1
PyTorch

SOX

For Linux:

sudo apt update
sudo apt install sox libsox-fmt-all build-essential zlib1g-dev libbz2-dev liblzma-dev

# Verify installation
sox --version

pip install -r requirements.txt

2. Configure Comet-ML Integration

Note

Replace dummy_key with your actual Comet-ML API key and project name in the .env file to enable real-time loss curve plotting, system metrics tracking, and confusion matrix visualization.

API_KEY = "dummy_key"
PROJECT_NAME = "dummy_key"

Usage Instructions

ASR (Automatic Speech Recognition)

1. Audio Conversion

Note

--not-convert if you don't want audio conversion

py common_voice.py --file_path file_path/to/validated.tsv
                   --save_json_path file_path/to/save/json
                   -w 4
                   --percent 10
                   --output_format wav/flac

2. Train Model

Note

--checkpoint_path path/to/checkpoint_file to load pre-trained model and fine tune on it.

py train.py --train_json path/to/train.json
            --valid_json path/to/test.json
            -w 4 
            --batch_size 128 
            -lr 2e-4 
            --epochs 20

3. Sentence Extraction

py extract_sentence.py --file_path file_path/to/validated.tsv
                       --save_txt_path file_path/to/save/json

Speech Sentiment

1. Audio Downsample and Augment

Note

Run the Speech_Sentiment.ipynb first to get the path and emotions table in csv format and downsample all clips.

py downsample.py --file_path path/to/audio_file.csv 
                 --save_csv_path output/path 
                 -w 4 
                 --output_format wav/flac

py augment.py --file_path "path/to/emotion_dataset.csv" 
              --save_csv_path "output/path" 
              -w 4 
              --percent 20

2. Train the Model

py neuralnet/train.py --train_csv "path/to/train.csv" 
                      --test_csv "path/to/test.csv" 
                      -w 4 
                      --batch_size 256 
                      --epochs 25 
                      -lr 1e-3

Text Summarization

Note

Just run the Notebook File in src/Text_Summarizer directory. You may need 🤗 Hugging Face Token with write permission file to upload your trained model directly on the 🤗 HF hub.

Data Source

Project	Dataset Source
ASR	Mozilla Common Voice
SER	RAVDESS, CremaD, TESS, SAVEE
Text Summarizer	XSum, BillSum

Code Structure

The code styling adheres to autopep8 formatting.

Results

Project	Base Model Link	Final Model Link
ASR	CNN-BiLSTM	Conformer
SER	XGBoost
Text Summarizer	T5 Small-FineTune, T5 Base-FineTune	BART

Metrics Used

Project	Metrics Used
ASR	WER, CER
SER	Accuracy, F1-Score, Precision, Recall
Text Summarizer	Rouge1, Rouge2, Rougel, Rougelsum, Gen Len

Loss Curve Evaluation

Project	Base Model	Final Model Link
ASR	CNN-BiLSTM
Speech Sentiment	XGBoost
Text Summarizer

Evaluation Metrics Results

Project	Base Model	Final Model Link
ASR	CNN-BiLSTM
Speech Sentiment	XGBoost
Text Summarizer

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github		.github
docs		docs
notebook		notebook
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASR-with-Speech-Sentiment-&-Text-Summarizer

Introduction

Intermediate Goals

Goals

Contributors

Project Architecture

1. ASR (Automatic Speech Recognition)

2. SER (Speech Emotion Recognition)

3. Text Summarizer

High Level Next Steps

Usage

Clone the Repository

1. Install Required Dependencies

2. Configure Comet-ML Integration

Usage Instructions

ASR (Automatic Speech Recognition)

1. Audio Conversion

2. Train Model

3. Sentence Extraction

Speech Sentiment

1. Audio Downsample and Augment

2. Train the Model

Text Summarization

Data Source

Code Structure

Results

Metrics Used

Loss Curve Evaluation

Evaluation Metrics Results

About

Packages

Contributors 3

Languages

License

LuluW8071/ASR-with-Speech-Sentiment-and-Text-Summarizer

Folders and files

Latest commit

History

Repository files navigation

ASR-with-Speech-Sentiment-&-Text-Summarizer

Introduction

Intermediate Goals

Goals

Contributors

Project Architecture

1. ASR (Automatic Speech Recognition)

2. SER (Speech Emotion Recognition)

3. Text Summarizer

High Level Next Steps

Usage

Clone the Repository

1. Install Required Dependencies

2. Configure Comet-ML Integration

Usage Instructions

ASR (Automatic Speech Recognition)

1. Audio Conversion

2. Train Model

3. Sentence Extraction

Speech Sentiment

1. Audio Downsample and Augment

2. Train the Model

Text Summarization

Data Source

Code Structure

Results

Metrics Used

Loss Curve Evaluation

Evaluation Metrics Results

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Contributors 3

Languages

Packages