Skip to content

An AI project using Multinomial Naive Bayes to identify toxic speech in English, Filipino, and Taglish (mix of both).

License

Notifications You must be signed in to change notification settings

Nesvier-Tech/toxic-speech-classifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

toxic-speech-classifier

An AI project using Multinomial Naive Bayes to identify toxic speech in English, Filipino, and Taglish (mix of both).

This GitHub repository contains the code and resources for a powerful Toxic Speech Classifier AI project that utilizes a Multinomial Naive Bayes classifier. The classifier is specifically designed to identify toxic or offensive language in text, and it supports English, Filipino, and Taglish (a mix of English and Filipino) languages.

Key Features:

  • Multinomial Naive Bayes Classifier: The core of this project is the implementation of a Multinomial Naive Bayes classifier, a popular machine learning algorithm for text classification tasks. It effectively models the frequency distribution of words in the training data to make predictions about the toxicity of new texts.

  • Language Support: The toxic speech classifier is designed to handle English, Filipino, and Taglish languages. This broad language coverage makes it versatile and adaptable to various cultural contexts.

  • Preprocessing and Feature Extraction: The project includes robust preprocessing techniques to clean and transform the text data before feeding it to the classifier. It also implements feature extraction methods to enhance the predictive power of the model.

  • Training and Evaluation: The repository provides training scripts that enable the creation of a robust toxic speech detection model. Additionally, evaluation scripts are available to assess the model's performance, including accuracy, precision, recall, and F1-score metrics. Dataset: A labeled dataset comprising toxic and non-toxic speech samples in English, Filipino, and Taglish is included in the repository. This dataset serves as the foundation for training the classifier and fine-tuning the model.

Usage: Clone the repository and navigate to the project directory. Install the necessary dependencies by following the instructions in the provided requirements.txt file. Explore the dataset, preprocess the data, and extract relevant features using the provided scripts. Train the Multinomial Naive Bayes classifier on the prepared dataset. Evaluate the trained model's performance using the evaluation scripts. Utilize the trained toxic speech classifier by integrating it into your own applications or services.

Contributing: Contributions to this project are welcome! If you have ideas for improvements or additional language support, please feel free to submit a pull request. Make sure to follow the established coding conventions and document any changes made.

License: This project is licensed under the MIT License. Please review the LICENSE file for more details.

Disclaimer: While the toxic speech classifier aims to identify offensive language accurately, it may not be 100% accurate. It is important to exercise caution and review the results in context. The developers are not liable for any consequences resulting from the use of this software.

Start detecting and combating toxic speech in English, Filipino, and Taglish languages with this Multinomial Naive Bayes-based AI project. Let's create a more respectful and inclusive online environment together!

About

An AI project using Multinomial Naive Bayes to identify toxic speech in English, Filipino, and Taglish (mix of both).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 82.7%
  • Dart 6.7%
  • C++ 5.0%
  • CMake 4.0%
  • Swift 0.5%
  • Python 0.4%
  • Other 0.7%