Skip to content

This repo contains the files I worked on and my submission notebooks on Kaggle for the Jigsaw Multilingual Toxic Comment Classification contest

Notifications You must be signed in to change notification settings

nikjohn7/Jigsaw-Multilingual-Toxic-Comment-Classification

Repository files navigation

Jigsaw Multilingual Toxic Comment Classification

This is my submission for the Jigsaw Multilingual Toxic Comment Classification Kaggle competition. This is my initial submission. I will be modifying and trying to improve my score. If you're attempting this for the first time, feel free to fork this repo and make modifications to the code. Do let me know if you're able to improve the score. Running this on Kaggle will give you a result accuracy of ~91%

Contest details

This competition is based on Conversation AI, an initiative of Jigsaw and Google. The main area of focus is creating machine learning models that can identify toxicity in online conversations, where toxicity is defined as anything rude, disrespectful, or otherwise likely to make someone leave a discussion.

Data

You can access all datasets and details of each file from here

How to run

All the files included in src are sufficient to train and test the model.

The Jupyter notebooks Jigsaw-multilingual-nikhiljohn.ipynb and jigsaw-inference-nikhiljohn.ipynb are my Kaggle notebooks, one for training and one for inference respectively. Feel free to use them too. If you use this, make sure to use the TPUs provided by Kaggle. If you need a guide on how to work with TPUs, use this link. It's a video tutorial by Abhishek Thakur, a data scientist I really admire.

About

This repo contains the files I worked on and my submission notebooks on Kaggle for the Jigsaw Multilingual Toxic Comment Classification contest

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published