News Classification Project

This project focuses on classifying news articles into two categories: fake news and true news, using various machine learning models and text vectorization techniques. The goal is to build a robust text classification model that can accurately distinguish between fake and true news.

Vectorization Techniques

Three different text vectorization techniques have been used:

Count Vectorizer: This technique converts text data into a numerical format based on the frequency of words.
TF-IDF Vectorizer: TF-IDF (Term Frequency-Inverse Document Frequency) is used to represent the importance of words in a document relative to the entire corpus.
Word2Vec Vectorizer: This method creates word embeddings by learning word associations within the text data.

Machine Learning Models

The following machine learning models have been implemented and evaluated for news classification:

Logistic Regression: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Naive Bayes: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Support Vector Machine (SVM): Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Random Forest: Initially implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer. An improved version of the Random Forest model with TF-IDF Vectorization is also presented.

Evaluation

The models have been evaluated using the test dataset, and the following metrics have been calculated:

Accuracy
F1 Score

Benchmark Improvement

The Random Forest model with TF-IDF Vectorization has been improved by adjusting its parameters for better performance.

Dependencies

Ensure you have the following Python libraries installed:

numpy
pandas
scikit-learn
gensim
matplotlib
seaborn

Usage

Clone this repository to your local machine:

git clone https://github.com/NikosMav/AI-FakeNews-Classification.git

Install the required dependencies:

pip install numpy pandas scikit-learn gensim matplotlib seaborn

Run the Jupyter Notebook or Python script to execute the machine learning models and perform news classification.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Feel free to explore the Jupyter Notebook for a detailed step-by-step explanation of the project and its implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

News Classification Project

Vectorization Techniques

Machine Learning Models

Evaluation

Benchmark Improvement

Dependencies

Usage

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

News Classification Project

Vectorization Techniques

Machine Learning Models

Evaluation

Benchmark Improvement

Dependencies

Usage

License