Skip to content

A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.

Notifications You must be signed in to change notification settings

Keval-Gandevia/BigDataETLAndSentimentAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigDataETLAndSentimentAnalysis

Overview

This project provides a comprehensive solution for processing and analyzing Reuters news data. It includes:

  • A Java application for parsing and storing news articles in MongoDB.
  • An Apache Spark job for word frequency analysis directly from .sgm files.
  • A Java-based sentiment analysis implementation using a Bag-of-Words model which provides polarity of words.

Features

  • Data Parsing and Storage: Extracts news articles from .sgm files and stores them in a MongoDB database.
  • Word Frequency Analysis: Utilizes Apache Spark to count word frequencies in news articles.
  • Sentiment Analysis: Implements a Bag-of-Words model in Java to classify news article titles as positive, negative, or neutral.

Technologies Used

  • Java
  • MongoDB
  • Apache Spark
  • Bag-of-Words Model

About

A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages