Skip to content

Several popular machine learning problems, solved using Scikit-learn.

Notifications You must be signed in to change notification settings

YoussefEssDS/ML-classic-problems-with-Python

Repository files navigation

ML-classic-problems-with-Python

Working on most known machine learning problems.

I will Precisely cover:

1-SPAM filter. (Using Sckitlearn I will implement a Naive Bayes model to classify emails).

2-Prediction of candidates to invite for job interviews.

Using Decision trees / Random Forests algorithms, we will implement a tool that uses historic data to predict wither to invite a candidate for a job interview or not based on skills and some demographic traits of the person such as: experience, gender,disability, and index if the candidate belongs to an under-represented minority.

See the following link for further info about the dataset: https://www.kaggle.com/vingkan/strategeion-resume-skills/downloads/strategeion-resume-skills.zip/2

3-Prediction of credit card defaulting clients:

Using SVM (Support vector machines algorithm), we will implement a tool that uses data from defaulting clients in Taiwan between April & September 2005 as a training dataset to predict if a new client is likely to default in the following month, based on their demographic traits and financial situation.

For further info about the dataset take a look at this link: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

4-Cat Vs. Not Cat image classification:

Using a neural network, we will implement one of the most classic computer vision problems, a model that takes as input an image and predicts if it is a cat picture or not. You can check the dataset at this link: (It's in h5 format) https://www.kaggle.com/mriganksingh/cat-images-dataset#train_catvnoncat.h5

5-Movie recommender:

Using the data from movielens 100k we will build a movie recommendation system, that, giving movies you rated, can provide you suggestions of movies you might like. You can find out more about the dataset used here: https://grouplens.org/datasets/movielens/100k/

Part I: We use correlation between movies rating vectors. Part II: We use instead correlation matrix between each movie and the others based on raitings.