- Problem Statement
- Project Motivation
- Instalation
- Files
- Results and Conclusions
- Licenses and Acknowledgements
- Consider a company's marketing campaign (Arvato Financial Services), in which we
need to select those individuals who can become the company's future customers. For this task,
we have the following databases: demographic information from Germany (country where the
company is located) and information from individuals who are already customers of this
company.
- First, the demographic information of the German population was analyzed in order to
understand and explore the main characteristics of this population.
- Then, we create a predictive model that can determine with reasonable accuracy whether
a person can become a possible consumer of the company, when subjected to a certain
marketing campaign.
- Finally, we classify each possible consumer, from an unexplored test database, and
submit the result on the kaggle platform.
- The project is a problem for a company, with real data and with several possible approaches. It is a rich set of data and an interesting problem to be solved. Submitting work on Kaggle is a way to compare the quality of our algorithm with of other students algorithms. That's why I chose to do this specific project that motivated me to learn even more.
- The following packages are necessary: numpy , datetime, pandas , matplotlib, seaborn , math, sklearn , pylab ,itertools, imblearn, pickle, xgboost
- project.pdf - Report with detailed explanation of the entire project.
- capstone_proposal.pdf - Report with a proposal for thus project.
- util.py - python module with basically data processing and feature engineering
- cluster.py -- python module with clustering methods for segmentation report
- pca.py -- python module with pca methods for dimensionality reduction
- Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns);
- Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns);
- Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns);
- Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns);
- unknown_values.csv: Mapping dictionary with attributes and the value of the unkown value
- The result of this work can be found in the file final_project.pdf, as well as any details of implementation, conclusions and future work
- The project is part of Udacity's machine learning nanodegree program. The data provided is not public, and belongs to Arvato and Udacity