Skip to content

HumasLin/CSE-6240-Project

Repository files navigation

CSE-6240-Project

100cities_details_indeed.py: This program can scrape 100 data files from Indeed.com as job postings from these 100 cities, with information including salary, location, job title, job description, company, etc.

locs.txt: This file is a pure text file of 100 big cities in US

merge.py: This program can merge all the job posting information from 100 cities in US to produce a integrated dataset.

salary_prediction.ipynb: This notebook is able to execute following tasks: -transform specific salary values into salary levels; -predict job postings without salary values based on vectorized job description; -categorize job postings into 16 industries by counting the number of keywords from each industry; -calculate the confidence interval of gender scores in each industry/salary level.

*When executed, make sure all the data files are in the same directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published