Skip to content

A Python application which presents the top similar job matches based on search keywords using KNN to find similar jobs online and K-means to cluster similar jobs together

Notifications You must be signed in to change notification settings

sakethsaxena/Intelligent-Career-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Intelligent Career Agent

San Jose State University ..

This project was created towards completion of course requirements for CS265-F17 - Topics in Artificial Intelligence

The Intelligent Career Agent takes user input for keywords to search for scrapes ieee.org, acm.org and indeed.com for related jobs and gives a list of the closest matches using KNN to give the K closest sites and K means to cluster the similar sites together within the data.


Usage : in your terminal run python CareerAgent.py and the interactive command line interface will ask for your input. The script then scrapes the websites mentioned above and generates a lookuptable and stores it on your the system as a shelve file when you search for the first key word.

Files:

careerlookup - this file will be created by shelve to store the lookup table, it is a shelve file which is an extension of pickle
    The script uses shelve as a data structure/ lookup table to store search results and links to access them as quickly as possible    
    when they are searched again. After more than 24 hours, if a word is searched for again the lookup table is updated.

CareerResults.html - this file is generated by the script to output the result of a query in the browser

A word about the algorithms:


KNN - I have embedded the KNN algorithm into the lookup table generation by calculating the jaccard similarity and sorting the 
    list in a descending order of them so as to give the k nearest neighbours extremely quickly if the key is present in the look up 
    table. The script simply has to get the top K links.

K-means clustering algorithm - I have used the jaccard similarity to cluster vectors together 
                                as for the stopping measure I have used difflib which is a python package which uses 
                                Gestalt pattern matching to produce a similarity ratio betwen two vectors in the range of [0,1]
                                My stopping parameters are as follows:
                                    if the average similarity value - averagesm for all clusters is greater than 0.7, program 
                                    returns the given clusters.(therefore, if the similarity is about 70%)
                                    If the absolute difference between the current avereage similarity and the old average 
                                    similarity is less than 0.1. (that is there is less than 10% dissimilarity between the two)

The output of the script is displayed in a new page in the browser

About

A Python application which presents the top similar job matches based on search keywords using KNN to find similar jobs online and K-means to cluster similar jobs together

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages