This repository consists of experiements conduced on Sarcasm and Irony datasets using various model architectures. The quality of the language model is measured through confusion matrix and detailed report is present in 15-report.pdf
- replace_url to replace urls with URL
- replace_hashtags to replace with HASHTAG
- replace_email to replace with EMAIL
- replace_mentions to replace with MENTION
- replace_numbers to replace with NUMBER
- remove_abbrevations to replace possessive words with their extended representations.
- remove_special_patterns to replace words like 10334m delimiter words found in corpus.
- remove_punctuation to remove the punctuations and it can be replaced with PUNCT
- replace_url to replace urls with URL
- Include special token '[EMOTICON]' for the sentence where emoticons and text baesd smilies are present
- Include special token '[ELONGATED]' for the sentences where words with elongaged expressions present like "foreveeer", "yayyy", "Aweeeeesome", etc
create virtual environment and install dependencies transformers, nltk, gensim, sklearn to reproduce https://drive.google.com/drive/folders/1wwpnXvfuH1vbCFFsTfMj_xuE1fhlSRZd?usp=drive_link
Often times the execution may fail if python path is not set correctly. Try loading the project to IDE for smooth execution.
Contact author for any queries to reproduce the results