Advanced NLP – Project Submission

About

This repository consists of experiements conduced on Sarcasm and Irony datasets using various model architectures. The quality of the language model is measured through confusion matrix and detailed report is present in 15-report.pdf

Project files

Common Util files

preprocessing.ipynb module consists of various methods to load datasets, generate datasets, applying tokenization, etc

It consists of various cleaning methods to clean the text like following:

replace_url to replace urls with URL
replace_hashtags to replace with HASHTAG
replace_email to replace with EMAIL
replace_mentions to replace with MENTION
replace_numbers to replace with NUMBER
remove_abbrevations to replace possessive words with their extended representations.
remove_special_patterns to replace words like 10334m delimiter words found in corpus.
remove_punctuation to remove the punctuations and it can be replaced with PUNCT

Transformers specific preprocessing

Since transformers are powered by powerful tokenization like BPE etc, we restrcited preprocessing to following:

replace_url to replace urls with URL
Include special token '[EMOTICON]' for the sentence where emoticons and text baesd smilies are present
Include special token '[ELONGATED]' for the sentences where words with elongaged expressions present like "foreveeer", "yayyy", "Aweeeeesome", etc

Model specific files

experiment/Irony_bilstm.ipynb consists of training code and evaluation loop methods for Bilstm model with attention

experiment/irony_transformers_hf.ipynb consists of bidirectional encoder transformer model implementation

experiment/irony_transformers_torch.ipynb consists of transformer encoder model implementation

experiment/setfit_impl.ipynb consists of setfit few shot training implementation

experiment/irony_tf_exponential.ipynb consists of exponential task specific postional encoding transformer training implementation

Dependencies

create virtual environment and install dependencies transformers, nltk, gensim, sklearn to reproduce https://drive.google.com/drive/folders/1wwpnXvfuH1vbCFFsTfMj_xuE1fhlSRZd?usp=drive_link

Troubleshooting

Often times the execution may fail if python path is not set correctly. Try loading the project to IDE for smooth execution.

Contact

Contact author for any queries to reproduce the results

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
code-mix		code-mix
experiments		experiments
src		src
.gitignore		.gitignore
15-Outline.pdf		15-Outline.pdf
15-final-presentation.pptx		15-final-presentation.pptx
Sarcasm and Irony Detection_report.docx		Sarcasm and Irony Detection_report.docx
preprocessing.ipynb		preprocessing.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced NLP – Project Submission

About

Project files

Common Util files

Transformers specific preprocessing

Model specific files

Dependencies

Troubleshooting

Contact

About

Releases

Packages

Contributors 3

Languages

tanalpha-aditya/NLP-Sarcasm-Irony-Detection

Folders and files

Latest commit

History

Repository files navigation

Advanced NLP – Project Submission

About

Project files

Common Util files

Transformers specific preprocessing

Model specific files

Dependencies

Troubleshooting

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages