Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
This is a Statistical Learning application which will consist of various Machine Learning algorithms and their implementation in R done by me and their in depth interpretation.Documents and reports related to the below mentioned techniques can be found on my Rpubs profile.
This repository not only contains experience about parameter finetune, but also other in-practice experience such as model ensemble (boosting, bagging and stacking) in Kaggle or other competitions.
It is from Kaggle Competitions where the training dataset is very small and the testing dataset is very large and we have to avoid or reduce overfiting by looking for best possible ways to overcome the most popular problem faced in field of predictive analytics.
Analysed syntax and Semantics of Corpus of Text Documents Retrieved from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.
The objective of this project is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.
Performance analysis of Decisions Trees, Boosting & Bagging, KNN, Neural Network and Linear Regression algorithms. Over two Data Sets (meant-to-be) very different in nature and volume.