close
The Wayback Machine - https://web.archive.org/web/20201029165546/https://github.com/Shubhamjain27/nlp-for-telugu
Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

NLP for Telugu

This repository contains State of the Art Language models and Classifier for Telugu language(spoken in Indian sub-continent)

The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)

Dataset

Created as part of this project

  1. Telugu Wikipedia Dataset

  2. Telugu News Dataset

  3. Telugu News Dataset II

Results

Language Model Perplexity

Architecture/Dataset Telugu Wikipedia Articles
ULMFiT 27.47
TransformerXL 29.44

Classification Metrics

ULMFiT
Dataset Accuracy Kappa Score
Telugu News Articles 95.4 93.8
Telugu News Articles - Andhra Jyoti 92.09

Visualizations

Embedding Space
Architecture Visualization
ULMFiT Embeddings projection
TransformerXL Embeddings projection

Pretrained Language Model

Download pretrained Language Model from here

Classifier

Download classifier from here

Tokenizer

Trained tokenizer using Google's sentencepiece

Download the trained model and vocabulary from here

About

State of the Art Language models and Classifier for Telugu language (spoken in Indian sub-continent)

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.