KothaGPT/bilingual-corpus
Viewer • Updated • 18 • 73
This repository contains the complete collection of KothaGPT bilingual language models and tools for Bangla (Bengali) and English languages. All models have been updated and published to the Hugging Face Hub.
Last Updated: January 2026
Organization: KothaGPT
License: Apache 2.0
All models are published using the automated script:
HF_TOKEN=your_token bash scripts/huggingface/publish_all.sh false
hf upload-large-folder for better large file handling| Model | Parameters | Files | Size | Use Case |
|---|---|---|---|---|
| bilingual-lm | ~125M | 42 | ~500MB | General text generation |
| literary-lm | ~125M | 2 | ~5MB | Literary text analysis |
| readability-classifier | - | 5 | ~2MB | Text assessment |
| sentiment-tone-classifier | - | 2 | ~1MB | Sentiment analysis |
| text-complexity-predictor | - | 1 | ~505KB | Complexity scoring |
| poetic-meter-detector | - | 2 | ~1MB | Poetry analysis |
| metaphor-simile-detector | - | 2 | ~1MB | Literary analysis |
| named-entity-recognizer | - | 2 | ~1MB | Entity extraction |
| cross-lingual-embed | - | 1 | ~1MB | Embeddings |
| style-transfer-gpt | - | 2 | ~1MB | Style transfer |
| tokenizer | - | 2 | ~262KB | Tokenization |
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load main bilingual model
tokenizer = AutoTokenizer.from_pretrained("KothaGPT/bilingual-lm")
model = AutoModelForCausalLM.from_pretrained("KothaGPT/bilingual-lm")
# Load classifier
classifier = AutoModelForSequenceClassification.from_pretrained("KothaGPT/readability-classifier")
models = {
"sentiment": "KothaGPT/sentiment-tone-classifier",
"readability": "KothaGPT/readability-classifier",
"complexity": "KothaGPT/text-complexity-predictor"
}
for task, model_name in models.items():
# Load and process
pass
models/ directorybash scripts/huggingface/publish_all.sh falseAll models in this collection are licensed under Apache 2.0. See individual model repositories for specific usage terms.
Note: This collection represents the complete suite of KothaGPT bilingual models. Models are regularly updated with new training data and improved architectures.