Text Generation
Transformers
PyTorch
Nepali (individual language)
Nepali (macrolanguage)
gpt2
goldfish
text-generation-inference
Instructions to use goldfish-models/nep_deva_1000mb with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use goldfish-models/nep_deva_1000mb with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="goldfish-models/nep_deva_1000mb")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("goldfish-models/nep_deva_1000mb") model = AutoModelForCausalLM.from_pretrained("goldfish-models/nep_deva_1000mb") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use goldfish-models/nep_deva_1000mb with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "goldfish-models/nep_deva_1000mb" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "goldfish-models/nep_deva_1000mb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/goldfish-models/nep_deva_1000mb
- SGLang
How to use goldfish-models/nep_deva_1000mb with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "goldfish-models/nep_deva_1000mb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "goldfish-models/nep_deva_1000mb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "goldfish-models/nep_deva_1000mb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "goldfish-models/nep_deva_1000mb", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use goldfish-models/nep_deva_1000mb with Docker Model Runner:
docker model run hf.co/goldfish-models/nep_deva_1000mb
| license: apache-2.0 | |
| language: | |
| - npi | |
| - nep | |
| datasets: | |
| - allenai/MADLAD-400 | |
| - oscar-corpus/OSCAR-2109 | |
| - allenai/nllb | |
| - cis-lmu/Glot500 | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - goldfish | |
| - arxiv:2408.10441 | |
| # nep_deva_1000mb | |
| Goldfish is a suite of monolingual language models trained for 350 languages. | |
| This model is the <b>Nepali</b> (Devanagari script) model trained on 1000MB of data, after accounting for an estimated byte premium of 2.63; content-matched text in Nepali takes on average 2.63x as many UTF-8 bytes to encode as English. | |
| The Goldfish models are trained primarily for comparability across languages and for low-resource languages; Goldfish performance for high-resource languages is not designed to be comparable with modern large language models (LLMs). | |
| Note: nep_deva is a [macrolanguage](https://iso639-3.sil.org/code_tables/639/data) code. None of its contained individual languages are included in Goldfish (for script deva). | |
| All training and hyperparameter details are in our paper, [Goldfish: Monolingual Language Models for 350 Languages (Chang et al., 2024)](https://www.arxiv.org/abs/2408.10441). | |
| Training code and sample usage: https://github.com/tylerachang/goldfish | |
| Sample usage also in this Google Colab: [link](https://colab.research.google.com/drive/1rHFpnQsyXJ32ONwCosWZ7frjOYjbGCXG?usp=sharing) | |
| ## Model details: | |
| To access all Goldfish model details programmatically, see https://github.com/tylerachang/goldfish/blob/main/model_details.json. | |
| All models are trained with a [CLS] (same as [BOS]) token prepended, and a [SEP] (same as [EOS]) token separating sequences. | |
| For best results, make sure that [CLS] is prepended to your input sequence (see sample usage linked above)! | |
| Details for this model specifically: | |
| * Architecture: gpt2 | |
| * Parameters: 124770816 | |
| * Maximum sequence length: 512 tokens | |
| * Training text data (raw): 2629.97MB | |
| * Training text data (byte premium scaled): 1000.005MB | |
| * Training tokens: 215368192 (x10 epochs) | |
| * Vocabulary size: 50000 | |
| * Compute cost: 1.09918502977536e+18 FLOPs or ~103.9 NVIDIA A6000 GPU hours | |
| Training datasets (percentages prior to deduplication): | |
| * 53.66687%: [MADLAD-400 (CommonCrawl)](https://huggingface.co/datasets/allenai/MADLAD-400) | |
| * 23.99086%: [OSCAR 2021/09](https://huggingface.co/datasets/oscar-corpus/OSCAR-2109) | |
| * 16.12038%: [NLLB (CommonCrawl and ParaCrawl)](https://huggingface.co/datasets/allenai/nllb) | |
| * 5.97982%: [Glot500](https://huggingface.co/datasets/cis-lmu/Glot500), including [CCNet](https://github.com/facebookresearch/cc_net), [Earthlings](https://publicdata.canterbury.ac.nz/Research/Geocorpus/CCGLU_v5.0/), [Tatoeba](https://tatoeba.org/en/), [TICO](https://tico-19.github.io/), [W2C](https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0022-6133-9), [WikiMatrix](https://github.com/facebookresearch/LASER/tree/main/tasks/WikiMatrix) | |
| * 0.24207%: [eBible](https://ebible.org/find/) | |
| ## Citation | |
| If you use this model, please cite: | |
| ``` | |
| @article{chang-etal-2024-goldfish, | |
| title={Goldfish: Monolingual Language Models for 350 Languages}, | |
| author={Chang, Tyler A. and Arnett, Catherine and Tu, Zhuowen and Bergen, Benjamin K.}, | |
| journal={Preprint}, | |
| year={2024}, | |
| url={https://www.arxiv.org/abs/2408.10441}, | |
| } | |
| ``` | |