Instructions to use LLM360/K2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM360/K2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM360/K2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM360/K2") model = AutoModelForCausalLM.from_pretrained("LLM360/K2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLM360/K2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM360/K2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM360/K2
- SGLang
How to use LLM360/K2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM360/K2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM360/K2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM360/K2 with Docker Model Runner:
docker model run hf.co/LLM360/K2
Update README.md
Browse files
README.md
CHANGED
|
@@ -16,23 +16,22 @@ K2 is a fully transparent large language model on par with Llama 2 - 70B.
|
|
| 16 |
<center><img src="eval_table_temp.png" alt="eval table"/></center>
|
| 17 |
|
| 18 |
## Datasets and Mix
|
| 19 |
-
| Dataset | Starting Tokens | Multiplier | Total Tokens |
|
| 20 |
| ----------- | ----------- | ----------- | ----------- | ----------- |
|
| 21 |
-
| dm-math | 4.
|
| 22 |
-
|
|
| 23 |
-
|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
|
|
| 27 |
-
|
|
| 28 |
-
|
|
| 29 |
-
|
|
| 30 |
-
|
|
| 31 |
-
|
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
| 35 |
-
| Checkpoint 356[link] | Checkpoint 351[link] | Checkpoint 355[link] | Checkpoint 355[link] |
|
| 36 |
|
| 37 |
## First 10 Checkpoints
|
| 38 |
| Checkpoints | |
|
|
|
|
| 16 |
<center><img src="eval_table_temp.png" alt="eval table"/></center>
|
| 17 |
|
| 18 |
## Datasets and Mix
|
| 19 |
+
| Dataset | Starting Tokens | Multiplier | Total Tokens |% of Total |
|
| 20 |
| ----------- | ----------- | ----------- | ----------- | ----------- |
|
| 21 |
+
| dm-math | 4.33B | 3x | 13B | 1% |
|
| 22 |
+
| pubmed-abstracts | 4.77B | 3x | 14.3B | 1.1% |
|
| 23 |
+
| uspto | 4.77B | 3x | 14.3B | 1.1% |
|
| 24 |
+
| pubmed-central | 26B | 1x | 26B | 2% |
|
| 25 |
+
| redpajama.arxiv | 27.3B | 1x | 27.3B | 2.1% |
|
| 26 |
+
| starcoder.spm | 67.6B | 0.5x | 33.8B | 2.6% |
|
| 27 |
+
| starcoder.fim | 67.6B | 0.5x | 33.8B | 2.6% |
|
| 28 |
+
| redpajama.stackexchange | 61.1B | 1x | 61.1B | 4.7% |
|
| 29 |
+
| starcoder | 132.6B | 0.5x | 66.3B | 5.1% |
|
| 30 |
+
| pile-of-law | 76.7B | 1x | 76.7B | 5.9% |
|
| 31 |
+
| redpajama.book | 80.6B | 1x | 80.6B | 6.2% |
|
| 32 |
+
| s2orc | 107.9B | 1x | 107.9B | 8.3% |
|
| 33 |
+
| redpajama.wikipedia | 22.1B | 6x | 132.6B | 10.2% |
|
| 34 |
+
| refinedweb | 612.3B | 1x | 612.3B | 47.1% |
|
|
|
|
| 35 |
|
| 36 |
## First 10 Checkpoints
|
| 37 |
| Checkpoints | |
|