Instructions to use LLM360/K2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM360/K2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM360/K2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM360/K2") model = AutoModelForCausalLM.from_pretrained("LLM360/K2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLM360/K2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM360/K2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM360/K2
- SGLang
How to use LLM360/K2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM360/K2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM360/K2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/K2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM360/K2 with Docker Model Runner:
docker model run hf.co/LLM360/K2
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - nlp | |
| - llm | |
| # K2 - Deciphering Llama 2 70B | |
| K2 is a fully transparent large language model on par with Llama 2 - 70B. | |
| ## Evaluations | |
| <center><img src="eval_table_temp.png" alt="eval table"/></center> | |
| ## Datasets and Mix | |
| The following data mix was used to train K2 and achieve results in line with Llama 2 70B. The full data sequence will be available soon. | |
| | Dataset | Starting Tokens | Multiplier | Total Tokens |% of Total | | |
| | ----------- | ----------- | ----------- | ----------- | ----------- | | |
| | dm-math | 4.33B | 3x | 13B | 1% | | |
| | pubmed-abstracts | 4.77B | 3x | 14.3B | 1.1% | | |
| | uspto | 4.77B | 3x | 14.3B | 1.1% | | |
| | pubmed-central | 26B | 1x | 26B | 2% | | |
| | redpajama.arxiv | 27.3B | 1x | 27.3B | 2.1% | | |
| | starcoder.spm | 67.6B | 0.5x | 33.8B | 2.6% | | |
| | starcoder.fim | 67.6B | 0.5x | 33.8B | 2.6% | | |
| | redpajama.stackexchange | 61.1B | 1x | 61.1B | 4.7% | | |
| | starcoder | 132.6B | 0.5x | 66.3B | 5.1% | | |
| | pile-of-law | 76.7B | 1x | 76.7B | 5.9% | | |
| | redpajama.book | 80.6B | 1x | 80.6B | 6.2% | | |
| | s2orc | 107.9B | 1x | 107.9B | 8.3% | | |
| | redpajama.wikipedia | 22.1B | 6x | 132.6B | 10.2% | | |
| | refinedweb | 612.3B | 1x | 612.3B | 47.1% | | |
| | Totals | - | - | 1.3T | 100% | | |
| ## First 10 Checkpoints | |
| | Checkpoints | | | |
| | ----------- | ----------- | | |
| | Checkpoint 360[link] | Checkpoint 355[link] | | |
| | Checkpoint 359[link] | Checkpoint 354[link] | | |
| | Checkpoint 358[link] | Checkpoint 353[link] | | |
| | Checkpoint 357[link] | Checkpoint 352[link] | | |
| | Checkpoint 356[link] | Checkpoint 351[link] | | |
| ## Additional Artifacts | |
| We are working on release caliber artifacts for the dataset, code, and analysis which will be released over the next few weeks. | |
| ## Model Description | |
| - **Model type:** Language model with the same architecture as LLaMA. | |
| - **Language(s) (NLP):** English | |
| - **License:** Apache 2.0 | |
| - **Resources for more information:** | |
| - [Training Code] | |
| - [Data Preparation] | |
| - [Metrics] | |
| - [Fully processed Amber pretraining data] | |
| ## About LLM360 | |
| LLM360 is an initiative for comprehensive and fully open-sourced LLMs, | |
| where all training details, model checkpoints, intermediate results, and | |
| additional analyses are made available to the community. Our goal is to advance | |
| the field by inviting the community to deepen the understanding of LLMs | |
| together. As the first step of the project LLM360, we release all intermediate | |
| model checkpoints, our fully-prepared pre-training dataset, all source code and | |
| configurations, and training details. We are | |
| committed to continually pushing the boundaries of LLMs through this open-source | |
| effort. | |
| [Visit us](https://www.llm360.ai/) | |