Instructions to use TwinDoc/RedWhale-tv-10.8B-sft-k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TwinDoc/RedWhale-tv-10.8B-sft-k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TwinDoc/RedWhale-tv-10.8B-sft-k")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TwinDoc/RedWhale-tv-10.8B-sft-k") model = AutoModelForCausalLM.from_pretrained("TwinDoc/RedWhale-tv-10.8B-sft-k") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TwinDoc/RedWhale-tv-10.8B-sft-k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TwinDoc/RedWhale-tv-10.8B-sft-k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TwinDoc/RedWhale-tv-10.8B-sft-k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TwinDoc/RedWhale-tv-10.8B-sft-k
- SGLang
How to use TwinDoc/RedWhale-tv-10.8B-sft-k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TwinDoc/RedWhale-tv-10.8B-sft-k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TwinDoc/RedWhale-tv-10.8B-sft-k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TwinDoc/RedWhale-tv-10.8B-sft-k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TwinDoc/RedWhale-tv-10.8B-sft-k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TwinDoc/RedWhale-tv-10.8B-sft-k with Docker Model Runner:
docker model run hf.co/TwinDoc/RedWhale-tv-10.8B-sft-k
Model Description
K-S κ³ κ°μ¬ νλ‘μ νΈ μ μμ±ν RAG λ°μ΄ν°μ μ νμ©νμ¬ Supervised Fine-Tuning(a.k.a SFT) νμ΅ν λͺ¨λΈμ λλ€. νμ΅ λ°μ΄ν°μ μ 보μμ μν΄ κ³΅κ°νμ§ μμ΅λλ€.
About the Model
Name: TwinDoc/RedWhale-tv-10.8B-sft-k
Finetuned from model: TwinDoc/RedWhale-tv-10.8B-v1.0
Train Datasets: private
Developed by: μ μμΌμλ€ (AGILESODA)
Model type: llama
Language(s) (NLP): νκ΅μ΄
License: cc-by-nc-sa-4.0
train setting
- Lora r, alpha : 32, 32
- Dtype : bf16
- Epoch : 5
- Learning rate : 1e-5
- Global batch : 1
- Context length : 4096
inference setting
- BOS id : 1
- EOS id : 2
- Top-p : 0.95
- Temperature : 0.01
prompt template
### User: λΉμ μ μΈκ³΅μ§λ₯ λΉμμ
λλ€. μ¬μ©μκ° μ¬λ¬λΆμκ² κ³Όμ λ₯Ό μ€λλ€. λΉμ μ λͺ©νλ κ°λ₯ν ν μΆ©μ€νκ² μμ
μ μλ£νλ κ²μ
λλ€. μμ
μ μννλ λμ λ¨κ³λ³λ‘ μκ°νκ³ λ¨κ³λ₯Ό μ λΉννμΈμ. Userμ μ§λ¬Έμ΄ μ£Όμ΄μ§λ©΄ κ³ νμ§μ λ΅λ³μ λ§λ€μ΄μ£ΌμΈμ.
μλ¬Έ: {CONTEXT}
μ§λ¬Έ: μλ¬Έμ μ°Έκ³ νμ¬ λ΅λ³νμΈμ. {QUESTION}
### Assistant: {ANSWER}
License
The content of this project, created by AGILESODA, is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Citation
@misc{vo2024redwhaleadaptedkoreanllm,
title={RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining},
author={Anh-Dung Vo and Minseong Jung and Wonbeen Lee and Daewoo Choi},
year={2024},
eprint={2408.11294},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.11294},
}
Built with:
- Downloads last month
- -