Instructions to use jojo0217/ChatSKKU5.8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jojo0217/ChatSKKU5.8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jojo0217/ChatSKKU5.8B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jojo0217/ChatSKKU5.8B") model = AutoModelForCausalLM.from_pretrained("jojo0217/ChatSKKU5.8B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jojo0217/ChatSKKU5.8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jojo0217/ChatSKKU5.8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jojo0217/ChatSKKU5.8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jojo0217/ChatSKKU5.8B
- SGLang
How to use jojo0217/ChatSKKU5.8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jojo0217/ChatSKKU5.8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jojo0217/ChatSKKU5.8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jojo0217/ChatSKKU5.8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jojo0217/ChatSKKU5.8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jojo0217/ChatSKKU5.8B with Docker Model Runner:
docker model run hf.co/jojo0217/ChatSKKU5.8B
์ฑ๊ท ๊ด๋ํ๊ต ์ฐํํ๋ ฅ ๋ฐ์ดํฐ๋ก ๋ง๋ ํ
์คํธ ๋ชจ๋ธ์
๋๋ค.
๊ธฐ์กด 10๋ง 7์ฒ๊ฐ์ ๋ฐ์ดํฐ + 2์ฒ๊ฐ ์ผ์๋ํ ์ถ๊ฐ ๋ฐ์ดํฐ๋ฅผ ์ฒจ๊ฐํ์ฌ ํ์ตํ์์ต๋๋ค.
๋ชจ๋ธ์ EleutherAI/polyglot-ko-5.8b๋ฅผ base๋ก ํ์ต ๋์์ผ๋ฉฐ
ํ์ต parameter์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
batch_size: 128
micro_batch_size: 8
num_epochs: 3
learning_rate: 3e-4
cutoff_len: 1024
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
weight_decay: 0.1
์ธก์ ํ kobest 10shot ์ ์๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
๋ชจ๋ธ prompt template๋ kullm์ template๋ฅผ ์ฌ์ฉํ์์ต๋๋ค.
ํ
์คํธ ์ฝ๋๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
https://colab.research.google.com/drive/1xEHewqHnG4p3O24AuqqueMoXq1E3AlT0?usp=sharing
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
model_name="jojo0217/ChatSKKU5.8B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
load_in_8bit=True,#๋ง์ฝ ์์ํ ๋๊ณ ์ถ๋ค๋ฉด false
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=model_name,
device_map="auto"
)
def answer(message):
prompt=f"์๋๋ ์์
์ ์ค๋ช
ํ๋ ๋ช
๋ น์ด์
๋๋ค. ์์ฒญ์ ์ ์ ํ ์๋ฃํ๋ ์๋ต์ ์์ฑํ์ธ์.\n\n### ๋ช
๋ น์ด:\n{message}"
ans = pipe(
prompt + "\n\n### ์๋ต:",
do_sample=True,
max_new_tokens=512,
temperature=0.7,
repetition_penalty = 1.0,
return_full_text=False,
eos_token_id=2,
)
msg = ans[0]["generated_text"]
return msg
answer('์ฑ๊ท ๊ด๋ํ๊ต์๋ํด ์๋ ค์ค')
- Downloads last month
- 9