Instructions to use aloobun/Pruned-SmolLM2-1.4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aloobun/Pruned-SmolLM2-1.4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aloobun/Pruned-SmolLM2-1.4B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("aloobun/Pruned-SmolLM2-1.4B") model = AutoModelForCausalLM.from_pretrained("aloobun/Pruned-SmolLM2-1.4B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use aloobun/Pruned-SmolLM2-1.4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aloobun/Pruned-SmolLM2-1.4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aloobun/Pruned-SmolLM2-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/aloobun/Pruned-SmolLM2-1.4B
- SGLang
How to use aloobun/Pruned-SmolLM2-1.4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aloobun/Pruned-SmolLM2-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aloobun/Pruned-SmolLM2-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aloobun/Pruned-SmolLM2-1.4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aloobun/Pruned-SmolLM2-1.4B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use aloobun/Pruned-SmolLM2-1.4B with Docker Model Runner:
docker model run hf.co/aloobun/Pruned-SmolLM2-1.4B
Pruned SmolLM2-1.7B model - total parameters 1.47B.
Intermediate step, requires further training - try it yourself.
Eval results using SmolLM evaluation scripts (LightEval):
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| all | acc_norm | 0.4555 | ± | 0.0114 | |
| qem | 0.0431 | ± | 0.0022 | ||
| custom:arc:_average:0 | acc_norm | 0.5021 | ± | 0.0120 | |
| custom:arc:challenge:0 | 0 | acc_norm | 0.3686 | ± | 0.0141 |
| custom:arc:easy:0 | 0 | acc_norm | 0.6355 | ± | 0.0099 |
| custom:commonsense_qa:0 | 0 | acc_norm | 0.3333 | ± | 0.0135 |
| custom:gsm8k:5 | 0 | qem | 0.0076 | ± | 0.0024 |
| custom:hellaswag:0 | 0 | acc_norm | 0.5568 | ± | 0.0050 |
| custom:mmlu_pro:0 | 0 | acc_norm | 0.1287 | ± | 0.0031 |
| custom:openbook_qa:0 | 0 | acc_norm | 0.3660 | ± | 0.0216 |
| custom:piqa:0 | 0 | acc_norm | 0.7187 | ± | 0.0105 |
| custom:trivia_qa:0 | 0 | qem | 0.0787 | ± | 0.0020 |
| custom:winogrande:0 | 0 | acc_norm | 0.5367 | ± | 0.0140 |
- Downloads last month
- -