Instructions to use the1ullneversee/Restful-Llama-3-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use the1ullneversee/Restful-Llama-3-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="the1ullneversee/Restful-Llama-3-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("the1ullneversee/Restful-Llama-3-7b")
model = AutoModelForCausalLM.from_pretrained("the1ullneversee/Restful-Llama-3-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use the1ullneversee/Restful-Llama-3-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "the1ullneversee/Restful-Llama-3-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "the1ullneversee/Restful-Llama-3-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/the1ullneversee/Restful-Llama-3-7b

SGLang

How to use the1ullneversee/Restful-Llama-3-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "the1ullneversee/Restful-Llama-3-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "the1ullneversee/Restful-Llama-3-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "the1ullneversee/Restful-Llama-3-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "the1ullneversee/Restful-Llama-3-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use the1ullneversee/Restful-Llama-3-7b with Docker Model Runner:
```
docker model run hf.co/the1ullneversee/Restful-Llama-3-7b
```

LLaMA-7B-Instruct-API-Coder

Model Description

This model is a fine-tuned version of the LLaMA-7B-Instruct model, specifically trained on conversational data related to RESTful API usage and code generation. The training data was generated by LLaMA-70B-Instruct, focusing on API interactions and code creation based on user queries and JSON REST schemas.

Intended Use

This model is designed to assist developers and API users in:

Understanding and interacting with RESTful APIs
Generating code snippets to call APIs based on user questions
Interpreting JSON REST schemas
Providing conversational guidance on API usage

Training Data

The model was fine-tuned on a dataset of conversational interactions generated by LLaMA-70B-Instruct. This dataset includes:

Discussions about RESTful API concepts
Examples of API usage
Code generation based on API schemas
Q&A sessions about API integration

Training Procedure

Base Model: LLaMA-7B-Instruct
Quantization: The base model was loaded in 4-bit precision using Unsloth for efficient training
Fine-tuning Method: SFTTrainer (Supervised Fine-Tuning Trainer) was used for the fine-tuning process
LoRA (Low-Rank Adaptation): The model was fine-tuned using LoRA to generate an adapter
Merging: The LoRA adapter was merged back with the original model to create the final fine-tuned version

This approach allows for efficient fine-tuning while maintaining model quality and reducing computational requirements.

Limitations

The model's knowledge is limited to the APIs and schemas present in the training data
It may not be up-to-date with the latest API standards or practices
The generated code should be reviewed and tested before use in production environments
Performance may vary compared to the full-precision model due to 4-bit quantization

Ethical Considerations

The model should not be used to access or manipulate APIs without proper authorization
Users should be aware of potential biases in the generated code or API usage suggestions

Additional Information

Model Type: Causal Language Model
Language: English
License: Apache 2.0
Fine-tuning Technique: LoRA (Low-Rank Adaptation)
Quantization: 4-bit precision

For any questions or issues, please open an issue in the GitHub repository.

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

F16

F32