Instructions to use the1ullneversee/Restful-Llama-3-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use the1ullneversee/Restful-Llama-3-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="the1ullneversee/Restful-Llama-3-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("the1ullneversee/Restful-Llama-3-7b") model = AutoModelForCausalLM.from_pretrained("the1ullneversee/Restful-Llama-3-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use the1ullneversee/Restful-Llama-3-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "the1ullneversee/Restful-Llama-3-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "the1ullneversee/Restful-Llama-3-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/the1ullneversee/Restful-Llama-3-7b
- SGLang
How to use the1ullneversee/Restful-Llama-3-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "the1ullneversee/Restful-Llama-3-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "the1ullneversee/Restful-Llama-3-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "the1ullneversee/Restful-Llama-3-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "the1ullneversee/Restful-Llama-3-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use the1ullneversee/Restful-Llama-3-7b with Docker Model Runner:
docker model run hf.co/the1ullneversee/Restful-Llama-3-7b
LLaMA-7B-Instruct-API-Coder
Model Description
This model is a fine-tuned version of the LLaMA-7B-Instruct model, specifically trained on conversational data related to RESTful API usage and code generation. The training data was generated by LLaMA-70B-Instruct, focusing on API interactions and code creation based on user queries and JSON REST schemas.
Intended Use
This model is designed to assist developers and API users in:
- Understanding and interacting with RESTful APIs
- Generating code snippets to call APIs based on user questions
- Interpreting JSON REST schemas
- Providing conversational guidance on API usage
Training Data
The model was fine-tuned on a dataset of conversational interactions generated by LLaMA-70B-Instruct. This dataset includes:
- Discussions about RESTful API concepts
- Examples of API usage
- Code generation based on API schemas
- Q&A sessions about API integration
Training Procedure
- Base Model: LLaMA-7B-Instruct
- Quantization: The base model was loaded in 4-bit precision using Unsloth for efficient training
- Fine-tuning Method: SFTTrainer (Supervised Fine-Tuning Trainer) was used for the fine-tuning process
- LoRA (Low-Rank Adaptation): The model was fine-tuned using LoRA to generate an adapter
- Merging: The LoRA adapter was merged back with the original model to create the final fine-tuned version
This approach allows for efficient fine-tuning while maintaining model quality and reducing computational requirements.
Limitations
- The model's knowledge is limited to the APIs and schemas present in the training data
- It may not be up-to-date with the latest API standards or practices
- The generated code should be reviewed and tested before use in production environments
- Performance may vary compared to the full-precision model due to 4-bit quantization
Ethical Considerations
- The model should not be used to access or manipulate APIs without proper authorization
- Users should be aware of potential biases in the generated code or API usage suggestions
Additional Information
- Model Type: Causal Language Model
- Language: English
- License: Apache 2.0
- Fine-tuning Technique: LoRA (Low-Rank Adaptation)
- Quantization: 4-bit precision
For any questions or issues, please open an issue in the GitHub repository.
- Downloads last month
- 6