| --- |
| pipeline_tag: text-generation |
| license: apache-2.0 |
| tags: |
| - text generation |
| programming_language: |
| - Java |
| - JavaScript |
| - Python |
| metrics: |
| - code_eval |
| inference: true |
| widget: |
| - text: 'def print_hello_world():' |
| example_title: Hello world |
| group: Python |
| model-index: |
| - name: DeciCoder-1b |
| results: |
| - task: |
| type: text-generation |
| dataset: |
| type: nuprl/MultiPL-E |
| name: MultiPL-HumanEval (Python) |
| metrics: |
| - name: pass@1 |
| type: pass@1 |
| value: 0.191 |
| verified: false |
| - task: |
| type: text-generation |
| dataset: |
| type: nuprl/MultiPL-E |
| name: MultiPL-HumanEval (JavaScript) |
| metrics: |
| - name: pass@1 |
| type: pass@1 |
| value: 0.184 |
| verified: false |
| - task: |
| type: text-generation |
| dataset: |
| type: nuprl/MultiPL-E |
| name: MultiPL-HumanEval (Java) |
| metrics: |
| - name: pass@1 |
| type: pass@1 |
| value: 0.166 |
| verified: false |
| datasets: |
| - bigcode/starcoderdata |
| --- |
| |
| # Model Card for DeciCoder 1B |
|
|
| DeciCoder 1B is a 1 billion parameter decoder-only code completion model |
| trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata). |
| The model uses Grouped Query Attention and has a context window of 2048 |
| tokens. It was trained using a Fill-in-the-Middle training objective. The model's |
| architecture was generated by Deci's proprietary Neural Architecture |
| Search-based technology, AutoNAC. |
|
|
| ## Model Details |
|
|
| - **Developed by:** Deci |
| - **Model type:** DeciCoder is an auto-regressive language model based on the transformer decoder architecture, using Grouped Query Attention. |
| - **Language(s):** Python, Java, JavaScript |
| - **License:** Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
| ## Model Architecture |
|
|
| | Parameters | Layers | Heads | Sequence Length | GQA num_key_value_heads | Hidden Size | |
| |:----------|:----------|:----------|:----------|:----------|:----------| |
| | 1.1B | 20 | 32 | 2048 | 4 | 2048 | | |
| |
| |
| - **Decoder layer:** Grouped Query Attention [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245) |
| - **Position Embeddings:** Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864) |
| |
| ## Uses |
| |
| The model is intended to do single/multiline code completion from a |
| context window of up to 2048k tokens. It is *not* an instruction model |
| and commands like \"Write a function that computes the absolute value of |
| an integer,\" won't yield the desired results. A more effective approach |
| is to frame instructions in the style of source code comments (e.g. \# |
| this function calculates the absolute value of an integer) or to present |
| a function signature and docstring, enabling the model to complete the |
| function's body. |
| |
| ### How to Use |
| |
| ```bibtex |
| # pip install -q transformers |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| checkpoint = "Deci/DeciCoder-1b" |
| device = "cuda" # for GPU usage or "cpu" for CPU usage |
| |
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device) |
| |
| inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device) |
| outputs = model.generate(inputs, max_new_tokens=100) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
| |
| ### Attribution |
| |
| DeciCoder was trained on StarCoder Training Dataset, filtered for |
| Python, Java, and Javascript code. For additional information, please |
| refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata). |
| |
| ### Limitations |
| |
| The model has undergone training with source code from Python, Java, and |
| JavaScript. While the primary language in the source is English, it does |
| contain other languages. Therefore, the model can produce code snippets |
| given some context. However, there\'s no assurance that the resulting |
| code will function as expected. It might be suboptimal, contain bugs, or |
| even exploits. |
| |
| ## Training Details |
| |
| ### Training Data |
| |
| DeciCoder was trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata) |
| |
| |
| ### Training Procedure |
| |
| - **Warm-Up Steps**: 9000 |
| - **Total Training Steps**: 284k |
| - **Total Tokens**: 446B |
| - **Global Batch Size**: 768 |
| - **Optimizer**: AdamW |
| - **Optimizer Parameters**: beta1=0.9, beta2=0.95 |
| - **Weight Decay**: 0.1 |
| - **Learning Rate**: 4e-4 |
| - **Learning Rate Schedule**: cosine |
| |
| ## Evaluation |
| |
| Below are DeciCoder's pass@1 on MultiPL HumanEval scores |
| |
| | Python | JavaScript | Java | |
| |:----------|:----------|:----------| |
| | 19.1% | 18.4% | 16.6% | |
| |
| |
| ### Runtime Benchmarks |
| |
| |Inference Tool/Hardware | A10 (tokens/sec) |A100 (tokens/sec) | |
| |:----------|:----------|:----------| |
| | HF Inference Endpoints | 1,364.2 | 3,244.4 | |
| | Infery LLM | 3,889.3 | 11,676.8 | |
| |
| - Throughput (tokens/sec) - Measured with optimal batch size per hardware - A10 on BS 128, A100 on BS 512 |
| |
| ## Documentation |
| |
| - [Notebook](https://colab.research.google.com/drive/1JCxvBsWCZKHfIcHSMVf7GZCs3ClMQPjs) |
| - Blog post: [Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation](https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llm/) |
| - Questions:Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/) |
| |
| ## How to Cite |
| |
| Please cite this model using this format. |
| |
| ```bibtex |
| @misc{DeciFoundationModels, |
| title = {DeciCoder}, |
| author = {DeciAI Research Team}, |
| year = {2023} |
| url={[https://huggingface.co/deci/decicoder-1b](https://huggingface.co/deci/decicoder-1b)}, |
| } |
| ``` |