Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-0.6B

  • Using non-standard (forked) LLaMA C++ branch for quantization.
  • Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
  • Using dataset sources: text_en, text_ru.
  • Using dataset chunks: 750.
  • Small set of patches added.
  • Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
  • Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS PPL correlation PPL mean ratio ΔPPL Mean KLD Maximum KLD 99.9% KLD Mean Δp RMS Δp
3.50 85.56% 1.328184 ± 0.006198 113.055840 ± 2.030409 1.738910 ± 0.002968 19.720037 10.812436 -3.331 ± 0.037 % 17.941 ± 0.063 %
4.00 93.58% 1.360601 ± 0.004364 124.223093 ± 1.754530 0.787587 ± 0.001566 19.145452 6.742295 -1.688 ± 0.027 % 12.813 ± 0.052 %
4.50 95.88% 1.235695 ± 0.003178 81.194364 ± 1.267961 0.492418 ± 0.001110 16.026636 5.010898 -1.135 ± 0.022 % 10.330 ± 0.047 %
5.00 97.12% 1.225914 ± 0.002645 77.825006 ± 1.132301 0.323054 ± 0.000773 14.785572 3.622450 -1.015 ± 0.018 % 8.707 ± 0.042 %
5.50 98.61% 1.140659 ± 0.001722 48.455459 ± 0.754229 0.142510 ± 0.000361 10.251470 1.794577 -0.500 ± 0.012 % 5.834 ± 0.031 %
6.00 99.03% 1.095986 ± 0.001385 33.066115 ± 0.583789 0.089186 ± 0.000256 9.821702 1.338731 -0.228 ± 0.010 % 4.639 ± 0.028 %
6.50 99.28% 1.086861 ± 0.001185 29.922486 ± 0.515390 0.056230 ± 0.000156 9.794716 0.742507 -0.200 ± 0.008 % 3.695 ± 0.022 %
7.00 99.47% 1.032955 ± 0.000969 11.352594 ± 0.361050 0.033634 ± 0.000093 3.941931 0.450176 -0.041 ± 0.006 % 2.933 ± 0.020 %
7.50 99.53% 1.015463 ± 0.000891 5.326939 ± 0.316756 0.024763 ± 0.000071 4.088861 0.356147 0.027 ± 0.005 % 2.558 ± 0.019 %
8.00 99.56% 1.012318 ± 0.000866 4.243431 ± 0.305956 0.021680 ± 0.000059 1.800071 0.296283 0.037 ± 0.005 % 2.380 ± 0.015 %
8.50 99.63% 1.020174 ± 0.000803 6.949556 ± 0.292500 0.013173 ± 0.000038 3.675829 0.164360 -0.008 ± 0.004 % 1.903 ± 0.015 %
9.00 99.64% 1.017785 ± 0.000789 6.126855 ± 0.285191 0.011793 ± 0.000035 3.644340 0.155200 0.001 ± 0.004 % 1.822 ± 0.015 %
9.50 99.64% 1.023754 ± 0.000790 8.182888 ± 0.292987 0.011307 ± 0.000036 3.703608 0.149881 -0.017 ± 0.004 % 1.799 ± 0.017 %
10.00 99.64% 1.026870 ± 0.000790 9.256477 ± 0.297126 0.010960 ± 0.000039 4.368244 0.146243 -0.023 ± 0.004 % 1.781 ± 0.017 %
10.50 99.65% 1.031793 ± 0.000791 10.952329 ± 0.305078 0.010631 ± 0.000033 2.633135 0.139757 -0.033 ± 0.004 % 1.756 ± 0.016 %
11.00 99.65% 1.032131 ± 0.000785 11.068814 ± 0.303826 0.010088 ± 0.000026 1.066854 0.125272 -0.039 ± 0.004 % 1.698 ± 0.010 %
11.50 99.66% 1.034359 ± 0.000785 11.836127 ± 0.307385 0.009816 ± 0.000026 1.311119 0.122771 -0.039 ± 0.004 % 1.685 ± 0.010 %
12.00 99.66% 1.033216 ± 0.000782 11.442555 ± 0.304820 0.009535 ± 0.000028 2.683337 0.118959 -0.036 ± 0.004 % 1.669 ± 0.015 %
12.50 99.66% 1.035454 ± 0.000782 12.213620 ± 0.308823 0.009296 ± 0.000026 1.888097 0.117823 -0.036 ± 0.003 % 1.653 ± 0.011 %
13.00 99.66% 1.035315 ± 0.000779 12.165472 ± 0.307315 0.009015 ± 0.000024 1.563708 0.112765 -0.037 ± 0.003 % 1.641 ± 0.012 %
13.50 99.66% 1.035511 ± 0.000777 12.233185 ± 0.307197 0.008828 ± 0.000027 2.020073 0.110392 -0.042 ± 0.003 % 1.634 ± 0.016 %
14.00 99.67% 1.034812 ± 0.000775 11.992309 ± 0.305653 0.008529 ± 0.000025 1.908343 0.105182 -0.034 ± 0.003 % 1.617 ± 0.017 %
14.50 99.66% 1.035023 ± 0.000780 12.064932 ± 0.307359 0.008970 ± 0.000023 0.820967 0.111618 -0.034 ± 0.003 % 1.618 ± 0.010 %
15.00 99.66% 1.035123 ± 0.000777 12.099435 ± 0.306609 0.008687 ± 0.000022 1.018949 0.102148 -0.033 ± 0.003 % 1.612 ± 0.010 %
Downloads last month
10,362
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF

Quantized
(11)
this model

Dataset used to train ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF