Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-0.6B

Using non-standard (forked) LLaMA C++ branch for quantization.
Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
Using dataset sources: text_en, text_ru.
Using dataset chunks: 750.
Small set of patches added.
Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS	PPL correlation	PPL mean ratio	ΔPPL	Mean KLD	Maximum KLD	99.9% KLD	Mean Δp	RMS Δp
3.50	85.56%	1.328184 ± 0.006198	113.055840 ± 2.030409	1.738910 ± 0.002968	19.720037	10.812436	-3.331 ± 0.037 %	17.941 ± 0.063 %
4.00	93.58%	1.360601 ± 0.004364	124.223093 ± 1.754530	0.787587 ± 0.001566	19.145452	6.742295	-1.688 ± 0.027 %	12.813 ± 0.052 %
4.50	95.88%	1.235695 ± 0.003178	81.194364 ± 1.267961	0.492418 ± 0.001110	16.026636	5.010898	-1.135 ± 0.022 %	10.330 ± 0.047 %
5.00	97.12%	1.225914 ± 0.002645	77.825006 ± 1.132301	0.323054 ± 0.000773	14.785572	3.622450	-1.015 ± 0.018 %	8.707 ± 0.042 %
5.50	98.61%	1.140659 ± 0.001722	48.455459 ± 0.754229	0.142510 ± 0.000361	10.251470	1.794577	-0.500 ± 0.012 %	5.834 ± 0.031 %
6.00	99.03%	1.095986 ± 0.001385	33.066115 ± 0.583789	0.089186 ± 0.000256	9.821702	1.338731	-0.228 ± 0.010 %	4.639 ± 0.028 %
6.50	99.28%	1.086861 ± 0.001185	29.922486 ± 0.515390	0.056230 ± 0.000156	9.794716	0.742507	-0.200 ± 0.008 %	3.695 ± 0.022 %
7.00	99.47%	1.032955 ± 0.000969	11.352594 ± 0.361050	0.033634 ± 0.000093	3.941931	0.450176	-0.041 ± 0.006 %	2.933 ± 0.020 %
7.50	99.53%	1.015463 ± 0.000891	5.326939 ± 0.316756	0.024763 ± 0.000071	4.088861	0.356147	0.027 ± 0.005 %	2.558 ± 0.019 %
8.00	99.56%	1.012318 ± 0.000866	4.243431 ± 0.305956	0.021680 ± 0.000059	1.800071	0.296283	0.037 ± 0.005 %	2.380 ± 0.015 %
8.50	99.63%	1.020174 ± 0.000803	6.949556 ± 0.292500	0.013173 ± 0.000038	3.675829	0.164360	-0.008 ± 0.004 %	1.903 ± 0.015 %
9.00	99.64%	1.017785 ± 0.000789	6.126855 ± 0.285191	0.011793 ± 0.000035	3.644340	0.155200	0.001 ± 0.004 %	1.822 ± 0.015 %
9.50	99.64%	1.023754 ± 0.000790	8.182888 ± 0.292987	0.011307 ± 0.000036	3.703608	0.149881	-0.017 ± 0.004 %	1.799 ± 0.017 %
10.00	99.64%	1.026870 ± 0.000790	9.256477 ± 0.297126	0.010960 ± 0.000039	4.368244	0.146243	-0.023 ± 0.004 %	1.781 ± 0.017 %
10.50	99.65%	1.031793 ± 0.000791	10.952329 ± 0.305078	0.010631 ± 0.000033	2.633135	0.139757	-0.033 ± 0.004 %	1.756 ± 0.016 %
11.00	99.65%	1.032131 ± 0.000785	11.068814 ± 0.303826	0.010088 ± 0.000026	1.066854	0.125272	-0.039 ± 0.004 %	1.698 ± 0.010 %
11.50	99.66%	1.034359 ± 0.000785	11.836127 ± 0.307385	0.009816 ± 0.000026	1.311119	0.122771	-0.039 ± 0.004 %	1.685 ± 0.010 %
12.00	99.66%	1.033216 ± 0.000782	11.442555 ± 0.304820	0.009535 ± 0.000028	2.683337	0.118959	-0.036 ± 0.004 %	1.669 ± 0.015 %
12.50	99.66%	1.035454 ± 0.000782	12.213620 ± 0.308823	0.009296 ± 0.000026	1.888097	0.117823	-0.036 ± 0.003 %	1.653 ± 0.011 %
13.00	99.66%	1.035315 ± 0.000779	12.165472 ± 0.307315	0.009015 ± 0.000024	1.563708	0.112765	-0.037 ± 0.003 %	1.641 ± 0.012 %
13.50	99.66%	1.035511 ± 0.000777	12.233185 ± 0.307197	0.008828 ± 0.000027	2.020073	0.110392	-0.042 ± 0.003 %	1.634 ± 0.016 %
14.00	99.67%	1.034812 ± 0.000775	11.992309 ± 0.305653	0.008529 ± 0.000025	1.908343	0.105182	-0.034 ± 0.003 %	1.617 ± 0.017 %
14.50	99.66%	1.035023 ± 0.000780	12.064932 ± 0.307359	0.008970 ± 0.000023	0.820967	0.111618	-0.034 ± 0.003 %	1.618 ± 0.010 %
15.00	99.66%	1.035123 ± 0.000777	12.099435 ± 0.306609	0.008687 ± 0.000022	1.018949	0.102148	-0.033 ± 0.003 %	1.612 ± 0.010 %

Downloads last month: 10,362

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

Octen/Octen-Embedding-0.6B

Quantized

(11)

this model

ENOSYS
/

Octen-Embedding-0.6B-750-v1-GGUF

Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-0.6B

Quantization comparison

Model tree for ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF

Dataset used to train ENOSYS/Octen-Embedding-0.6B-750-v1-GGUF