Qwodel
Backends

CPU-friendly quantization for llama.cpp-compatible runtimes.

Install: pip install qwodel[gguf]


Supported Formats

FormatDescription
Q4_K_MBest balance of speed and quality. Recommended for most users.
Q5_K_MBetter quality than Q4_K_M, slightly larger.
Q5_K_SSmall 5-bit K-quant.
Q6_KHigh quality — between Q8_0 and Q4_K_M.
Q8_0Near-lossless. Requires more RAM.
Q2_KMaximum compression. Reduced quality.
Q3_K_M3-bit medium quality.
Q4_0Compact 4-bit, slightly smaller than Q4_K_M.
Q4_K_SSmall 4-bit K-quant.
IQ4_NL4.5 bpw importance-based quantization.
IQ3_M3.66 bpw compact importance quantization.

Parameters

Quantizer(...) — Initialization

ParameterTypeDefaultDescription
model_pathstrPath to HuggingFace model directory or existing .gguf file.
output_dirstr./quantized_modelsOutput directory.

GGUF has no additional backend-specific initialization parameters.

quantize(format) — Runtime

ParameterTypeRequiredDescription
formatstrYesOne of the formats listed above.

Example

from qwodel import Quantizer

quantizer = Quantizer(
    backend="gguf",
    model_path="./llama-3",
    output_dir="./output"
)
output = quantizer.quantize(format="Q4_K_M")
print(f"Output: {output}")

CLI:

qwodel quantize ./llama-3 --backend gguf --format Q4_K_M --output ./output

After Quantization

Your output is a single .gguf file. Run it with: