Qwodel
Getting Started

Get your first quantized model in under 5 minutes.


Python API

from qwodel import Quantizer

quantizer = Quantizer(
    backend="gguf",
    model_path="meta-llama/Llama-2-7b-hf",
    output_dir="./quantized"
)

output_path = quantizer.quantize(format="Q4_K_M")
print(f"Done! Quantized model: {output_path}")

That's it. You now have a .gguf file ready for deployment.


Command Line

qwodel quantize meta-llama/Llama-2-7b-hf \
    --backend gguf \
    --format Q4_K_M \
    --output ./quantized

Which format should I use?

Not sure what Q4_K_M means? Start here:

  • GGUF: Q4_K_M is the recommended default — best balance of size and quality.
  • AWQ: Use int4 — it's the only GPU format.
  • CoreML: Use float16 for broadest iOS/macOS compatibility.

See Concepts → Perplexity vs Speed for a deeper explanation.


Next Steps

📚Understand the concepts — Why do different formats exist?
⚙️Choose your backend — GGUF, AWQ, or CoreML?
🚀Run your model — Load into Ollama, llama.cpp, vLLM, or iOS
📖API Reference — Full Quantizer class docs