Common installation and runtime errors, and how to fix them.
Installation issues
build backend is missing the 'build_editable' hook
Why: Older pip or setuptools versions don't support the editable install hook.
Fix:
pip install --upgrade pip setuptools wheel
pip install -e .llama-quantize not found
Why: GGUF quantization requires the llama-quantize C++ binary, which is separate from the Python package.
Fix: Ensure llama-quantize is compiled from llama.cpp and available in your PATH, or install via the pip package:
pip install llama-cpp-pythonCUDA not available (AWQ)
Why: PyTorch was installed without CUDA support.
Fix: Reinstall PyTorch with the correct CUDA wheel:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 \
--index-url https://download.pytorch.org/whl/cu121Runtime issues
ModuleNotFoundError: No module named 'qwodel'
Fix: Install in editable mode from the repo root:
pip install -e .Out of memory (OOM) during AWQ quantization
Why: Your GPU doesn't have enough VRAM for the default calibration config.
Fix: Reduce batch size and sequence length manually:
quantizer.quantize(format="int4", batch_size=1, seq_len=512, num_samples=32)Or let Qwodel auto-select by not passing any of these — it reads available VRAM headroom automatically.
Failed to load calibration dataset
Why: The default dataset (mit-han-lab/pile-val-backup) may be unavailable or rate-limited.
Fix: Use a different dataset:
Quantizer(
backend="awq",
model_path="./model",
output_dir="./output",
calibration_dataset="wikitext:wikitext-2-raw-v1"
)Still stuck?
Open an issue on GitHub with:
- Your OS and Python version
- The full error traceback
- The command or code you ran
