Troubleshooting

Common installation and runtime errors, and how to fix them.

Installation issues

Why: Older pip or setuptools versions don't support the editable install hook.

Fix:

pip install --upgrade pip setuptools wheel
pip install -e .

Why: GGUF quantization requires the llama-quantize C++ binary, which is separate from the Python package.

Fix: Ensure llama-quantize is compiled from llama.cpp and available in your PATH, or install via the pip package:

pip install llama-cpp-python

Why: PyTorch was installed without CUDA support.

Fix: Reinstall PyTorch with the correct CUDA wheel:

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 \
    --index-url https://download.pytorch.org/whl/cu121

Fix: Install in editable mode from the repo root:

pip install -e .

Why: Your GPU doesn't have enough VRAM for the default calibration config.

Fix: Reduce batch size and sequence length manually:

quantizer.quantize(format="int4", batch_size=1, seq_len=512, num_samples=32)

Or let Qwodel auto-select by not passing any of these — it reads available VRAM headroom automatically.

Why: The default dataset (mit-han-lab/pile-val-backup) may be unavailable or rate-limited.

Fix: Use a different dataset:

Quantizer(
    backend="awq",
    model_path="./model",
    output_dir="./output",
    calibration_dataset="wikitext:wikitext-2-raw-v1"
)

Open an issue on GitHub with: