If you already have a model stored on disk — whether downloaded with git lfs, exported from a training run, or compiled from source — Qwodel can quantize it directly. No internet connection required after the model is on disk.
Supported local formats
| Source | Typical layout | Qwodel accepts? |
|---|---|---|
HuggingFace snapshot_download | model_dir/config.json + *.safetensors | ✓ |
git lfs clone of a HF repo | Same as above | ✓ |
Manual HF format (save_pretrained) | Same as above | ✓ |
| llama.cpp self-build output | .gguf file | ✓ (GGUF backend pass-through) |
Raw PyTorch checkpoint (.pt/.pth) | Single weight file | See PyTorch guide |
Step 1: Verify your model directory
A valid HuggingFace-format model directory must contain at minimum:
my-model/
├── config.json # required — architecture definition
├── tokenizer.json # required — tokenizer config
├── tokenizer_config.json
└── model.safetensors # weights (or sharded: model-00001-of-00003.safetensors …)Quick check:
ls -lh ./my-model/
# Look for config.json and at least one .safetensors or .bin fileStep 2: Quantize
from qwodel import Quantizer
quantizer = Quantizer(
backend="gguf", # or "awq" / "coreml"
model_path="./my-model", # absolute or relative local path
output_dir="./output"
)
output = quantizer.quantize(format="Q4_K_M")
print(f"Output: {output}")CLI equivalent:
qwodel quantize \
--backend gguf \
--format Q4_K_M \
--model ./my-model \
--output ./outputWorking with git lfs
Many HF repos use Git LFS for large weight files. Clone correctly:
# Install git-lfs if needed
sudo apt install git-lfs # Ubuntu/Debian
brew install git-lfs # macOS
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 ./mistral-7bThen point Qwodel at ./mistral-7b.
Self-built llama.cpp models
If you compiled a model to GGUF yourself using llama.cpp, you can still pass it to the GGUF backend for re-quantization to a different format:
from qwodel import Quantizer
quantizer = Quantizer(
backend="gguf",
model_path="./my-model.gguf", # existing GGUF file
output_dir="./output"
)
output = quantizer.quantize(format="Q2_K") # re-quantize to smaller formatTips for large models
Use fast local storage — Place the model directory on an SSD. GGUF conversion reads the entire model sequentially; a slow spinning disk can make it 3–5× slower.
RAM requirements — Qwodel loads model weights into CPU RAM during conversion. As a rough guide, allow ~2× the model file size in available RAM.
Sharded safetensors — Qwodel handles sharded safetensors (model-0000X-of-0000Y.safetensors) transparently; just pass the directory.
Verify your output
from pathlib import Path
output = Path("./output/my-model-q4_k_m.gguf")
print(f"Size: {output.stat().st_size / 1e9:.2f} GB")
print(f"Exists: {output.exists()}")Next: PyTorch Post-Training →
