Integrations

This section covers two things:

Sourcing models — how to get a model into Qwodel (HuggingFace Hub, local disk, post-training).
Deploying output — how to load a Qwodel-quantized file into popular runtimes.

Model sources

Page	Use case
HuggingFace	Download a model from HF Hub and quantize it
Local LLM	Use a model already on disk (git-lfs clone, compiled build, etc.)
PyTorch Post-Training	Quantize your own fine-tuned / LoRA-merged model

Deployment runtimes

Page	Output type	Runtime
Ollama	`.gguf`	Ollama — run models locally with one command
llama-cpp-python	`.gguf`	Python bindings for llama.cpp
vLLM	AWQ `safetensors`	High-throughput GPU inference server
iOS App (CoreML)	`.mlpackage`	On-device inference in Swift/Xcode

Which integration do I need?

Where is my model?
    ├─ On Hugging Face Hub             → HuggingFace guide
    ├─ Already on disk                 → Local LLM guide
    └─ Just finished training/fine-tuning → PyTorch guide

What backend did I use?
    ├─ backend="gguf"
    │   ├─ Want one-command serving?   → Ollama
    │   └─ Want Python integration?    → llama-cpp-python
    ├─ backend="awq"
    │   └─ GPU serving / batch inference → vLLM
    └─ backend="coreml"
        └─ iOS / macOS app             → iOS App (CoreML)

Model sources

Deployment runtimes

Which integration do I need?

On this page