Qwodel
Integrations

This section covers two things:

  1. Sourcing models — how to get a model into Qwodel (HuggingFace Hub, local disk, post-training).
  2. Deploying output — how to load a Qwodel-quantized file into popular runtimes.

Model sources

PageUse case
HuggingFaceDownload a model from HF Hub and quantize it
Local LLMUse a model already on disk (git-lfs clone, compiled build, etc.)
PyTorch Post-TrainingQuantize your own fine-tuned / LoRA-merged model

Deployment runtimes

PageOutput typeRuntime
Ollama.ggufOllama — run models locally with one command
llama-cpp-python.ggufPython bindings for llama.cpp
vLLMAWQ safetensorsHigh-throughput GPU inference server
iOS App (CoreML).mlpackageOn-device inference in Swift/Xcode

Which integration do I need?

Where is my model?
    ├─ On Hugging Face Hub             → HuggingFace guide
    ├─ Already on disk                 → Local LLM guide
    └─ Just finished training/fine-tuning → PyTorch guide

What backend did I use?
    ├─ backend="gguf"
    │   ├─ Want one-command serving?   → Ollama
    │   └─ Want Python integration?    → llama-cpp-python
    ├─ backend="awq"
    │   └─ GPU serving / batch inference → vLLM
    └─ backend="coreml"
        └─ iOS / macOS app             → iOS App (CoreML)