How it works

Cloud power for training,
your hardware for inference.

DozyTune bridges the gap between cloud GPU training and local model inference. You get the best of both. Cloud compute for the expensive part, full ownership for the part that runs every day.

01
Choose a dataset
Browse the curated catalogue of fine-tuning datasets. Each one is hand-built for a specific job (email writing, RPG dungeon mastering, code review, customer support). Datasets ship in OpenAI chat format with 1,000–3,000 high-quality examples.
Technical details+
Datasets are JSONL files in OpenAI chat format (messages array with system/user/assistant roles). Built via seed-and-expand using Claude or GPT for synthetic generation, then human-reviewed for quality before listing.
02
Optionally add your samples
Upload 10–50 examples of your own writing to personalise the model. Your samples are mixed with the dataset during training so the model learns your style, vocabulary, and patterns. Not just the dataset author’s.
Technical details+
User samples are parsed and validated for format correctness, deduplicated against the base dataset, and appended to the training set. 10–50 is the sweet spot, enough to influence the model without overfitting on a small slice.
03
Training in the cloud
A worker activates a cluster for the training job and runs Unsloth for LoRA training. We notify you when it's complete. You can watch progress in real time, including live sample outputs from the model as it trains.
Technical details+
Training uses Unsloth with QLoRA (4-bit quantisation) for memory efficiency. LoRA rank 16, learning rate 2e-4, cosine scheduler. Output is a LoRA adapter in safetensors format. The backend then merges with the base model and converts to GGUF.
04
GGUF conversion
After training, the LoRA adapter is automatically merged with the base model and converted to GGUF (Q4_K_M quantisation). MLX format is also generated for Apple Silicon users. You don’t touch any of this. It’s all wired up.
Technical details+
Pipeline: merge LoRA → convert to FP16 with llama.cpp convert.py → quantise to Q4_K_M GGUF. For MLX: merge → mlx_lm.convert. Models are cached on S3 for 7 days to support resumed downloads.
05
Install and run locally
Download the GGUF and run it with Ollama (ollama create + ollama run) or drop it into LM Studio. The model runs on your hardware with zero ongoing costs and zero API calls. The DozyTune desktop app can auto-install for you.
Technical details+
The Tauri-based desktop companion app can auto-install to Ollama via a generated Modelfile, or copy the GGUF to LM Studio’s models directory. For manual install: ollama create mymodel -f Modelfile with FROM pointing at the downloaded GGUF.

Ready

Pick a dataset and train your first model.

Every dataset is curated and ready to train. Browse the catalogue and queue a run.

Browse datasets

DozyTune

How it works

Cloud power for training,
your hardware for inference.

DozyTune bridges the gap between cloud GPU training and local model inference. You get the best of both. Cloud compute for the expensive part, full ownership for the part that runs every day.

01
Choose a dataset
Browse the curated catalogue of fine-tuning datasets. Each one is hand-built for a specific job (email writing, RPG dungeon mastering, code review, customer support). Datasets ship in OpenAI chat format with 1,000–3,000 high-quality examples.
Technical details+
Datasets are JSONL files in OpenAI chat format (messages array with system/user/assistant roles). Built via seed-and-expand using Claude or GPT for synthetic generation, then human-reviewed for quality before listing.
02
Optionally add your samples
Upload 10–50 examples of your own writing to personalise the model. Your samples are mixed with the dataset during training so the model learns your style, vocabulary, and patterns. Not just the dataset author’s.
Technical details+
User samples are parsed and validated for format correctness, deduplicated against the base dataset, and appended to the training set. 10–50 is the sweet spot, enough to influence the model without overfitting on a small slice.
03
Training in the cloud
A worker activates a cluster for the training job and runs Unsloth for LoRA training. We notify you when it's complete. You can watch progress in real time, including live sample outputs from the model as it trains.
Technical details+
Training uses Unsloth with QLoRA (4-bit quantisation) for memory efficiency. LoRA rank 16, learning rate 2e-4, cosine scheduler. Output is a LoRA adapter in safetensors format. The backend then merges with the base model and converts to GGUF.
04
GGUF conversion
After training, the LoRA adapter is automatically merged with the base model and converted to GGUF (Q4_K_M quantisation). MLX format is also generated for Apple Silicon users. You don’t touch any of this. It’s all wired up.
Technical details+
Pipeline: merge LoRA → convert to FP16 with llama.cpp convert.py → quantise to Q4_K_M GGUF. For MLX: merge → mlx_lm.convert. Models are cached on S3 for 7 days to support resumed downloads.
05
Install and run locally
Download the GGUF and run it with Ollama (ollama create + ollama run) or drop it into LM Studio. The model runs on your hardware with zero ongoing costs and zero API calls. The DozyTune desktop app can auto-install for you.
Technical details+
The Tauri-based desktop companion app can auto-install to Ollama via a generated Modelfile, or copy the GGUF to LM Studio’s models directory. For manual install: ollama create mymodel -f Modelfile with FROM pointing at the downloaded GGUF.

Ready

Pick a dataset and train your first model.

Every dataset is curated and ready to train. Browse the catalogue and queue a run.

Browse datasets

Cloud power for training,your hardware for inference.

Choose a dataset

Optionally add your samples

Training in the cloud

GGUF conversion

Install and run locally

Pick a dataset and train your first model.

Cloud power for training,your hardware for inference.

Choose a dataset

Optionally add your samples

Training in the cloud

GGUF conversion

Install and run locally

Pick a dataset and train your first model.

Cloud power for training,
your hardware for inference.

Cloud power for training,
your hardware for inference.