This was the first full LoRA training run I’ve completed locally from start to finish.
Not rented compute.
Not a hosted service.
Not somebody else’s infrastructure.
An actual local training pass running on hardware sitting a few feet away from me.
The Setup
This run used:
- Kohya SS
- Juggernaut XL v11 (an SDXL-derived checkpoint)
- CUDA acceleration
- Float16 training
- Local custom dataset
- Local storage
- Local GPU (5070 Ti)
The training process ran for:
- 10 epochs
- 2,000 total steps
- approximately 5 hours and 43 minutes
The resulting model artifact: kalisoc_v1.safetensors
Checkpoints were configured to save once per epoch. Being able to compare intermediate outputs across epochs makes the training process feel much less like black magic and much more like engineering.
Or at least engineering adjacent chaos.
What Surprised Me
The first thing that stood out was how normal the process eventually became. Not easy. Not polished. Not stable in the “consumer appliance” sense. God knows I got irritated with the UI on Kohya SS a few times.
But it was understandable. Once the environment was configured correctly, the workflow stopped feeling mystical.
The second surprise was realizing the actual training run took less effort than curating the dataset. Maybe I was just too much of a stickler for details. I’ll work on it.
The Actual Training Run
The run stayed relatively stable throughout the process.
- Average loss hovered roughly between 0.096 - 0.109.
- Step timing gradually increased over the course of the run, eventually settling around ~10 seconds per iteration
The training checkpoints completed successfully all the way through epoch 10 without catastrophic failure, GPU crashes, or my entire machine deciding it no longer believed in electricity or God.
That alone felt like a win.
What This Experiment Was Really About
Honestly, this wasn’t about producing the perfect LoRA.
This was about proving the pipeline.
There’s a huge difference between watching tutorials, reading Reddit threads, and actually running a full training cycle yourself on your own hardware.
Once you complete a run end-to-end, the entire system starts making more sense. The abstractions start to fall away, and eventually you stop seeing black magic and start seeing interconnected systems with tradeoffs everywhere.
At the end of the day, it’s just math.
Current Thoughts
This was the first full training run. Unfortunately, I’m exhausted at the end of the day and I have to go set up a way to see how well it performed throughout all the epochs.
Stay tuned and we’ll find out.