First Local LoRA Training Run

There’s something deeply satisfying about watching a machine spend nearly six hours cooking a .safetensors file in your office while you keep getting more work done during the process.

This was the first full LoRA training run I’ve completed locally from start to finish.

Not rented compute.

Not a hosted service.

Not somebody else’s infrastructure.

An actual local training pass running on hardware sitting a few feet away from me.

The Setup

This run used:

Kohya SS
Juggernaut XL v11 (an SDXL-derived checkpoint)
CUDA acceleration
Float16 training
Local custom dataset
Local storage
Local GPU (5070 Ti)

The training process ran for:

10 epochs
2,000 total steps
approximately 5 hours and 43 minutes

The resulting model artifact: kalisoc_v1.safetensors

Checkpoints were configured to save once per epoch. Being able to compare intermediate outputs across epochs makes the training process feel much less like black magic and much more like engineering.

Or at least engineering adjacent chaos.

What Surprised Me

The first thing that stood out was how normal the process eventually became. Not easy. Not polished. Not stable in the “consumer appliance” sense. God knows I got irritated with the UI on Kohya SS a few times.

But it was understandable. Once the environment was configured correctly, the workflow stopped feeling mystical.

The second surprise was realizing the actual training run took less effort than curating the dataset. Maybe I was just too much of a stickler for details. I’ll work on it.

The Actual Training Run

The run stayed relatively stable throughout the process.

Average loss hovered roughly between 0.096 - 0.109.
Step timing gradually increased over the course of the run, eventually settling around ~10 seconds per iteration

The training checkpoints completed successfully all the way through epoch 10 without catastrophic failure, GPU crashes, or my entire machine deciding it no longer believed in electricity or God.

That alone felt like a win.

What This Experiment Was Really About

Honestly, this wasn’t about producing the perfect LoRA.

This was about proving the pipeline.

There’s a huge difference between watching tutorials, reading Reddit threads, and actually running a full training cycle yourself on your own hardware.

Once you complete a run end-to-end, the entire system starts making more sense. The abstractions start to fall away, and eventually you stop seeing black magic and start seeing interconnected systems with tradeoffs everywhere.

At the end of the day, it’s just math.

Current Thoughts

This was the first full training run. Unfortunately, I’m exhausted at the end of the day and I have to go set up a way to see how well it performed throughout all the epochs.

Stay tuned and we’ll find out.