Successfully Running a Local LLM on an RTX 2060

For a long time, I assumed local AI was reserved for people with absurd workstation budgets and datacenter-grade GPUs.

Then I started seeing people online claiming they were running local language models on hardware that frankly had no business doing this kind of work. Naturally, I became skeptical. So of course, I decided to try it myself.

At the time, my machine was running an RTX 2060 with 6GB of VRAM. Not exactly the kind of hardware people usually put in flashy YouTube thumbnails with titles like:

“THE FUTURE OF AI IS HERE!!!!!”

Still, I figured if the entire machine exploded, at least I’d learn something. My wife just told me not to burn down the house.

The Goal

I wasn’t trying to build sentient machine intelligence in my office. I’ve seen Terminator, too, and far more evil engineers than I are already working on that anyway. I just wanted to see if useful local AI workloads were actually practical on consumer hardware.

Not theoretically. Not “technically if you sacrifice a goat under a full moon.”

Actually usable.

…Although I’m still not ruling out the goat.

The Initial Learning Curve

The first thing I discovered is that the local AI ecosystem has approximately the same level of user friendliness as a pile of extension cords thrown into a washing machine.

At the time, I barely knew what GGUF meant, how quantization worked, why model sizes mattered, or why everyone seemed emotionally attached to llama.cpp.

Every tutorial assumed different software, different folder structures, different launch commands, different opinions about what was “obviously” the correct setup.

It’s overwhelming.

It’s also completely normal open source behavior.

Eventually I got llama.cpp running with CUDA acceleration and loaded my first model.

That moment was honestly kind of surreal.

And then, I too, caught feelings for llama.cpp.

What Actually Worked

To my surprise, the 2060 was far more capable than I expected.

Hardly infinite capability. Not magic. But usable. I had tokens generating, and I was actually getting results.

Smaller quantized models were completely workable, especially for:

  • experimentation,
  • basic coding assistance,
  • local testing,
  • and infrastructure learning.

The biggest realization was that quantization and model compression change the entire conversation. They are what allows all of this to actually work.

At the end of the day, it’s just math and memory management.

Very aggressive math and memory management, but still.

What Didn’t Work

The internet occasionally gives the impression that any GPU can run any model if you just believe in yourself hard enough.

This is not entirely accurate. Or even partially accurate.

The 2060 absolutely had limits:

  • VRAM ceilings arrived quickly
  • larger context windows hurt
  • bigger models became painfully slow
  • memory spillover into system RAM was brutal
  • loading times escalated rapidly

I learned fast that “can technically load,” and “is actually usable,” are two completely different categories. That distinction matters.

A lot.

The Important Realization

The most important takeaway wasn’t “the RTX 2060 is secretly a supercomputer.”

It’s not.

The important realization was that local AI experimentation has crossed a threshold where normal enthusiast hardware can now meaningfully participate.

That changes everything.

Because once experimentation becomes accessible:

  • workflows emerge,
  • tooling improves,
  • understanding compounds,
  • and the systems stop feeling magical.

They start feeling engineerable. That was probably the moment Pacific Wharf really started, even if I didn’t realize it at the time.

Naturally, This Escalated

Of course, once I proved local inference was possible, the project immediately spiraled into:

  • larger models,
  • better GPUs,
  • SDXL,
  • ComfyUI,
  • LoRA training,
  • WAN video generation,
  • benchmarking,
  • VRAM bottlenecks,
  • UPS load problems,
  • and a growing collection of folders with names that increasingly resemble industrial accidents.

In other words, this escalated exactly the way I should have expected.

Current Thoughts

The RTX 2060 is not a miracle machine. But, it proved something important: local AI has become accessible long before I thought it ever would.

And once I realized that, the entire field became much more interesting.

Now, about that goat…

Rights notice: Editorial content and published media for this entry are © 2026 Pacific Wharf. All rights reserved. Site code and layout are licensed separately under MIT.

Back to tech blog