Running AI on My Mac: Why I Ditched ChatGPT for LM Studio (And Saved $240/Year)
TL;DR
LM Studio lets you run powerful AI models locally on Mac with Apple Silicon, offering complete privacy and no subscription costs, though it's slower than cloud alternatives.
Key Takeaways
- LM Studio runs AI models locally with zero data leaving your machine
- M4 Pro with 48GB RAM can run 30B parameter vision models like Qwen3-VL
- Vision models understand screenshots, diagrams, and UI mockups
- Costs $0 vs $240/year for ChatGPT Plus
- Trade-offs: slower responses (5-10 sec), requires storage for models
LM Studio lets you run large language models locally on Apple Silicon Macs — completely free, offline, and private. On an M4 Pro with 48GB RAM, the best models are Qwen3-30B for coding/analysis and Qwen3-VL-30B for vision tasks. You trade cloud speed (responses take 5-10 seconds) for zero cost, full privacy, and no dependency on OpenAI’s uptime.
I was knee-deep in a coding problem when ChatGPT went dark. The little error message mocked me. “Try again later,” it said. My deadline wasn’t going to wait.
That outage was annoying. But it got me thinking. Every time I use ChatGPT, my data flies to OpenAI’s servers. They log it. They train on it. Maybe that’s fine for “what’s a good pizza recipe.” But code snippets? Research notes? That felt wrong.
I needed something different. Something local. Something mine.
The Cloud Problem No One Talks About
Here’s the thing about cloud AI. It’s convenient. Type a question, get an answer. But convenience has a cost.
Your prompts live on someone else’s computer. They say they don’t use it for training anymore. Maybe they don’t. But can you be sure? What about government requests? Data breaches? Server logs?
I’m not paranoid. I just value control. My Mac has plenty of power. Why send my data across the internet when I can process it right here?
How I Found LM Studio
A friend mentioned LM Studio in passing. “Run AI models on your Mac,” he said. “Totally free.”
I was skeptical. Local AI sounded slow. Complicated. Probably worse than the cloud versions.
I downloaded it anyway. The install was simple. No account. No credit card. Just download and go.
The interface looked clean. Like a chat app, but with a model picker. I could browse thousands of open-source models. Download them. Run them locally.
No API keys. No monthly bills. No data leaving my machine.
My M4 Pro Makes This Possible
I upgraded to an M4 Pro MacBook Pro earlier this year. 48GB of unified memory. At the time, I thought I’d gone overboard.
Turns out, that RAM is perfect for AI. Most people run smaller models. Maybe 8B parameters. Those work fine on 16GB machines.
But with 48GB? I can run Qwen3-VL-30B. That’s a big model. A smart model. And it can do something most AI tools can’t.
It can see.
Model Recommendations by RAM
Not sure which model to run? Here’s what works at different RAM levels:
| RAM | Best Models | Use Case |
|---|---|---|
| 16GB | Qwen-7B, Llama-8B, Mistral-7B | Basic chat, simple coding questions |
| 32GB | Qwen-14B, Llama-13B, Deepseek-Coder-6.7B | Complex reasoning, longer context windows |
| 48GB+ | Qwen3-VL-30B, Llama-70B (quantized), CodeLlama-34B | Vision models, advanced analysis, code generation |
More RAM means bigger models. Bigger models mean better reasoning. It’s that simple.
Best Models for Apple Silicon Macs (2026)
| Mac Config | Best Model | Quant | VRAM Usage | Best For | Rating |
|---|---|---|---|---|---|
| M4 Pro 24GB | Qwen 2.5 Coder 14B | Q5_K_M | ~12GB | Coding, general chat | ⭐⭐⭐⭐⭐ |
| M4 Pro 24GB | Llama 3.3 8B | Q8_0 | ~10GB | Fast general purpose | ⭐⭐⭐⭐ |
| M4 Max 36GB | Qwen 2.5 32B | Q5_K_M | ~24GB | Best all-rounder | ⭐⭐⭐⭐⭐ |
| M4 Max 48GB | Llama 3.3 70B | Q4_K_M | ~42GB | Maximum capability | ⭐⭐⭐⭐⭐ |
| M4 Max 48GB | Qwen 2.5 VL 32B | Q5_K_M | ~24GB | Vision + text | ⭐⭐⭐⭐ |
| Any Mac 16GB+ | Phi-4 Mini 3.8B | Q8_0 | ~5GB | Light tasks, testing | ⭐⭐⭐ |
My daily driver: Qwen 2.5 Coder 14B at Q5_K_M on an M4 Pro 24GB. It handles coding tasks, document analysis, and general questions without breaking a sweat — and everything stays on my machine.
Vision Models Are the Unexpected Part
Qwen3-VL is a vision-language model. Feed it text, it responds. Feed it an image, it understands that too.
Last week I got a weird error in my terminal. Red text everywhere. I took a screenshot. Pasted it into LM Studio. Asked “what’s wrong here?”
The model looked at my screenshot. Pointed out the exact line causing trouble. Explained why. Suggested a fix.
Here’s the kind of prompt that works:
Look at this terminal screenshot. The red error text starts with “TypeError”. What’s causing this and how do I fix it?
And the model responds with context-aware analysis:
The error “TypeError: Cannot read properties of undefined” on line 47 indicates you’re trying to access a property on a variable that wasn’t initialized. Check that
userDatais defined before callinguserData.profile.name. Add a null check or optional chaining:userData?.profile?.name.
This isn’t OCR. The model actually understands visual context. Diagrams. Charts. UI mockups. Code screenshots with syntax highlighting intact.
I also run the smaller Qwen3-VL-8B for quick questions — it’s faster and good enough for most things. The 30B model is noticeably better for anything that requires multi-step reasoning or a longer context.
Most people with 16GB machines are running 7B or 8B models. With 48GB you can run 30B, and the quality difference is real.
Speed Comparison: Cloud vs Local
Let’s be honest about performance. Here’s what I measured:
| Task | ChatGPT | LM Studio (Qwen-30B) |
|---|---|---|
| Simple question | ~1 sec | ~5 sec |
| Code explanation (100 lines) | ~2 sec | ~10 sec |
| Image analysis | ~3 sec | ~15 sec |
| Long document summary | ~4 sec | ~20 sec |
The slower speed is the main trade-off. But here’s the thing—I’ve found it rarely matters for actual work. While waiting 10 seconds for a response, I’m reading the question I just asked. By the time I look up, the answer is there.
What I Actually Use It For
Every day looks different. Sometimes I’m researching a new library. I paste documentation. Ask questions. The model helps me understand faster than reading alone.
Other times I’m experimenting. Testing prompts. Trying different models. Seeing what works. LM Studio makes this easy. Switch models with one click.
I’ve used it for code reviews. Explaining legacy code. Brainstorming architecture. Even writing. If you’re curious how local models compare to cloud-based options for coding, I broke that down in my AI coding tools comparison. The AI isn’t perfect. But it’s always there. Always private.
Yesterday I analyzed a competitor’s UI design. Screenshot → LM Studio → detailed breakdown of their layout choices. All offline. No one tracking what I’m researching.
That’s the real win. Privacy isn’t about hiding. It’s about control.
Alternatives I Considered
LM Studio isn’t the only option. Here’s how it compares:
- Ollama – Command-line focused, no built-in GUI. Great for devs who prefer terminal. Excellent for automation and scripting.
- GPT4All – Simpler UI, fewer models, easier for beginners. Good starting point if LM Studio feels overwhelming.
- Jan.ai – Similar to LM Studio, newer project, currently has fewer model options. Worth watching.
I picked LM Studio for the model variety and vision model support. The interface is clean. The model selection is massive. And it just works.
The Honest Downsides
LM Studio isn’t perfect. Let me be real about the problems.
First, it’s slower than ChatGPT. Cloud models have massive GPU clusters. My Mac has… one M4 Pro. Responses take longer. Maybe 5-10 seconds instead of instant.
Second, you manage everything yourself. Want a new model? Download it. That’s a few gigabytes. Models pile up fast. You’ll need storage space.
Third, there’s a learning curve. Which model for which task? What’s the difference between 7B and 70B? You have to learn this stuff.
The interface is simpler than the web version of ChatGPT. No plugins. No browsing. Just you and the model.
For some tasks, cloud AI still wins. Web search? Up-to-date info? Yeah, ChatGPT is better there. And when AI-generated code fails in production, running models locally gives you more control over the fallout — something I explored in my take on vibe coding.
Why I Keep Using It Anyway
The practical answer: my data stays on my machine, it works offline, and I’m not paying per token. ChatGPT Plus is $20/month — $240/year. LM Studio is free, and the models are free. I bought the Mac for other reasons anyway.
There’s also the ecosystem thing. I’m not locked into one model or one provider. If Qwen releases something better next week, I download it. If I don’t trust a particular model for a particular task, I swap. That flexibility matters more than I expected when I started.
Open source model releases have been genuinely relentless. What counts as “good” keeps moving upward.
Getting Started
Go to lmstudio.ai. Download the app. Open it.
Click “Discover” to browse models. Search for “Qwen3-VL-8B” if you want to try vision. Or “Qwen-7B” for text-only.
Download a model. Wait. It’s big. Go make coffee.
Once downloaded, click “Load.” Pick your model. Start chatting.
That’s it. Seriously.
If you have a Mac with Apple Silicon, you’re golden. 16GB RAM minimum. More is better. Windows and Linux work too.
The models live at Hugging Face. Thousands of them. Some good. Some bad. LM Studio shows you ratings and downloads to help pick.
My Final Take
I still use ChatGPT. It’s better for anything that needs web access or current information. But for daily coding work, document analysis, and anything involving code I’d rather not send to an external server — LM Studio is what I reach for.
The M4 Pro with 48GB was expensive. Running local models is what made it feel worth it. The vision capability was the surprise — I didn’t expect that to be genuinely useful, and it is.
If you have Apple Silicon and care about where your data goes, it’s worth downloading and trying. The barrier is lower than it sounds.
Related Posts
- AI Coding Tools Compared (2026) — How local models stack up against cloud AI assistants
- The Truth About Vibe Coding — When AI-generated code works and when it bites you
- I Built Logwell for Self-Hosted Logging — PostgreSQL-native logging with Claude Code
- Full-Stack App on Cloudflare Workers — D1, Durable Objects, Queues, and AI parsing
- My Side Project Stack in 2026 — The full toolkit alongside LM Studio
Frequently Asked Questions
How much RAM do I need to run LM Studio?
Minimum 16GB for smaller 7B-8B models. 32GB+ recommended for 13B models. 48GB+ needed for 30B+ parameter models like Qwen3-VL-30B.
Is LM Studio really free?
Yes, completely free. No account, no credit card, no subscription. The models from Hugging Face are also free and open-source.
Can LM Studio analyze images and screenshots?
Yes, with vision-language models like Qwen3-VL. You can paste screenshots of code errors, UI designs, charts, or diagrams and the model will understand the visual context.
How does LM Studio compare to ChatGPT?
LM Studio is slower (5-10 seconds vs instant), works offline, has complete privacy, costs nothing, but lacks web browsing and real-time information. For daily coding and analysis work, it's highly capable.
What Mac models support LM Studio?
Any Mac with Apple Silicon (M1, M2, M3, M4). Performance scales with RAM and chip tier. M4 Pro/Max with 48GB+ RAM offers the best experience for large models.
Divanshu Chauhan (@divkix)
Software Engineer based in Tempe, Arizona, USA. More about divkix