April 27, 2026

Memory Is the Machine

It is late April 2026. If you want to get a Mac you want, you cannot go into any Apple Store and pick the Mac you want.

A Mac mini with 64GB of RAM, ordered today, ships in sixteen to eighteen weeks. A Mac Studio with 256GB of RAM ships in four to five months. The 128GB and 256GB Mac Studio configurations are listed as “currently unavailable” on Apple’s online store. Apple removed the 512GB Mac Studio option entirely earlier this year. As of last week, even the base $599 Mac mini is sold out.

Have you wondered why?

The easy answers include a global memory shortage thanks to the AI boom. And that Apple has devices that are good for AI work.

Both are true. And yet, that is not the whole story.

For instance, if you want a maxed-out M5 Max MacBook Pro with 128GB of RAM and 2TB of storage, it ships in ten to fifteen days. The Mac mini with 64GB of RAM does not ship until August. Well, simply speaking, Apple makes gobs more money from its laptops, especially the high-end machines, and is saving its memory resources for them.

The desktop Mac mini is getting hit by demand like never before. Even Apple wasn’t expecting the upsurge it would experience as a result of the popularity of edge AI sensations like OpenClaw, as I explained in my piece, Lobster Boil.

The reason Macs are hot is because they are good for AI. Make that very good for AI.

The question is which part of the device is good for AI. Apple’s press release wants you to look at the new super cores, the new performance cores, the next-generation GPU with a Neural Accelerator in every core, and the new Fusion Architecture I wrote about last month. They are not sold out.

What is sold out is the memory, and believe it or not, memory is as crucial to AI as are those uber-powerful GPUs made by Nvidia and TPUs by Google.

We have spent years talking about CPUs and GPUs, and treated memory as a footnote, especially when it comes to Apple Silicon. And the reason is you can’t market memory as sexy, but more CPU cores, and more GPU cores, pack a greater marketing punch.

Let me elaborate on why memory matters so much in edge AI, and will matter even more as we start to see proliferation of more and more edge devices, from robots to glasses to neck pendants. Memory, in a way, is the bridge AI needs as much as it needs low latency in its network connectivity.

Think of a large language model as a giant warehouse of numbers. A 70-billion-parameter model is a warehouse with 70 billion individual numbers, called parameters, that the model learned during training. To produce one word of output, the chip walks through the entire warehouse, reads every number, multiplies it against your input, and adds things up. To produce the next word, it walks through the whole warehouse again. Every word streaming back at you on the screen is one full pass through the warehouse.

Memory bandwidth is how fast you can walk through the warehouse. The bigger the warehouse, the longer each pass takes. The faster the bandwidth, the more passes per second. That is the whole game.

The numbers are not arbitrary. A 70-billion-parameter model, compressed to four-bit precision, is about 35 gigabytes. On a machine with 614 GB/s of memory bandwidth, that is roughly seventeen passes through the warehouse per second. So seventeen tokens per second, which feels like a conversation. On a machine with 100 GB/s, it is two tokens per second. That feels like waiting.

The semiconductor industry’s own panel discussion earlier this month put the threshold for usable edge AI at “300 to 500 GB/s or more.” This is not some engineer’s wish list. It is the floor below which local large language models do not feel real.

Apple is the only consumer hardware company shipping above that threshold in volume. The M5 Pro hits 307 GB/s. The M5 Max hits 460 GB/s in the 32-core GPU configuration and 614 GB/s in the 40-core configuration. The M5 Ultra, expected in the next Mac Studio refresh in October, will go higher.

For inference, the compute waits on the memory. The chip is mostly idle. The bottleneck is feeding it. Bandwidth and capacity are not specs alongside the CPU and the GPU. They are the specs.

Apple made the decision that delivered this advantage in November 2020.

The M1 shipped with memory packaged onto the system on a chip, shared between the CPU, GPU, and Neural Engine, with bandwidth that was unusual for a laptop. Most coverage focused on the CPU performance and the Rosetta translation story. The unified memory was treated as a curiosity, something Apple could do because it controlled the whole stack. I wrote at the time that the M1’s combination of rapid memory access, task-adaptive computing, and machine-learning architecture was the future. That was correct, but I underestimated how much of the future would turn out to be about the memory specifically.

AMD, Intel, Qualcomm, and NVIDIA kept memory separate from the processor. PCIe to discrete GPU memory. Socketed DDR for the CPU. The traditional architecture, which is good for upgradability and bad for AI inference, because every tensor that crosses the bus from CPU memory to GPU memory pays a tax in latency and bandwidth that a unified architecture does not pay.

Five years later, AMD has Strix Halo trying to copy the approach. Qualcomm has Snapdragon X. Intel is reorganizing around something similar. They are four to five years behind, because the decision was hard to walk back once the existing customers expected discrete memory and socketed RAM.

The Fusion Architecture extends the bet. By splitting the chip into two dies but preserving unified memory across the boundary, Apple solved the manufacturing wall without giving up the architectural advantage. M5 Max at 614 GB/s in a 14-inch laptop. M5 Ultra coming in the next Mac Studio. M6 generation built on the same Fusion approach, with more dies and more bandwidth. The roadmap is clear, and the lead is durable.

Apple did not win an AI race. Apple made a memory-architecture decision in 2020 that turned out to be the AI race, five years before there was an AI race.

The piece of this story almost no one writes about is what is actually running on these machines.

Apple Intelligence is weak. The gap between what M5 silicon can do and what Apple’s own software asks it to do is enormous. For instance, I use a tiny beta app called Typeahead. It runs on my three-year-old MacBook Pro M3 Max and uses the latest Alibaba Qwen and Google Gemma models. It auto-completes sentences in any app using AI, trained on my own data. I don’t think about what model is running. I just find it useful. In comparison, Apple Intelligence and its proofreading and writing functionality feels like a toddler playing with alphabet blocks.

That example shows what people are doing with their Macs. The reason the Mac mini is sold out has nothing to do with Apple Intelligence. It is because we can run AI locally. Is it all powerful and perfect? No. But we are less than a year removed from the launch of OpenClaw, which is showing us the possibilities of edge AI as consumers.

But wait, there is something else.

The “something else” is overwhelmingly open-weight models. An outsized share of the best open-weight models for local deployment come out of China. DeepSeek. Qwen, from Alibaba. Kimi, from Moonshot. Baichuan. Zhipu. The list keeps getting longer. (By the way you can run all these models using the Apple research created MLX framework on Apple Silicon.)

There is a structural reason for this. Chinese AI labs have operated under the October 2022 chip export controls and the rounds that followed. They cannot get the latest NVIDIA hardware in volume. So they have spent three years optimizing the model side of the equation, because that is the only side they can change.

There is a reason everyone in China is so hopped up on OpenClaw. They are all in on edge AI. I recently played around with a tiny $10 computer called PicoClaw. It is about a quarter the size of a Square card reader. And it can run a variant of OpenClaw connected to cloud models. It is what they are all excited about in China. The American model is to push the cloud AI, to put it simply. Sell more tokens is the silent mission of Claude, Gemini, and ChatGPT. Elsewhere they want to do things locally.

The interesting next step on edge AI is going to come with the introduction of Apple Intelligence powered by Gemini. I have been told that it will be some kind of hybrid. And with the memory capabilities of Apple devices, that could be something else.

April 27, 2026. San Francisco.

Update from reader Martin Alderson is informational and very helpful.

Just wanted to offer one piece of information you overlooked in your mac memory article. You’re assuming that all models are ‘dense’ when in reality many models are MoE (mixture of experts). This means vastly less params to ‘walk’ in the warehouse.

eg gemma 4 26b a4b only has 4b active params and delivers ~similar performance to the full 31b ‘dense’ model.

I made a rough calculator on my blog if you want to play around https://martinalderson.com/posts/local-llm-speed-calculator/

basically with these moe models you can actually get pretty good usable results running the moe part off even “normal” system RAM.

Technology

6 comments

6 thoughts on this post

pedro tavares says:

April 27, 2026 at 12:02 pm

Thank you Mr. Malik!
Superb writing! Learned a lot from it.

Loading...

Reply
1. Om Malik says:
  
  April 28, 2026 at 3:45 am
  
  I am glad you enjoyed this piece. Thank you for reading
  
  Loading...
  
  Reply
Brian Sullan says:

April 27, 2026 at 1:00 pm

Thanks – this was very informative, especially learning how AI utilizes memory. I was really hoping to replace my M1 MacBook Pro sometime this year since I’ve been paying the price trying to process multi-layer Photoshop files with just the base model with 16GB of RAM. Considering going with the next Mac Studio and 64GB if supplies aren’t constrained.

Loading...

Reply
1. Om Malik says:
  
  April 28, 2026 at 3:47 am
  
  I am glad this was of help. I would say rather than buying a Mac Studio, you could do well with MacBook Pro with M5 as you are more likely to get it 😉
  
  Loading...
  
  Reply
  1. Adam Isacson says:
    
    April 28, 2026 at 1:50 pm
    
    Back in March, I ordered a Studio (arriving in July hopefully) instead of a MacBook because it was $1,000+ less.
    
    Loading...
    
    Reply
Jerold Von Hemel says:

April 28, 2026 at 5:55 am

Om, such a great take on Apple’s unified memory! I asked my wife if I could buy a 20k mac studio (that you cannot even preorder now) and run models at home, and she rightfully looks at me like I grew two heads. openrouter.ai offers me the ability to use all the open models affordably, along with my 3090 for gemma4. Now to look into typeahead.ai!

It is so surreal living during this ai revolution that is said to rival the industrial revolution. I was fortunate to have bought my base Mac Mini M4 in December 2025, before the claw invasion, and to have not sold my pandemic era 3090!

It will be interesting to see if Apple made a wise call by keeping their cash hoard on the sidelines while the other big players invest like drunken sailors, with even google offering century bonds to finance their cap ex. I suspect Google will win AI in the end.

Loading...

Reply

Memory Is the Machine

Subscribe to discover Om’s fresh perspectives on the present and future.

6 thoughts on this post

Leave a Reply Cancel reply

Share on Mastodon