Subscribe to discover Om’s fresh perspectives on the present and future.
Om Malik is a San Francisco based writer, photographer and investor. Read More
It is late April 2026. If you want to get a Mac you want, you cannot go into any Apple Store and pick the Mac you want.
A Mac mini with 64GB of RAM, ordered today, ships in sixteen to eighteen weeks. A Mac Studio with 256GB of RAM ships in four to five months. The 128GB and 256GB Mac Studio configurations are listed as “currently unavailable” on Apple’s online store. Apple removed the 512GB Mac Studio option entirely earlier this year. As of last week, even the base $599 Mac mini is sold out.
Have you wondered why?
The easy answers include a global memory shortage thanks to the AI boom. And that Apple has devices that are good for AI work.
Both are true. And yet, that is not the whole story.
For instance, if you want a maxed-out M5 Max MacBook Pro with 128GB of RAM and 2TB of storage, it ships in ten to fifteen days. The Mac mini with 64GB of RAM does not ship until August. Well, simply speaking, Apple makes gobs more money from its laptops, especially the high-end machines, and is saving its memory resources for them.
The desktop Mac mini is getting hit by demand like never before. Even Apple wasn’t expecting the upsurge it would experience as a result of the popularity of edge AI sensations like OpenClaw, as I explained in my piece, Lobster Boil.
The reason Macs are hot is because they are good for AI. Make that very good for AI.
The question is which part of the device is good for AI. Apple’s press release wants you to look at the new super cores, the new performance cores, the next-generation GPU with a Neural Accelerator in every core, and the new Fusion Architecture I wrote about last month. They are not sold out.
What is sold out is the memory, and believe it or not, memory is as crucial to AI as are those uber-powerful GPUs made by Nvidia and TPUs by Google.
We have spent years talking about CPUs and GPUs, and treated memory as a footnote, especially when it comes to Apple Silicon. And the reason is you can’t market memory as sexy, but more CPU cores, and more GPU cores, pack a greater marketing punch.
Let me elaborate on why memory matters so much in edge AI, and will matter even more as we start to see proliferation of more and more edge devices, from robots to glasses to neck pendants. Memory, in a way, is the bridge AI needs as much as it needs low latency in its network connectivity.
Think of a large language model as a giant warehouse of numbers. A 70-billion-parameter model is a warehouse with 70 billion individual numbers, called parameters, that the model learned during training. To produce one word of output, the chip walks through the entire warehouse, reads every number, multiplies it against your input, and adds things up. To produce the next word, it walks through the whole warehouse again. Every word streaming back at you on the screen is one full pass through the warehouse.
Memory bandwidth is how fast you can walk through the warehouse. The bigger the warehouse, the longer each pass takes. The faster the bandwidth, the more passes per second. That is the whole game.
The numbers are not arbitrary. A 70-billion-parameter model, compressed to four-bit precision, is about 35 gigabytes. On a machine with 614 GB/s of memory bandwidth, that is roughly seventeen passes through the warehouse per second. So seventeen tokens per second, which feels like a conversation. On a machine with 100 GB/s, it is two tokens per second. That feels like waiting.
The semiconductor industry’s own panel discussion earlier this month put the threshold for usable edge AI at “300 to 500 GB/s or more.” This is not some engineer’s wish list. It is the floor below which local large language models do not feel real.
Apple is the only consumer hardware company shipping above that threshold in volume. The M5 Pro hits 307 GB/s. The M5 Max hits 460 GB/s in the 32-core GPU configuration and 614 GB/s in the 40-core configuration. The M5 Ultra, expected in the next Mac Studio refresh in October, will go higher.
For inference, the compute waits on the memory. The chip is mostly idle. The bottleneck is feeding it. Bandwidth and capacity are not specs alongside the CPU and the GPU. They are the specs.

Apple made the decision that delivered this advantage in November 2020.
The M1 shipped with memory packaged onto the system on a chip, shared between the CPU, GPU, and Neural Engine, with bandwidth that was unusual for a laptop. Most coverage focused on the CPU performance and the Rosetta translation story. The unified memory was treated as a curiosity, something Apple could do because it controlled the whole stack. I wrote at the time that the M1’s combination of rapid memory access, task-adaptive computing, and machine-learning architecture was the future. That was correct, but I underestimated how much of the future would turn out to be about the memory specifically.
AMD, Intel, Qualcomm, and NVIDIA kept memory separate from the processor. PCIe to discrete GPU memory. Socketed DDR for the CPU. The traditional architecture, which is good for upgradability and bad for AI inference, because every tensor that crosses the bus from CPU memory to GPU memory pays a tax in latency and bandwidth that a unified architecture does not pay.
Five years later, AMD has Strix Halo trying to copy the approach. Qualcomm has Snapdragon X. Intel is reorganizing around something similar. They are four to five years behind, because the decision was hard to walk back once the existing customers expected discrete memory and socketed RAM.
The Fusion Architecture extends the bet. By splitting the chip into two dies but preserving unified memory across the boundary, Apple solved the manufacturing wall without giving up the architectural advantage. M5 Max at 614 GB/s in a 14-inch laptop. M5 Ultra coming in the next Mac Studio. M6 generation built on the same Fusion approach, with more dies and more bandwidth. The roadmap is clear, and the lead is durable.
Apple did not win an AI race. Apple made a memory-architecture decision in 2020 that turned out to be the AI race, five years before there was an AI race.
The piece of this story almost no one writes about is what is actually running on these machines.
Apple Intelligence is weak. The gap between what M5 silicon can do and what Apple’s own software asks it to do is enormous. For instance, I use a tiny beta app called Typeahead. It runs on my three-year-old MacBook Pro M3 Max and uses the latest Alibaba Qwen and Google Gemma models. It auto-completes sentences in any app using AI, trained on my own data. I don’t think about what model is running. I just find it useful. In comparison, Apple Intelligence and its proofreading and writing functionality feels like a toddler playing with alphabet blocks.
That example shows what people are doing with their Macs. The reason the Mac mini is sold out has nothing to do with Apple Intelligence. It is because we can run AI locally. Is it all powerful and perfect? No. But we are less than a year removed from the launch of OpenClaw, which is showing us the possibilities of edge AI as consumers.
But wait, there is something else.
The “something else” is overwhelmingly open-weight models. An outsized share of the best open-weight models for local deployment come out of China. DeepSeek. Qwen, from Alibaba. Kimi, from Moonshot. Baichuan. Zhipu. The list keeps getting longer. (By the way you can run all these models using the Apple research created MLX framework on Apple Silicon.)
There is a structural reason for this. Chinese AI labs have operated under the October 2022 chip export controls and the rounds that followed. They cannot get the latest NVIDIA hardware in volume. So they have spent three years optimizing the model side of the equation, because that is the only side they can change.
There is a reason everyone in China is so hopped up on OpenClaw. They are all in on edge AI. I recently played around with a tiny $10 computer called PicoClaw. It is about a quarter the size of a Square card reader. And it can run a variant of OpenClaw connected to cloud models. It is what they are all excited about in China. The American model is to push the cloud AI, to put it simply. Sell more tokens is the silent mission of Claude, Gemini, and ChatGPT. Elsewhere they want to do things locally.
The interesting next step on edge AI is going to come with the introduction of Apple Intelligence powered by Gemini. I have been told that it will be some kind of hybrid. And with the memory capabilities of Apple devices, that could be something else.
April 27, 2026. San Francisco.