Meanwhile, the corresponding "non-standard" desktop PC is the Framework Desktop, which with the Ryzen AI Max+ 395 can use 120GB of its 128GB RAM for the GPU: How to Run a One Trillion-Parameter LLM Locally: An AMD Ryzen™ AI Max+ Cluster Guide https://www.amd.com/en/developer/resources/technical-article...
The Framework Desktop motherboard does actually have a PCIe x4 slot, their case just doesn't expose it for whatever reason. But you can buy the board separately and put it in your own choice of Mini-ITX case which does.
That PCI slot has low power output, not suitable for what people would plug into it. Easier to cover it than run into support problems. It won't run a GPU directly. It will run an Oculink card which will allow use of an egpu.
I can't imagine why you'd want to buy a beefy SOC with unified memory, only to have it host a discrete GPU through it's narrow PCIe x4 interface. You'd be better off with a traditional CPU that supports a proper x16 slot.
Yeah, but all of this is pointless when RAM is as expensive as two CPUs by itself - if it's even in stock. AMD/Intel should focus on that first if they want to save their DIY business at all - which I'm starting to doubt they don't
Let me describe this in the most simple terms possible: You have speculators speculating about AI products. The speculators are not very smart when it comes to technology, and think RAM is RAM. There is at least three kinds of RAM that are important to this: DDR for system RAM, GDDR for GPUs, and HBM for high density enterprise products, and they are not interchangeable, there is no one-die-fits-all solution.
So, these speculators are like "oh no, more GPUs requires more RAM!", and then just start speculating on all RAM. Which of these RAMs are the ones that they need to worry about? Exclusively HBM, which is a minority in production, DDR and GDDR dominate production.
If you're into inference, and have older machines, you're buying Hxxx or Bxxx cards that use HBM, fit into dual slot x16 configurations, and you're jamming (optimally) 8 of them in. If you're into hardware that is newer, somewhere in the middle of the inference boom, you're using MXM cards. In either situation, the host machine has DDR, but if you're OpenAI, Anthropic, Microsoft, or Google, you're not building (more) inference machines like this.
The first two are buying Nvidia's all in one SBC solution: unified HBM, onboard ARM CPU to babysit the dual GPUs, has its own dual QSFP network controller that can RDMA, etc. No DDR or GDDR involved. Any machines built before this platform are being phased out entirely.
Microsoft is doing the same, but with AMD's products, the MI series that co-locates Epyc-grade Zen 4/5 CCDs with CDNA compute chiplets, running the entire thing off HBM, thus also unified and no DDR/GDDR needed. They, too, are phasing out machines older than this.
Google has a mix: they offer Nvidia all in one SBCs as part of GCP for legacy inference tasks (so your stack that can't run on AMD yet still can run), but also offer the same MI products that Microsoft offers via Azure's inference product, but also has their own TPUs that some of Gemini runs on; the TPUs run on HBM afiact. No DDR or GDDR here.
So, what does AMD or Intel do here? Lets say they waste fab time to make their own dies on the wrong process (TSMC and Intel-Foundry do not have for-RAM optimized processes)... they would be producing DDR and GDDR for a market that almost has its entire demand met. Intel lacks the die stacking technology required to build HBM, and TSMC I think can't do it for that many layers (HBM has 8 to 16 layers in current gen stuff iirc).
Micron, for example, already is bringing two large factories online here in the US to meet the projected growth in demand for the next 20+ years. When these factories finally start producing, it will not change the minds of speculators: they still seem to think AI datacenters need RAM, of any kind, and refuse to understand even the most basics of nuance. Also, when they come online, HBM will be a minority product; the AI inference boom is still just a bump in the road for them.
Nvidia kinda screwed their consumer partners, btw: they no longer bundle the GDDR required for the card with the purchase of the die. There is a slight short term bump in GDDR spot prices as partners are building up warchests to push series 60 GPUs into production, and once that is done, spot prices return to normal (outside of the wild speculation manipulation).
One last thing: what about LPDDR, used by AMD Strix Halo and Apple stuff? Speculation seems to have not actually effected it. I consider it as a sub-category of DDR (and some dies seem to work as either DDR or LPDDR as of DDR5, due to the merger of the specs by JEDEC), but since it isn't something you find in datacenters, it seems to have avoided speculation.
The Ryzen Max CPUs mentioned in the linked article? Uses LPDDR. Doubling down on the Ryzen Max product line might be a brilliant move.
> The speculators are not very smart when it comes to technology, and think RAM is RAM. There is at least three kinds of RAM that are important to this: DDR for system RAM, GDDR for GPUs, and HBM for high density enterprise products, and they are not interchangeable, there is no one-die-fits-all solution.
The commenter is also not very smart and does not realize companies making the RAM can trade capacity of one for another and any re-tooling at current price is still profitable.
The commenter also does not realize that is also true for lines currently making SSDs
They can trade capacity, but they generally aren't. The huge storage-only fabs owned by Samsung and Micron do runs that go for 9 months to 12 months.
Flash chips haven't been speculated on nearly as hard, and are suffering from the same sort of weird lack-of-nuance. Samsung, for example, isn't reassigning capacity to meet some sort of phantom datacenter demand that isn't already there, generically, across all datacenters, AI or not.
A lot of SSD price skyrocketing is largely "SSDs have RAM on them for cache", not "SSDs have flash chips, and they're both made at the same fabs"... which oddly effects low end SSDs that don't have external cache.
To make it worse, for the speculators who do understand this, because it isn't some universal homogeneous group, the flash chips that go into enterprise SSDs aren't the same that go into consumer SSDs.
The Big Three still aren't doing some major re-tasking of capacity, as the actual global demand isn't outstripping supply any more than normal. There is no short term problem to fix, speculators are just gonna have to stop hoarding toilet paper like its the start of Covid.
Edit: Oh, and if you want to ask how AMD/TSMC or Intel solve this? They can't, same reason why making their own in-house HBM isn't happening.
> what about LPDDR, used by AMD Strix Halo and Apple stuff? Speculation seems to have not actually effected it
Good luck actually finding them on stock with 128GB+ RAM. I got strix laptop while ago, now price in EU is technically the same, but no stock. Maybe month or three
There is also claw hype. And large gwen3.5 models can run very well on DDR5 CPUs or mac minis...
I find the panic over RAM prices to be overestimated. 32GB DDR5 RAM is around $500 which is comparable to to the 9800x3D. Sure it sucks that it increases by around 4x, but when you factor in the overall price of a top end PC at around 1000-2000, especially for the lion's sum of the GPU, the increase is marginal.
This only effects a very narrow slice of highly budget conscious consumers trying to build high end PCs at razor thin margins.
$500 for 32GB is about $15/GB which is a high we haven't seen since the mid-2000s. This is a big deal, it turns RAM and to some extent storage (especially fast storage) into a massive economic bottleneck.
That is an insane amount of money for just 32GB of RAM! That's what we were paying back when it was hard to use more than 32-64GB in a desktop setting. These days with all the electron and node bloatware, containers everywhere and AI - 32GB doesn't get you far.
I disagree with you. The issue does not only affect a “very narrow slice” of consumers. https://www.techspot.com/news/111472-hp-warns-ram-now-makes-... A major brand is now suggesting that this is a “new normal” and one solution is to just offer systems with less ram. This is an issue when lots of modern software seems to expect an unending supply.
Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory bandwidth using the CPU for compute, never mind the GPU. Adding dark silicon for some special operations isn't going to make out memory bandwidth faster.
I'm not sure what you mean - I think the mobile 300 series can do quad channel already for its APU at least. I'd assume it can do more but do you not need more slots beyond that?
It's streaming access, and no not as far as I'm aware. APUs have always been hilariously bottlenecked on memory bandwidth as soon as your task actually needed to pull in data. The only exception I know of is the PS5 because it uses GDDR instead of desktop memory.
I guess it depends on what games you play. I have an AI Max 395 (Framework Desktop) and it runs every game in my library flawlessly. I'm sure if I played this year's most resource-intensive games it might stutter, but I don't. For me, it's an amazing low power minpc doing triple duty as a gaming PC, development box, and running my self-hosted services for the rest of the house.
Yeah. I have one. People buying the Framework Desktop mainboard aren't buying it just for gaming. There are better and far cheaper options for gaming. What this does is everything, though. Good enough for 1440p gaming. 16c/32t powerful CPU, it can run LLM's. SFF main pc that can do everything in a tiny space is a win.
That is already the case with datacenter "GPUs". A A100, MI300 or Intel PVC/Gaudi does not have useful graphics performance nor capabilities. Coprocessors ala NPU/VPU are also on the rise again for CPUs.
Yes, this has already been the case for years on mobile devices, CoPilot+ PC design requires this approach as well.
Additionally, GPUs are going back to the early days, by becoming general purpose parallel compute devices, where you can use the old software rendering techniques, now hardware accelerated.
Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area!
Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name.
This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve.
Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures.
That may or may not pan out.
I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used.
Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied.
Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else.
AMD marketing is hoping the “AI” branding is a positive. Antidotally, I know many consumers who are not sold on AI. This branding could actually hurt sales.
We are dealing with a hype, but the reality is that AI would change everything we do. Local models will start being helpful in [more] unobtrusive ways. Machines with decent local NPUs would be usable for longer before they feel too slow.
> the reality is that AI would change everything we do
Your true believer convictions don't matter here. Those AI accelerators are merely just marketing stunts. They won't help your local inference because they are not general purpose enough for that, they are too weak to be impactful, most people won't ever run local inference because it sucks and is a resource hog most can't afford, and it goes against the interests of those scammy unprofitable corporations who are selling us LLMs as AI as the silver bullet to every problem and got us there in the first place (they are already successful in that, by making computing unaffordable). There's little to no economical and functional meaning to those NPUs.
> most people won't ever run local inference because it sucks and is a resource hog most can't afford
a) Local inference for chats sucks. Using LLMs for chatting is stupid though.
b) Local inference is cheap if you're not selling a general-purpose chatbot.
There's lots of fun stuff you can get with a local LLM that previously wasn't economically possible.
Two big ones are gaming (for example, text adventure games or complex board games like Magic the Gathering) and office automation (word processors, excel tables).
Your comment is almost completely irrelevant to what the parent is saying. "AI would change everything we do" has nothing to do with "This new chip along with bloat from Windows enables new workflows for you". If you have been paying attention, you'd know that NPUs from these new CPUs barely made any difference from a consumer's perspective.
For some people maybe. I don't want to use local AI and NPU will be dead weight for me. Can't imagine a single task in my workflow that would benefit from AI.
It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
> Can't imagine a single task in my workflow that would benefit from AI.
You don't do anything involving realtime image, video, or sound processing? You don't want ML-powered denoising and other enhancements for your webcam, live captions/transcription for video, OCR allowing you to select and copy text out of any image, object and face recognition for your photo library enabling semantic search? I can agree that local LLMs aren't for everybody—especially the kind of models you can fit on a consumer machine that isn't very high-end—but NPUs aren't really meant for LLMs, anyways, and there are still other kinds of ML tasks.
> It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
Do you insist that your CPU cores must be completely homogeneous? AMD, Intel, Qualcomm and Apple are all making at least some processors where the smaller CPU cores aren't optimized for power efficiency so much as maximizing total multi-core throughput with the available die area. It's a pretty straightforward consequence of Amdahl's Law that only a few of your CPU cores need the absolute highest single-thread performance, and if you have the option of replacing the rest with a significantly larger number of smaller cores that individually have most of the performance of the larger cores, you'll come out ahead.
None of what I listed was in any way specific to "content creators". They're not the only ones who participate in video calls or take photos.
And on the platforms that have a NPU with a usable programming model and good vendor support, the NPU absolutely does get used for those tasks. More fragmented platforms like Windows PCs are least likely to make good use of their NPUs, but it's still common to see laptop OEMs shipping the right software components to get some of those tasks running on the NPU. (And Microsoft does still seem to want to promote that; their AI PC branding efforts aren't pure marketing BS.)
The issue is that the consumer strongly associates "AI" with LLMs specifically. The fact that machine learning is used to blur your background in a video call, for example, is irrelevant to the consumer and isn't thought of as AI.
Never wanted to do high quality voice recognition? No need for face/object detection in near instant speed for your photos, embedding based indexing and RAG for your local documents with free text search where synonyms also work? All locally, real-time, with minimal energy use.
That is fine. Most ordinary users can benefit from these very basic use cases which can be accelerated.
Guess people also said this for video encoding acceleration, and now they use it on a daily basis for video conferencing, for example.
Those usecases are at least 5 years if not 10 years out. They require software support which won't come until a significant part of the pc market has the necessary hardware for it. Until then, paying extra for the hardware is foolish.
This will only come if Windows 12 requires a TPU and most of the old hardware is decommissioned.
Also similar to GPU + CPU on the same die, yet here we are. In a sense, AI is already in every x86 CPU for many years, and you already benefit from using it locally (branch prediction in modern processors is ML-based).
> Also similar to GPU + CPU on the same die, yet here we are.
I think the overall trend is now moving somewhat away from having the CPU and GPU on one die. Intel's been splitting things up into several chiplets for most of their recent generations of processors, AMD's desktop processors have been putting the iGPU on a different die than the CPU cores for both of the generations that have an iGPU, their high-end mobile part does the same, even NVIDIA has done it that way.
Where we still see monolithic SoCs as a single die is mostly smaller, low-power parts used in devices that wouldn't have the power budget for a discrete GPU. But as this article shows, sometimes those mobile parts get packaged for a desktop socket to fill a hole in the product line without designing an entirely new piece of silicon.
So I’ve got a lot warmer to believing that AI can be a better programmer than most programmers these days. That is a low bar :). The current approach to AI can definitely change how effective a programmer is: but then it is up to the market to decide if we need so many programmers. The talk about how each company is going to keep all the existing programmers and just expect productivity multipliers is just what execs are currently telling programmers; that might change when the same is execs are talking to shareholders etc.
But does this extrapolate to the current way of doing AI being in normal life in a good way that ends up being popular? The way Microsoft etc is trying to put AI in everything is kinda saying no it isn’t actually what users want.
I’d like voice control in my PC or phone. That’s a use for these NPUs. But I imagine it is like AR- what we all want until it arrives and it’s meh.
Indeed, I was buying a laptop for my wife, and she was viscerally against "Ryzen AI": I don't want a CPU with builtin AI to spy on my screen all the time!
As far as I can find, Plex does not support AMD iGPU for transcoding. Jellyfin will work, but support seems rather spotty. For other AI/ML work, it seems like ROCm is up and coming, but support - e.g. for Frigate object detection - is still a work in progress, especially for newer chips.
Is it actually using the iGPU, or just "brute forcing" it?
I've put it in quotes as the effort required from these chips for streaming transcoding is so low these days that brute force makes it sound like more effort than it really is.
Putting Strix Halo into the AM5 socket would make no sense. Half the memory controllers would be orphaned and the GPU would be severely bandwidth-starved (assuming that the memory controller on Strix Halo actually supports DDR5 and not just LPDDR5).
Yeah the next generation of Strix Halo is what would get me excited. I think right now TSMC has no capacity, so maybe we have to wait another year. Kinda ironic that all CPU/RAM capacity is being sold to LLM companies, and as a result we can't get the hardware needed for good local LLMs.
> all CPU/RAM capacity is being sold to LLM companies, and as a result we can't get the hardware needed for good local LLMs.
yeah... Ironic I guess. It's as if they've realised that it's only a matter of time until we get a "good enough" FOSS model that runs on consumer hardware. The fact that such a thing would demolish their entire business of getting VC hyped while giving out their service for a loss surely got lost to them. Surely they and Nvidia have not realised that the only thing that could stop this is to make good hardware unreachable for anything smaller than a massive corp
Mark my words: in less than one year, we'll probably get something akin to Opus 4.6 FOSS. China is putting as much money into that as they can because they know this would crash the US economy, which is in the green only thanks to big tech pumping up AI. China wants Trump either gone or neutered as soon as possible, which they know they can do by making Republicans as unelectable as possible - something that will probably do if the economy crashes and a recession happens
The Ryzen AI line is actually great if deployed to an entire org as the bottom tier, as it garuantees every device has a 50 TOPs NPU. We deploy local software at $STARTUP and this makes deployment to a Windows corp more predictable.
> This makes them AMD’s first desktop chips to qualify for Microsoft’s Copilot+ PC label, which enables a handful of unique Windows 11 features like Recall and Click to Do.
Microsoft: "Friendship ended with Intel, now AMD is my best friend"
Actually it is Qualcom, as they keep trying to push for ARM, but due to the way PC ecosystem has been going since the IBM PC clones started, no one is rushing out to adopt ARM.
[1] https://community.frame.work/t/noisy-psu-fan/74751
Let me describe this in the most simple terms possible: You have speculators speculating about AI products. The speculators are not very smart when it comes to technology, and think RAM is RAM. There is at least three kinds of RAM that are important to this: DDR for system RAM, GDDR for GPUs, and HBM for high density enterprise products, and they are not interchangeable, there is no one-die-fits-all solution.
So, these speculators are like "oh no, more GPUs requires more RAM!", and then just start speculating on all RAM. Which of these RAMs are the ones that they need to worry about? Exclusively HBM, which is a minority in production, DDR and GDDR dominate production.
If you're into inference, and have older machines, you're buying Hxxx or Bxxx cards that use HBM, fit into dual slot x16 configurations, and you're jamming (optimally) 8 of them in. If you're into hardware that is newer, somewhere in the middle of the inference boom, you're using MXM cards. In either situation, the host machine has DDR, but if you're OpenAI, Anthropic, Microsoft, or Google, you're not building (more) inference machines like this.
The first two are buying Nvidia's all in one SBC solution: unified HBM, onboard ARM CPU to babysit the dual GPUs, has its own dual QSFP network controller that can RDMA, etc. No DDR or GDDR involved. Any machines built before this platform are being phased out entirely.
Microsoft is doing the same, but with AMD's products, the MI series that co-locates Epyc-grade Zen 4/5 CCDs with CDNA compute chiplets, running the entire thing off HBM, thus also unified and no DDR/GDDR needed. They, too, are phasing out machines older than this.
Google has a mix: they offer Nvidia all in one SBCs as part of GCP for legacy inference tasks (so your stack that can't run on AMD yet still can run), but also offer the same MI products that Microsoft offers via Azure's inference product, but also has their own TPUs that some of Gemini runs on; the TPUs run on HBM afiact. No DDR or GDDR here.
So, what does AMD or Intel do here? Lets say they waste fab time to make their own dies on the wrong process (TSMC and Intel-Foundry do not have for-RAM optimized processes)... they would be producing DDR and GDDR for a market that almost has its entire demand met. Intel lacks the die stacking technology required to build HBM, and TSMC I think can't do it for that many layers (HBM has 8 to 16 layers in current gen stuff iirc).
Micron, for example, already is bringing two large factories online here in the US to meet the projected growth in demand for the next 20+ years. When these factories finally start producing, it will not change the minds of speculators: they still seem to think AI datacenters need RAM, of any kind, and refuse to understand even the most basics of nuance. Also, when they come online, HBM will be a minority product; the AI inference boom is still just a bump in the road for them.
Nvidia kinda screwed their consumer partners, btw: they no longer bundle the GDDR required for the card with the purchase of the die. There is a slight short term bump in GDDR spot prices as partners are building up warchests to push series 60 GPUs into production, and once that is done, spot prices return to normal (outside of the wild speculation manipulation).
One last thing: what about LPDDR, used by AMD Strix Halo and Apple stuff? Speculation seems to have not actually effected it. I consider it as a sub-category of DDR (and some dies seem to work as either DDR or LPDDR as of DDR5, due to the merger of the specs by JEDEC), but since it isn't something you find in datacenters, it seems to have avoided speculation.
The Ryzen Max CPUs mentioned in the linked article? Uses LPDDR. Doubling down on the Ryzen Max product line might be a brilliant move.
The commenter is also not very smart and does not realize companies making the RAM can trade capacity of one for another and any re-tooling at current price is still profitable.
The commenter also does not realize that is also true for lines currently making SSDs
Flash chips haven't been speculated on nearly as hard, and are suffering from the same sort of weird lack-of-nuance. Samsung, for example, isn't reassigning capacity to meet some sort of phantom datacenter demand that isn't already there, generically, across all datacenters, AI or not.
A lot of SSD price skyrocketing is largely "SSDs have RAM on them for cache", not "SSDs have flash chips, and they're both made at the same fabs"... which oddly effects low end SSDs that don't have external cache.
To make it worse, for the speculators who do understand this, because it isn't some universal homogeneous group, the flash chips that go into enterprise SSDs aren't the same that go into consumer SSDs.
The Big Three still aren't doing some major re-tasking of capacity, as the actual global demand isn't outstripping supply any more than normal. There is no short term problem to fix, speculators are just gonna have to stop hoarding toilet paper like its the start of Covid.
Edit: Oh, and if you want to ask how AMD/TSMC or Intel solve this? They can't, same reason why making their own in-house HBM isn't happening.
Micron killed Crucial to focus on AI.
Good luck actually finding them on stock with 128GB+ RAM. I got strix laptop while ago, now price in EU is technically the same, but no stock. Maybe month or three
There is also claw hype. And large gwen3.5 models can run very well on DDR5 CPUs or mac minis...
This only effects a very narrow slice of highly budget conscious consumers trying to build high end PCs at razor thin margins.
Did you adjust for inflation ?
All 4 of my "top end PCs" have 128GB RAM. Me server (I self host everything is 512GB). Lucky for me all were bought before that insanity.
I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.
>First wave of Ryzen AI desktop CPUs targets business PCs rather than DIYers.
I can see separate cards for datacenter use but for consumers they will probably come on same SOC as CPU
Additionally, GPUs are going back to the early days, by becoming general purpose parallel compute devices, where you can use the old software rendering techniques, now hardware accelerated.
Pretty much every hardware vendor has an NPU
Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area!
Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name.
This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve.
Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures.
That may or may not pan out.
I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used.
Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied.
Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else.
> the compute needed to perform matrix multiplications goes up as the cube of their size,
are they really not using even Strassen multiplication?
Your true believer convictions don't matter here. Those AI accelerators are merely just marketing stunts. They won't help your local inference because they are not general purpose enough for that, they are too weak to be impactful, most people won't ever run local inference because it sucks and is a resource hog most can't afford, and it goes against the interests of those scammy unprofitable corporations who are selling us LLMs as AI as the silver bullet to every problem and got us there in the first place (they are already successful in that, by making computing unaffordable). There's little to no economical and functional meaning to those NPUs.
Do you have a real argument, especially a technical one, that you can contribute?
a) Local inference for chats sucks. Using LLMs for chatting is stupid though.
b) Local inference is cheap if you're not selling a general-purpose chatbot.
There's lots of fun stuff you can get with a local LLM that previously wasn't economically possible.
Two big ones are gaming (for example, text adventure games or complex board games like Magic the Gathering) and office automation (word processors, excel tables).
Ask your friends or a small business owner if they are going to spend $1k on a new laptop because "there's lots of fun stuff".
For office automation, you'll get a lot more mileage with Claude and similar.
/r/SillyTavernAI would disagree with you.
We are talking about NPUs here.
It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
You don't do anything involving realtime image, video, or sound processing? You don't want ML-powered denoising and other enhancements for your webcam, live captions/transcription for video, OCR allowing you to select and copy text out of any image, object and face recognition for your photo library enabling semantic search? I can agree that local LLMs aren't for everybody—especially the kind of models you can fit on a consumer machine that isn't very high-end—but NPUs aren't really meant for LLMs, anyways, and there are still other kinds of ML tasks.
> It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
Do you insist that your CPU cores must be completely homogeneous? AMD, Intel, Qualcomm and Apple are all making at least some processors where the smaller CPU cores aren't optimized for power efficiency so much as maximizing total multi-core throughput with the available die area. It's a pretty straightforward consequence of Amdahl's Law that only a few of your CPU cores need the absolute highest single-thread performance, and if you have the option of replacing the rest with a significantly larger number of smaller cores that individually have most of the performance of the larger cores, you'll come out ahead.
Nothing that's not already hardware accelerated by the GPU or trivial to do on CPU.
> You don't want ML-powered denoising and other enhancements for your webcam
Not really.
> live captions/transcription for video
Not really, since they're always bad. Maybe if it's really good, but I haven't seen that yet.
> OCR allowing you to select and copy text out of any image
Yet to see this implemented well, but it would be a nice QOL feature, but not one I'd care all that much about being absent.
> object and face recognition for your photo library enabling semantic search?
Maybe for my old vacation photos, but that's a solid 'eh'. Nice to have, wouldn't care if it wasn't there.
Besides, most of what you mentioned doesn't run on NPU anyway. They are usually standard GPU workload.
And on the platforms that have a NPU with a usable programming model and good vendor support, the NPU absolutely does get used for those tasks. More fragmented platforms like Windows PCs are least likely to make good use of their NPUs, but it's still common to see laptop OEMs shipping the right software components to get some of those tasks running on the NPU. (And Microsoft does still seem to want to promote that; their AI PC branding efforts aren't pure marketing BS.)
That is fine. Most ordinary users can benefit from these very basic use cases which can be accelerated.
Guess people also said this for video encoding acceleration, and now they use it on a daily basis for video conferencing, for example.
This will only come if Windows 12 requires a TPU and most of the old hardware is decommissioned.
I think the overall trend is now moving somewhat away from having the CPU and GPU on one die. Intel's been splitting things up into several chiplets for most of their recent generations of processors, AMD's desktop processors have been putting the iGPU on a different die than the CPU cores for both of the generations that have an iGPU, their high-end mobile part does the same, even NVIDIA has done it that way.
Where we still see monolithic SoCs as a single die is mostly smaller, low-power parts used in devices that wouldn't have the power budget for a discrete GPU. But as this article shows, sometimes those mobile parts get packaged for a desktop socket to fill a hole in the product line without designing an entirely new piece of silicon.
Toilets also changed everything we do and are helpful in unobtrusive ways, but that won't make the "Ryzen Crapper" a customer favorite.
But does this extrapolate to the current way of doing AI being in normal life in a good way that ends up being popular? The way Microsoft etc is trying to put AI in everything is kinda saying no it isn’t actually what users want.
I’d like voice control in my PC or phone. That’s a use for these NPUs. But I imagine it is like AR- what we all want until it arrives and it’s meh.
I've put it in quotes as the effort required from these chips for streaming transcoding is so low these days that brute force makes it sound like more effort than it really is.
What's your source for this? Transcoding without acceleration is incredibly expensive, especially for 4K content, and especially for 4K HDR content.
Even a single 4K HDR -> 1080p transcode takes a huge amount of resources.
The Asustor Lockerstor4 Gen3 has a Quad-Core Ryzen Embedded V3C14 and cannot transcode 4K content.
Meanwhile, an old Kaby Lake Intel chip does so just fine but only because its QSV can handle h265.
My homelab setup runs out of memory much faster than it does CPU cores.
I wanted a better strix halo (which has 128GB unified RAM and 40cu on the 8080s (or something) iGPU).
This looks like normal Ryzen mobile chips + but with fewer cus.
yeah... Ironic I guess. It's as if they've realised that it's only a matter of time until we get a "good enough" FOSS model that runs on consumer hardware. The fact that such a thing would demolish their entire business of getting VC hyped while giving out their service for a loss surely got lost to them. Surely they and Nvidia have not realised that the only thing that could stop this is to make good hardware unreachable for anything smaller than a massive corp
Mark my words: in less than one year, we'll probably get something akin to Opus 4.6 FOSS. China is putting as much money into that as they can because they know this would crash the US economy, which is in the green only thanks to big tech pumping up AI. China wants Trump either gone or neutered as soon as possible, which they know they can do by making Republicans as unelectable as possible - something that will probably do if the economy crashes and a recession happens
Microsoft: "Friendship ended with Intel, now AMD is my best friend"