Probably fun for those who already bought DDR5 memory... still kicking myself for not just pulling the trigger on that 128GB dual stick kit I looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.
If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.
I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.
128GB of RAM should not cost $4K even in this market.
In January I upgraded my desktop, 9950X3D £600, 64GB DDR5-6000 £600, MSI MAG Tomahawk X870E £300, Samsung 990 Pro 4TB £350, Asus Prime 9070XT £580. I spent a another £250 on PSU and cooler and reused my case (Phanteks Evolv Enthoo TG, beautiful case but horrible cooling. Will cut some holes in it and if it doesnt work out look for something with more airflow).
The RAM price was already inflated at that time, and the same kit is now £800, but in October or earlier last year I'd have saved possibly the cost of the CPU/GPU on the whole thing, but now it's be about the cost of a CPU/GPU more expensive.
On a side note for anyone not aware, 9950X3D isn't the best choice for pure gaming, 9850X3D is cheaper and marginally better, also I went with 2 sticks of RAM kit, 4 sticks is much harder to run at the advertised speed (6000) which is actually an overclock.
Im a dev and a linux user/gamer hence my choice of CPU/GPU.
Very similar config, but I bought a second pair of ram. Running 4 sticks at 3600.
Also, the LAN port of the motherboard stopped working after a week, so I had to buy an Ethernet card
Due to the high prices of DRAM and SSDs they now are the greatest fractions of the total price of a computer.
In January I was forced to upgrade an ancient Intel NUC, by replacing it with an Arrow Lake H based ASUS NUC. The complete system with 32 GB DRAM and 3 TB SSDs has cost EUR 1200, including VAT sales tax.
Got it running with 4800MT/s and literally 30 minute boot times in an AM5 machine. The 30 minute boot time could be worked around by enabling the (off-by-default) memory context restore option in BIOS, but it really made me think something was broken and it wasn't until I found other people talking about 30 minute boot times that I stopped debugging and just let it sit for an eternity.
It's so bad. I don't get why they sell AM5 motherboards with 4 RAM slots.
At least that system has been running well for like two years. But had I known that the situation is so much more dire than with DDR4, I would've just gotten the same amount of RAM in two sticks rather than four.
You need to enable MCR (which trains the memory once and caches the result for (iirc) 30 days) otherwise yeah, booting is horribly slow, even the 64GB I have can take several minutes but with MCR it boots basically instantly.
Memory training seems to be getting faster with each bios update. In 2024 when I upgraded to AM5, 64GB memory training took like 15 minutes. Now the same setup takes about a minute when it needs to retrain, then near instant with MCR (Windows 11 takes significantly longer to load than the POST process).
I’m in the same situation! My machine will take 2-5 minute to post every few reboots, it seems random. The messed up part is the marketing material says this things can handle 256gb of ram or whatever absurd number, f me for thinking then 128gb should be no problem. Honestly this whole thing has soured me on AMD. Yea they have bigger numbers than intel but at what cost, stability?
It's the RAM. It needs to "trained" which takes some time but for for some reason these boards seem to randomly forget their training, requiring it to happen again.
I've never had memory training be forgotten with my AM4 nor LPDDR5-based laptops and NUCs. Is this a new thing with AM5 or something? Or just a certain brand of BIOSes?
DDR5 is much, much more fickle than DDR4 and earlier standards. I think it's primarily due to pushing clock speeds (6000 MT/s would be insanely fast for DDR4, but kinda slow for DDR5).
Memory training has always been a thing: during boot, your PC runs tests to work out what slight changes between signals and stuff it needs to adapt to the specific requirements of your particular hardware. With DDR4 and earlier, that was really fast because the timings were so relatively loose. With DDR5, it can be really slow because the timings are so tight.
I’m running 128gb on a 9550x now with 4x32gb sticks and it’s terrible. It’s unstsable, post time is about 2 minutes (not exaggerating)and I’m stuck at a lower speed.
I’m considering just taking 2 of the sticks out and working with 64gb and increasing my swap partition. The nvme drive is fast at least.
This is my first time off intel and I have to say I don’t understand the hype.
> It’s unstsable, post time is about 2 minutes (not exaggerating)
The long POST times must mean it's retraining the memory each time, which is not normal. Just in case you haven'ttried it yet, I'd start by reseating them, I've had weird issues with marginally seated RAM before.
Also you definitely have to go much slower with 4 sticks compared to two, so lower speed as much as you can. If that doesn't help, I'd verify them in pairs.
If they work in pairs but not in quad at the slowest speed, something is surely wrong.
Once you get them working in quad, you can start bumping up the speed, might need voltage boost as well.
No such bundle deals where I am. Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
Cheapest 64GB kit is $930.
The kit I was oh-so-close to buying was two 6400 64GB sticks.
Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.
> Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
That's not far from the bundle deal above, once you subtract the $700 CPU.
If you really need 128GB the 5600 kit is fine. Having 208MB of total cache on the CPU means the real world difference between a 5600 kit and a slightly faster kit is negligible in most use cases.
If you don't need to upgrade then clearly don't force an upgrade right now. I just wanted to comment that $4K for 128GB of RAM is a very bad price right now, even with the current situation.
> a slightly faster kit is negligible in most use cases
Does that “most use cases” caveat really apply to someone buying 128G of RAM? If I’m buying that much, it means I’m actually going to put it through its paces, unless it’s just there for huge reserved guest VM overhead.
The 208MB of total cache on the CPU we’re discussing does a good job of reducing sensitivity to RAM speed differences on this platform.
If you’re trying to run LLMs off of the CPU instead of the GPU then the RAM speed dictates a lot. It’s going to be slow mo matter what, though. Dual channel DDR5 just isn’t enough to run large LLMs that start to fill 128GB of RAM and the difference between 5600 and 6400 isn’t going to make it usable.
If you’re just running a lot of VMs or doing a lot of mixed tasks that keep a lot of RAM occupied then you’d probably have a hard time measuring a difference between 5600 and 6400 if you tried with one of these X3D CPUs with a lot of cache.
This is a frequent topic of discussion for gamers because some people obsess over optimizing their RAM speed and timings and pay large premiums for RAM with CAS latency of 28 instead of 36. Then they see benchmarks showing 1-2% differences in games or even most productivity apps and realize they would have been better spending that extra money on the next faster GPU or CPU or other part.
I really want a x3d because a game I play is heavily single threaded, I have the income and the financial stability but I can't in any good conscious upgrade to am5 with the ram prices. It's insane
AMD had an upgrade path with the 5700x3d, assuming you’re on AM4.
Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.
I'm the mythical customer who went from a 1700X in a B350 motherboard near launch day to a 5800X3D in the same board (after a dozen BIOS updates). Felt amazing. Like the old 486DX2 days.
Same! Kept checking back for bios updates and even years later they kept announcing more support! Truly crazy.
Other than the speed it’s a very good reason to go with amd, the upgrade scope is massive, on am5 you can go from a 6 core and soon all the way to a 24 core with the new zen6
I was waiting too, but the one game I play often that requires FPS performance decided to ruin their game with poor development direction. Now, I'm planning to buy for local llm hosting.
Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.
I can't imagine it's looking good in the consumer space, but server space seems to be lit[1]:
Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.
“We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”
PCPartPicker are also publishing charts showing the astronomic rise in DDR5 prices over time: https://pcpartpicker.com/trends/price/memory/. Those charts don't cover any kits with 64 GB sticks, but they're a good demonstration of the general scale.
> Probably fun for those who already bought DDR5 memory
Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
I'm looking at building a new system, and was waiting to see what happens with this chip and Intel's Arc Pro B70 card. I can't find ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick respectively.
I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.
The not so good side is that getting a RVA23 development board this year with an usable size of RAM (for e.g. compiling and linking large code bases) is not going to be cheap.
>Meanwhile I hope my AM4 will chug along a few more years.
I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.
Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
After randomly breaking the AM4 CPU and motherboard in my 4 year old PC last year and seeing that at the time I'd spent almost a new PC to get new parts and rebuild it. Less if I wanted to do a complete rebuild myself but I'm over building PCs. I've done that for years.
It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.
That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.
I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.
You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.
I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.
The C30 64GB kits are nearly impossible to buy now, so, well done. Got one in September '23 for ~$380 AUD, on the rare occasions it's available today it's been over $1600 AUD.
I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...
Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.
I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.
Most definitely, I work in finance and optimizing workloads to fit entirely in cache (and not use any memory allocations after initialization) is the de-facto standard of writing high perf / low latency code.
Lots of optimizations happening to make a trading model as small as possible.
I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.
In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.
I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.
The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.
Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.
KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.
Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.
Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.
Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.
Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?
That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.
Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.
Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's crazy when a modern CPU cache can theoretically host an OS (or even multiple OSes) from early 2000s.
The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same year came with 128 MB of system RAM base, and OS X supported 64 MB configurations for a few years after this.
The RAM prices are so high and the storage is also getting more expensive every day, so we're forced to fit everything inside the CPU cache as a solution! /s
Several processors support this by effectively locking cache lines. At the low end, it allows a handful of fast interrupt routines without dedicated TCM. At the high end, it allows boot ROMs to negotiate DRAM links in software, avoiding both the catch 22 and complex hardware negotiation.
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
> The extra cache doesn't do a damn thing (maybe +2%)
It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.
For other tasks it doesn't help at all.
If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work
CFD benefits from cache, but it benefits even more from sustained memory bandwidth, no? A small(ish) chunk of L3 + two channels of DRAM is not going to compete with a quarter as much L3 plus eight channels of DRAM when typical working set sizes (in my experience) are in the tens of gigabytes, is it?
It's also useful to point out that the use cases and workloads where SDCI are most beneficial are far, far beyond the scope of what anyone will have installed in a Zen rig. Dual 100G networking cards? The cost of both of those damn near buys all of a 9950X3D2 setup.
It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size. What little gains are being realized are from not having to wait for cores with access to the cache to become available.
There are some very specific workloads (say simple object detection) that fit into cache and have crazy performance where the value of the cpu will be unbeatable, as the alternative is one of the cache epycs, everywhere else it'll only be small improvement if the software is not purpose made for it
> Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:
Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.
On AMD the L3 cache is partitioned between the 2 chiplets.
So for 9950X3D half of the cores use a small L3 cache.
For applications that use all 16 cores, the cases where X3D2 provides a great benefit will be much more frequent than for a hypothetical CPU where the same cache increase would have been applied to a unified L3 cache.
The threads that happen to be scheduled on the 2nd chiplet will have a 3 times bigger L3 cache, which can enhance their performance a lot and many applications may have synchronization points where they wait for the slowest thread to finish a task, so the speed of the slowest thread may have a lot of influence on the performance.
Agree. The article's 2nd para notes "AMD relies on its driver software to make sure that software that benefits from the extra cache is run on the V-Cache-enabled CPU cores, which usually works well but is occasionally error-prone." - in regard to the older, mixed-cache-size chips.
> I'm curious to see...
Yeah - though I don't expect current-day Ars Technica will bother digging that deep. It could take some very specialized benchmarks to show such large gains.
Some of their writers, who are quite excellent, still do. Others just seem to regurgitate press releases with very little useful investigation.
How critical of the lazy writers I am may seem outsized, but I grew up reading and learning from the much better version of Ars -one I used to subscribe to.
I'm interested to know if the L3 cache all behaves as a single pool for any core on either CCD, whether there's a penalty in access time depending on locality or whether they are just entirely localised.
It does not. For any of the dual CCD parts AMD has ever released for consumers. Even Strix Halo which has higher bandwidth, lower latency interconnect doesn't make a single L3 across CCDs.
It'll probably only happen when they have a singular, large die filled with cache upon which both CCDs are stacked.
And that answer is good enough for most workloads. You should stop reading now.
_______________________
The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).
But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).
The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.
I haven't seen anyone make a benchmark which would show the effect, if it exists.
AMD didn't have to introduce a special driver for the Ryzen 9 5950x to keep threads resident to the "gaming" CCD. There was only a small difference between the 5950x and the non-X3d Ryzen 7 5800x in workloads that didn't use more than 8 cores unlike the observed slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released compared to the Ryzen 7 7800X3D .
When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.
With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.
The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.
I am so grateful that I bought my 128 GB ram kit in January of last year for my own 9950 upgrade. We just built my dad a 7000 series to replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the same price at Micro Center that I paid last year. I was able to gift him an Nvidia 1060 discreet graphics card so that he could continue to run his two monitors. The newer motherboards have much less on board capability for that.
I upgraded to a 4070 super last year. I ran both cards at the same time for a little bit, but it got really frustrating to keep the wrong card from being assigned to a particular task with llama. I really should’ve taken an R&D tax credit on my AI research but I’m still able to expense it for the business.
Oh man. I am running computations on my server that involve computing geodesic distances with the heat method. The job turns out to be a L3 cache thrasher, leaving my cpus underutilized for multi worker jobs .... 208mb instead of my 25 per socket sounds amazing
The gain is very workload dependent, so there are no generally-applicable rules.
There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.
In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.
So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.
It really comes down to how much more this CPU is over the next one down if you're building a new rid for a long period of time. I'm running on a 5950X which is coming up on it's 6 years in November. I could have spend a little less on the next model down, but I expect this rig will last me for a few more years (especially with how much memory is). The per year extra expense for that CPU was almost nothing over its lifetime.
Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.
That's what's different about this one. "Enter the Ryzen 9 9950X3D2 Dual Edition, a mouthful of a chip that includes 64MB of 3D V-Cache on both processor dies, without the hybrid arrangement that has defined the other chips up until now."
Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.
But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.
Make a fake ram which offers write through guarantee and returns bus no matter what address is referenced. You could possibly short circuit any "is ram there" test if it just says yes for whatever size and stride got configured.
I know the prices of RAM are high, but 256GB RAM limit seems like omission. If they supported at least 512GB in quad or eight channel that would be something worth looking at for me. I know there is Threadripper but ECC memory is out of reach.
Makes sense. RAM pricing surely has lead to a fall of AM5 high-end CPU purchases, might as well try to get some extra cash from those who still buy. Bin the remaining now non-X3D chips as something else.
Meanwhile I hope my AM4 will chug along a few more years.
You can buy 128GB of DDR5-6000 with a 9950X3D (not this newest X2 version, but still a $699 CPU) and a motherboard and a case for $2800 right now: https://www.newegg.com/Product/ComboDealDetails?ItemList=Com...
If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.
If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.
I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.
128GB of RAM should not cost $4K even in this market.
Last summer, a 9950X3D + motherboard + cooler + 128 GB DRAM + VAT sales taxes was the equivalent of $1400 in Europe, where I live.
That's half of your quoted price. That was without case and PSU, but adding e.g. $200 for those would not change much.
The RAM price was already inflated at that time, and the same kit is now £800, but in October or earlier last year I'd have saved possibly the cost of the CPU/GPU on the whole thing, but now it's be about the cost of a CPU/GPU more expensive.
On a side note for anyone not aware, 9950X3D isn't the best choice for pure gaming, 9850X3D is cheaper and marginally better, also I went with 2 sticks of RAM kit, 4 sticks is much harder to run at the advertised speed (6000) which is actually an overclock.
Im a dev and a linux user/gamer hence my choice of CPU/GPU.
I don't really want to run my RAM that slow which is why I'll probably stick with two sticks.
I commented because someone thought that $4K was the going price for 128GB of RAM, which is way too much even with the demand crunch.
In January I was forced to upgrade an ancient Intel NUC, by replacing it with an Arrow Lake H based ASUS NUC. The complete system with 32 GB DRAM and 3 TB SSDs has cost EUR 1200, including VAT sales tax.
The distribution of the price was like this:
Since then, the prices of DDR5 and SSDs have continued to increase, so now the fraction spent for memory would be even higher than 59%.Before 2026, for so small amounts of memory its cost would have been much less than the rest of the system.
6 or so weeks after I returned it the kit was listed at 1499.
The most I could get running on 10GB VRAM + 96GB RAM was a REAP'd + quantized version of MiniMax-M2.5
It's so bad. I don't get why they sell AM5 motherboards with 4 RAM slots.
At least that system has been running well for like two years. But had I known that the situation is so much more dire than with DDR4, I would've just gotten the same amount of RAM in two sticks rather than four.
Some motherboards have it off by default.
> The 30 minute boot time could be worked around by enabling the (off-by-default) memory context restore option in BIOS
Memory training has always been a thing: during boot, your PC runs tests to work out what slight changes between signals and stuff it needs to adapt to the specific requirements of your particular hardware. With DDR4 and earlier, that was really fast because the timings were so relatively loose. With DDR5, it can be really slow because the timings are so tight.
That's my best understanding of it at least.
This is my first time off intel and I have to say I don’t understand the hype.
The long POST times must mean it's retraining the memory each time, which is not normal. Just in case you haven'ttried it yet, I'd start by reseating them, I've had weird issues with marginally seated RAM before.
Also you definitely have to go much slower with 4 sticks compared to two, so lower speed as much as you can. If that doesn't help, I'd verify them in pairs.
If they work in pairs but not in quad at the slowest speed, something is surely wrong.
Once you get them working in quad, you can start bumping up the speed, might need voltage boost as well.
You may need to bump up voltages slightly for your CPU's IMC (I needed to on my ryzen 8700F to run 6000 stable). Its CPU sample dependant.
Also as other commenter pointed out, typically 4 sticks will achieve lower stable clocks
Cheapest 64GB kit is $930.
The kit I was oh-so-close to buying was two 6400 64GB sticks.
Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.
That's not far from the bundle deal above, once you subtract the $700 CPU.
If you really need 128GB the 5600 kit is fine. Having 208MB of total cache on the CPU means the real world difference between a 5600 kit and a slightly faster kit is negligible in most use cases.
If you don't need to upgrade then clearly don't force an upgrade right now. I just wanted to comment that $4K for 128GB of RAM is a very bad price right now, even with the current situation.
Does that “most use cases” caveat really apply to someone buying 128G of RAM? If I’m buying that much, it means I’m actually going to put it through its paces, unless it’s just there for huge reserved guest VM overhead.
If you’re trying to run LLMs off of the CPU instead of the GPU then the RAM speed dictates a lot. It’s going to be slow mo matter what, though. Dual channel DDR5 just isn’t enough to run large LLMs that start to fill 128GB of RAM and the difference between 5600 and 6400 isn’t going to make it usable.
If you’re just running a lot of VMs or doing a lot of mixed tasks that keep a lot of RAM occupied then you’d probably have a hard time measuring a difference between 5600 and 6400 if you tried with one of these X3D CPUs with a lot of cache.
This is a frequent topic of discussion for gamers because some people obsess over optimizing their RAM speed and timings and pay large premiums for RAM with CAS latency of 28 instead of 36. Then they see benchmarks showing 1-2% differences in games or even most productivity apps and realize they would have been better spending that extra money on the next faster GPU or CPU or other part.
Oh absolutely. Just mentioned it since I was very close to buying it back then, and now it's completely bonkers.
That bundle deal is quite well priced all things considered, it basically prices the memory where it was. Again, sadly no great bundle deals here.
I would not be surprised if we see casualties in adjacent markets, such as motherboards, coolers and whatnot.
Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.
Other than the speed it’s a very good reason to go with amd, the upgrade scope is massive, on am5 you can go from a 6 core and soon all the way to a 24 core with the new zen6
Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.
Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.
“We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”
[1]: https://stocktwits.com/news-articles/markets/equity/amd-ceo-...
(cheapest at $1240 USD)
Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
I hope this is still enough for the planned upgrade to Zen7 in 2028.
I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.
While I was aiming at 128, I settled for 96GB, because any more than 2 sticks means a sharp drop in RAM clocks this generation.
Feeling pretty chuffed now XD (though still sad because building a new PC is dumb when RAM costs more than a 24 core monster CPU)
The not so good side is that getting a RVA23 development board this year with an usable size of RAM (for e.g. compiling and linking large code bases) is not going to be cheap.
I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.
Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.
That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.
I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.
You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.
I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.
I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...
It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.
Lots of optimizations happening to make a trading model as small as possible.
The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.
https://en.wikipedia.org/wiki/Timex_Sinclair_1000
I didn’t realize the Atari 2600 had basic, always thought of it as a game console.
https://ww1.microchip.com/downloads/en/DeviceDoc/1006S.pdf
Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.
Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.
* https://en.wikipedia.org/wiki/Commodore_PET
Same time as the Trash-80 and BBC micro were making inroads.
There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...
If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".
Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.
For other tasks it doesn't help at all.
If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work
If your tasks don’t benefit then don’t buy it.
But stop claiming that it doesn’t help anywhere because that’s simply wrong. I do some FEA work occasionally and the extra cache is a HUGE help.
There are also a lot of non-LLM AI workloads that have models in the size range than fit into this cache.
See https://www.phoronix.com/review/amd-ryzen-9-9950x3d-linux/10
> Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:
Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.
I’m curious to see whether the same benchmarks benefit again so greatly.
So for 9950X3D half of the cores use a small L3 cache.
For applications that use all 16 cores, the cases where X3D2 provides a great benefit will be much more frequent than for a hypothetical CPU where the same cache increase would have been applied to a unified L3 cache.
The threads that happen to be scheduled on the 2nd chiplet will have a 3 times bigger L3 cache, which can enhance their performance a lot and many applications may have synchronization points where they wait for the slowest thread to finish a task, so the speed of the slowest thread may have a lot of influence on the performance.
Agree. The article's 2nd para notes "AMD relies on its driver software to make sure that software that benefits from the extra cache is run on the V-Cache-enabled CPU cores, which usually works well but is occasionally error-prone." - in regard to the older, mixed-cache-size chips.
> I'm curious to see...
Yeah - though I don't expect current-day Ars Technica will bother digging that deep. It could take some very specialized benchmarks to show such large gains.
How critical of the lazy writers I am may seem outsized, but I grew up reading and learning from the much better version of Ars -one I used to subscribe to.
I might even shell out for an upgrade to AM5 and DDR5. On the other hand, my 5900X is still blazing fast.
It'll probably only happen when they have a singular, large die filled with cache upon which both CCDs are stacked.
Run this test if you're curious: https://github.com/ChipsandCheese/MemoryLatencyTest
On a regular CCD:
32768,46.115
65536,74.243
98304,85.699
131072,91.42
262144,99.402
On a 3D cache CCD:
32768,11.992
65536,12.712
98304,29.921
131072,49.91
262144,86.059
And that answer is good enough for most workloads. You should stop reading now.
_______________________
The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).
But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).
The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.
I haven't seen anyone make a benchmark which would show the effect, if it exists.
When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.
With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.
The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.
https://www.phoronix.com/review/amd-3d-vcache-optimizer-9950...
For comparison, 9950X3D have a total cache of 144MB.
It is indeed 8MB per compute die but really 1MB per core. Not shared among the entire CCD.
For gaming, AMD already pins the game threads to the CCD with the extra cache pretty well.
For multi-threaded workloads the gain from having cache on both CCDs is quite small.
There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.
In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.
So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.
Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.
If they are stacked then why not 9800X3D2?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.