AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip

(arstechnica.com)

290 points | by zdw 20 hours ago

27 comments

magicalhippo 18 hours ago
Probably fun for those who already bought DDR5 memory... still kicking myself for not just pulling the trigger on that 128GB dual stick kit I looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
[-]
- Aurornis 17 hours ago
  > Now it's listed at $4k...
  You can buy 128GB of DDR5-6000 with a 9950X3D (not this newest X2 version, but still a $699 CPU) and a motherboard and a case for $2800 right now: https://www.newegg.com/Product/ComboDealDetails?ItemList=Com...
  If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.
  If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.
  I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.
  128GB of RAM should not cost $4K even in this market.
  [-]
  - adrian_b 13 hours ago
    $2800 is still a huge price in comparison with the last year.
    Last summer, a 9950X3D + motherboard + cooler + 128 GB DRAM + VAT sales taxes was the equivalent of $1400 in Europe, where I live.
    That's half of your quoted price. That was without case and PSU, but adding e.g. $200 for those would not change much.
    [-]
    - alias_neo 12 hours ago
      In January I upgraded my desktop, 9950X3D £600, 64GB DDR5-6000 £600, MSI MAG Tomahawk X870E £300, Samsung 990 Pro 4TB £350, Asus Prime 9070XT £580. I spent a another £250 on PSU and cooler and reused my case (Phanteks Evolv Enthoo TG, beautiful case but horrible cooling. Will cut some holes in it and if it doesnt work out look for something with more airflow).
      The RAM price was already inflated at that time, and the same kit is now £800, but in October or earlier last year I'd have saved possibly the cost of the CPU/GPU on the whole thing, but now it's be about the cost of a CPU/GPU more expensive.
      On a side note for anyone not aware, 9950X3D isn't the best choice for pure gaming, 9850X3D is cheaper and marginally better, also I went with 2 sticks of RAM kit, 4 sticks is much harder to run at the advertised speed (6000) which is actually an overclock.
      Im a dev and a linux user/gamer hence my choice of CPU/GPU.
      [-]
      - sqquima 6 hours ago
        Very similar config, but I bought a second pair of ram. Running 4 sticks at 3600. Also, the LAN port of the motherboard stopped working after a week, so I had to buy an Ethernet card
        [-]
        alias_neo 1 hour ago
        Ouch, were you not willing to RMA for that ethernet port? I wouldn't be too pleased after only a week if parts of the board stopped working.
        I don't really want to run my RAM that slow which is why I'll probably stick with two sticks.
    - Aurornis 8 hours ago
      Yes of course. We all know prices are up.
      I commented because someone thought that $4K was the going price for 128GB of RAM, which is way too much even with the demand crunch.
      [-]
      - adrian_b 6 hours ago
        Due to the high prices of DRAM and SSDs they now are the greatest fractions of the total price of a computer.
        In January I was forced to upgrade an ancient Intel NUC, by replacing it with an Arrow Lake H based ASUS NUC. The complete system with 32 GB DRAM and 3 TB SSDs has cost EUR 1200, including VAT sales tax.
        The distribution of the price was like this:
        Barebone mini-PC: 41% 32 GB DDR5 SODIMMs: 26% 2 TB PCIe 5.0 SSD: 24% 1 TB PCIe 4.0 SSD: 9%
        Since then, the prices of DDR5 and SSDs have continued to increase, so now the fraction spent for memory would be even higher than 59%.
        Before 2026, for so small amounts of memory its cost would have been much less than the rest of the system.
  - sspiff 15 hours ago
    I bought 192GB (4x 48GB) of DDR5-6400 for 299 euro in September but returned it because I couldn't get 4 DIMMS to run at decent speeds in the system.
    6 or so weeks after I returned it the kit was listed at 1499.
    [-]
    - 2001zhaozhao 14 hours ago
      Yeah the only way to run 4 sticks of DDR5 decently is with Intel. It's a bit of a shame that you can't cram enough RAM to run big models.
      The most I could get running on 10GB VRAM + 96GB RAM was a REAP'd + quantized version of MiniMax-M2.5
      [-]
      - mort96 8 hours ago
        Got it running with 4800MT/s and literally 30 minute boot times in an AM5 machine. The 30 minute boot time could be worked around by enabling the (off-by-default) memory context restore option in BIOS, but it really made me think something was broken and it wasn't until I found other people talking about 30 minute boot times that I stopped debugging and just let it sit for an eternity.
        It's so bad. I don't get why they sell AM5 motherboards with 4 RAM slots.
        At least that system has been running well for like two years. But had I known that the situation is so much more dire than with DDR4, I would've just gotten the same amount of RAM in two sticks rather than four.
        [-]
        noir_lord 4 hours ago
        You need to enable MCR (which trains the memory once and caches the result for (iirc) 30 days) otherwise yeah, booting is horribly slow, even the 64GB I have can take several minutes but with MCR it boots basically instantly.
        Some motherboards have it off by default.
        [-]
        mort96 1 hour ago
        From my comment:
        > The 30 minute boot time could be worked around by enabling the (off-by-default) memory context restore option in BIOS
        kenhwang 4 hours ago
        Memory training seems to be getting faster with each bios update. In 2024 when I upgraded to AM5, 64GB memory training took like 15 minutes. Now the same setup takes about a minute when it needs to retrain, then near instant with MCR (Windows 11 takes significantly longer to load than the POST process).
        WD-42 6 hours ago
        I’m in the same situation! My machine will take 2-5 minute to post every few reboots, it seems random. The messed up part is the marketing material says this things can handle 256gb of ram or whatever absurd number, f me for thinking then 128gb should be no problem. Honestly this whole thing has soured me on AMD. Yea they have bigger numbers than intel but at what cost, stability?
        [-]
        noir_lord 4 hours ago
        Check you have MCR (Memory Context Restore) enabled, otherwise you train the RAM way more often than you need to (every boot).
        secondcoming 6 hours ago
        Your machine takes 30 minutes to boot because of the RAM? Or it takes 30 minutes to load a model?
        [-]
        WD-42 6 hours ago
        It's the RAM. It needs to "trained" which takes some time but for for some reason these boards seem to randomly forget their training, requiring it to happen again.
        [-]
        magicalhippo 1 hour ago
        I've never had memory training be forgotten with my AM4 nor LPDDR5-based laptops and NUCs. Is this a new thing with AM5 or something? Or just a certain brand of BIOSes?
        jazzyjackson 2 hours ago
        huh, its been a decade since i built a PC, whats changed?
        [-]
        mort96 1 hour ago
        DDR5 is much, much more fickle than DDR4 and earlier standards. I think it's primarily due to pushing clock speeds (6000 MT/s would be insanely fast for DDR4, but kinda slow for DDR5).
        Memory training has always been a thing: during boot, your PC runs tests to work out what slight changes between signals and stuff it needs to adapt to the specific requirements of your particular hardware. With DDR4 and earlier, that was really fast because the timings were so relatively loose. With DDR5, it can be really slow because the timings are so tight.
        That's my best understanding of it at least.
        roboror 46 minutes ago
        It's an AMD thing
        WD-42 1 hour ago
        My guess is bigger numbers, higher voltages, tighter timings.
      - WD-42 8 hours ago
        I’m running 128gb on a 9550x now with 4x32gb sticks and it’s terrible. It’s unstsable, post time is about 2 minutes (not exaggerating)and I’m stuck at a lower speed. I’m considering just taking 2 of the sticks out and working with 64gb and increasing my swap partition. The nvme drive is fast at least.
        This is my first time off intel and I have to say I don’t understand the hype.
        [-]
        magicalhippo 8 hours ago
        > It’s unstsable, post time is about 2 minutes (not exaggerating)
        The long POST times must mean it's retraining the memory each time, which is not normal. Just in case you haven'ttried it yet, I'd start by reseating them, I've had weird issues with marginally seated RAM before.
        Also you definitely have to go much slower with 4 sticks compared to two, so lower speed as much as you can. If that doesn't help, I'd verify them in pairs.
        If they work in pairs but not in quad at the slowest speed, something is surely wrong.
        Once you get them working in quad, you can start bumping up the speed, might need voltage boost as well.
        hxorr 1 hour ago
        What ddr5 speed are you running? 6000 is technically an over clock, AMD only guarantees being able to run at something like 4800 or 5200.
        You may need to bump up voltages slightly for your CPU's IMC (I needed to on my ryzen 8700F to run 6000 stable). Its CPU sample dependant.
        Also as other commenter pointed out, typically 4 sticks will achieve lower stable clocks
      - HauntingPin 9 hours ago
        I had the same issue with Intel. It's not guaranteed there either.
      - jodleif 14 hours ago
        Threadripper is a good alternative. No point having a lot of dual channel ram for LLMs, too slow
  - magicalhippo 16 hours ago
    No such bundle deals where I am. Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
    Cheapest 64GB kit is $930.
    The kit I was oh-so-close to buying was two 6400 64GB sticks.
    Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.
    [-]
    - Aurornis 16 hours ago
      > Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.
      That's not far from the bundle deal above, once you subtract the $700 CPU.
      If you really need 128GB the 5600 kit is fine. Having 208MB of total cache on the CPU means the real world difference between a 5600 kit and a slightly faster kit is negligible in most use cases.
      If you don't need to upgrade then clearly don't force an upgrade right now. I just wanted to comment that $4K for 128GB of RAM is a very bad price right now, even with the current situation.
      [-]
      - throwup238 10 hours ago
        > a slightly faster kit is negligible in most use cases
        Does that “most use cases” caveat really apply to someone buying 128G of RAM? If I’m buying that much, it means I’m actually going to put it through its paces, unless it’s just there for huge reserved guest VM overhead.
        [-]
        Aurornis 8 hours ago
        The 208MB of total cache on the CPU we’re discussing does a good job of reducing sensitivity to RAM speed differences on this platform.
        If you’re trying to run LLMs off of the CPU instead of the GPU then the RAM speed dictates a lot. It’s going to be slow mo matter what, though. Dual channel DDR5 just isn’t enough to run large LLMs that start to fill 128GB of RAM and the difference between 5600 and 6400 isn’t going to make it usable.
        If you’re just running a lot of VMs or doing a lot of mixed tasks that keep a lot of RAM occupied then you’d probably have a hard time measuring a difference between 5600 and 6400 if you tried with one of these X3D CPUs with a lot of cache.
        This is a frequent topic of discussion for gamers because some people obsess over optimizing their RAM speed and timings and pay large premiums for RAM with CAS latency of 28 instead of 36. Then they see benchmarks showing 1-2% differences in games or even most productivity apps and realize they would have been better spending that extra money on the next faster GPU or CPU or other part.
      - magicalhippo 16 hours ago
        > I just wanted to comment that $4K for 128GB of RAM is a very bad price right now
        Oh absolutely. Just mentioned it since I was very close to buying it back then, and now it's completely bonkers.
        That bundle deal is quite well priced all things considered, it basically prices the memory where it was. Again, sadly no great bundle deals here.
  - nicman23 8 hours ago
    that bs of you don't need 128 are toxic. what if you want to upgrade from ddr4 and you already have 128?
- jofzar 17 hours ago
  I really want a x3d because a game I play is heavily single threaded, I have the income and the financial stability but I can't in any good conscious upgrade to am5 with the ram prices. It's insane
  [-]
  - magicalhippo 16 hours ago
    Yep exactly the same situation.
    I would not be surprised if we see casualties in adjacent markets, such as motherboards, coolers and whatnot.
  - fakwandi_priv 16 hours ago
    AMD had an upgrade path with the 5700x3d, assuming you’re on AM4.
    Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.
    [-]
    - timschmidt 15 hours ago
      I'm the mythical customer who went from a 1700X in a B350 motherboard near launch day to a 5800X3D in the same board (after a dozen BIOS updates). Felt amazing. Like the old 486DX2 days.
      [-]
      - slightlygrilled 13 hours ago
        Same! Kept checking back for bios updates and even years later they kept announcing more support! Truly crazy.
        Other than the speed it’s a very good reason to go with amd, the upgrade scope is massive, on am5 you can go from a 6 core and soon all the way to a 24 core with the new zen6
  - tyjen 9 hours ago
    I was waiting too, but the one game I play often that requires FPS performance decided to ruin their game with poor development direction. Now, I'm planning to buy for local llm hosting.
    Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.
  - Panzer04 15 hours ago
    What game, if you don't mind my asking?
    [-]
    - jofzar 13 hours ago
      World of Warcraft
- tarangsutariya 12 hours ago
  Wonder how much sales amd and intel are losing because of tight DDR5 supply
  [-]
  - magicalhippo 12 hours ago
    I can't imagine it's looking good in the consumer space, but server space seems to be lit[1]:
    Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.
    “We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”
    [1]: https://stocktwits.com/news-articles/markets/equity/amd-ceo-...
  - aetimmes 10 hours ago
    None. Every component is seeing huge demand.
- throawayonthe 13 hours ago
  oh wow you weren't joking: https://pcpartpicker.com/products/memory/#xcx=0&b=ddr5&Z=131...
  (cheapest at $1240 USD)
  [-]
  - MrDOS 4 hours ago
    PCPartPicker are also publishing charts showing the astronomic rise in DDR5 prices over time: https://pcpartpicker.com/trends/price/memory/. Those charts don't cover any kits with 64 GB sticks, but they're a good demonstration of the general scale.
- tom_alexander 13 hours ago
  > Probably fun for those who already bought DDR5 memory
  Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
  [-]
- snvzz 16 hours ago
  I am glad I decisively ordered 96GB (2x48) DDR5 ECC back in June, alongside the 9800x3d.
  I hope this is still enough for the planned upgrade to Zen7 in 2028.
  [-]
  - mroche 13 hours ago
    I'm looking at building a new system, and was waiting to see what happens with this chip and Intel's Arc Pro B70 card. I can't find ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick respectively.
    I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.
    [-]
    - snvzz 8 hours ago
      I was surprised to find that ECC modules available were 24 or 48, so 128GB with 2 sticks was impossible.
      While I was aiming at 128, I settled for 96GB, because any more than 2 sticks means a sharp drop in RAM clocks this generation.
  - Panzer04 15 hours ago
    You're basically me. I was mulling 48 vs 96, decided 200$ wasn't worth quibbling too much over and bought 96GB in August.
    Feeling pretty chuffed now XD (though still sad because building a new PC is dumb when RAM costs more than a 24 core monster CPU)
    [-]
    - snvzz 8 hours ago
      This is the good side.
      The not so good side is that getting a RVA23 development board this year with an usable size of RAM (for e.g. compiling and linking large code bases) is not going to be cheap.
  - disillusioned 14 hours ago
    Same... got 2x48 DDR5 for $304 back in February of 2025. Equivalent kits are going for $900-$1,100. Madness.
- DeathArrow 14 hours ago
  >Meanwhile I hope my AM4 will chug along a few more years.
  I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.
  Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
- jmyeet 16 hours ago
  After randomly breaking the AM4 CPU and motherboard in my 4 year old PC last year and seeing that at the time I'd spent almost a new PC to get new parts and rebuild it. Less if I wanted to do a complete rebuild myself but I'm over building PCs. I've done that for years.
  It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.
  That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.
  I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.
  You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.
  I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
  We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.
  [-]
  - sundvor 12 hours ago
    The C30 64GB kits are nearly impossible to buy now, so, well done. Got one in September '23 for ~$380 AUD, on the rare occasions it's available today it's been over $1600 AUD.
    I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...
chao- 19 hours ago
Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!
It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
[-]
- cwzwarich 18 hours ago
  https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
  [-]
  - wmf 18 hours ago
    Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
    [-]
    - coppsilgold 15 hours ago
      There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.
      I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.
      [-]
      - stingraycharles 13 hours ago
        Most definitely, I work in finance and optimizing workloads to fit entirely in cache (and not use any memory allocations after initialization) is the de-facto standard of writing high perf / low latency code.
        Lots of optimizations happening to make a trading model as small as possible.
    - lathiat 16 hours ago
      I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.
- pwg 18 hours ago
  In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.
  [-]
  - MegaDeKay 18 hours ago
    I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
    [-]
    - acomjean 11 hours ago
      I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.
      The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.
      https://en.wikipedia.org/wiki/Timex_Sinclair_1000
      I didn’t realize the Atari 2600 had basic, always thought of it as a game console.
      [-]
      - makapuf 9 hours ago
        You can buy this bad boy [attiny11] with no ram, only registers.
        https://ww1.microchip.com/downloads/en/DeviceDoc/1006S.pdf
- HerbManic 17 hours ago
  My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.
- compounding_it 17 hours ago
  Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.
  [-]
  - nextaccountic 4 hours ago
    doubtful that we will still have this computer architecture by then
- basilikum 18 hours ago
  KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.
  [-]
  - vlovich123 18 hours ago
    Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
    [-]
    - tadfisher 17 hours ago
      Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.
    - chao- 18 hours ago
      Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.
      Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.
    - basilikum 18 hours ago
      Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?
  - hrmtst93837 12 hours ago
    That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.
    Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.
- shric 17 hours ago
  You had ~160,000 times more storage than I did for my first personal computer.
- defrost 13 hours ago
  Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .
  * https://en.wikipedia.org/wiki/Commodore_PET
  Same time as the Trash-80 and BBC micro were making inroads.
- bombcar 19 hours ago
  IIRC some relatively strange CPUs could run with unbacked cache.
  [-]
  - twbarr 18 hours ago
    Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
- alfiedotwtf 14 hours ago
  > it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
  There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...
- m463 18 hours ago
  I wonder how much faster dos would boot, especially with floppy seek times...
  [-]
  - userbinator 18 hours ago
    Instantly.
    If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".
  - RulerOf 16 hours ago
    You can get close with a VM, but there's overhead in device emulation that slows things down.
    Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.
- tumdum_ 11 hours ago
  My first pc had 40MB hrs and 8MB ram :D
- amelius 12 hours ago
  640K ought to be enough for anybody.
- Zardoz84 13 hours ago
  My first computer whole RAM could fit in L1 of a single core (128k)
senfiaj 9 hours ago
Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's crazy when a modern CPU cache can theoretically host an OS (or even multiple OSes) from early 2000s.
[-]
- addaon 4 hours ago
  The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same year came with 128 MB of system RAM base, and OS X supported 64 MB configurations for a few years after this.
- egeozcan 8 hours ago
  The RAM prices are so high and the storage is also getting more expensive every day, so we're forced to fit everything inside the CPU cache as a solution! /s
  [-]
  - sqquima 6 hours ago
    It would be interesting if it allowed to use the cache as ram and could boot without any sticks on the motherboard.
    [-]
    - addaon 4 hours ago
      Several processors support this by effectively locking cache lines. At the low end, it allows a handful of fast interrupt routines without dedicated TCM. At the high end, it allows boot ROMs to negotiate DRAM links in software, avoiding both the catch 22 and complex hardware negotiation.
    - 0-_-0 4 hours ago
      Instead of a cache you could put down an SRAM buffer, it would be more efficient than a cache and just as fast. And addressable. Interesting idea.
kristianp 59 minutes ago
Nobody adds L1+L2+L3 like that, because L1 stores a subset of L2 and L2 stores a subset of L3. Just say 192MB of L3.
[-]
- Retr0id 20 minutes ago
  It depends on the implementation, it is possible for a cache line to be in L1 but not L2, etc.
monster_truck 17 hours ago
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
[-]
- Aurornis 17 hours ago
  > The extra cache doesn't do a damn thing (maybe +2%)
  It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.
  For other tasks it doesn't help at all.
  If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work
  [-]
  - addaon 4 hours ago
    CFD benefits from cache, but it benefits even more from sustained memory bandwidth, no? A small(ish) chunk of L3 + two channels of DRAM is not going to compete with a quarter as much L3 plus eight channels of DRAM when typical working set sizes (in my experience) are in the tens of gigabytes, is it?
  - zahlman 8 hours ago
    Sorry, what is "CFD" in this context?
    [-]
    - detaro 8 hours ago
      https://en.wikipedia.org/wiki/Computational_fluid_dynamics
  - YoumuChan 14 hours ago
    But consumer product does not support SDCI (only Epyc Turin supports it), so it does not benefit too much if an accelerator is involved.
    [-]
    - monster_truck 12 hours ago
      It's also useful to point out that the use cases and workloads where SDCI are most beneficial are far, far beyond the scope of what anyone will have installed in a Zen rig. Dual 100G networking cards? The cost of both of those damn near buys all of a 9950X3D2 setup.
      [-]
      - justincormack 10 hours ago
        no, dual 100Gb are not that expensive any more, eg https://www.scan.co.uk/products/2-port-intel-e810-cqda2blk-d... UK retail for gbp349.
  - monster_truck 12 hours ago
    It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size. What little gains are being realized are from not having to wait for cores with access to the cache to become available.
    [-]
    - Aurornis 8 hours ago
      > It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size.
      If your tasks don’t benefit then don’t buy it.
      But stop claiming that it doesn’t help anywhere because that’s simply wrong. I do some FEA work occasionally and the extra cache is a HUGE help.
      There are also a lot of non-LLM AI workloads that have models in the size range than fit into this cache.
    - Numerlor 9 hours ago
      There are some very specific workloads (say simple object detection) that fit into cache and have crazy performance where the value of the cpu will be unbeatable, as the alternative is one of the cache epycs, everywhere else it'll only be small improvement if the software is not purpose made for it
- EnPissant 17 hours ago
  It's very workload dependent. It certainly does more than 2% on many workloads.
  See https://www.phoronix.com/review/amd-ryzen-9-9950x3d-linux/10
  > Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:
  Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.
  [-]
  - spockz 15 hours ago
    I think GP was saying that the additional 3D cache on this chip compared to the standard x3d isn’t going to do much.
    I’m curious to see whether the same benchmarks benefit again so greatly.
    [-]
    - adrian_b 12 hours ago
      On AMD the L3 cache is partitioned between the 2 chiplets.
      So for 9950X3D half of the cores use a small L3 cache.
      For applications that use all 16 cores, the cases where X3D2 provides a great benefit will be much more frequent than for a hypothetical CPU where the same cache increase would have been applied to a unified L3 cache.
      The threads that happen to be scheduled on the 2nd chiplet will have a 3 times bigger L3 cache, which can enhance their performance a lot and many applications may have synchronization points where they wait for the slowest thread to finish a task, so the speed of the slowest thread may have a lot of influence on the performance.
    - bell-cot 14 hours ago
      > I think GP was saying...
      Agree. The article's 2nd para notes "AMD relies on its driver software to make sure that software that benefits from the extra cache is run on the V-Cache-enabled CPU cores, which usually works well but is occasionally error-prone." - in regard to the older, mixed-cache-size chips.
      > I'm curious to see...
      Yeah - though I don't expect current-day Ars Technica will bother digging that deep. It could take some very specialized benchmarks to show such large gains.
      [-]
      - monster_truck 12 hours ago
        Some of their writers, who are quite excellent, still do. Others just seem to regurgitate press releases with very little useful investigation.
        How critical of the lazy writers I am may seem outsized, but I grew up reading and learning from the much better version of Ars -one I used to subscribe to.
      - spockz 11 hours ago
        I hoping that phoronix will be able to redo the benchmark of the 9950x3D with this new X3D2 variant.
        I might even shell out for an upgrade to AM5 and DDR5. On the other hand, my 5900X is still blazing fast.
pwr22 11 hours ago
I'm interested to know if the L3 cache all behaves as a single pool for any core on either CCD, whether there's a penalty in access time depending on locality or whether they are just entirely localised.
[-]
- trynumber9 53 minutes ago
  It does not. For any of the dual CCD parts AMD has ever released for consumers. Even Strix Halo which has higher bandwidth, lower latency interconnect doesn't make a single L3 across CCDs.
  It'll probably only happen when they have a singular, large die filled with cache upon which both CCDs are stacked.
  Run this test if you're curious: https://github.com/ChipsandCheese/MemoryLatencyTest
  On a regular CCD:
  32768,46.115
  65536,74.243
  98304,85.699
  131072,91.42
  262144,99.402
  On a 3D cache CCD:
  32768,11.992
  65536,12.712
  98304,29.921
  131072,49.91
  262144,86.059
- phire 9 hours ago
  The short answer is that L3 is local to each CCD.
  And that answer is good enough for most workloads. You should stop reading now.
  _______________________
  The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).
  But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).
  The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.
  I haven't seen anyone make a benchmark which would show the effect, if it exists.
- undersuit 10 hours ago
  AMD didn't have to introduce a special driver for the Ryzen 9 5950x to keep threads resident to the "gaming" CCD. There was only a small difference between the 5950x and the non-X3d Ryzen 7 5800x in workloads that didn't use more than 8 cores unlike the observed slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released compared to the Ryzen 7 7800X3D .
  When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.
  With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.
  The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.
  https://www.phoronix.com/review/amd-3d-vcache-optimizer-9950...
nexle 18 hours ago
Breakdown of the (semi-clickbait) 208MB cache: 16MB L2 (8MB per die?) + 32MB L3 * 2 dies + 64MB L3 Stacked 3D V-cache * 2
For comparison, 9950X3D have a total cache of 144MB.
[-]
- trynumber9 18 hours ago
  > 16MB L2 (8MB per die?)
  It is indeed 8MB per compute die but really 1MB per core. Not shared among the entire CCD.
- teaearlgraycold 17 hours ago
  I wouldn’t be caught dead with less than 200MB of cache in my desktop in 2026.
rietta 5 hours ago
I am so grateful that I bought my 128 GB ram kit in January of last year for my own 9950 upgrade. We just built my dad a 7000 series to replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the same price at Micro Center that I paid last year. I was able to gift him an Nvidia 1060 discreet graphics card so that he could continue to run his two monitors. The newer motherboards have much less on board capability for that.
[-]
- hu3 5 hours ago
  1060 is a sweet card for multi monitor. good on you for gifting him.
  [-]
  - rietta 5 hours ago
    I upgraded to a 4070 super last year. I ran both cards at the same time for a little bit, but it got really frustrating to keep the wrong card from being assigned to a particular task with llama. I really should’ve taken an R&D tax credit on my AI research but I’m still able to expense it for the business.
SubiculumCode 4 hours ago
Oh man. I am running computations on my server that involve computing geodesic distances with the heat method. The job turns out to be a L3 cache thrasher, leaving my cpus underutilized for multi worker jobs .... 208mb instead of my 25 per socket sounds amazing
[-]
- AnthonyMouse 3 hours ago
  They sell essentially the same chips with more CCDs as Epyc instead of Ryzen. 9684X has more than 1GB of L3 per socket (but it's not cheap).
erulabs 17 hours ago
9950X3D2? AMD, who is making you name your products like this? At some point just give up and name the chip a UUID already.
[-]
- jofzar 17 hours ago
  I actually don't mind this one, 9950 is the actual chip, x3d is the cache (where it's larger) and the 2 stands for it being on both chiplets.
- sidkshatriya 17 hours ago
  Like your UUID joke but agree with sibling comment that 9950X3D2 is actually a good name.
- hu3 16 hours ago
  can't agree. this name has logical meaning
2001zhaozhao 14 hours ago
I don't really see a huge reason to buy this other than it being a top-tier halo product.
For gaming, AMD already pins the game threads to the CCD with the extra cache pretty well.
For multi-threaded workloads the gain from having cache on both CCDs is quite small.
[-]
- adrian_b 12 hours ago
  The gain is very workload dependent, so there are no generally-applicable rules.
  There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.
  In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.
  So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.
- pixl97 5 hours ago
  It really comes down to how much more this CPU is over the next one down if you're building a new rid for a long period of time. I'm running on a 5950X which is coming up on it's 6 years in November. I could have spend a little less on the next model down, but I expect this rig will last me for a few more years (especially with how much memory is). The per year extra expense for that CPU was almost nothing over its lifetime.
  Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.
MaximilianEmel 3 hours ago
They should allow it to function without any external RAM.
Readerium 19 hours ago
Can someone explain if the 3D Vcache are stacked on top of each other or side by side.
If they are stacked then why not 9800X3D2?
[-]
- zdw 19 hours ago
  The 99xx chips have two CPU dies, and one cache die is on each CPU die.
  [-]
  - modeswitch 18 hours ago
    The 3D V-Cache sits underneath only one of the CCDs. See https://en.wikipedia.org/wiki/Ryzen#Ryzen_9000.
    [-]
    - anonymars 18 hours ago
      That's what's different about this one. "Enter the Ryzen 9 9950X3D2 Dual Edition, a mouthful of a chip that includes 64MB of 3D V-Cache on both processor dies, without the hybrid arrangement that has defined the other chips up until now."
    - Tostino 18 hours ago
      Did you forget which thread we are on?
electronsoup 4 hours ago
Whenever I see a chip like this, I think "why wont my company let me use a decent computer"
Jotalea 5 hours ago
so you're telling me I can (theoretically) have a full Alpine Linux installation in just the CPU? I'm impressed
fc417fc802 18 hours ago
Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
[-]
- trynumber9 17 hours ago
  Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.
jaimex2 15 hours ago
Can someone like... boot Windows 98 on these on a system with no ram?!
[-]
- brandnewideas 11 hours ago
  Theoretically anything is possible with enough thought and work.
- bell-cot 14 hours ago
  Conceptually - yes, easily.
  But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.
  [-]
  - anticensor 3 hours ago
    The CPU literally initialises itself without DDR then initialises the DDR PHY, there must be a way of keeping the CPU in that "cache as RAM" mode.
  - ggm 13 hours ago
    Make a fake ram which offers write through guarantee and returns bus no matter what address is referenced. You could possibly short circuit any "is ram there" test if it just says yes for whatever size and stride got configured.
varispeed 4 hours ago
I know the prices of RAM are high, but 256GB RAM limit seems like omission. If they supported at least 512GB in quad or eight channel that would be something worth looking at for me. I know there is Threadripper but ECC memory is out of reach.
tw1984 16 hours ago
that is larger than the HDD of my first PC.
DeathArrow 14 hours ago
My first computer had 64KB of RAM. My first PC had 8MB of RAM.
renewiltord 18 hours ago
I have a gigabyte of cache on my 9684x at home!
swarnie 10 hours ago
Factorio mega basing just found a new ceiling.
[-]
- Lightkey 5 hours ago
  I'm curious to see if that is true. The maximum amount of cache addressable per core didn't increase after all.
sylware 12 hours ago
With the best silicon tech, in R&D, what would be the maxium static RAM(L1 cache) you could really slap to a 8 core CPU? (Zero DRAM).
throwaway85825 17 hours ago
It's disappointing that they had this for years but didn't release it until now.
[-]
- stingraycharles 17 hours ago
  I think it’s mostly that they had leftover cache.
  [-]
  - neRok 12 hours ago
    This video made the argument that AMD released it to not give Intel a look-in: [AMD KILLED Intel's 290K Dreams w/ R9 9950X3D2](https://www.youtube.com/watch?v=u7SyrDPbKls)
    [-]
    - stingraycharles 9 hours ago
      I like this theory more, perhaps it’s both.
  - magicalhippo 16 hours ago
    Makes sense. RAM pricing surely has lead to a fall of AM5 high-end CPU purchases, might as well try to get some extra cash from those who still buy. Bin the remaining now non-X3D chips as something else.
    [-]
    - Ekaros 4 hours ago
      Bad time to move entirely new platform. Perfect time to sell to upgrade junkies just CPU.
jeremie_strand 7 hours ago
[dead]
jeremie_strand 9 hours ago
[dead]
qmr 18 hours ago
[flagged]