Arm's Cortex X925: Reaching Desktop Performance

(chipsandcheese.com)

98 points | by ingve 3 hours ago

6 comments

  • pdpi 2 hours ago
    Kind of weird to see an article about high-performance ARM cores without a single reference to Apple or how this hardware compares to M4 or M5 cores.
    • hrmtst93837 0 minutes ago
      You make a valid point; Apple has indeed set a high standard for ARM cores in performance. A comparison with their M4 and M5 cores would provide valuable context for these new developments.
    • ezst 1 hour ago
      That would only matter (to me, at least) if those Apple chips were propping up an open platform that suits my needs. As things stand today, procuring an M chip represents a commitment to the Apple software ecosystem, which Apple made abundantly clear doesn't optimize for user needs. Those marginally faster CPU cycles happen on a time scale that anyway can't offset the wasted time fighting MacOS and re-building decades-long muscle memory, so thanks but no thanks.
      • pdpi 1 hour ago
        Sure. Insofar as Apple Silicon beats these things, "I'll take less powerful hardware if it means I'm not stuck with the Apple ecosystem" is a perfectly reasonable tradeoff to make. Two things, though.

        First, I don't like making blind tradeoffs. If what I need (for whatever reason) is a really beefy ARM CPU, I'd like to know what the "Apple-less tax" costs me (if anything!)

        Second, the status quo is that Apple Silicon is the undisputed king of ARM CPU performance, so it's the obvious benchmark to compare this thing against. Providing that context is just basic journalistic practice, even if just to say "but it's irrelevant because we can't use the hardware without the software".

      • flembat 1 hour ago
        When purchasing any ARM based computer a key question for me, is how many of those can I purchase for the cost of a Mac mini, and how many Mac mini can I purchase for the cost of that, and does that have working drivers...
      • renewiltord 10 minutes ago
        Last time I tried, getting Linux working on Apple Silicon actually worked better than on Qualcomm ARM machine (which only support strange Windows).
      • tucnak 25 minutes ago
        FWIW, Apple Virtualization framework is fantastic, and Rosetta 2 is unmatched on other Arm desktops where QEMU is required. For example, you can get Vivado working on Debian guest, macOS host trivially like that.
      • spiderfarmer 1 hour ago
        > represents a commitment to the Apple software ecosystem

        I don't see how that's holding you back from using these tools for your work anymore than using a Makita power tool with LXT battery pack.

    • atwrk 53 minutes ago
      Those are of almost zero use for people wishing to run Linux etc.

      Yes, Asahi exists, and props to the developers, but I don't think I'm alone in being unwilling to buy hardware from a manufacturer who obviously is not interested in supporting open operating systems

    • SG- 2 hours ago
      Same, I wish Chips and Cheese would compare some of these cores to Apple Silicon, especially in this case where they're talking about another ARM core.

      A few years ago they were writing articles about Apple Silicon.

    • amelius 1 hour ago
      Apple does not produce general purpose computing parts.

      This is an industry blog, not a consumer oriented blog.

    • DeathArrow 19 minutes ago
      >Kind of weird to see an article about high-performance ARM cores without a single reference to Apple

      And Qualcomm.

  • dinglo 1 hour ago
    If ARM starts dominating in desktop and laptop spaces with a quite different set of applications, might we start seeing more software bugs around race conditions? Caused by developers writing software with X86 in mind, with its differing constraints on memory ordering.
    • vardump 45 minutes ago
      That's a possibility. Some code still assumes (without realizing!) x86 style ordered loads and stores. This is called a strong memory model, specifically TSO, Total Store Order. If you tell x86 to execute "a=1; b=2;", it will always store value to 'a' first. Of course compilers might reorder stores and loads, but that's another matter.

      ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.

    • runeks 1 hour ago
      Wouldn't the compiler take care of producing the correct machine code?
      • octachron 41 minutes ago
        The issue is that the C memory model allows more behaviours than the memory model of x86-64 processors. You can thus write code which is incorrect according to the C language specification but will happen to work on x86-64 processors. Moving to arm64 (with its weaker memory model than x86-64) will then reveal the latent bug in your program.
      • mhh__ 1 hour ago
        The compiler relies on the language and programmer to enforce and follow a memory consistency model
    • ivolimmen 1 hour ago
      If you go around your OS yes that could be the case but you can already have issues using the application from machine to machine with the same OS having different amounts of RAM and different CPU's. But I am not an expert in these matters.
    • dd_xplore 1 hour ago
      The major issue is these days most software is electron based or a webapp. I miss the days of 98/XP, where you'd find tons of desktop software. A PC actually felt something that had a purpose. Even if you spin up a XP/98(especially 98/2000 VM) now, you'd see the entire OS feels something that you can spend some time on. Nowadays most PCs feel like a random terminal where I open the browser and do some basic work(except for gaming ofcourse). I really hate the UX of win 11 , even 10 isn't much better compared to XP. I really hope we go back to that old era.
    • jordiburgos 1 hour ago
      Only for the hand-written assemply parts of the source code. The rest will be handled by the compilers.
      • bpye 55 minutes ago
        You don't need to be writing assembly. Anything sharing memory between multiple threads could have bugs with ARM's memory model, even if written in C, C++, etc.
      • silon42 56 minutes ago
        Not even close. Except maybe in Rust /s
  • Incipient 1 hour ago
    Without being a cpu geek, a lot of the branch prediction details go over my head, however generally a good review. I liked the detail of performance on more complex workloads where IPC can get muddy when you need more instructions.

    I feel these days however, for any comparison of performance, power envelope needs to be included (I realise this is dependent on the final chip)

    • adrian_b 38 minutes ago
      ARM Cortex-X925 achieves indeed a very good IPC, but it has competitive performance only in general-purpose applications that cannot benefit from using array operations (i.e. the vector instructions and registers). The results shown in the parent article for the integer tests of SPEC CPU2017 are probably representative for Cortex-X925 when running this kind of applications.

      While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.

      One disadvantage of Cortex-X925 is having narrower vector instructions and registers, which requires more instructions for the same task and it is only partially compensated by the fact that Cortex-X925 can execute up to 6 128-bit instructions per clock cycle (vs. up to 4 vector instructions per clock cycle for Intel/AMD, but which are wider, 256-bit for Intel and up to 512-bit for Zen 5). This has been shown in the parent article.

      The second disadvantage of Cortex-X925 is that it has an unbalanced microarchitecture for vector operations. For decades most CPUs with good vector performance had an equal throughput for fused multiply-add operations and for loads from the L1 cache memory. This is required to ensure that the execution units are fed all the time with operands in many applications.

      However, Cortex-X925 can do at most 4 loads, while it can do 6 FMAs. Because of this lower load throughput Cortex-X925 can reach the maximum FMA throughput only much less frequently than the AMD or Intel CPUs. This is compounded by the fact that achieving better FMA to load ratios requires more storage space in the architectural vector registers, and Cortex-X925 is also disadvantaged for this, by having 4-time smaller vector registers than Zen 5.

      • DeathArrow 15 minutes ago
        Still, what percentage of software uses AVX512 for its core functionality, so vector performance matters in practice?
  • xarope 1 hour ago
    I can't seem to find any power draw or efficiency figures (e.g. <perf>/watts).

    Only found this which talks about performance-per-area (PPA) and performance-per-clock ()I assume cycle) (PPC): https://www.reddit.com/r/hardware/comments/1gvo28c/latest_ar...

  • sylware 33 minutes ago
    But with hardware IP locks like x86_64.

    Better favor as much as possible RISC-V implementations.

    But, I don't know if there are already good modern-desktop-grade RISC-V implementations (in the US, Sifive is moving fast as far as I know)... and the hard part: accessing the latest and greatest silicon process of TMSC, aka ~5GHz.

    Those markets are completely saturated, namely at best, it will be very slow unless something big does happen: for instance AMD adapts its best micro-architecture to RISC-V (ISA decoding mostly), etc.

    And if valve start to distribute a client with a strong RISC-V game compilation framework...

    • DeathArrow 11 minutes ago
      This is kind of a solution in search for a problem. RISC-V will grow only if people find some value in it. If it solves their actual problems in ways that other architectures can't.
  • ddtaylor 2 hours ago
    Can't zoom any of the content on mobile so most of the charts are unreadable.
    • sfdlkj3jk342a 53 minutes ago
      Zoom works fine with Firefox on Android.
    • GaggiX 1 hour ago
      Browsers usually have an accessibility option to force the ability to zoom on all websites.