For many enterprise-class customers, Intel has been the go-to standard for a number of years, commanding over 98 percent of the market. AMD’s Opteron series lacked the raw performance for many buyers and ARM architecture was not widely supported by applications required for everyday operations.
The landscape is vastly different now than it was nearly a decade ago. The rise of ARM and many-core CPUs in dozens of different configurations, optimized for different uses, make it easier to process data efficiently. A former Intel architect now working for Cloudflare has shared some benchmarks comparing dual-socket Intel Xeon E5-2630 v4 processors to an offering from Qualcomm based on ARMv8 64-bit architecture.
Using a Qualcomm Centriq server based on the Falkor core, the most recent Xeon Skylake CPU and previous generation Broadwell version were given a run for their money. The specifications for each test system can be found in the table below.
|Issue||8 µops/cycle||8 µops/cycle||8 instructions/cycle|
|Dispatch||4 µops/cycle||5 µops/cycle||4 instructions/cycle|
|Core Count||10 x 2S + HT (40 threads)||12 x 2S + HT (48 threads)||46C / 46T|
|Clock Speed||2.2GHz (3.1GHz turbo)||2.1GHz (3.0GHz turbo)||2.5 GHz|
|Cache||2.5 MB/core||1.35 MB/core||1.25 MB/core|
|TDP||170W (85W x 2S)||170W (85W x 2S)||120W|
Prior to this year, the use of ARM servers has been difficult due to lack of software support. Over time, Linux distributions and libraries have been updated to add compatibility and optimizations for ARM systems.
In the first benchmark run using OpenSSL, single-core performance proves best on Broadwell due to the higher clock speed compared to Skylake. Falkor falls slightly behind for single-core results but clearly gets the win for multi-core tests.
Compression is another common task that servers must handle. Reduction in file size keeps bandwidth use down and reduces disk space usage. Using a modified version of the generic zlib library adjusted to run on ARMv8, Falkor again loses the single-core test but absolutely dominates the Intel platforms when all cores are in use.
Moving on to a newer benchmark, Golang was used to evaluate a number of different performance metrics. Golang offers support to ARMv8 without the additional headaches of compiler extensions or special optimization options.
Cryptographic performance on Falkor is currently very poor in comparison to Intel’s platforms on both single-core and all-core tests. Go is well optimized for math functions on Intel platforms but does not have the same optimizations for ARMv8. However, there is potential for up to a 10x increase in performance of ECDSA, Chacha20-Poly1305 and AES-GCM by using proper assembly optimizations for Falkor.
A number of other benchmarks using Golang also show similar issues of optimization for ARMv8. The performance of hardware is available to be highly competitive and often surpassing that of Intel CPUs but the software currently written does not take full advantage.
So, should Intel really be worried about Qualcomm breaking into the server market with ARM? Based on the data provided by Cloudflare, it is fairly safe to say that Intel will need to keep a close eye on how ARM platforms begin to shape up. There may be no immediate threat since it takes years for large data centers to implement major changes but several years from now, there could be serious competition for the enterprise market.
It’s worth noting that AMD is also returning to the server market with Epyc. AMD enjoys showing off how it can beat the majority of Intel’s offerings depending on which benchmark is shown.
To wrap up, Intel will continue to remain in good shape for the near future but there are issues it will need to address. Power consumption is significantly higher than ARM and performance is unlikely to remain better in the long term without architecture changes. The results are only a small sample and do not show data on Intel’s flagship Xeon offerings but there is clearly some motivation for Intel to carefully plan future developments.