You are currently viewing Intel Xeon 6 6700E Sierra Forest shatters Xeon expectations

Intel Xeon 6 6700E Sierra Forest shatters Xeon expectations

Intel Xeon 6780E 3

Intel needs a Xeon shake-up and it’s finally happening. For more than two years since we published Intel Sierra Forest E-Core Xeon Intel Needs, Sierra Forest has been on our radar as the most important part we’ve come to expect from the Intel stable. Now called Intel Xeon 6, this is really the first part of Intel’s large-scale Xeon transition. As I write this, I’m also pretty confident that the next generation of Clearwater Forest will be much more commercially successful, but I think it would also be wrong to sleep on Sierra Forest.

Let’s just put it in there: 144 Xeon cores with DDR5 and more PCIe Gen5 in 250W TDP are here. These are E-cores without Hyper-Threading, an important feature when Arm’s own hyper-scaler designs avoid SMT implementations. Better yet, these are x86 cores that are ready to port legacy workloads without the hassle of switching architectures. As we’ll get into, Sierra Forest isn’t perfect, but it puts Intel Xeon on a new and exciting path.

As a quick disclosure, Intel is sponsoring this, as you can imagine, as we had access to the chips prior to launch. Intel sent a QCT development platform for testing. Given the timing of receiving the system just a few days before launch, we had an SOS to Supermicro for a second system to use, which ended up arriving the morning after the Intel-QCT machine. A quick thank you to everyone involved in making this happen.

Intel QCT Birtch Stream Xeon 6 Platform 5
Intel QCT Birtch Stream Xeon 6 platform 5

Update: Since it was a night shoot, we waited until morning to upload the video. It’s here for those who prefer to watch or listen.

As always we suggest watching this in a dedicated tab, window or app for the best viewing experience.

Supermicro SYS 222H TN Xeon 6 Birtch Stream Platform 7
Supermicro SYS 222H TN Xeon 6,288 cores and 2TB memory

P-cores and E-cores. New platform features. Different platforms. There’s a lot here, but for those who just want to know what parts they can buy, we’ll get to the Intel Xeon 6700E SKUs first. If you’d like to know more about P-cores and E-cores, platform features, release schedule, etc., we’ll get to that later.

Intel Xeon 6700E SKU list

This is one of those launches where I think people will focus on the top container pieces, but there are some fascinating SKUs on the list. For the record, I think Intel has enough complexity with sockets, core types, MCR DIMM support, CXL Type-3 memory support, and so on that all differentiation in using DDR5 memory speeds should just stop. Intel desperately needs less complexity in decoding its products, not more.

Intel Xeon 6700E SKU list
Intel Xeon 6700E SKU list

On the plus side, there are no more precious metal names and no more “Scalable”, which is a huge win for the brand. These SKUs also have SST profiles for use in built-in spaces. Here is the table we found for it.

Intel Xeon 6700E SST profiles
Intel Xeon 6700E SST profiles

The “E” in the Intel Xeon 6700E parts is for efficient cores versus the “P” parts for performance cores. At the top end are the 144-core Intel Xeon 6780E and 6766E SKUs, each with 108MB of L3 cache.

Intel Xeon 6780E Lscpu output
Intel Xeon 6780E Lscpu output

These SKUs are the ones we received to use for this piece.

Intel Xeon 6766E Lscpu output
Intel Xeon 6766E Lscpu output

It’s great that we can see two DSA, IAA, QAT and DLB accelerators active in parts, but there are up to four available.

Intel Xeon 6780E Lscpi V Grep Qat
Intel Xeon 6780E Lscpi V Grep Qat

One part you shouldn’t sleep on is that even with those accelerators and 144 cores, these are 330W and 250W TDP parts. The topic of this part will be the concept of moving legacy virtual machines and microservices to E-cores from older generation hardware. The Intel Xeon Gold 5218 was a 16-core mid-stack CPU from the 2019 “Cascade Lake” 2nd generation Intel Xeon Scalable era with a 125W TDP. We’ll be talking consolidation ratios of 4.5:1 to 9:1 over something like that. When you see 250W, consider that at worst you are saving 2.5 times the power at the server level, and probably a lot more than that.

Intel QCT Birtch Stream Xeon 6 Platform 2
Intel QCT Birtch Stream Xeon 6 Platform 2

This will be rough conceptually for people. Still, anything Xeon Scalable 2nd gen and older that doesn’t make heavy use of AVX-512 will be hard to justify in the future. When we say E-core, if you want to translate to P-core 5th Gen Xeons currently available for workloads other than AVX-512 or AMX, divide the E-core count by 1.8 to 2.0 or so and you’ll be close to the same level of performance. More on that later.

Here is the topology map for the 144-core Intel Xeon 6780E in a dual-socket configuration.

2p Intel Xeon 6780E Supermicro topology
2p Intel Xeon 6780E Supermicro topology

We can see 4MB L2 cache blocks shared between four cores. This is the same we can see looking at the Intel Xeon 6766E system, which makes sense as it should mostly just be a lower TDP and clock speed part.

2p Intel Xeon 6766E QCT topology
2p Intel Xeon 6766E QCT topology

Notably missing here is the PCH I/O connection, as these are now non-PCH processors. Finally! AMD EPYC has been PCHless since the original EPYC 7001 series in 2017. Intel uses its PCHs to provide lower speed I/O, SATA, etc. There were even Lewisburg PCH versions with Intel QAT acceleration (and those PCHs were also on PCIe QAT accelerator cards.)

Intel QCT Birtch Stream Xeon 6 Platform 10
Intel QCT Birtch Stream Xeon 6 platform without PCH

By running a “fast” core-to-core latency run, we can easily identify clusters of four cores. Immediate clusters appear to be in the 86ns range. Most of the cores tend to fall in the 99-105ns range. Then for every four cores we achieve a different latency pattern in the 112-124ns range. It’s really cool to hear an architectural feature of 4MB L2 cache per core and then see patterns repeated for clusters of four cores.

144 Core Intel Xeon 6780E Core to Core latency Low
144 Core Intel Xeon 6780E Core to Core latency Low

By “quick”, it takes hours to run, so when you only have a few days with a system, we can only run it once. At the same time, you can easily see the four main L2 cache clusters.

Intel Xeon 6766E Core to Core Latency Start
Intel Xeon 6766E Core to Core Latency Start

The Intel Xeon 6766E was very similar to the Xeon 6780E, which seems to be a feature of the Sierra Forest design.

From nest to nest, the numbers increase.

Intel Sierra Forest 144 Core C2C Latency 2S Large
Intel Sierra Forest 144 Core C2C Latency 2S Large

This is common among dual contact systems. At the same time, the latency numbers aren’t what we’d expect to see from a P-core-only CPU these days. That’s fair enough since we’re trading maximum performance for density.

As for pricing, Intel released list prices right after we published this piece, so we have a ninja update. Here is the table.

Intel Xeon 6700E Price List
Intel Xeon 6700E Price List

Intel is in a bit of a bind as the Intel Xeon 6780E delivers more overall performance than its top end Emerald Rapids by >20% and has more cores. As such, it needs a high list price. Still, the platform will be cheaper than larger socket projects, so keep that in mind at the system level. The 64-core Xeon 6710E with 205W TDP is also impressive. If you have older 16-core and smaller servers, this will give you much of the benefits of consolidation at a low cost.

Next, let’s put this in the context of P-cores vs. E-cores and the different platforms.

Leave a Reply