In August 2018, fibeReality published the article: “Co-Packaged Optics on Trial,” in which we offered this summary judgement: “It is not hard to see why co-packaged optics has failed to be applied commercially, yet, despite work being done on it as long as eight years ago.” Over time, our negative outlook on CPO, particularly for intra-data center applications, became even more intense, culminating in the following piece this past March. At OFC 2021, the announcement by Arista Networks of the OSFP-XD, with an implicit endorsement by Google, we characterized as a knockout blow, most notably as it related to CPO. (Our latest reflection on the solution is with Microsoft leapfrogging from 400G to 1.6T, which means that the company is planning to stay at the lower rate for a long time – so, what is the rush for CPO? — as the big user seemingly attempted to adjust its messaging on the fly at the large optics show.) Nevertheless, we stated in regards to the very next panel we attended, that the new concept “will still need to be watched.” Consequently, it was only fair that we put OSFP-XD on trial as well. We made our customary, concerted effort to be as brutally objective as possible. We were not prejudiced by any obligation to be unjustifiably deferential to CPO, nor by the positive messages delivered by Andy Bechtolsheim, Chairman of Arista — the five panels at OFC in which he was given the opportunity to tout the novel idea. Indeed, there are issues that need to be resolved with OSFP-XD – and there was an inadequate amount of reflection given to practicality. One should also take note, it appears that the full technological details have not been made extensively available, yet. The product is not projected to be obtainable until 2024 at the earliest, for the 1.6Tb interface, which we believe could easily turn out to be quite optimistic.
One argument against Arista is that despite any technical advantages of the OSFP-XD, the supplier was forced to counteract the CPO, because it lacked the three silicon photonics platforms possessed by Cisco Systems. However, it still amounts at least partially to an ad hominem attack for the following four reasons: 1) CPO is again, not even close to being ready for prime time and widespread deployment could be many years into the future; 2) SiPh has never come close to reaching its full promise; 3) Cisco has not especially pushed hard for the CPO model; and 4) Arista’s solution maintains the advantages/benefits of previous pluggables.
The OSFP is a well-designed form factor with numerous engineering benefits compared with legacy FFs. The silicon is improved by several dBs, and it has better internal volume (particularly in comparison to the QSFP-DD). So, the OSFP-XD with a 16-lane electrical host I/F gives a clear path to 1.6 Tb modules based on proven 100G electrical signaling. In summary, it offers both better thermal conditions and mechanical integration than other FFs. Certainly, the technical advantages of the silicon using 100G SerDes and 100G per lane are well-understood now.
Additionally, the OSFP-XD provides an answer to a clear problem (unlike with CPO, in which the argument can be made about the lack of lucidity on the actual difficulty to be solved). Switch producers have the task of scaling up their 1RU switches to stay in line with the webscaler’s expansion. Of the seven form factors that either have or are expected to reach millions of unit volume, the OSFP-XD stands out at 1.6Tb Ethernet, twice the capacity of an OSFP, and four times as much as QSFP112.
On the negative side, with OSFP-XD, SI complications are the most pressing worry. Dealing with OSFP-XD is tougher than for QSFP-DD (albeit, the footprint allows for a simple means for the two FFs to interchange for the host card design).
With the present QSFP-DD and OSFP, there is hardly sufficient space to achieve eight differential signal pairs from the gold fingers to the DSP. Attempting to do so with 16 of them appears to be a stretch right now. Presumably, the industry will be expected to come up with a solution for this hurdle as well as another one — OSFP-XD using DD with 16 lanes will even be more challenging than usual for the DD, as with the power density, the network planner is better off constructing a 2RU with OSFP or QSFP-DD (with flyover cable), rather than assembling a 1RU with OSFP-XD.
Another problem is that much of the industry is not committed anymore to the 1RU switch form factor box, which is what drives the face plate density requirement. Some operators are content to go to 2- or 3RU, negating the extra dense I/O requirement. (Naturally, one key driver of port speed for the interface will be what is on the router, not just the server, and the former will have 800G I/Os for sure that will necessitate an interface.)
There is a big push for server remoting using MMF, maybe with Microsoft still being at least vocally an exception. Conversely, the OSFP-XD would be spot-on in examining future situations in which there is a direct link to the SMF fanout (leaf) to the servers.
Moreover, the fact that each I/O port is still 100G SerDes means more connections per OSFP, and so, if there is a failure on one, 16 I/O ports are taken down to replace the module — a lot of capacity to knock down, just to replace one failed I/O port. Although this last statement is definitely true, there is 2:1 redundancy on the connections, while the fabric itself can have an eighth of all links fail before it has any kind of meaningful impact — not to mention, it beats a CPO breaking.
Then, a 33W per OSFP package ostensibly goes against the plans of Microsoft, Facebook, and perhaps Amazon, which are demanding less power per RU for the data center. They already have lots of empty space racks and thus, there might be no necessity for a denser I/O port configuration. The goal is to get the links in the leaf as short as feasible. (Bechtolsheim always talks about the power per bit and each bandwidth step is 30% lower, but the raw power numbers per package are the snag.)
Inevitably, power/bit leads to questions about cooling an OSFP-XD, specifically with a 33W module envelope, which may call for an exotic scheme. At the moment, it is only logical that there is a presumption of a temperature slackening from one or two principal customers pushing for the FF, as extremely dense 1RU switches at present have a cooling (along with an EMI) problem because of the density at the faceplate. Additionally, from a connector perspective, the XD would have to take on the challenge of stacked 2x options, while attaining 200G electrical lanes.
Next, there is the concern that manifested itself with the 400ZR, given the narrow adoption of the OSFP. While we would not be surprised to see all of the hyperscalers ultimately move in an OSFP direction, the installed base of DD will be around for quite a while.
Lastly, there could be real causes of action regarding patents because the OSFP-XD seems to have borrowed heavily from the QSFP-DD FF, probably not against the MSA itself of the former, but perhaps against transceiver producers. On the other hand, such lawsuits would not make the big-end customers happy.
Despite each of these tech confrontations, Bechtolsheim, made a sensible and shrewd move in getting the OSFP more attractive for higher-speed Ethernet, as it is a natural progression of the historic trends with pluggables. As 200G optics will be easier to pull off than 200G electrical for pluggable modules, under the most pessimistic scenario, the 16x100G electrical interface should be expected to at least establish a foothold in the marketplace, in the foreseeable future, as one of the options for next-gen switch chips. That cannot be said of CPO in addressing the same type of app.
CPO proponents are still focusing on the necessity of Arista to come up with a narrative, which would be described as an easier choice than CPO, with the inevitability of 2RU with flyover cables. fibeReality believes that a combination of new tech developments and yes, concessions from at least one big buyer to relax requirements, is cause for legitimate hope. We also would disagree with any current prediction that “OSFP-XD2” at 100T would be as difficult as CPO – partly as the industry is still just in the midst, at best, in filling in 25G switches with chips, and internationally, there is quite a bit to go in deploying 200GbE optics in data centers, let alone those to be installed at the 400GbE rate.
The good news for Arista is that Google appears to be somewhat optimistic about the prospects for OSFP-XD, as it is immensely interested in denser optical FFs. We perceive that the search engine giant really wants the benefit of pushing out CPO for a minimum of two generations of interconnects, while allowing for backward compatibility to its OSFPs. It was also made clear at OFC that Google is not a fan of SiPh, supposedly a key enabler of CPO.
At an OFC panel, one learned that Google is presently looking at 200G SerDes and 200G per channel for a 1.6Tb pluggable, keeping the x8 lane structure for the OSFP package. Assuming that the silicon and power can be managed, Google is hopeful that XDx200G can provide it with 3.2 Tb modules.
Regarding Microsoft, of course, it will be interested in looking at anything new – in particular, a desire in getting more data on 1.6T applicability. Nonetheless, by prematurely punting on 800G, the cloud player could be putting exponentially more pressure on itself in the very long-term to come up with a radically new design, like CPO, to help accommodate the immense power surge at that rate. In contrast, fibeReality thinks that Google is taking a more pragmatic, incremental approach with its emphasis on continuing to “double-down” on capacity.
Optimally, it is quite conceivable that OSFP-XD will be mainly architected in a breakout mode for fanout. So, in the leaf — 16x100G and 8x200G; in the spine — 4x400G and 2x800G. Opponents of this point will say that it is one of those Bechtolsheim side-bar implementations being proposed, and that breakout cables are disliked because they are just one more physical aspect to fail (although recognizing that sometimes there is no other choice). Needless to say that Google will press on employing OSFP with fanout (2x200G-FR4) and later with 2x400G-FR4 — 2x800G — as well as possibly deploying 2x800G-FR4 with OSFP-XD in five or more years (according to our current projections).
Succinctly put, our ruling of the matter of OSFP-XD is cautiously optimistic in overcoming the obstacles with double-density, 16 I/O lanes from a mechanical packaging perspective in the same form factor. The prototypes need to come out and then there will be a close examination of the parasitic noise in a tight space, and the connector for that XD has to be created with the “halfing” of area for each interface. Arista thinks the other option for the XD will be a 200G SerDes with 8x lanes to do the 1.6T — if that works, there are the same number of components as for the 800G OSFP module and therefore, a lower cost per bandwidth bit.
On a final note, there is a difference between viability and relatively soon popularity. Operational annoyances have to be taken into consideration. There is always something to be said for the optic to match the Ethernet MAC speed as an OSFPx8 lane or OSFP-XDx16 lane gets to be bigger than the most popular MAC rates, and the customer is forced to divide the module into two logic optics with a half or a quarter of the bandwidth (ex. 2x400g-FR4 module or 400G-DR4 split to four DR1s).
In fact, perhaps the XD thrives to a much greater extent in a pluggable ZR than inside the data center network. At the same time, just because there are tech challenges with the XD, it does not automatically mean that CPO is in the driver’s seat anytime soon, other than for theoretical reasons, as the hurdles with the latter are far more enormous. When NPO is still painfully given official legitimacy this late in the game, it makes the next step in the so-called evolution less credible than ever.