Broadcom’s Tomahawk 3: An Optics Riddle

Avatar photoPublished on
Blog 1534972012 211

There is a mystery concerning stranded capacity as Broadcom’s StrataXGS Tomahawk 3 next-gen switching chipset offers quite a bit of bandwidth with a total capacity of 12.8 terabytes. The problem is in imagining how it all works in that the near-term front panel pluggables cannot provide enough bandwidth on a 2-row by 16. We are assuming the use of a QSFP-DD, and even packaging two 100-gig PMDs in one QSFP, meaning 32 QSFP-DDs on that panel with 2x100G per module, working out to only 6.4 terabytes. So, a hyperscale data center operator would be short 50% in what is projected to be a 200G world indefinitely. Although individuals at Microsoft can be found stating they would be willing to live with this situation, such large users will ultimately not put up with a very inefficient state of affairs from a cost point of view. In fact, it begs the important question: Why buy the Tomahawk 3, if not enough optical ports can be provided to satisfy the bandwidth needs? In this article, we will discuss various ways to deal with this matter.

For instance, the use of multiple RUs has been considered. With a 2RU, there would now be four rows of 16. Yet, if the switch is sitting on a PCB, how does one get the 50G electrical signals up to the second RU?

One idea has been to use a mezzanine card inside. However, this card can disrupt the signal integrity, and so, a 50-gig PAM4 may not even work. PAM4 is more sensitive to reflections, and a mezzanine card might suffer from impedance discontinuities.

Perhaps this difficulty helps open the door to Finisar’s 50G solution in that NRZ could be a better choice. Nevertheless, the vendor would have to overcome the world, for better or worse, being dedicated to moving on to PAM4, along with the supplier’s distraction with 3D-sensing.

Another option is that flyover copper cables could be used to get to the other RU. Still, it sounds like an awfully expensive way to build a switch.

On a side note, the expectation by a lot of engineers is to use flyover cables from companies, such as Samtec and Molex, with serial 100G electrical lanes in the future, as there is a challenge with the edge modules. While Arista claims it hates this type of cable, its founder and Chief Development Officer, is pushing OSFP, and whether the connector circumstances are truly better with this solution remains to be seen. Amusingly, for many of the others, it is like a copper pigtail, which is somewhat ironic because of the common dislike for optical pigtails in the industry.

Returning to the trapped capacity problem, perhaps fiber breakout/shuffle cables may be a more practical/cheaper option, although a complete cost breakdown analysis would have to be done. It could easily be what Google has in mind with its plan indicating a 2x(2x100G) in a module. Still, 2x100G in a QSFP apparently strands switch bandwidth, and we do not know whether it would be sufficient for the operator’s requirements.

Engineers are also claiming a 2RU switch can be done, resulting in 12.8 terabytes using front-panel pluggables. Google would still go 200G, but it would mean each pluggable port would be at that rate, but 64 of them would be required. While it may be the most practical way to go of all of the choices, at the same time, it would be a technical challenge, and we understand there is still some bandwidth, which remains unusable.

Despite all of these concepts right now, there appears to be an inadequate amount of attention being given to what would ordinarily be considered a ludicrous difficulty with such a high number of stranded ports.  It is not even clear as to when 50G optics themselves will be ready enough for prime time to make a significant leap from 25G. Although there has been progress with 50G, unsurprisingly, companies are apparently still more in the learning phase at the higher rate, as the bulk of the activity is still happening in the labs, and it is difficult to fully know how well it is working.

Once again, the damage to the optical ecosystem will not help in even getting to 200G (4x50G). The owners of these large, private networks, which are in the most need for additional capacity, created that mess.

As always, fibeReality does not recommend any securities, and this writer does not invest in any companies being analyzed by us.

To follow us on our totally separate, quick update company blog, which is exclusively on fibeReality’s LinkedIn page, please click here.

[written by Mark Lutkowitz]

SHARE

1 comment

Comments are closed.