Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made — cores taking turns using the same bus. MIT’s 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel’s 2007 Teraflop Research Chip — code-named Polaris — where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 50-core Xeon Phi, however, Intel settled instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.
I commented some time back on a similar article on the same topic. It appears now the MIT research group has working silicon of the design. As mentioned in the pull-quote, the Xeon Phi (which has made some news in the Top 500 SuperComputer stories recently) is a massively multicore architecture but uses a different interconnect that Intel designed on their own. These stories as they appear get filed into the category of massively multicore or low power CPU developments. Most times the same CPUs add cores without significantly drawing more power and thus provide a net increase in compute ability. Tilera, Calxeda and yes even SeaMicro were all working along towards those ends. Either through mergers, or cutting of funding each one has seemed to trail off and not succeed at its original goal (massively multicore, low power designs). Also along the way Intel has done everything it can to dull and dent the novelty of the new designs by revising an Atom based or Celeron based CPU to provide much lower power at the scale of maybe 2 cores per CPU.
Like this chip MIT announced Tilera too was originally an MIT research product spun off of the University campus. Its principals were the PI and a research associate if I remember correctly. Now that MIT has the working silicon they’re going to benchmark and test and verify their design. The researchers will release the verilog hardware description of chip for anyone use, research or verify for themselves once they’ve completed their own study. It will be interesting to see how much of an incremental improvement this design provides, and possibly could be the launch of another Tilera style product out of MIT.
“Were here today shipping a 64-bit processor core and we are what looks like two years ahead of ARM,” says Bishara. “The architecture of the Tile-Gx is aligned to the workload and gives one server node per chip rather than a sea of wimpy nodes not acting in a cache coherent manner. We have been in this market for two years now and we know what hurts in data centers and what works. And 32-bit ARM just is not going to cut it. Applied Micro is doing their own core, and that adds a lot of risks.”
Tilera is preparing to ship a 36 core Tile-Gx cpu in March. It’s going to be packaged with a re-compiled Linux distribution of CentOS on a development board (TILEencore). It will also have a number of re-compiled Unix utilities and packages included, so OEM shops can begin product development as soon as possible.
I’m glad to see Tilera is still duking it out, battling for the design wins with manufacturers selling into the Data Center as it were. Larger Memory addressing will help make the Tilera chips more competitive with Commodity Intel Hardware data center shops who build their own hardware. Maybe we’ll see full 64bit memory extensions at some point as a follow on to the current 40bit address space extensions currently. The memory extensions are necessary to address more than the 32bit limit of 4GBytes, so an extra 8 bits goes a long, long way to competing against a fully 64bit address space.
Also considering work being done at ARM for optimizing their chip designs for narrower design rules, Tilera should follow suit and attempt to shrink their chip architecture too. This would allow clock speeds to ease upward and keep the thermal design point consistent with previous generation Tile architecture chips, making Tile-Gx more competitive in the coming years. ARM announced 1 month ago they will be developing a 22nm sized cpu core for future licensing by ARM customers. As it is now Tilera uses an older fabrication design rule of around 40nm (which is still quite good given the expense required to shrink to lower design rules). And they have plans to eventually migrate to a narrower design rule. However ideally they would not stay farther behind that 1 generation from the top-end process lines of Intel (who is targeting 14nm production lines in the near future).
Tilera’s roadmap calls for its next generation of processors, code-named Stratton, to be released in 2013. The product line will expand the number of processors in both directions, down to as few as four and up to as many as 200 cores. The company is going from a 40-nm to a 28-nm process, meaning they’re able to cram more circuits in a given area. The chip will have improvements to interfaces, memory, I/O and instruction set, and will have more cache memory.
I’m enjoying the survey of companies doing massively parallel, low power computing products. Wired.com|Enterprise started last week with a look at SeaMicro and how the two principal founders got their start observing Google’s initial stabs at a warehouse sized computer. Since that time things have fractured somewhat instead of coalescing and now three big attempts are competing to fulfil the low power, massively parallel computer in a box. Tilera is a longer term project startup from MIT going back further than Calxeda or SeaMicro.
However application of this technology has been completely dependent on the software. Whether it be OSes or Applications, they all have to be constructed carefully to take full advantage of the Tile processor architecture. To their credit Tilera has attempted to insulate application developers from some of the vagaries of the underlying chip by creating an OS that does the heavy lifting of queuing and scheduling. But still, there’s got to be a learning curve there even if it isn’t quite as daunting as say folks who develop applications for the super computers at National Labs here in the U.S. Suffice it to say it’s a non-trivial choice to adopt a Tilera cpu for a product/project you are working on. And the people who need a Tilera GX cpu for their app, already know all they need to know about the the chip in advance. It’s that kind of choice they are making.
I’m also relieved to know they are continuing development to shrink down the design rules. Intel being the biggest leader in silicon semi-conductor manufacturing, continues to shrink its design, development and manufacturing design rules. We’re fast approaching a 20nm-18nm production line in both Oregon and Arizona. Both are Intel design fabrication plants and there not about to stop and take a breath. Companies like Tilera, Calxeda and SeaMicro need to do continuous development on their products to keep from being blind sided by Intel’s continuous product development juggernaut. So Tilera is very wise to shrink its design rule from 40nm down to 28nm as fast as it can and then get good yields on the production lines once they start sampling chips at this size.
*UPDATE: Just saw this run through my blogroll last week. Tilera has announced a new chip coming in March. Glad to see Tilera is still duking it out, battling for the design wins with manufacturers selling into the Data Center as it were. Larger Memory addressing will help make the Tilera chips more competitive with Commodity Intel Hardware shops, and maybe we’ll see full 64bit memory extensions at some point as a follow on to the current 40bit address space extenstions currently being touted in this article from The Register.
Wired.com isn’t the best at following the Cloud Data Industry. In fact at least they partially want to keep their advertisers happy so they will publish a Fear, Uncertainty and Doubt raising response direct from an Intel PR Engineer. Happily the Intel folks aren’t even fully aware of what people are doing with their SeaMicro and Quanta SQ-2 boxes and continue to beat the drum on Virtualized Servers on Multi-core, high-clocked chips. That’s the old school thinking on what a Compute Cloud can be. The New School says put the cloud in a single box, let the clock run slower and use less power and everybody wins. Read On:
So Intel gets an interview with a Conde-Nast writer for a sub-blog of Wired.com. I doubt too many purchasers or data center architects consult Cloudline@Wired.com. But all the same, I saw through many thinly veiled bits of handwaving and old saws from Intel saying, “Yes, this exists but we’re already addressing it with our exiting product lines,. . .” So, I wrote in a comment to this very article. Especially regarding a throw-away line mentioning the ‘future’ of the data center and the direction the Data Center and Cloud Computing market was headed. However the moderator never published the comment. In effect, I raised the Question: Whither Tilera? And the Quanta SM-2 server based on the Tilera Chip?
Aren’t they exactly what is described by the author John Stokes as a network of cores on a chip? And given the scale of Tilera’s own product plans going into the future and the fact they are not just concentrating on Network gear but actual Compute Clouds too, I’d say both Stokes and Walcyzk are asking the wrong questions and directing our attention in the wrong direction. This is not a PR battle but a flat out technology battle. You cannot win this with words and white papers but in fact it requires benchmarks and deployments and Case Histories. Technical merit and superior technology will differentiate the players in the Cloud in a Box race. And this hasn’t been the case in the past as Intel has battled AMD in the desktop consumer market. In the data center Intel Fear Uncertainty and Doubt is the only weapon they have.
And I’ll quote directly from John Stokes’s article here describing EXACTLY the kind of product that Tilera has been shipping already:
“Instead of Xeon with virtualization, I could easily see a many-core Atom or ARM cluster-on-a-chip emerging as the best way to tackle batch-oriented Big Data workloads. Until then, though, it’s clear that Intel isn’t going to roll over and let ARM just take over one of the hottest emerging markets for compute power.”
The key phrase here is cluster on a chip, in essence exactly what Tilera has strived to achieve with its Tilera64 based architecture. To review from previous blog entries of this website following the announcements and timelines published by Tilera:
It seems like massive scale multi-core cpus are increasing in popularity. A third party competitor is entering the market with a mobile cpu co-processor. Adapteva is announcing the Epiphany co-processor, but the question is really what’s it good at, and who is going to integrate it into a new phone design. Read On:
Many-core processors are apparently the new black for 2011. Intel continues to work on both its single chip cloud computer and Knights Corner, Tilera made headlines earlier this year, and now a new company, Adapteva, has announced its own entry into the field.
A competitor to Tilera and Intel’s MIC has entered the field as a mobile processor, co-processor. Given the volatile nature of chip architectures in the mobile market, this is going to be hard sell for some device designers I think. I say this as each new generation of Mobile CPU gets more and more integrated features as each new die shrink allows more embedded functions. The Graphic processors are now being embedded wholesale into every smartphone cpu. Other features like memory controllers and baseband processors will now doubt soon be added to the list as well. If Adapteva wants any traction at all in the Mobile market they will need to further their development of the Epiphany into a synthesizable core that can be added to an existing cpu (most likely a design from ARM). Otherwise trying to stick with being a separate auxiliary chip is going to hamper and severely limit the potential applications of their product.
Witness the integration of the graphics processing unit. Not long ago it was a way to differentiate a phone but required it to be integrated into the motherboard design along with any of the power requirements it required. In a very short time, after GPUs were added to cell phones they were integrated into the CPU chip sandwich to help keep manufacturing and power budget in check. If the Epiphany had been introduced around the golden age of discrete chips on cell phone motherboards, it would make a lot more sense. But now you need to be embedded, integrated and 100% ARM compatible with a fully baked developer toolkit. Otherwise, it’s all uphill from the product introduction forward. If there’s an application for the Ephiphany co-processor I hope they concentrate more on the tools to fully use the device and develop a niche right out of the gate rather than attempt to get some big name but small scale wins on individual devices from the Android market. That seems like the most likely candidates for shipping product right now.
One of the more radical departures from of the off the shelf commodity data centers built on Intel is the Quanta SQ-2. Based on the Tilera chip, it has multiple cores (many more than an equivalent Intel Architecture) and uses a mesh network on chip to speed communications between the cores. It’s been a long, low, slow slog to get Tilera to market in any product other than a network switch or comm switch of any sort. But according to Facebook, Tilera shows promise in the clock cycles/versus energy consumption category. Read On:
Facebook lined up the Tilera-based Quanta servers against a number of different server configurations making use of Intels four-core Xeon L5520 running at 2.27GHz and eight-core Opteron 6128 HE processors running at 2GHz. Both of these x64 chips are low-voltage, low power variants. Facebook ran the tests on single-socket 1U rack servers with 32GB and on dual-socket 1U rack servers with 64GB.All three machines ran CentOS Linux with the 2.6.33 kernel and Memcached 1.2.3h.
You will definitely want to read this whole story as presented El Reg. They have a few graphs displaying the performance of the Tilera based Quanta data cloud in a box versus the Intel server rack. And let me tell you on certain very specific workloads like the Web Caching using Memcached I declare advantage Tilera. No doubt data center managers need to pay attention to this and get some more evidence to back up this initial white paper from Facebook, but this is big, big news. And all one need do apart from tuning the software for the chipset is add a few PCIe based SSDs or TMS RamSan and you have what could theoretically be the fastest possible web performance possible. Even at this level of performance, there’s still room to grow I think on the hard drive storage front. What I would hope in future to see is Facebook do an exhaustive test on the Quanta SQ-2 product versus Calxeda (ARM cloud in a box) and the Seamicro SM-10000×64 (64bit Intel Atom cloud in a box). It would prove an interesting research project just to see how much chipsets, chip architectures and instruction sets play in optimizing each for a particular style and category of data center workload. I know I will be waiting and watching.
Seamicro just keeps cranking out new product. They are like the Apple of the massively parallel cloud computer in a box segment of the industry. They just recently moved from old style x86 32bit Intel Atom CPUs to fully x64 capable cpus. And now the increased the density of the cpus on each compute node within their 10U server box, bringing the grand total of cores up to a staggering 768!
An original SM10000 server with 512 cores and 1TB of main memory cost $139,000. The bump up to the 64-bit Atom N570 for 512 cores and the same 1TB of memory boosted the price to $165,000. A 768-core, 1.5TB machine using the new 64HD cards will run you $237,000. Thats 50 per cent more oomph and memory for 43.6 per cent more money. ®
SeaMicro continues to pump out the jams releasing another updated chassis in less than a year. There is now a grand total of 768 processor cores jammed in that 10U high box. Which leads me to believe they have just eclipsed the compute per rack unit of the Tilera and Calxeda massively parallel cloud servers in a box. But that would wrong because Calxeda is making a 2U server rack unit hold 120-4 core ARM cpus. So that gives you a grand total of 480 in just 2 rack units alone. Multiply that by 5 and you get 2400 cores in a 10U rack serving. So advantage Calxeda in total core count, however lets also consider software too. Atom being the cpu that Seamicro has chosen all along is an intel architecture chip and an x64 architecture at that. It is the best of both worlds for anyone who already had a big investment in Intel binary compatible OSes and applications. It is most often the software and it’s legacy pieces that drive the choice of which processor goes into your data cloud.
Anyone who had clean slate to start from might be able to choose between Calxeda versus Seamicro for their applications and infrastructure. And if density/thermal design point per rack unit is very important Calxeda too will suit your needs I would think. But who knows? Maybe your workflow isn’t as massively parallel as a Calxeda server and you might have a much lower implementation threshold getting started on an Intel system, so again advantage Seamicro. A real industry analyst would look at these two competing companies as complimentary, different architectures for different workflows.