Categories
cloud data center fpga science & technology

MIT Puts 36-Core Internet on a Chip | EE Times

Partially connected mesh topology
Partially connected mesh topology (Photo credit: Wikipedia)

Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made — cores taking turns using the same bus. MIT’s 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel’s 2007 Teraflop Research Chip — code-named Polaris — where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 50-core Xeon Phi, however, Intel settled instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.

via MIT Puts 36-Core Internet on a Chip | EE Times.

I commented some time back on a similar article on the same topic. It appears now the MIT research group has working silicon of the design. As mentioned in the pull-quote, the Xeon Phi (which has made some news in the Top 500 SuperComputer stories recently) is a massively multicore architecture but uses a different interconnect that Intel designed on their own. These stories as they appear get filed into the category of massively multicore or low power CPU developments. Most times the same CPUs add cores without significantly drawing more power and thus provide a net increase in compute ability. Tilera, Calxeda and yes even SeaMicro were all working along towards those ends. Either through mergers, or cutting of funding each one has seemed to trail off and not succeed at its original goal (massively multicore, low power designs). Also along the way Intel has done everything it can to dull and dent the novelty of the new designs by revising an Atom based or Celeron based CPU to provide much lower power at the scale of maybe 2 cores per CPU.

Like this chip MIT announced Tilera too was originally an MIT research product spun off of the University campus. Its principals were the PI and a research associate if I remember correctly. Now that MIT has the working silicon they’re going to benchmark and test and verify their design. The researchers will release the verilog hardware description of chip for anyone use, research or verify for themselves once they’ve completed their own study. It will be interesting to see how much of an incremental improvement this design provides, and possibly could be the launch of another Tilera style product out of MIT.

Categories
cloud computers data center technology

Tilera preps many-cored Gx chips for March launch • The Register

“Were here today shipping a 64-bit processor core and we are what looks like two years ahead of ARM,” says Bishara. “The architecture of the Tile-Gx is aligned to the workload and gives one server node per chip rather than a sea of wimpy nodes not acting in a cache coherent manner. We have been in this market for two years now and we know what hurts in data centers and what works. And 32-bit ARM just is not going to cut it. Applied Micro is doing their own core, and that adds a lot of risks.”

via Tilera preps many-cored Gx chips for March launch • The Register.

Tile of a TILE64 processor from Tilera
Image via Wikipedia

Tilera is preparing to ship a 36 core Tile-Gx cpu in March. It’s going to be packaged with a re-compiled Linux distribution of CentOS on a development board (TILEencore). It will also have a number of re-compiled Unix utilities and packages included, so OEM shops can begin product development as soon as possible.

I’m glad to see Tilera is still duking it out, battling for the design wins with manufacturers selling into the Data Center as it were. Larger Memory addressing will help make the Tilera chips more competitive with Commodity Intel Hardware data center shops who build their own hardware. Maybe we’ll see full 64bit memory extensions at some point as a follow on to the current 40bit address space extensions currently. The memory extensions are necessary to address more than the 32bit limit of 4GBytes, so an extra 8 bits goes a long, long way to competing against a fully 64bit address space.

Also considering work being done at ARM for optimizing their chip designs for narrower design rules, Tilera should follow suit and attempt to shrink their chip architecture too. This would allow clock speeds to ease upward and keep the thermal design point consistent with previous generation Tile architecture chips, making Tile-Gx more competitive in the coming years. ARM announced 1 month ago they will be developing a 22nm sized cpu core for future licensing by ARM customers. As it is now Tilera uses an older fabrication design rule of around 40nm (which is still quite good given the expense required to shrink to lower design rules). And they have plans to eventually migrate to a narrower design rule. However ideally they would not stay farther behind that 1 generation from the top-end process lines of Intel (who is targeting 14nm production lines in the near future).

Categories
cloud computers data center google technology

Tilera | Wired Enterprise | Wired.com

Tilera’s roadmap calls for its next generation of processors, code-named Stratton, to be released in 2013. The product line will expand the number of processors in both directions, down to as few as four and up to as many as 200 cores. The company is going from a 40-nm to a 28-nm process, meaning they’re able to cram more circuits in a given area. The chip will have improvements to interfaces, memory, I/O and instruction set, and will have more cache memory.

via Tilera | Wired Enterprise | Wired.com.

Image representing Wired Magazine as depicted ...
Image via CrunchBase

I’m enjoying the survey of companies doing massively parallel, low power computing products. Wired.com|Enterprise started last week with a look at SeaMicro and how the two principal founders got their start observing Google’s initial stabs at a warehouse sized computer. Since that time things have fractured somewhat instead of coalescing and now three big attempts are competing to fulfil the low power, massively parallel computer in a box. Tilera is a longer term project startup from MIT going back further than Calxeda or SeaMicro.

However application of this technology has been completely dependent on the software. Whether it be OSes or Applications, they all have to be constructed carefully to take full advantage of the Tile processor architecture. To their credit Tilera has attempted to insulate application developers from some of the vagaries of the underlying chip by creating an OS that does the heavy lifting of queuing and scheduling. But still, there’s got to be a learning curve there even if it isn’t quite as daunting as say folks who develop applications for the super computers at National Labs here in the U.S. Suffice it to say it’s a non-trivial choice to adopt a Tilera cpu for a product/project you are working on. And the people who need a Tilera GX cpu for their app, already know all they need to know about the the chip in advance. It’s that kind of choice they are making.

I’m also relieved to know they are continuing development to shrink down the design rules. Intel being the biggest leader in silicon semi-conductor manufacturing, continues to shrink its design, development and manufacturing design rules. We’re fast approaching a 20nm-18nm production line in both Oregon and Arizona. Both are Intel design fabrication plants and there not about to stop and take a breath. Companies like Tilera, Calxeda and SeaMicro need to do continuous development on their products to keep from being blind sided by Intel’s continuous product development juggernaut. So Tilera is very wise to shrink its design rule from 40nm down to 28nm as fast as it can and then get good yields on the production lines once they start sampling chips at this size.

*UPDATE: Just saw this run through my blogroll last week. Tilera has announced a new chip coming in March. Glad to see Tilera is still duking it out, battling for the design wins with manufacturers selling into the Data Center as it were. Larger Memory addressing will help make the Tilera chips more competitive with Commodity Intel Hardware shops, and maybe we’ll see full 64bit memory extensions at some point as a follow on to the current 40bit address space extenstions currently being touted in this article from The Register.

English: Block diagram of the Tilera TILEPro64...
Image via Wikipedia
Categories
blogtools cloud computers data center technology

Intel Responds to Calxeda/HP ARM Server News (Wired.com)

Now, you’re probably thinking, isn’t Xeon the exact opposite of the kind of extreme low-power computing envisioned by HP with Project Moonshot? Surely this is just crazy talk from Intel? Maybe, but Walcyzk raised some valid points that are worth airing.via Cloudline | Blog | Intel Responds to Calxeda/HP ARM Server News: Xeon Still Wins for Big Data.

Structure of the TILE64 Processor from Tilera
Image via Wikipedia: Tile64 mesh network processor from Tilera
Image representing Tilera as depicted in Crunc...
Image via CrunchBase

So Intel gets an interview with a Conde-Nast writer for a sub-blog of Wired.com. I doubt too many purchasers or data center architects consult Cloudline@Wired.com. But all the same, I saw through many thinly veiled bits of handwaving and old saws from Intel saying, “Yes, this exists but we’re already addressing it with our exiting product lines,. . .” So, I wrote in a comment to this very article. Especially regarding a throw-away line mentioning the ‘future’ of the data center and the direction the Data Center and Cloud Computing market was headed. However the moderator never published the comment. In effect, I raised the Question: Whither Tilera? And the Quanta SM-2 server based on the Tilera Chip?

Aren’t they exactly what is described by the author John Stokes as a network of cores on a chip? And given the scale of Tilera’s own product plans going into the future and the fact they are not just concentrating on Network gear but actual Compute Clouds too, I’d say both Stokes and Walcyzk are asking the wrong questions and directing our attention in the wrong direction. This is not a PR battle but a flat out technology battle. You cannot win this with words and white papers but in fact it requires benchmarks and deployments and Case Histories. Technical merit and superior technology will differentiate the players in the  Cloud in a Box race. And this hasn’t been the case in the past as Intel has battled AMD in the desktop consumer market. In the data center Intel Fear Uncertainty and Doubt is the only weapon they have.

And I’ll quote directly from John Stokes’s article here describing EXACTLY the kind of product that Tilera has been shipping already:

“Instead of Xeon with virtualization, I could easily see a many-core Atom or ARM cluster-on-a-chip emerging as the best way to tackle batch-oriented Big Data workloads. Until then, though, it’s clear that Intel isn’t going to roll over and let ARM just take over one of the hottest emerging markets for compute power.”

The key phrase here is cluster on a chip, in essence exactly what Tilera has strived to achieve with its Tilera64 based architecture. To review from previous blog entries of this website following the announcements and timelines published by Tilera:

Categories
gpu mobile technology

Rise of the Multi-Core Mesh Munchkins: Adapteva Announces New Epiphany Processor – HotHardware

Epiphany Processor from Adapteva
Epiphany Block Diagram

Many-core processors are apparently the new black for 2011. Intel continues to work on both its single chip cloud computer and Knights Corner, Tilera made headlines earlier this year, and now a new company, Adapteva, has announced its own entry into the field.

via Rise of the Multi-Core Mesh Munchkins: Adapteva Announces New Epiphany Processor – HotHardware.

A competitor to Tilera and Intel’s MIC  has entered the field as a mobile processor, co-processor. Given the volatile nature of chip architectures in the mobile market, this is going to be hard sell for some device designers I think. I say this as each new generation of Mobile CPU gets more and more integrated features as each new die shrink allows more embedded functions. The Graphic processors are now being embedded wholesale into every smartphone cpu. Other features like memory controllers and baseband processors will now doubt soon be added to the list as well. If Adapteva wants any traction at all in the Mobile market they will need to further their development of the Epiphany into a synthesizable core that can be added to an existing cpu (most likely a design from ARM). Otherwise trying to stick with being a separate auxiliary chip is going to hamper and severely limit the potential applications of their product.

Witness the integration of the graphics processing unit. Not long ago it was a way to differentiate a phone but required it to be integrated into the motherboard design along with any of the power requirements it required. In a very short time, after GPUs were added to cell phones they were integrated into the CPU chip sandwich to help keep manufacturing and power budget in check. If the Epiphany had been introduced around the golden age of discrete chips on cell phone motherboards, it would make a lot more sense. But now you need to be embedded, integrated and 100% ARM compatible with a fully baked developer toolkit. Otherwise, it’s all uphill from the product introduction forward. If there’s an application for the Ephiphany co-processor I hope they concentrate more on the tools to fully use the device and develop a niche right out of the gate rather than attempt to get some big name but small scale wins on individual devices from the Android market. That seems like the most likely candidates for shipping product right now.

Categories
cloud computers data center technology

Tilera routs Intel, AMD in Facebook bakeoff • The Register

Structure of the TILE64 Processor from Tilera
Tile64 processor from Tilera

Facebook lined up the Tilera-based Quanta servers against a number of different server configurations making use of Intels four-core Xeon L5520 running at 2.27GHz and eight-core Opteron 6128 HE processors running at 2GHz. Both of these x64 chips are low-voltage, low power variants. Facebook ran the tests on single-socket 1U rack servers with 32GB and on dual-socket 1U rack servers with 64GB.All three machines ran CentOS Linux with the 2.6.33 kernel and Memcached 1.2.3h.

via Tilera routs Intel, AMD in Facebook bakeoff • The Register.

You will definitely want to read this whole story as presented El Reg. They have a few graphs displaying the performance of the Tilera based Quanta data cloud in a box versus the Intel server rack. And let me tell you on certain very specific workloads like the Web Caching using Memcached I declare advantage Tilera. No doubt data center managers need to pay attention to this and get some more evidence to back up this initial white paper from Facebook, but this is big, big news. And all one need do apart from tuning the software for the chipset is add a few PCIe based SSDs or TMS RamSan and you have what could theoretically be the fastest possible web performance possible. Even at this level of performance, there’s still room to grow I think on the hard drive storage front. What I would hope in future to see is Facebook do an exhaustive test on the Quanta SQ-2 product versus Calxeda (ARM cloud in a box) and the Seamicro SM-10000×64 (64bit Intel Atom cloud in a box). It would prove an interesting research project just to see how much chipsets, chip architectures and instruction sets play in optimizing each for a particular style and category of data center workload. I know I will be waiting and watching.

Categories
cloud computers data center technology

SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register

Image representing SeaMicro as depicted in Cru...
Image via CrunchBase

An original SM10000 server with 512 cores and 1TB of main memory cost $139,000. The bump up to the 64-bit Atom N570 for 512 cores and the same 1TB of memory boosted the price to $165,000. A 768-core, 1.5TB machine using the new 64HD cards will run you $237,000. Thats 50 per cent more oomph and memory for 43.6 per cent more money. ®

via SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register.

SeaMicro continues to pump out the jams releasing another updated chassis in less than a year. There is now a grand total of 768 processor cores jammed in that 10U high box. Which leads me to believe they have just eclipsed the compute per rack unit of the Tilera and Calxeda massively parallel cloud servers in a box. But that would wrong because Calxeda is making a 2U server rack unit hold 120-4 core ARM cpus. So that gives you a grand total of 480 in just 2 rack units alone. Multiply that by 5 and you get 2400 cores in a 10U rack serving. So advantage Calxeda in total core count, however lets also consider software too. Atom being the cpu that Seamicro has chosen all along is an intel architecture chip and an x64 architecture at that. It is the best of both worlds for anyone who already had a big investment in Intel binary compatible OSes and applications. It is most often the software and it’s legacy pieces that drive the choice of which processor goes into your data cloud.

Anyone who had clean slate to start from might be able to choose between Calxeda versus Seamicro for their applications and infrastructure. And if density/thermal design point per rack unit is very important Calxeda too will suit your needs I would think. But who knows? Maybe your workflow isn’t as massively parallel as a Calxeda server and you might have a much lower implementation threshold getting started on an Intel system, so again advantage Seamicro. A real industry analyst would look at these two competing companies as complimentary, different architectures for different workflows.

Categories
cloud computers data center technology wintel

Tilera throws gauntlet at Intels feet • The Register

Upstart mega-multicore chip maker Tilera has not yet started sampling its future Tile-Gx 3000 series of server processors, and companies have already locked in orders for the chips.

via Tilera throws gauntlet at Intels feet • The Register.

Proof that sometimes  a shipping product doesn’t always make all the difference. Although it might be nice to tout performance of actual shipping product. What’s becoming more real is the power efficiency of the Tilera architcture core for core versus the Intel IA-64 architecture. Tilera can provide a much lower Thermal Design Point (TDM) per core than typical Intel chips running the same workloads. So Tilera for the win on paper anyways.

Categories
cloud computers data center science & technology technology

Tilera preps 100-core chips for network gear • The Register

One Blue Gene/L node board
Image via Wikipedia

Upstart multicore chip maker Tilera is using the Interop networking trade show as the coming out party for its long-awaited Tile-Gx series of processors, which top out at 100 cores on a single die.

via Tilera preps 100-core chips for network gear • The Register.

A further update on Tilera’s product launches as the old Interop tradeshow for network switch and infrastructure vendors is held in Las Vegas. They have tweaked the chip packaging of their cpus and now are going to market different cpus to different industries. This family of Tilera chips is called the 8000 series and will be followed by a next generation of 3000 and 5000 series chips. Projections are by the time the Tilera 3000 series is released the density of the chips will be sufficient to pack upwards of 20,000 cpu cores of Tilera chips in a single 42 unit tall, 19 inch wide server rack. with a future revision possibly doubling that number of cores to 40,000. That road map is very agressive but promising and shows that there is lots of scaling possible with the Tilera product over time. Hopefully these plans will lead to some big customers signing up to use Tilera in shipping product in the immediate and near future.

What I’m most interested in knowing is how does the Qanta server currently shipping that uses the Tilera cpu benchmark compared to an Intel Atom based or ARM based server on a generic webserver benchmark. While white papers and press releases have made regular appearances on the technolog weblogs, very few have attempted to get sample product and run it through the paces. I suspect, and cannot confirm that anyone who is a potential customer are given Non-disclosure Agreements and shipping samples to test in their data centers before making any big purchases. I also suspect that as is often the case the applications for these low power massively parallel dense servers is very narrow. Not unlike that for a super computer. IBM‘s Cell Processor that powers the Blue Gene super computers is essentially a PowerPC architecture with some extra optimizations and streamlining to make it run very specific workloads and algorithms faster. In a super computing environment you really need to tune your software to get the most out of the huge up front investment in the ‘iron’ that you got from the manufacturer. There’s not a lot of value add available in that scientific and super computing environment. You more or less roll your own solution, or beg, borrow or steal it from a colleague at another institution using the same architecture as you. So the Quanta S2Q server using the Tilera chip is similarly likely to be a one off or niche product, but a very valuable one to those who  purchase it. Tilera will need a software partner to really pump up the volumes of shipping product if they expect a wider market for their chips.

But using a Tilera processor in a network switch or a ‘security’ device or some other inspection engine might prove very lucrative. I’m thinking of your typical warrantless wire-tapping application like the NSA‘s attempt to scoop up and analyze all the internet traffic at large carriers around the U.S. Analyzing data traffic in real time prevents folks like NSA from capturing and having to move around large volumes of useless data in order to have it analyzed at a central location. Instead localized computing nodes can do the initial inspection in realtime keying on phrases, words, numbers, etc. which then trigger the capturing process and send the tagged data back to NSA for further analysis. Doing that in parallel with a 100 core CPU would be very advantageous in that a much smaller footprint would be required in the secret closets NSA maintains at those big data carriers operations centers. Smaller racks, less power makes for a much less obvious presence in the data center.

Categories
cloud data center google technology

Facebook: No ‘definite plans’ to ARM data centers • The Register

Image representing Facebook as depicted in Cru...
Image via CrunchBase

Clearly, ARM and Tilera are a potential threat to Intel’s server business. But it should be noted that even Google has called for caution when it comes to massively multicore systems. In a paper published in IEEE Micro last year, Google senior vice president of operations Urs Hölzle said that chips that spread workloads across more energy-efficient but slower cores may not be preferable to processors with faster but power-hungry cores.

“So why doesn’t everyone want wimpy-core systems?” Hölzle writes. “Because in many corners of the real world, they’re prohibited by law – Amdahl’s law.

via Facebook: No ‘definite plans’ to ARM data centers • The Register.

The explanation given here by Google’s top systems person is that latency versus parallel processes overhead. Which means if you have to do all the steps in order, with a very low level of parallel tasks that results in much higher performance. And that is the measure that all the users of your service will judge you by. Making things massively parallel might provide the same level of response, but at a lower energy cost. However the complications due to communication and processing overhead to assemble all the data and send it over the wire will offset any advantage in power efficiency. In other words, everything takes longer and latency increases, and the users will deem your service to be slow and unresponsive. That’s the dilemna of Amdahl’s Law, the point of diminishing returns when adopting parallel computer architectures.

Now compare this to something say we know much more concretely, like the Airline Industry. As the cost of tickets came down, the attempt to cut costs went up. Schedules for landings and gate assignments got more complicated and service levels have suffered terribly. No one is really all that happy about the service they get, even from the best airline currently operating. So maybe Amdahl’s Law doesn’t apply when there’s a false ceiling placed on what is acceptable in terms of the latency of a ‘system’. If airlines are not on time, but you still make your connection 99% of the time, who will complain? So by way of comparison there is a middle ground that may be achieved where more parallelizing of compute tasks will lower the energy required by a data center. It will require greater latency, and a worse experience for the users. But if everyone suffers equally from this and the service is not great but adequate, then the company will be able to cut costs through implementing more parallel processors in their data centers.

I think Tilera holds a special attraction potentially for Facebook. Especially since Quanta their hardware assembler of choice is already putting together computers with the Tilera chip for customers now. It seems like this chain of associations might prove a way for Facebook to test the waters on a scale large enough to figure out the cost/benefits of massively parallel cpus in the data center. Maybe it will take another build out of a new data center to get there, but it will happen no doubt eventually.