Categories
cloud data center fpga science & technology

MIT Puts 36-Core Internet on a Chip | EE Times

Partially connected mesh topology
Partially connected mesh topology (Photo credit: Wikipedia)

Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made — cores taking turns using the same bus. MIT’s 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel’s 2007 Teraflop Research Chip — code-named Polaris — where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 50-core Xeon Phi, however, Intel settled instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.

via MIT Puts 36-Core Internet on a Chip | EE Times.

I commented some time back on a similar article on the same topic. It appears now the MIT research group has working silicon of the design. As mentioned in the pull-quote, the Xeon Phi (which has made some news in the Top 500 SuperComputer stories recently) is a massively multicore architecture but uses a different interconnect that Intel designed on their own. These stories as they appear get filed into the category of massively multicore or low power CPU developments. Most times the same CPUs add cores without significantly drawing more power and thus provide a net increase in compute ability. Tilera, Calxeda and yes even SeaMicro were all working along towards those ends. Either through mergers, or cutting of funding each one has seemed to trail off and not succeed at its original goal (massively multicore, low power designs). Also along the way Intel has done everything it can to dull and dent the novelty of the new designs by revising an Atom based or Celeron based CPU to provide much lower power at the scale of maybe 2 cores per CPU.

Like this chip MIT announced Tilera too was originally an MIT research product spun off of the University campus. Its principals were the PI and a research associate if I remember correctly. Now that MIT has the working silicon they’re going to benchmark and test and verify their design. The researchers will release the verilog hardware description of chip for anyone use, research or verify for themselves once they’ve completed their own study. It will be interesting to see how much of an incremental improvement this design provides, and possibly could be the launch of another Tilera style product out of MIT.

Categories
cloud computers data center gpu technology wintel

AMD Snatches New-Age Server Maker From Under Intel | Wired Enterprise | Wired.com

Image representing AMD as depicted in CrunchBase
Image via CrunchBase

Chip designer and chief Intel rival AMD has signed an agreement to acquire SeaMicro, a Silicon Valley startup that seeks to save power and space by building servers from hundreds of low-power processors.

via AMD Snatches New-Age Server Maker From Under Intel | Wired Enterprise | Wired.com.

It was bound to happen eventually, I guess. SeaMicro has been acquired by AMD. We’ll see what happens as a result as SeaMicro is a customer of Intel’s Atom chips and now most recently Xeon server chips as well. I have no idea where this is going or what AMD intends to do, but hopefully this won’t scare off any current or near future customers.

SeaMicro’s competitive advantage has been and will continue to be the development work they performed on that custom ASIC chip they use in all their systems. That bit of intellectual property was in essence the reason AMD decided to acquire SeaMicro and hopefully let it gain an engineering advantage for systems it might put out on the market in the future for large scale Data Centers.

While this is all pretty cool technology, I think that SeaMicro’s best move was to design its ASIC so that it could take virtually any common CPU. In fact, SeaMicro’s last big announcement introduced its SM10000-EX option, which uses low-power, quad-core Xeon processors to more than double compute performance while still keeping the high density, low-power characteristics of its siblings.

via SeaMicro acquisition: A game-changer for AMD • The Register.

So there you have it Wired and The Register are reporting the whole transaction pretty positively. Looks on the surface to be a win for AMD as it can design new server products and get them to market quickly using the SeaMicro ASIC as a key ingredient. SeaMicro can still service it’s current customers and eventually allow AMD to up sell or upgrade as needed to keep the ball rolling. And with AMD’s Fusion architecture marrying GPUs with CPU cores who knows what cool new servers might be possible? But as usual the nay-sayers the spreaders of Fear, Uncertainty and Doubt have questioned the value of SeaMicro and their original product he SM-10000.

Diane Bryant, the general manager of Intel’s data center and connected systems group at a press conference for the launch of new Xeon processors had this to say, ““We looked at the fabric and we told them thereafter that we weren’t even interested in the fabric,” when asked about SeaMicro’s attempt to interest Intel in buying out the company. To Intel there’s nothing special enough in the SeaMicro to warrant buying the company. Furthermore Bryant told Wired.com:

“…Intel has its own fabric plans. It just isn’t ready to talk about them yet. “We believe we have a compelling solution; we believe we have a great road map,” she said. “We just didn’t feel that the solution that SeaMicro was offering was superior.”

This is a move straight out of Microsoft’s marketing department circa 1992 where they would pre-announce a product that never shipped was barely developed beyond a prototype stage. If Intel is really working on this as a new product offering you would have seen an announcement by now, rather than a vague, tangential reference that appears more like a parting shot than a strategic direction. So I will be watching intently in the coming months and years if needed to see what if any Intel ‘fabric technology’ makes its way from the research lab, to the development lab and to final product shipping. However don’t be surprised if this is Intel attempting to undermine AMD’s choice to purchase SeaMicro. Likewise, Forbes.com later reported from a representative from SeaMicro that their company had not tried to encourage Intel to acquire SeaMicro. It is anyone’s guess who is really correct and being 100% honest in their recollections. However I am still betting on SeaMicro’s long term strategy of pursuing low power, ultra dense, massively parallel servers. It is an idea whose time has come.

Image representing Intel as depicted in CrunchBase
Image via CrunchBase
Categories
computers data center technology

SeaMicro adds Xeons to Atom smasher microservers • The Register

Theres some interesting future possibilities for the SeaMicro machines. First, SeaMicro could extend that torus interconnect to span multiple chassis. Second, it could put a “Patsburg” C600 chipset on an auxiliary card and actually make fatter SMP nodes out of single processor cards and then link them into the torus interconnect. Finally, it could of course add other processors to the boards, such as Tileras 64-bit Tile Gx3000s or 64-bit ARM processors when they become available.

via SeaMicro adds Xeons to Atom smasher microservers • The Register.

SeaMicro SM10000
SeaMicro SM10000 (Photo credit: blogeee.net)

Timothy Prickett Morgan writing for The Register, has a great article on SeaMicro’s recent announcement of a Xeon-based 10U server chassis. Seemingly going against it’s first two generations of low power massively parallel server boxes, this one uses a brawny Intel Xeon server chip (albeit one that is fairly low power and low Thermal Design Point).

Sad as it may seem to me, the popularity of the low power, massively parallel cpu box must not be very lucrative. But a true testament to the flexibility of their original 10U server rack design is the ability to do a ‘swap’ of the higher power Intel Xeon cpus. I doubt there’s too many competitors in this section of the market that could ‘turn on a dime’ the way SeaMicro has appeared to do with this Xeon based server. Most often designs will be so heavily optimized for a particular cpu, power supply and form factor layout that changing one component might force a bigger change order in the design department. And the product would take longer to develop and ship as a result.

So even though I hope the 64bit Intel Atom will still be SeaMicro’s flagship product, I’m also glad they can stay in the fight longer selling into the ‘established’ older data center accounts worldwide. Adapt or die is the cliche adage of some technology writers and I would mark this with a plus (+) in the adapt column.

Categories
cloud computers data center google technology

How Google Spawned The 384-Chip Server | Wired Enterprise | Wired.com

SeaMicro’s latest server includes 384 Intel Atom chips, and each chip has two “cores,” which are essentially processors unto themselves. This means the machine can handle 768 tasks at once, and if you’re running software suited to this massively parallel setup, you can indeed save power and space.

via How Google Spawned The 384-Chip Server | Wired Enterprise | Wired.com.

Image representing Wired Magazine as depicted ...
Image via CrunchBase

Great article from Wired.com on SeaMicro and the two principle minds behind its formation. Both of these fellows were quite impressed with Google’s data center infrastructure at the points in time when they both got to visit a Google Data Center. But rather than just sit back and gawk, they decided to take action and borrow, nay steal some of those interesting ideas the Google Engineers adopted early on. However, the typical naysayers pull a page out of the Google white paper arguing against SeaMicro and the large number of smaller, lower-powered cores they use in the SM-10000 product.

SeaMicro SM10000
Image by blogeee.net via Flickr

But nothing speaks of success more than product sales and SeaMicro is selling it’s product into data centers. While they may not achieve the level of commerce reached by Apple Inc., it’s a good start. What still needs to be done is more benchmarks and real world comparisons that reproduce or negate the results of Google’s whitepaper promoting their choice of off the shelf commodity Intel chips. Google is adamant that higher clock speed ‘server’ chips attached to single motherboards connected to one another in large quantity is the best way to go. However, the two guys who started SeaMicro insist that while Google’s choice for itself makes perfect sense, NO ONE else is quite like Google in their compute infrastructure requirements. Nobody has such a large enterprise or the scale Google requires (except for maybe Facebook, and possibly Amazon). So maybe there is a market at the middle and lower end of the data center owner’s market? Every data center’s needs will be different especially when it comes to available space, available power and cooling restrictions for a given application. And SeaMicro might be the secret weapon for shops constrained by all three: power/cooling/space.

*UPDATE: Just saw this flash through my Google Reader blogroll this past Wednesday, Seamicro is now selling an Intel Xeon based server. I guess the market for larger numbers of lower power chips just isn’t strong enough to sustain a business. Sadly this makes all the wonder and speculation surrounding the SM10000 seem kinda moot now. But hopefully there’s enough intellectual property rights and patents in the original design to keep the idea going for a while. Seamicro does have quite a headstart over competitors like Tilera, Calxeda and Applied Micro. And if they can help finance further developments of Atom based servers by selling a few Xeons along the way, all the better.

Categories
cloud computers data center technology

HP hooks up with Calxeda to form server ARMy • The Register

Calxeda is producing 4-core, 32-bit, ARM-based system-on-chip SOC designs, developed from ARMs Cortex A9. It says it can deliver a server node with a thermal envelope of less than 5 watts. In the summer it was designing an interconnect to link thousands of these things together. A 2U rack enclosure could hold 120 server nodes: thats 480 cores.

via HP hooks up with Calxeda to form server ARMy • The Register.

EnergyCore prototype card
The first attempt at making an OEM compute node from Calxeda

HP signing on as a OEM for Calxeda designed equipment is going to push ARM based massively parallel server designs into a lot more data centers. Add to this the announcement of the new ARM-15 cpu and it’s timeline for addressing 64-bit memory and you have a battle royale going up against Intel. Currently the Intel Xeon is the preferred choice for applications requiring large amounts of DRAM to hold whole databases and Memcached webpages for lightning quick fetches. On the other end of the scale is the low per watt 4 core ARM chips dissipating a mere 5 watts. Intel is trying to drive down the Thermal Design Point for their chips even resorting to 64bit Atom chips to keep the Memory Addressing advantage. But the timeline for decreasing the Thermal Design Point doesn’t quite match up to the ARM x64 timeline. So I suspect ARM will have the advantage as will Calxeda for quite some time to come.

While I had hoped the recen ARM-15 announcement was also going to usher in a fully 64-bit capable cpu, it will at least be able to fake larger size memory access. The datapath I remember being quoted was 40-bits wide and that can be further extended using software. And it doesn’t seem to have discouraged HP at all who are testing the Calxeda designed prototype EnergyCore evaluation board. This is all new territory for both Calxeda and HP so a fully engineered and designed prototype is absolutely necessary to get this project off the ground. My hope is HP can do a large scale test and figure out some of the software configuration optimization that needs to occur to gain an advantage in power savings, density and speed over an Intel Atom server (like SeaMicro).

Categories
cloud computers data center technology

SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register

Image representing SeaMicro as depicted in Cru...
Image via CrunchBase

An original SM10000 server with 512 cores and 1TB of main memory cost $139,000. The bump up to the 64-bit Atom N570 for 512 cores and the same 1TB of memory boosted the price to $165,000. A 768-core, 1.5TB machine using the new 64HD cards will run you $237,000. Thats 50 per cent more oomph and memory for 43.6 per cent more money. ®

via SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register.

SeaMicro continues to pump out the jams releasing another updated chassis in less than a year. There is now a grand total of 768 processor cores jammed in that 10U high box. Which leads me to believe they have just eclipsed the compute per rack unit of the Tilera and Calxeda massively parallel cloud servers in a box. But that would wrong because Calxeda is making a 2U server rack unit hold 120-4 core ARM cpus. So that gives you a grand total of 480 in just 2 rack units alone. Multiply that by 5 and you get 2400 cores in a 10U rack serving. So advantage Calxeda in total core count, however lets also consider software too. Atom being the cpu that Seamicro has chosen all along is an intel architecture chip and an x64 architecture at that. It is the best of both worlds for anyone who already had a big investment in Intel binary compatible OSes and applications. It is most often the software and it’s legacy pieces that drive the choice of which processor goes into your data cloud.

Anyone who had clean slate to start from might be able to choose between Calxeda versus Seamicro for their applications and infrastructure. And if density/thermal design point per rack unit is very important Calxeda too will suit your needs I would think. But who knows? Maybe your workflow isn’t as massively parallel as a Calxeda server and you might have a much lower implementation threshold getting started on an Intel system, so again advantage Seamicro. A real industry analyst would look at these two competing companies as complimentary, different architectures for different workflows.

Categories
cloud computers data center SSD technology test

Atom smasher claims Hadoop cloud migration victory • The Register

Image representing SeaMicro as depicted in Cru...
Image via CrunchBase

SeaMicro has been peddling its SM10000-64 micro server, based on Intels dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis. . .

. . . The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

via Atom smasher claims Hadoop cloud migration victory • The Register.

While it is not always easy to illustrate the cost/benefit and Return on Investment on a lower power box like the Seamicro, running it head to head on a similar workload with a bunch of off the shelf Xeon boxes really shows the difference. The calculation of the benefit is critical too. What do you measure? Is it speed? Is it speed per transaction? Is it total volume allowed through? Or is it cost per unit transaction within a set amount of transactions? You’re getting closer with that last one. The test setup used a set number of transaction needing to be done in a set period of time. The benchmark then measure total power dissipation to accomplish that number of transactions in the set amount of time. SeaMicro came away the winner in unit cost per transaction in power terms. While the Xeon based servers had huge excess speed and capacity the power dissipation put it pretty far into the higher cost per transaction category.

However it is very difficult to communicate this advantage that SeaMicro has over Intel. Future tests/benchmarks need to be constructed with clearly stated goals and criteria. Specifically if it can be communicated as a Case History of a particular problem that could be solved by either a SeaMicro server or a bunch of Intel boxes running Xeon cpus with big caches. Once that Case History is well described, then the two architectures are then put to work showing what the end goal is in clear terms (cost per transaction). Then and only then will SeaMicro communicate effectively how it does things different and how that can save money. Otherwise it’s too different to measure effectively versus a Intel Xeon based rack of servers.

Categories
cloud data center google technology

Facebook: No ‘definite plans’ to ARM data centers • The Register

Image representing Facebook as depicted in Cru...
Image via CrunchBase

Clearly, ARM and Tilera are a potential threat to Intel’s server business. But it should be noted that even Google has called for caution when it comes to massively multicore systems. In a paper published in IEEE Micro last year, Google senior vice president of operations Urs Hölzle said that chips that spread workloads across more energy-efficient but slower cores may not be preferable to processors with faster but power-hungry cores.

“So why doesn’t everyone want wimpy-core systems?” Hölzle writes. “Because in many corners of the real world, they’re prohibited by law – Amdahl’s law.

via Facebook: No ‘definite plans’ to ARM data centers • The Register.

The explanation given here by Google’s top systems person is that latency versus parallel processes overhead. Which means if you have to do all the steps in order, with a very low level of parallel tasks that results in much higher performance. And that is the measure that all the users of your service will judge you by. Making things massively parallel might provide the same level of response, but at a lower energy cost. However the complications due to communication and processing overhead to assemble all the data and send it over the wire will offset any advantage in power efficiency. In other words, everything takes longer and latency increases, and the users will deem your service to be slow and unresponsive. That’s the dilemna of Amdahl’s Law, the point of diminishing returns when adopting parallel computer architectures.

Now compare this to something say we know much more concretely, like the Airline Industry. As the cost of tickets came down, the attempt to cut costs went up. Schedules for landings and gate assignments got more complicated and service levels have suffered terribly. No one is really all that happy about the service they get, even from the best airline currently operating. So maybe Amdahl’s Law doesn’t apply when there’s a false ceiling placed on what is acceptable in terms of the latency of a ‘system’. If airlines are not on time, but you still make your connection 99% of the time, who will complain? So by way of comparison there is a middle ground that may be achieved where more parallelizing of compute tasks will lower the energy required by a data center. It will require greater latency, and a worse experience for the users. But if everyone suffers equally from this and the service is not great but adequate, then the company will be able to cut costs through implementing more parallel processors in their data centers.

I think Tilera holds a special attraction potentially for Facebook. Especially since Quanta their hardware assembler of choice is already putting together computers with the Tilera chip for customers now. It seems like this chain of associations might prove a way for Facebook to test the waters on a scale large enough to figure out the cost/benefits of massively parallel cpus in the data center. Maybe it will take another build out of a new data center to get there, but it will happen no doubt eventually.

Categories
cloud computers data center technology

Quanta crams 512 cores into pizza box server • The Register

Image representing Tilera as depicted in Crunc...
Image via CrunchBase

Two of these boards are placed side-by-side in the chassis and stacked two high, for a total of eight server nodes. Eight nodes at 64 cores each gives you 512 total cores in a 2U chassis. The server boards slide out on individual trays and share two 1,100 watt power supplies that are stacked on top of each other and that are put in the center of the chassis. Each node has three SATA II ports and can have three 2.5-inch drives allocated to it; the chassis holds two dozen drives, mounted in the front and hot pluggable.

via Quanta crams 512 cores into pizza box server • The Register.

Amazing how power efficient Tilera has made it’s shipping products as Quanta has jammed 512 cores into a 2 Rack Unit high box. Roughly this is 20% the size of the SeaMicro SM-10000 based on Intel Atom cpus. Now that there’s a shipping product, I would like to see benchmarks or comparisons made on similar workloads using both sets of hardware. Numerically speaking it will be an apples-to-apples comparison. But each of these products is unique and are going to be difficult to judge in the coming year.

First off, Intel Atom is an x86 compatible low power chip that helped launch the Asus/Acer netbook revolution (which until the iPad killed it was a big deal). However Quanta in order to get higher density on its hardware has chosen a different CPU than the Intel Atom (as used by SeaMicro). Instead Quanta is the primary customer for a new innovated chip company we have covered on carpetbomberz.com previously: Tilera. For those who have not been following the press releases from the company Tilera is a spin-off of an MIT research project in chip-scale networking. The idea was to create very simplified systems on a chip (whole computers scaled down to single chip) and then network them together all the same slice of silicon die. The speeds would be faster due to most of the physical interfaces and buses being contained directly on the chip circuits instead of externally on the computer’s motherboard. The promise of the Tilera chip is you can scaled up on the silicon wafer as opposed to the racks and racks of equipment within the datacenter. Performance of the Tilera chip has been somewhat a secret, no benchmarks or real comparisons to commercially shipping CPUs have been performed. But the feeling generally is any single core within a Tilera chip should be about as capable as the processor in your smartphone, and every bit as power efficient. Tilera has been planning to scale up to 100 cpus eventually within one single processor die and appears to have scaled up to 64 on its most recent research chips (far from being commercially produced at this point.)

I suspect both SeaMicro and Quanta will have their own custom OSes which run as a central supervisor allowing the administrators to install and sets up instances of their  favorite workhorse OSes. Each OS instance will be doled out to an available CPU core and then be linked up to a virtual network and virtual storage interface. Boom! You got a web server, file server, rendering station, streaming server, whatever you need in one fell swoop. And it is all bound together with two 1,100 watt power supplies in each 2 Rack Unit sized box. I don’t know how that compares to the SeaMicro power supply, but I imagine it is likely smaller per core than the SM-10000. Which can only mean in the war for data power efficiency Quanta might deliver to market a huge shot across the bow of SeaMicro. All I can say is let the games begin, let the market determine the winner.

Categories
computers data center mobile technology

Calxeda boasts of 5 watt ARM server node • The Register

Calxeda is not going to make and sell servers, but rather make chips and reference machines that it hopes other server makers will pick up and sell in their product lines. The company hopes to start sampling its first ARM chips and reference servers later this year. The first reference machine has 120 server nodes in a 2U rack-mounted format, and the fabric linking the nodes together internally can be extended to interconnect multiple enclosures together.

via Calxeda boasts of 5 watt ARM server node • The Register.

SeaMicro and now Calxeda are going gangbusters for the ultra dense low power server market. Unlike SeaMicro, Calxeda wants to create reference designs it licenses to manufacturers who will build machines with 120 cores in a 2 Unit rack. SeaMicro’s record right now is 512 cores per 10U rack  or roughly 102+ cores in a 2 Unit rack. The difference is the SeaMicro product uses an Intel low power Atom cpu,  whereas Calxeda is using a processor used more often in smart phones and tablet computers. SeaMicro has hinted they are not wedded to the Intel Architecture, but they are more interested in shipping real live product than coming up with generic designs others can license. In the long run it’s entirely possible SeaMicro may switch to a different CPU, they have indicated previously they have designed their servers with flexibility enough to swap out the processor to any other CPU if necessary. It would be really cool to see an apples-to-apples comparison of a SeaMicro server using first Intel CPUs versus ARM-based CPUs.