SUMMARY: Microsoft has been experimenting with its own custom chip effort in order to make its data centers more efficient, and these chips aren’t centered around ARM-based cores, but rather FPGAs from Altera.
FPGAs for the win, at least for eliminating unnecessary Xeon CPUs for doing online analytic processing for the Bing Search service. MS are saying they can process the same amount of data with half the number of CPUs by offloading some of the heavy lifting from general purpose CPUs to specially programmed FPGAs tune to the MS algorithms to deliver up the best search results. For MS the cost of the data center will out, and if you can drop half of the Xeons in a data center you just cut your per transaction costs by half. That is quite an accomplishment these days of radical incrementalism when it comes to Data Center ops and DevOps. The Field Programmable Gate Array is known as a niche, discipline specific kind of hardware solution. But when flashed, and programmed properly and re-configured as workloads and needs change it can do some magical heavy lifting from a computing standpoint.
Specifically I’m thinking really repetitive loops or recursive algorithms that take forever to unwind and deliver a final result are things best done in hardware versus software. For Search Engines that might be the process used to determine the authority of a page in the rankings (like Google’s PageRank). And knowing you can further tune the hardware to fit the algorithm means you’ll spend less time attempting to do heavy lifting on the General CPU using really fast C/C++ code instead. In Microsoft’s plan that means less CPUs need to do the same amount of work. And better yet, if you determine a better algorithm for your daily batch processes, you can spin up a new hardware/circuit diagram and apply that to the compute cluster over time (and not have to do a pull and replace of large sections of the cluster). It will be interesting to see if Microsoft reports out any efficiencies in a final report, as of now this seems somewhat theoretical though it may have been tested at least in a production test bed of some sort using real data.
The Center IT outfit I work for is dumping as much on premise Exchange Mailbox hosting as it can. However we are sticking with Outlook365 as provisioned by Microsoft (essentially an Outlook’d version of Hotmail). It has the calendar and global address list we all have come to rely on. But as this article goes into great detail on the rest of the Office Suite, people aren’t creating as many documents as they once did. We’re viewing them yes, but we just aren’t creating them.
I wonder how much of this is due in part to re-use or the assignment of duties to much higher top level people to become the authors. Your average admin assistant or even secretary doesn’t draft anything dictated to them anymore. The top level types now generally would be embarrassed to dictate something out to anyone. Plus the culture of secrecy necessitates more 1-to-1 style communications. And long form writing? Who does that anymore? No one writes letters, they write brief email or even briefer text, Tweets or Facebook updates. Everything is abbreviated to such a degree you don’t need thesaurus, pagination, or any of the super specialized doo-dads and add-ons we all begged M$ and Novell to add to their première word processors back in the day.
From an evolutionary standpoint, we could get by with the original text editors first made available on timesharing systems. I’m thinking of utilities like line editors (that’s really a step backwards, so I’m being really facetious here). The point I’m making is we’ve gone through a very advanced stage in the evolution of our writing tool of choice and it became a monopoly. WordPerfect lost out and fell by the wayside. Primary, Secondary and Middle Schools across the U.S. adopted M$ Word. They made it a requirement. Every college freshman has been given discounts to further the loyalty to the Office Suite. Now we don’t write like we used to, much less read. What’s the use of writing something so long in pages, no one will ever read it? We’ve jumped the shark of long form writing, and therefore the premiere app, the killer app for the desktop computer is slowly receding behind us as we keep speeding ahead. Eventually we’ll see it on the horizon, it’s sails being the last visible part, the crow’s nest, then poof! It will disappear below the horizon line. We’ll be left with our nostalgic memories of the first time we used MS Word.
The looming introduction of a 64-bit ARM-based server core (production 64-bit ARM server chips are expected from a variety of vendors later this year) also changes the economics of developing a server chip. While Moorhead believes building your own core is a multihundred million dollar process, Andrew Feldman, the corporate vice president and general manager of Advanced Micro Devices’ server chip business, told me last December that it could be in the tens of millions.
Things are changing rapidly in the ARM licensing market. The cost of a license is reasonable, you just need to get a contract fabricator to help process the silicon wafers for you. As the pull quote says even someone “dabbling” in the custom silicon cpu market, the threshold and risk for an outfit like Amazon is pretty darned low. And like so many other fields and areas in the cloud services sector, many others have done a lot of the heavy lifting already. Google and Facebook both have detailed and outline their custom computer build process (with Facebook going further and drafting the Open Compute Cloud spec). Apple (though not really a cloud provider) has shown the way towards a workable, scalable and somewhat future proof path to spinning many revs of custom CPUs (granted ARM derived, but still admirable). Between Apple’s contract manufacturing with Samsung and TSMC for their custom mobile CPUs and the knowledge Amazon has in house for their own rack based computers, there’s no telling how optimized they could make their AWS and EC2 data center services given more time.
No doubt to stay competitive against Google, Facebook, Microsoft and IBM, Amazon will go the custom route and try to lower ALL the marginal operating costs and capital costs. At least as is technically feasible and is cost effective. There’s a new cold war on in the Cloud, and it’s going to be customized, custom made, ultra-tailored computer configurations. And each player will find it’s competitive advantage each step along the way, some will go for MIPs some for FLOPs others for TDM and all the marginal costs and returns will be optimized for each completed instruction for each clock cycle. It’s a brave new closed source, closed hardware world and we’re just the ones living in it, or should I say living in the cloud.
It’s not just photos. I want the same for my whole expanding set of digital objects, including medical and financial records, commercial transactions, personal correspondence, home energy use data, you name it. I want all of my lifebits to be hosted in the cloud under my control. Is that feasible? Technically there are huge challenges, but they’re good ones, the kind that will spawn new businesses.
From Gordon Moore‘s MyLifeBits to most recently Stephen Wolfram‘s personal collection of data and now to Jon Udell. Witness the ever expanding universe of personal data. Thinking about Gordon Moore now, I think the emphasis from Microsoft Research was always on video and pictures and ‘recollecting’ what’s happened in any given day. Stephen Wolfram’s emphasis was not so much on collecting the data but analyzing it after the fact and watching patterns emerge. Now with Jon Udell we get a nice kind of advancing of the art by looking at possible end-game scenarios. So you have collected a mass of LifeBits, now what?
Who’s going to manage this thing? Is anyone going to offer a service that will help manage it? All great questions because the disparate form social networking lifebits take versus other like health and ‘performance’ lifebits (like Stephen Wolfram collects and maintains for himself) are pointing up a big gap that exists in the cloud services sector. Ripe pickings for anyone in the entrepreneurial vein to step in and bootstrap a service like the one Jon Udell proposes. If someone was really smart they could get it up and running cheaply on Amazon Web Services (AWS) until it got to be too cost and performance prohibitive to keep it hosted there. That would both allow an initial foray to test the waters, see the size and tastes of the market and adapt the hosted lifebits service to anyone willing to pay up. That might just be a recipe for success.
SeaMicro’s latest server includes 384 Intel Atom chips, and each chip has two “cores,” which are essentially processors unto themselves. This means the machine can handle 768 tasks at once, and if you’re running software suited to this massively parallel setup, you can indeed save power and space.
Great article from Wired.com on SeaMicro and the two principle minds behind its formation. Both of these fellows were quite impressed with Google’s data center infrastructure at the points in time when they both got to visit a Google Data Center. But rather than just sit back and gawk, they decided to take action and borrow, nay steal some of those interesting ideas the Google Engineers adopted early on. However, the typical naysayers pull a page out of the Google white paper arguing against SeaMicro and the large number of smaller, lower-powered cores they use in the SM-10000 product.
But nothing speaks of success more than product sales and SeaMicro is selling it’s product into data centers. While they may not achieve the level of commerce reached by Apple Inc., it’s a good start. What still needs to be done is more benchmarks and real world comparisons that reproduce or negate the results of Google’s whitepaper promoting their choice of off the shelf commodity Intel chips. Google is adamant that higher clock speed ‘server’ chips attached to single motherboards connected to one another in large quantity is the best way to go. However, the two guys who started SeaMicro insist that while Google’s choice for itself makes perfect sense, NO ONE else is quite like Google in their compute infrastructure requirements. Nobody has such a large enterprise or the scale Google requires (except for maybe Facebook, and possibly Amazon). So maybe there is a market at the middle and lower end of the data center owner’s market? Every data center’s needs will be different especially when it comes to available space, available power and cooling restrictions for a given application. And SeaMicro might be the secret weapon for shops constrained by all three: power/cooling/space.
*UPDATE: Just saw this flash through my Google Reader blogroll this past Wednesday, Seamicro is now selling an Intel Xeon based server. I guess the market for larger numbers of lower power chips just isn’t strong enough to sustain a business. Sadly this makes all the wonder and speculation surrounding the SM10000 seem kinda moot now. But hopefully there’s enough intellectual property rights and patents in the original design to keep the idea going for a while. Seamicro does have quite a headstart over competitors like Tilera, Calxeda and Applied Micro. And if they can help finance further developments of Atom based servers by selling a few Xeons along the way, all the better.
By itself Calxeda has made some big plans attempting to create computers like the SeaMicro SM10000. But the ability to manufacture on any scale and then sell that product is a bit limited. But as of today HP has partnered with Calxeda to sell product and help design a server using the reference design for a compute node. So the ball is rolling, and now there’s a third leg in this race between the Compute Cloud in a Box manufacturers (Calxeda, SeaMicro and Tilera). Read On:
Calxeda is producing 4-core, 32-bit, ARM-based system-on-chip SOC designs, developed from ARMs Cortex A9. It says it can deliver a server node with a thermal envelope of less than 5 watts. In the summer it was designing an interconnect to link thousands of these things together. A 2U rack enclosure could hold 120 server nodes: thats 480 cores.
HP signing on as a OEM for Calxeda designed equipment is going to push ARM based massively parallel server designs into a lot more data centers. Add to this the announcement of the new ARM-15 cpu and it’s timeline for addressing 64-bit memory and you have a battle royale going up against Intel. Currently the Intel Xeon is the preferred choice for applications requiring large amounts of DRAM to hold whole databases and Memcached webpages for lightning quick fetches. On the other end of the scale is the low per watt 4 core ARM chips dissipating a mere 5 watts. Intel is trying to drive down the Thermal Design Point for their chips even resorting to 64bit Atom chips to keep the Memory Addressing advantage. But the timeline for decreasing the Thermal Design Point doesn’t quite match up to the ARM x64 timeline. So I suspect ARM will have the advantage as will Calxeda for quite some time to come.
While I had hoped the recen ARM-15 announcement was also going to usher in a fully 64-bit capable cpu, it will at least be able to fake larger size memory access. The datapath I remember being quoted was 40-bits wide and that can be further extended using software. And it doesn’t seem to have discouraged HP at all who are testing the Calxeda designed prototype EnergyCore evaluation board. This is all new territory for both Calxeda and HP so a fully engineered and designed prototype is absolutely necessary to get this project off the ground. My hope is HP can do a large scale test and figure out some of the software configuration optimization that needs to occur to gain an advantage in power savings, density and speed over an Intel Atom server (like SeaMicro).
In past readings of announcements and analysis of announcements from ARM and Calxeda, I got the impression everyone was looking forward to ARM-15 4core cpus that had 64bit capability, specially the 64-bit addressing for large amounts of DRAM. Well now the first test chip for ARM-15 has been announced. And the timescale for the production release of that chips is now clearer. My only question is when will they announce the x64 version of ARM-15. Let’s first look at what’s been written so far. Read On:
The test chip will be fabbed at TSMC on its next-generation 20nm process, a full node reduction ~50% transistor scaling over its 28nm process. With the first 28nm ARM based products due out from TSMC in 2012, this 20nm tape-out announcement is an important milestone but were still around two years away from productization.
Happy Halloween! And like most years there are some tricks up ARM’s sleeve announced this past week along with some partnerships that should make things trickier for the Engineers trying to equip ever more energy efficient and dense Data Centers the world over.
It’s been announced, the ARM15 is coming to market some time in the future. Albeit a ways off yet. And it’s going to be using a really narrow design rule to insure it’s as low power as it possibly can be. I know manufacturers of the massively parallel compute cloud in a box will be seeking out this chip as soon as samples can arrive. The 64bit version of ARM15 is the real potential jewel in the crown for Calxeda who is attempting to balance low power and 64bit performance in the same design.
I can’t wait to see the first benchmarks of these chips apart from the benchmarks from the first shipping product Calxeda can get out with the ARM15 x64. Also note just this week Hewlett-Packard has signed on to sell designs by Calxeda in forth coming servers targeted at Energy Efficient Data Center build-outs. So more news to come regarding that partnership and you can read it right here @ Carpetbomberz.com