Archive for the ‘cloud’ Category
Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made — cores taking turns using the same bus. MIT’s 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel’s 2007 Teraflop Research Chip — code-named Polaris — where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 50-core Xeon Phi, however, Intel settled instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.
I commented some time back on a similar article on the same topic. It appears now the MIT research group has working silicon of the design. As mentioned in the pull-quote, the Xeon Phi (which has made some news in the Top 500 SuperComputer stories recently) is a massively multicore architecture but uses a different interconnect that Intel designed on their own. These stories as they appear get filed into the category of massively multicore or low power CPU developments. Most times the same CPUs add cores without significantly drawing more power and thus provide a net increase in compute ability. Tilera, Calxeda and yes even SeaMicro were all working along towards those ends. Either through mergers, or cutting of funding each one has seemed to trail off and not succeed at its original goal (massively multicore, low power designs). Also along the way Intel has done everything it can to dull and dent the novelty of the new designs by revising an Atom based or Celeron based CPU to provide much lower power at the scale of maybe 2 cores per CPU.
Like this chip MIT announced Tilera too was originally an MIT research product spun off of the University campus. Its principals were the PI and a research associate if I remember correctly. Now that MIT has the working silicon they’re going to benchmark and test and verify their design. The researchers will release the verilog hardware description of chip for anyone use, research or verify for themselves once they’ve completed their own study. It will be interesting to see how much of an incremental improvement this design provides, and possibly could be the launch of another Tilera style product out of MIT.
SUMMARY: Microsoft has been experimenting with its own custom chip effort in order to make its data centers more efficient, and these chips aren’t centered around ARM-based cores, but rather FPGAs from Altera.
FPGAs for the win, at least for eliminating unnecessary Xeon CPUs for doing online analytic processing for the Bing Search service. MS are saying they can process the same amount of data with half the number of CPUs by offloading some of the heavy lifting from general purpose CPUs to specially programmed FPGAs tune to the MS algorithms to deliver up the best search results. For MS the cost of the data center will out, and if you can drop half of the Xeons in a data center you just cut your per transaction costs by half. That is quite an accomplishment these days of radical incrementalism when it comes to Data Center ops and DevOps. The Field Programmable Gate Array is known as a niche, discipline specific kind of hardware solution. But when flashed, and programmed properly and re-configured as workloads and needs change it can do some magical heavy lifting from a computing standpoint.
Specifically I’m thinking really repetitive loops or recursive algorithms that take forever to unwind and deliver a final result are things best done in hardware versus software. For Search Engines that might be the process used to determine the authority of a page in the rankings (like Google’s PageRank). And knowing you can further tune the hardware to fit the algorithm means you’ll spend less time attempting to do heavy lifting on the General CPU using really fast C/C++ code instead. In Microsoft’s plan that means less CPUs need to do the same amount of work. And better yet, if you determine a better algorithm for your daily batch processes, you can spin up a new hardware/circuit diagram and apply that to the compute cluster over time (and not have to do a pull and replace of large sections of the cluster). It will be interesting to see if Microsoft reports out any efficiencies in a final report, as of now this seems somewhat theoretical though it may have been tested at least in a production test bed of some sort using real data.
Cavium will try to drive ARM SoCs into mainstream servers, challenging Intel’s Xeon x86 with a family of 28 nm devices using up to 48 2.5 GHz custom 64-bit ARM cores
Another entry into the massively multi-core low power server race. Since the fading of other competitors like Calxeda, SeaMicro there hasn’t been a lot of announcements or shipping products that promised to be the low-power vendor of choice. Each time an inventor or entrepreneur stepped up with a lower power or more core device, Intel would kind of blunt the advantage by doing a benchmark and claiming shutting cores off saves more power than using an inherently low power design. The race today as designed by Intel is race to sleep and that’s the benchmark by which they are measuring their own progress in the low power massively multi-core cpu market. However now Cavium is stepping up with an ARM based cpu with 48 cores. So let’s find out what we can about this new chip from this EE Times article.
It appears the manufacturing partner for this new product is Gigabyte who are creating a 2-socket motherboard for the 48-core ARM based CPU. The 48-core cpu is ARMv.8 based and addresses 64bits, so large amounts of RAM can be used with this architecture (a failing of past products from previous manufacturers attempting ARM based servers). Cavium has network processors in the market already using MIPS based CPUs and this new architecture using ARM based chips tries to leverage a lot of their expertise in the network processor market. Architecturally the motherboard interfaces and protocols are still in place, with only a cpu swap being the most noticeable difference. To Cavium is primarily known as a network processor manufacturer, but this move could push them into large scale data cloud type applications, with a tight binding to network operations supplied by their existing network processor products. Dates are still a little hazy, with the end of the calendar year being the most likely time a product has been developed, tested, manufactured and shipped.
I’m so happy to see the pressure being kept up in this one niche of computing. I still think ARM-based CPUs with massive amounts of cores being a new growth area. Similarly the move to 64bits takes away one of the last impediments most buyers pointed out when folks like Calxeda tried to market their wares into the data centers. Bit by bit, each attempt by each startup and each design outfit gets a little closer to a competitive product that might yet go up against the mighty Intel Xeon multi-core cpu.
It’s not unprecedented: Google already offers a testing suite for Android apps, though that’s focused on making sure they run well on smartphones and tablets, not testing the cloud-based services they connect to. If Google added testing services for the websites and services those apps connect to, it would have an end-to-end lock on developing for both the Web and mobile.
Load testing websites and web-apps is a market whose time has come. I know where I work we have Project group who has a guy who manages an installation of Silk as a load tester. Behind that is a little farm of old Latitude E6400s that he manages from the Silk console to point at whichever app is in development/QA/testing before it goes into production. Knowing there’s potential for a cloud-based tool for this makes me very, very interested.
As outsourcing goes, the Software as a Service (SaaS) or Platform as a Service (PaaS) or even Infrastructure as a Service (IaaS) categories are great as raw materials. But if there was just an app that I could login to, spin up some VMs install my load-test tool of choice and then manage them from my desktop, I would feel like I had accomplished something. Or failing that even just a toolkit for load testing with whatever tool du jour is already available (nothing is perfect that way) would be cool too. And better yet, if I could do that with an updated tool whenever I needed to conduct a round of testing, the tool would take into account things like the Heart Bleed bug in a timely fashion. That’s the kind of benefit a cloud-based, centrally managed, centrally updated Load Test service could provide.
And now as Microsoft has just announced a partnership with Salesforce on their Azure cloud platform, things get even more interesting. Not only could you develop using an existing toolkit like Salesforce.com, but host it on more than one cloud platform (AWS or Azure) as your needs change. And I would hope this would include unit test, load test and the whole sweet suite of security auditing one would expect for a webapp (thereby helping prevent vulnerabilities like HeartBleed OpenSSL).
After stripping out unnecessary licensing Office licenses, organisations were left with a hybrid environment, part cloud, part desktop Office.
The Center IT outfit I work for is dumping as much on premise Exchange Mailbox hosting as it can. However we are sticking with Outlook365 as provisioned by Microsoft (essentially an Outlook’d version of Hotmail). It has the calendar and global address list we all have come to rely on. But as this article goes into great detail on the rest of the Office Suite, people aren’t creating as many documents as they once did. We’re viewing them yes, but we just aren’t creating them.
I wonder how much of this is due in part to re-use or the assignment of duties to much higher top level people to become the authors. Your average admin assistant or even secretary doesn’t draft anything dictated to them anymore. The top level types now generally would be embarrassed to dictate something out to anyone. Plus the culture of secrecy necessitates more 1-to-1 style communications. And long form writing? Who does that anymore? No one writes letters, they write brief email or even briefer text, Tweets or Facebook updates. Everything is abbreviated to such a degree you don’t need thesaurus, pagination, or any of the super specialized doo-dads and add-ons we all begged M$ and Novell to add to their première word processors back in the day.
From an evolutionary standpoint, we could get by with the original text editors first made available on timesharing systems. I’m thinking of utilities like line editors (that’s really a step backwards, so I’m being really facetious here). The point I’m making is we’ve gone through a very advanced stage in the evolution of our writing tool of choice and it became a monopoly. WordPerfect lost out and fell by the wayside. Primary, Secondary and Middle Schools across the U.S. adopted M$ Word. They made it a requirement. Every college freshman has been given discounts to further the loyalty to the Office Suite. Now we don’t write like we used to, much less read. What’s the use of writing something so long in pages, no one will ever read it? We’ve jumped the shark of long form writing, and therefore the premiere app, the killer app for the desktop computer is slowly receding behind us as we keep speeding ahead. Eventually we’ll see it on the horizon, it’s sails being the last visible part, the crow’s nest, then poof! It will disappear below the horizon line. We’ll be left with our nostalgic memories of the first time we used MS Word.
In addition, AMD is planning to contribute to the Open Compute Project with a new micro-server design that utilizes the Opteron A-series, along with other architecture specifications for motherboards that Facebook helped developed called “Group Hug,” an agnostic server board design that can support traditional x86 processors, as well as ARM chips.
Kudos to Facebook as they still continue support for the Open Compute project which they spearheaded some years back to encourage more widespread expertise and knowledge of large scale data centers. This new charge is to allow a pick-and-choose, best of breed kind of design whereby a CPU is not a fixed quantity but can be chosen or changed like a hard drive or RAM module. And with the motherboard firmware remaining more or less consistent regardless of the CPU chosen. This would allow mass customization based solely on the best CPU for a given job (HTTP, DNS, Compute, Storage, etc). And the spare capacity might be allowed to erode a little so that any general CPU could be somewhat more aggressively scheduled while some of it’s former, less efficient services could be migrated to more specialist mobile CPUs on another cluster. Each CPU doing the set of protocols, services it inherently does best. This flies further in the face of always choosing general compute style CPUs and letting the software do most of the heavy lifting once the programming is completed.
Does Fusion-io have a sustainable competitive advantage or will it get blown away by a hurricane of other PCIe flash card vendors attacking the market, such as EMC, Intel, Micron, OCZ, TMS, and many others?
More updates on the data center uptake of PCI SSD cards in the form of two big wins from Facebook and Apple. Price/Performance for database applications seems to be skewed heavily to Fusion-io versus the big guns in large scale SAN roll-outs. It seems like due to the smaller scale and faster speed PCI SSD outstrips the resources needed to get an equally fast disk based storage array (including power, and square feet taken up by all the racks). Typically a large rack of spinning disks can be aggregated by using RAID drive controllers and caches to look like a very large high speed hard drive. The Fibre Channel connections add yet another layer of aggregation on top of all that so that you can start splitting the underlying massive disk array into virtual logical drives that fit the storage needs of individual servers and OSes along the way. But to get sufficient speed equal to a Fusion-io style PCI SSD, say to speed up JUST your MySQL server the number of equivalent drives, racks, RAID controllers, caches and Fibre Channel host bus adapters is so large and costs so much, it isn’t worth it.
A single PCI SSD won’t quite have the same total storage capacity as say that larger scale SAN. But for a single, say one-off speed up of a MySQL database you don’t need the massive storage so much as the massive speed up in I/O. And that’s where the PCI SSD comes into play. With the newest PCI 3.0 interfaces and utilizing 8x (eight PCI lane) connectors the current generation of cards is able to maintain 2GB/sec through put on a single PCI card. To achieve that using the older SAN technology is not just cost prohibitive but seriously SPACE prohibitive in all but the largest of data centers. The race now is to see how dense and energy efficient a data center can be constructed. So it comes as no surprise that Facebook and Apple (who are attempting to lower costs all around) are the ones leading this charge of higher density and higher power efficiency as well.
Don’t get me wrong when I tout the PCI SSD so heavily. Disk storage will never go away in my lifetime. It’s just to cost effective and it is fast enough. But for the SysOps in charge of deploying production Apps and hitting performance brick walls, the PCI SSD is going to really save the day. And if nothing else will act as a bridge for most until a better solution can be designed and procured in any given situation. That alone I think would make the cost of trying out a PCI SSD well worth it. Longer term, which vendor will win is still a toss-up. I’m not well versed in the scale of sales into Enterprises of the big vendors in the PCI SSD market. But Fusion-io is doing a great job keeping their name in the press and marketing to some big identifiable names.
But also I give OCZ some credit to with their Z-Drive R5 though it’s not quite considered an Enterprise data center player. Design wise, the OCZ R5 is helping push the state of the art by trying out new controllers, new designs attempting to raise the total number of I/Os and bandwidth on single card. I’ve seen one story so far about a test sample at Computex(Anandtech) that a brand new clean R5 hit nearly 800,000 I/Os in benchmark tests. That peak peformance eventually eroded as the flash chips filled up and fell to around 530,000 I/Os but the trend is clear. We may see 1million IOPs on a single PCI SDD before long. And that my readers is going to be an Andy Grove style 10X difference that brings changes we never thought possible.
- SanDisk Reveals Lightning PCI Express SSD Cards (news.softpedia.com)
- Intel’s SSD 910: Finally a PCIe SSD from Intel (anandtech.com)
- Three questions Fusion-io’s rivals face after flash API bombshell (go.theregister.com)
- Souping Up the Mac Pro: OWC Accelsior PCI Express SSD (barefeats.com)