Archive for the ‘cloud’ Category
Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made — cores taking turns using the same bus. MIT’s 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel’s 2007 Teraflop Research Chip — code-named Polaris — where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 50-core Xeon Phi, however, Intel settled instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.
I commented some time back on a similar article on the same topic. It appears now the MIT research group has working silicon of the design. As mentioned in the pull-quote, the Xeon Phi (which has made some news in the Top 500 SuperComputer stories recently) is a massively multicore architecture but uses a different interconnect that Intel designed on their own. These stories as they appear get filed into the category of massively multicore or low power CPU developments. Most times the same CPUs add cores without significantly drawing more power and thus provide a net increase in compute ability. Tilera, Calxeda and yes even SeaMicro were all working along towards those ends. Either through mergers, or cutting of funding each one has seemed to trail off and not succeed at its original goal (massively multicore, low power designs). Also along the way Intel has done everything it can to dull and dent the novelty of the new designs by revising an Atom based or Celeron based CPU to provide much lower power at the scale of maybe 2 cores per CPU.
Like this chip MIT announced Tilera too was originally an MIT research product spun off of the University campus. Its principals were the PI and a research associate if I remember correctly. Now that MIT has the working silicon they’re going to benchmark and test and verify their design. The researchers will release the verilog hardware description of chip for anyone use, research or verify for themselves once they’ve completed their own study. It will be interesting to see how much of an incremental improvement this design provides, and possibly could be the launch of another Tilera style product out of MIT.
SUMMARY: Microsoft has been experimenting with its own custom chip effort in order to make its data centers more efficient, and these chips aren’t centered around ARM-based cores, but rather FPGAs from Altera.
FPGAs for the win, at least for eliminating unnecessary Xeon CPUs for doing online analytic processing for the Bing Search service. MS are saying they can process the same amount of data with half the number of CPUs by offloading some of the heavy lifting from general purpose CPUs to specially programmed FPGAs tune to the MS algorithms to deliver up the best search results. For MS the cost of the data center will out, and if you can drop half of the Xeons in a data center you just cut your per transaction costs by half. That is quite an accomplishment these days of radical incrementalism when it comes to Data Center ops and DevOps. The Field Programmable Gate Array is known as a niche, discipline specific kind of hardware solution. But when flashed, and programmed properly and re-configured as workloads and needs change it can do some magical heavy lifting from a computing standpoint.
Specifically I’m thinking really repetitive loops or recursive algorithms that take forever to unwind and deliver a final result are things best done in hardware versus software. For Search Engines that might be the process used to determine the authority of a page in the rankings (like Google’s PageRank). And knowing you can further tune the hardware to fit the algorithm means you’ll spend less time attempting to do heavy lifting on the General CPU using really fast C/C++ code instead. In Microsoft’s plan that means less CPUs need to do the same amount of work. And better yet, if you determine a better algorithm for your daily batch processes, you can spin up a new hardware/circuit diagram and apply that to the compute cluster over time (and not have to do a pull and replace of large sections of the cluster). It will be interesting to see if Microsoft reports out any efficiencies in a final report, as of now this seems somewhat theoretical though it may have been tested at least in a production test bed of some sort using real data.
Cavium will try to drive ARM SoCs into mainstream servers, challenging Intel’s Xeon x86 with a family of 28 nm devices using up to 48 2.5 GHz custom 64-bit ARM cores
Another entry into the massively multi-core low power server race. Since the fading of other competitors like Calxeda, SeaMicro there hasn’t been a lot of announcements or shipping products that promised to be the low-power vendor of choice. Each time an inventor or entrepreneur stepped up with a lower power or more core device, Intel would kind of blunt the advantage by doing a benchmark and claiming shutting cores off saves more power than using an inherently low power design. The race today as designed by Intel is race to sleep and that’s the benchmark by which they are measuring their own progress in the low power massively multi-core cpu market. However now Cavium is stepping up with an ARM based cpu with 48 cores. So let’s find out what we can about this new chip from this EE Times article.
It appears the manufacturing partner for this new product is Gigabyte who are creating a 2-socket motherboard for the 48-core ARM based CPU. The 48-core cpu is ARMv.8 based and addresses 64bits, so large amounts of RAM can be used with this architecture (a failing of past products from previous manufacturers attempting ARM based servers). Cavium has network processors in the market already using MIPS based CPUs and this new architecture using ARM based chips tries to leverage a lot of their expertise in the network processor market. Architecturally the motherboard interfaces and protocols are still in place, with only a cpu swap being the most noticeable difference. To Cavium is primarily known as a network processor manufacturer, but this move could push them into large scale data cloud type applications, with a tight binding to network operations supplied by their existing network processor products. Dates are still a little hazy, with the end of the calendar year being the most likely time a product has been developed, tested, manufactured and shipped.
I’m so happy to see the pressure being kept up in this one niche of computing. I still think ARM-based CPUs with massive amounts of cores being a new growth area. Similarly the move to 64bits takes away one of the last impediments most buyers pointed out when folks like Calxeda tried to market their wares into the data centers. Bit by bit, each attempt by each startup and each design outfit gets a little closer to a competitive product that might yet go up against the mighty Intel Xeon multi-core cpu.
It’s not unprecedented: Google already offers a testing suite for Android apps, though that’s focused on making sure they run well on smartphones and tablets, not testing the cloud-based services they connect to. If Google added testing services for the websites and services those apps connect to, it would have an end-to-end lock on developing for both the Web and mobile.
Load testing websites and web-apps is a market whose time has come. I know where I work we have Project group who has a guy who manages an installation of Silk as a load tester. Behind that is a little farm of old Latitude E6400s that he manages from the Silk console to point at whichever app is in development/QA/testing before it goes into production. Knowing there’s potential for a cloud-based tool for this makes me very, very interested.
As outsourcing goes, the Software as a Service (SaaS) or Platform as a Service (PaaS) or even Infrastructure as a Service (IaaS) categories are great as raw materials. But if there was just an app that I could login to, spin up some VMs install my load-test tool of choice and then manage them from my desktop, I would feel like I had accomplished something. Or failing that even just a toolkit for load testing with whatever tool du jour is already available (nothing is perfect that way) would be cool too. And better yet, if I could do that with an updated tool whenever I needed to conduct a round of testing, the tool would take into account things like the Heart Bleed bug in a timely fashion. That’s the kind of benefit a cloud-based, centrally managed, centrally updated Load Test service could provide.
And now as Microsoft has just announced a partnership with Salesforce on their Azure cloud platform, things get even more interesting. Not only could you develop using an existing toolkit like Salesforce.com, but host it on more than one cloud platform (AWS or Azure) as your needs change. And I would hope this would include unit test, load test and the whole sweet suite of security auditing one would expect for a webapp (thereby helping prevent vulnerabilities like HeartBleed OpenSSL).
After stripping out unnecessary licensing Office licenses, organisations were left with a hybrid environment, part cloud, part desktop Office.
The Center IT outfit I work for is dumping as much on premise Exchange Mailbox hosting as it can. However we are sticking with Outlook365 as provisioned by Microsoft (essentially an Outlook’d version of Hotmail). It has the calendar and global address list we all have come to rely on. But as this article goes into great detail on the rest of the Office Suite, people aren’t creating as many documents as they once did. We’re viewing them yes, but we just aren’t creating them.
I wonder how much of this is due in part to re-use or the assignment of duties to much higher top level people to become the authors. Your average admin assistant or even secretary doesn’t draft anything dictated to them anymore. The top level types now generally would be embarrassed to dictate something out to anyone. Plus the culture of secrecy necessitates more 1-to-1 style communications. And long form writing? Who does that anymore? No one writes letters, they write brief email or even briefer text, Tweets or Facebook updates. Everything is abbreviated to such a degree you don’t need thesaurus, pagination, or any of the super specialized doo-dads and add-ons we all begged M$ and Novell to add to their première word processors back in the day.
From an evolutionary standpoint, we could get by with the original text editors first made available on timesharing systems. I’m thinking of utilities like line editors (that’s really a step backwards, so I’m being really facetious here). The point I’m making is we’ve gone through a very advanced stage in the evolution of our writing tool of choice and it became a monopoly. WordPerfect lost out and fell by the wayside. Primary, Secondary and Middle Schools across the U.S. adopted M$ Word. They made it a requirement. Every college freshman has been given discounts to further the loyalty to the Office Suite. Now we don’t write like we used to, much less read. What’s the use of writing something so long in pages, no one will ever read it? We’ve jumped the shark of long form writing, and therefore the premiere app, the killer app for the desktop computer is slowly receding behind us as we keep speeding ahead. Eventually we’ll see it on the horizon, it’s sails being the last visible part, the crow’s nest, then poof! It will disappear below the horizon line. We’ll be left with our nostalgic memories of the first time we used MS Word.
In addition, AMD is planning to contribute to the Open Compute Project with a new micro-server design that utilizes the Opteron A-series, along with other architecture specifications for motherboards that Facebook helped developed called “Group Hug,” an agnostic server board design that can support traditional x86 processors, as well as ARM chips.
Kudos to Facebook as they still continue support for the Open Compute project which they spearheaded some years back to encourage more widespread expertise and knowledge of large scale data centers. This new charge is to allow a pick-and-choose, best of breed kind of design whereby a CPU is not a fixed quantity but can be chosen or changed like a hard drive or RAM module. And with the motherboard firmware remaining more or less consistent regardless of the CPU chosen. This would allow mass customization based solely on the best CPU for a given job (HTTP, DNS, Compute, Storage, etc). And the spare capacity might be allowed to erode a little so that any general CPU could be somewhat more aggressively scheduled while some of it’s former, less efficient services could be migrated to more specialist mobile CPUs on another cluster. Each CPU doing the set of protocols, services it inherently does best. This flies further in the face of always choosing general compute style CPUs and letting the software do most of the heavy lifting once the programming is completed.
Does Fusion-io have a sustainable competitive advantage or will it get blown away by a hurricane of other PCIe flash card vendors attacking the market, such as EMC, Intel, Micron, OCZ, TMS, and many others?
More updates on the data center uptake of PCI SSD cards in the form of two big wins from Facebook and Apple. Price/Performance for database applications seems to be skewed heavily to Fusion-io versus the big guns in large scale SAN roll-outs. It seems like due to the smaller scale and faster speed PCI SSD outstrips the resources needed to get an equally fast disk based storage array (including power, and square feet taken up by all the racks). Typically a large rack of spinning disks can be aggregated by using RAID drive controllers and caches to look like a very large high speed hard drive. The Fibre Channel connections add yet another layer of aggregation on top of all that so that you can start splitting the underlying massive disk array into virtual logical drives that fit the storage needs of individual servers and OSes along the way. But to get sufficient speed equal to a Fusion-io style PCI SSD, say to speed up JUST your MySQL server the number of equivalent drives, racks, RAID controllers, caches and Fibre Channel host bus adapters is so large and costs so much, it isn’t worth it.
A single PCI SSD won’t quite have the same total storage capacity as say that larger scale SAN. But for a single, say one-off speed up of a MySQL database you don’t need the massive storage so much as the massive speed up in I/O. And that’s where the PCI SSD comes into play. With the newest PCI 3.0 interfaces and utilizing 8x (eight PCI lane) connectors the current generation of cards is able to maintain 2GB/sec through put on a single PCI card. To achieve that using the older SAN technology is not just cost prohibitive but seriously SPACE prohibitive in all but the largest of data centers. The race now is to see how dense and energy efficient a data center can be constructed. So it comes as no surprise that Facebook and Apple (who are attempting to lower costs all around) are the ones leading this charge of higher density and higher power efficiency as well.
Don’t get me wrong when I tout the PCI SSD so heavily. Disk storage will never go away in my lifetime. It’s just to cost effective and it is fast enough. But for the SysOps in charge of deploying production Apps and hitting performance brick walls, the PCI SSD is going to really save the day. And if nothing else will act as a bridge for most until a better solution can be designed and procured in any given situation. That alone I think would make the cost of trying out a PCI SSD well worth it. Longer term, which vendor will win is still a toss-up. I’m not well versed in the scale of sales into Enterprises of the big vendors in the PCI SSD market. But Fusion-io is doing a great job keeping their name in the press and marketing to some big identifiable names.
But also I give OCZ some credit to with their Z-Drive R5 though it’s not quite considered an Enterprise data center player. Design wise, the OCZ R5 is helping push the state of the art by trying out new controllers, new designs attempting to raise the total number of I/Os and bandwidth on single card. I’ve seen one story so far about a test sample at Computex(Anandtech) that a brand new clean R5 hit nearly 800,000 I/Os in benchmark tests. That peak peformance eventually eroded as the flash chips filled up and fell to around 530,000 I/Os but the trend is clear. We may see 1million IOPs on a single PCI SDD before long. And that my readers is going to be an Andy Grove style 10X difference that brings changes we never thought possible.
- SanDisk Reveals Lightning PCI Express SSD Cards (news.softpedia.com)
- Intel’s SSD 910: Finally a PCIe SSD from Intel (anandtech.com)
- Three questions Fusion-io’s rivals face after flash API bombshell (go.theregister.com)
- Souping Up the Mac Pro: OWC Accelsior PCI Express SSD (barefeats.com)
Chip designer and chief Intel rival AMD has signed an agreement to acquire SeaMicro, a Silicon Valley startup that seeks to save power and space by building servers from hundreds of low-power processors.
It was bound to happen eventually, I guess. SeaMicro has been acquired by AMD. We’ll see what happens as a result as SeaMicro is a customer of Intel’s Atom chips and now most recently Xeon server chips as well. I have no idea where this is going or what AMD intends to do, but hopefully this won’t scare off any current or near future customers.
SeaMicro’s competitive advantage has been and will continue to be the development work they performed on that custom ASIC chip they use in all their systems. That bit of intellectual property was in essence the reason AMD decided to acquire SeaMicro and hopefully let it gain an engineering advantage for systems it might put out on the market in the future for large scale Data Centers.
While this is all pretty cool technology, I think that SeaMicro’s best move was to design its ASIC so that it could take virtually any common CPU. In fact, SeaMicro’s last big announcement introduced its SM10000-EX option, which uses low-power, quad-core Xeon processors to more than double compute performance while still keeping the high density, low-power characteristics of its siblings.
So there you have it Wired and The Register are reporting the whole transaction pretty positively. Looks on the surface to be a win for AMD as it can design new server products and get them to market quickly using the SeaMicro ASIC as a key ingredient. SeaMicro can still service it’s current customers and eventually allow AMD to up sell or upgrade as needed to keep the ball rolling. And with AMD’s Fusion architecture marrying GPUs with CPU cores who knows what cool new servers might be possible? But as usual the nay-sayers the spreaders of Fear, Uncertainty and Doubt have questioned the value of SeaMicro and their original product he SM-10000.
Diane Bryant, the general manager of Intel’s data center and connected systems group at a press conference for the launch of new Xeon processors had this to say, ““We looked at the fabric and we told them thereafter that we weren’t even interested in the fabric,” when asked about SeaMicro’s attempt to interest Intel in buying out the company. To Intel there’s nothing special enough in the SeaMicro to warrant buying the company. Furthermore Bryant told Wired.com:
“…Intel has its own fabric plans. It just isn’t ready to talk about them yet. “We believe we have a compelling solution; we believe we have a great road map,” she said. “We just didn’t feel that the solution that SeaMicro was offering was superior.”
This is a move straight out of Microsoft’s marketing department circa 1992 where they would pre-announce a product that never shipped was barely developed beyond a prototype stage. If Intel is really working on this as a new product offering you would have seen an announcement by now, rather than a vague, tangential reference that appears more like a parting shot than a strategic direction. So I will be watching intently in the coming months and years if needed to see what if any Intel ‘fabric technology’ makes its way from the research lab, to the development lab and to final product shipping. However don’t be surprised if this is Intel attempting to undermine AMD’s choice to purchase SeaMicro. Likewise, Forbes.com later reported from a representative from SeaMicro that their company had not tried to encourage Intel to acquire SeaMicro. It is anyone’s guess who is really correct and being 100% honest in their recollections. However I am still betting on SeaMicro’s long term strategy of pursuing low power, ultra dense, massively parallel servers. It is an idea whose time has come.
- Intel Says AMD’s New Baby Is Ugly (wired.com)
- SeaMicro Investor, Intel Trade Barbs Following AMD Acquisition (forbes.com)
- With SeaMicro buy, AMD doubles down on servers (gigaom.com)
Now, Facebook has provided a new option for these big name Wall Street outfits. But Krey also says that even among traditional companies who can probably benefit from this new breed of hardware, the project isn’t always met with open arms. “These guys have done things the same way for a long time,” he tells Wired.
Interesting article further telling the story of Facebook’s Open Compute project. This part of the story concentrates on the mass storage needs of the social media company. Which means Wall Street data center designer/builders aren’t as enthusiastic about Open Compute as one might think. The old school Wall Streeters have been doing things the same way as Peter Krey says for a very long time. But that gets to the heart of the issue, what the members of the Open Compute project hope to accomplish. Rack Space AND Goldman Sachs are members, both contributing and getting pointers from one another. Rack Space is even beginning to virtualize equipment down to the functional level replacing motherboards with a Virtual I/O service. That would allow components to be ganged up together based on the frequency of their replacement and maintenance. According to the article, CPUs could be in one rack cabinet, DRAM in another, Disks in yet another (which is already the case now with storage area networks).
The newest item to come into the Open Compute circus tent is storage. Up until now that’s been left to Value Added Resellers (VARs) to provide. So different brand loyalties and technologies still hold sway for many Data Center shops including Open Compute. Now Facebook is redesigning the disk storage rack to create a totally tool-less design. No screws, no drive carriers, just a drive and a latch and that is it. I looked further into this tool-less phenomenon and found an interesting video at HP
Along with this professional video touting how easy it is to upgrade this all in one design:
Having recently purchased a similarly sized iMac 27″ and upgrading it by adding a single SSD drive into the case, I can tell you this HP Z1 demonstrates in every way possible the miracle of toolless designs. I was bowled over and remember back to some of my memories of different Dell tower designs over the years (some with more toolless awareness than others). If a toolless future is inevitable I say bring it on. And if Facebook ushers in the era of toolless Storage Racks as a central design tenet of Open Compute so much the better.
- Do-It-Yourself: Facebook Building Storage Hardware (allfacebook.com)
- Facebook Shakes Hardware World With Own Storage Gear | Wired Enterprise (thesmileystone.wordpress.com)
“Were here today shipping a 64-bit processor core and we are what looks like two years ahead of ARM,” says Bishara. “The architecture of the Tile-Gx is aligned to the workload and gives one server node per chip rather than a sea of wimpy nodes not acting in a cache coherent manner. We have been in this market for two years now and we know what hurts in data centers and what works. And 32-bit ARM just is not going to cut it. Applied Micro is doing their own core, and that adds a lot of risks.”
Tilera is preparing to ship a 36 core Tile-Gx cpu in March. It’s going to be packaged with a re-compiled Linux distribution of CentOS on a development board (TILEencore). It will also have a number of re-compiled Unix utilities and packages included, so OEM shops can begin product development as soon as possible.
I’m glad to see Tilera is still duking it out, battling for the design wins with manufacturers selling into the Data Center as it were. Larger Memory addressing will help make the Tilera chips more competitive with Commodity Intel Hardware data center shops who build their own hardware. Maybe we’ll see full 64bit memory extensions at some point as a follow on to the current 40bit address space extensions currently. The memory extensions are necessary to address more than the 32bit limit of 4GBytes, so an extra 8 bits goes a long, long way to competing against a fully 64bit address space.
Also considering work being done at ARM for optimizing their chip designs for narrower design rules, Tilera should follow suit and attempt to shrink their chip architecture too. This would allow clock speeds to ease upward and keep the thermal design point consistent with previous generation Tile architecture chips, making Tile-Gx more competitive in the coming years. ARM announced 1 month ago they will be developing a 22nm sized cpu core for future licensing by ARM customers. As it is now Tilera uses an older fabrication design rule of around 40nm (which is still quite good given the expense required to shrink to lower design rules). And they have plans to eventually migrate to a narrower design rule. However ideally they would not stay farther behind that 1 generation from the top-end process lines of Intel (who is targeting 14nm production lines in the near future).
- Tilera preps many-cored Gx chips for March launch (go.theregister.com)
- Intel Responds to Calxeda/HP ARM Server News (Wired.com) (carpetbomberz.com)