Category: cloud

  • Atom smasher claims Hadoop cloud migration victory • The Register

    Image representing SeaMicro as depicted in Cru...
    Image via CrunchBase

    SeaMicro has been peddling its SM10000-64 micro server, based on Intels dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis. . .

    . . . The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

    via Atom smasher claims Hadoop cloud migration victory • The Register.

    While it is not always easy to illustrate the cost/benefit and Return on Investment on a lower power box like the Seamicro, running it head to head on a similar workload with a bunch of off the shelf Xeon boxes really shows the difference. The calculation of the benefit is critical too. What do you measure? Is it speed? Is it speed per transaction? Is it total volume allowed through? Or is it cost per unit transaction within a set amount of transactions? You’re getting closer with that last one. The test setup used a set number of transaction needing to be done in a set period of time. The benchmark then measure total power dissipation to accomplish that number of transactions in the set amount of time. SeaMicro came away the winner in unit cost per transaction in power terms. While the Xeon based servers had huge excess speed and capacity the power dissipation put it pretty far into the higher cost per transaction category.

    However it is very difficult to communicate this advantage that SeaMicro has over Intel. Future tests/benchmarks need to be constructed with clearly stated goals and criteria. Specifically if it can be communicated as a Case History of a particular problem that could be solved by either a SeaMicro server or a bunch of Intel boxes running Xeon cpus with big caches. Once that Case History is well described, then the two architectures are then put to work showing what the end goal is in clear terms (cost per transaction). Then and only then will SeaMicro communicate effectively how it does things different and how that can save money. Otherwise it’s too different to measure effectively versus a Intel Xeon based rack of servers.

  • Tilera throws gauntlet at Intels feet • The Register

    Upstart mega-multicore chip maker Tilera has not yet started sampling its future Tile-Gx 3000 series of server processors, and companies have already locked in orders for the chips.

    via Tilera throws gauntlet at Intels feet • The Register.

    Proof that sometimes  a shipping product doesn’t always make all the difference. Although it might be nice to tout performance of actual shipping product. What’s becoming more real is the power efficiency of the Tilera architcture core for core versus the Intel IA-64 architecture. Tilera can provide a much lower Thermal Design Point (TDM) per core than typical Intel chips running the same workloads. So Tilera for the win on paper anyways.

  • Artur Bergman Wikia on SSDs @ OReilly Media Conferences/Don Bazile CEO of Violin Memory

    Image representing Violin Memory as depicted i...
    Image via CrunchBase

    Artur Bergman of Wikia explains why you should buy and use Solid State Disks (strong language)

    via Artur Bergman Wikia on SSDs on OReilly Media Conferences – live streaming video powered by Livestream.

    This is the shortest presentation I’ve seen and most pragmatic about what SSDs can do for you. He recommends buying Intel 320s and getting your feet wet by moving from a bicycle to a Ferrari. Later on if you need to go with a PCIe SSD do it, but it’s like the difference between a Formula 1 race car and a Ferrari. Personally in spite of the lack of major difference Artur is trying to illustrate I still like the idea of buying once and getting more than you need. And if this doesn’t start you down the road of seriously buying SSDs of some sort check out this interview with Violin Memory CEO, Don Bazile:

    Violin tunes up for billion dollar flash gig: Chris Mellor@theregister.co.uk (Saturday June 25th)

    Basile said: “Larry is telling people to use flash … That’s the fundamental shift in the industry. … Customers know their competitors will adopt the technology. Will they be first, second or last in their industry to do so? … It will happen and happen relatively quickly. It’s not just speed; its the lowest cost of data base transaction in history. [Flash] is faster and cheaper on the exact same software. It’s a no-brainer.”

    Violin Memory is the current market leader in data center SSD installations for transactional data or analytical processing. The boost folks are getting from putting the databases on Violin Memory boxes is automatic, requires very little tuning and the results are just flat out astounding. The ‘Larry’ quoted above is the Larry Ellison of Oracle, the giant database maker. So with that kind of praise I’m going to say the tipping point is near, but please read the article. Chris Mellor lays out a pretty detailed future of evolution in SSD sales and new product development. 3-bit Multi-Level memory cells in NAND flash is what Mellor thinks will be the tipping point as price is still the biggest sticking point for anyone responsible for bidding on new storage system installs. However while that price sticking point is a bigger issue for batch oriented off-line data warehouse analysis, for online streaming analysis SSD is cheaper per byte per second throughput. So depending on the typical style of database work you do or performance you need SSD is putting the big iron spinning hard disk vendors to shame. The inertia of these big capital outlays and cozy relationships with these vendors will make some shops harder to adopt the new technology (But IBM is giving us such a big discount!…WE are an EMC shop,etc.). However the competitors of the folks owning those datacenters will soon eat all that low hanging fruit a simple cutover to SSDs will afford and the competitive advantage will swing to the early adopters.

    *Late Note: Chris Mellor just followed up Monday night (June 27th) with an editorial further laying out the challenge to disk storage presented by the data center Flash Array vendors. Check it out:

    What should the disk drive array vendors do, if this scenario plays out?They should buy in or develop their own all-flash array technology. Having a tier of SSD storage in a disk drive array is a good start but customers will want the simpler choice of an all-flash array and, anyway, they are here now. Guys like Violin and Whiptail and TMS are knocking on the storage array vendors customer doors right now.

    via All aboard the flash array train? • The Register.

  • ARM server hero Calxeda lines up software super friends • The Register

    Company Logo
    Maker of the massively parallel ARM-based server

    via ARM server hero Calxeda lines up software super friends • The Register.

    Calxeda in the news again this week with some more announcements regarding its plans. Remembering recently to the last article I posted on Calxeda, this company boasts an ARM based server packing 120 cpus (each with four cores) into a 2U high rack (making it just 3-1/2″ tall *see note). With every evolution in hardware one must needs get an equal if not greater revolution in software. Which is the point of the announcement by Calxeda of its new software partners.

    It’s all mostly cloud apps, cloud provisioning and cloud management types of vendors. And with the partnership each company gets early access to the hardware Calxeda is promising to design, prototype and eventually manufacture. Both Google and Intel have poo-poohed the idea of using “wimpy processors” on massively parallel workloads claiming faster serialized workloads are still easier to manage through existing software/programming techniques. For many years as Intel has complained about the programming tools, it still has gone the multi-core/multi-thread route hoping to continue its domination by offering up ‘newer’ and higher performing products. So while Intel bad mouths parallelism on competing cpus it seems to be desperate to sell multi-core to willing customers year over year.

    Even as power efficient as those cores maybe Intel’s old culture of maximum performance for the money still holds sway. Even the most recent Ultra-low Voltage i-series cpus are still hitting about 17Watts of power for chips clocking in around 1.8Ghz (speed boosting up to 2.9Ghz in a pinch). Even if Intel allowed these chips to be installed into servers we’re stilling talking a lot of  Thermal Design Point (TDM) that has to be chilled to keep running.

  • Goal oriented visualizations? (via Erik Duval’s Weblog)

    Charles Minard's 1869 chart showing the losses...
    Image via Wikipedia

    Visualizations and their efficacy always takes me back to Edward Tufte‘s big hard cover books on Infographics (or Chart Junk when it’s done badly). In terms of this specific category, visualization leading to a goal I think it’s still very much a ‘general case’. But examples are always better than theoretical descriptions of an ideal. So while I don’t have an example to give (which is what Erik Duval really wants) I can at least point to a person who knows how Infographics get misused.

    I’m also reminded somewhat of the most recent issue of Wired Magazine where there’s an article on feedback loops. How are goal oriented visualizations different from or better than feedback loops? I’d say that’s an interesting question to investigate further. The primary example given in that story is the radar equipped speed limit sign. It doesn’t tell you the posted speed. It merely tells you how fast you are going and that by itself apart from ticketing and making the speed limit signs more noticeable did more to effect a change in behavior than any other option. So maybe a goal oriented visualization could also benefit from some techniques like feedback loops?

    Some of the fine fleur of information visualisation in Europe gathered in Brussels today at the Visualizing Europe meeting. Definitely worth to follow the links of the speakers on the program! Twitter has a good trace of what was discussed. Revisit offers a rather different view on that discussion than your typical twitter timeline. In the Q&A session, Paul Kahn asked the Rather Big Question: how do you choose between different design alterna … Read More

    via Erik Duval’s Weblog

  • EMC’s all-flash benediction: Turbulence ahead • The Register

    msystems
    Image via Wikipedia

    A flash array controller needs: “An architecture built from the ground up around SSD technology that sizes cache, bandwidth, and processing power to match the IOPS that SSDs provide while extending their endurance. It requires an architecture designed to take advantage of SSDs unique properties in a way that makes a scalable all-SSD storage solution cost-effective today.”

    via EMC’s all-flash benediction: Turbulence ahead • The Register.

    I think that Storage Controllers are the point of differentiation now for the SSDs coming on the market today. Similarly the device that ties those SSDs into the comptuer and its OS are equally, nay more important. I’m thinking specifically about a product like the SandForce 2000 series SSD controllers. They more or less provide a SATA or SAS interface into a small array of flash memory chips that are made to look and act like a spinning hard drive. However, time is coming soon now where all those transitional conventions can just go away and a clean slate design can go forward. That’s why I’m such a big fan of the PCIe based flash storage products. I would love to see SandForce create a disk controller with one interface that speaks PCIe 2.0/3.0 and the other is just open to whatever technology Flash memory manufacturers are using today. Ideally then the Host Bus would always be a high speed PCI Express interface which could be licensed or designed from the ground up to speed I/O in and out of the Flash memory array. On the memory facing side it could be almost like an FPGA made to order according to the features, idiosyncrasies of any random Flash Memory architecture that is shipping at the time of manufacture. Same would apply for any type of error correction and over-provisioning for failed memory cells as the SSD ages through multiple read/write cycles.

    In this article I quoted at the top from The Register, the big storage array vendors are attempting to market new products by adding Flash memory to either one component of the whole array product or in the case off EMC the whole product uses Flash memory based SSDs throughout. That more aggressive approach has seemed to be overly cost prohibitive given the manufacturing cost of large capacity commodity hard drives. But they problem is, in the market where these vendors compete, everyone pays an enormous price premium for the hard drives, storage controllers, cabling and software that makes it all work. Though the hard drive might be cheaper to manufacture, the storage array is not and that margin is what makes Storage Vendors a very profitable business to be in. As stated last week in the benchmark comparisons of High Throughput storage arrays, Flash based arrays are ‘faster’ per dollar than a well designed, engineered top-of-the-line hard drive based storage array from IBM. So for the segment of the industry that needs the throughput more than the total space, EMC will likely win out. But Texas Memory Systems (TMS) is out there too attempting to sign up OEM contracts with folks attempting to sell into the Storage Array market. The Register does a very good job surveying the current field of vendors and manufacturers trying to look at which companies might buy a smaller company like TMS. But the more important trend being spotted throughout the survey is the decidedly strong move towards native Flash memory in the storage arrays being sold into the Enterprise market. EMC has a lead, that most will be following real soon now.

  • From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology (Part 1)

    Process and data modeling
    Image via Wikipedia

    Big Data

    In short, big data simply means data sets that are large enough to be difficult to work with. Exactly how big is big is a matter of debate. Data sets that are multiple petabytes in size are generally considered big data (a petabye is 1,024 terabytes). But the debate over the term doesn’t stop there.

    via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology (Part 1).

    There’s big doin’s inside and outside the data center theses days. You cannot spend a day without a cool new article about some new project that’s just been open sourced from one of the departments inside the social networking giants. Hadoop being the biggest example. What you ask is Hadoop? It is a project Yahoo started after Google started spilling the beans on it’s two huge technological leaps in massively parallel databases and processing real time data streams. The first one was called BigTable. It is a huge distributed database that could be brought up on an inordinately large number of commodity servers and then ingest all the indexing data sent by Google’s web bots as they found new websites. That’s the database and ingestion point. The second point is the way in which the rankings and ‘pertinence’ of the indexed websites would be calculated through PageRank. The invention for the realtime processing of this data being collected is called MapReduce. It was a way of pulling in, processing and quickly sorting out the important highly ranked websites. Yahoo read the white papers put out by Google and subsequently created a version of those technologies which today power the Yahoo! search engine. Having put this into production and realizing the benefits of it, Yahoo turned it into an open source project to lower the threshold of people wanting to get into the Big Data industry. Similarly, they wanted to get many eyes of programmers looking at the source code and adding features, packaging it, and all importantly debugging what was already there. Hadoop was the name given to the Yahoo bag of software and this is what a lot of people initially adopt if they are trying to do large scale collection and real-time analysis of Big Data.

    Another discovery along the way towards the Big Data movement was a parallel attempt to overcome the limitations of extending the schema of a typical database holding all the incoming indexed websites. Tables and Rows and Structured Query Language (SQL) have ruled the day since about 1977 or so, and for many kinds of tabbed data there is no substitute. However, the kinds of data being stored now fall into the big amorphous mass of binary large objects (BLOBs) that can slow down a traditional database. So a non-SQL approach was adopted and there are parts of the BigTable database and Hadoop that dump the unique key values and relational tables of SQL to just get the data in and characterize it as quickly as possible, or better yet to re-characterize it by adding elements to the schema after the fact. Whatever you are doing, what you collect might not be structured or easily structured so you’re going to need to play fast and loose with it and you need a database of some sort equal to that task. Enter the NoSQL movement to collect and analyze Big Data in its least structured form. So my recommendation to anyone trying to get the square peg of Relational Databases to fit the round hole of their unstructured data is to give up. Go NoSQL and get to work.

    This first article from Read Write Web is good in that it lays the foundation for what a relational database universe looks like and how you can manipulate it. Having established what IS, future articles will be looking at what quick, dirty workarounds and one off projects people have come up with to fit their needs. And subsequently which ‘Works for Me’ type solutions have been turned into bigger open source projects that will ‘Work for Others’, as that is where each of these technologies will really differentiate themselves. Ease of use and lowering the threshold will be deciding factors for many people’s adoption of a NoSQL database I’m sure.

  • Tilera preps 100-core chips for network gear • The Register

    One Blue Gene/L node board
    Image via Wikipedia

    Upstart multicore chip maker Tilera is using the Interop networking trade show as the coming out party for its long-awaited Tile-Gx series of processors, which top out at 100 cores on a single die.

    via Tilera preps 100-core chips for network gear • The Register.

    A further update on Tilera’s product launches as the old Interop tradeshow for network switch and infrastructure vendors is held in Las Vegas. They have tweaked the chip packaging of their cpus and now are going to market different cpus to different industries. This family of Tilera chips is called the 8000 series and will be followed by a next generation of 3000 and 5000 series chips. Projections are by the time the Tilera 3000 series is released the density of the chips will be sufficient to pack upwards of 20,000 cpu cores of Tilera chips in a single 42 unit tall, 19 inch wide server rack. with a future revision possibly doubling that number of cores to 40,000. That road map is very agressive but promising and shows that there is lots of scaling possible with the Tilera product over time. Hopefully these plans will lead to some big customers signing up to use Tilera in shipping product in the immediate and near future.

    What I’m most interested in knowing is how does the Qanta server currently shipping that uses the Tilera cpu benchmark compared to an Intel Atom based or ARM based server on a generic webserver benchmark. While white papers and press releases have made regular appearances on the technolog weblogs, very few have attempted to get sample product and run it through the paces. I suspect, and cannot confirm that anyone who is a potential customer are given Non-disclosure Agreements and shipping samples to test in their data centers before making any big purchases. I also suspect that as is often the case the applications for these low power massively parallel dense servers is very narrow. Not unlike that for a super computer. IBM‘s Cell Processor that powers the Blue Gene super computers is essentially a PowerPC architecture with some extra optimizations and streamlining to make it run very specific workloads and algorithms faster. In a super computing environment you really need to tune your software to get the most out of the huge up front investment in the ‘iron’ that you got from the manufacturer. There’s not a lot of value add available in that scientific and super computing environment. You more or less roll your own solution, or beg, borrow or steal it from a colleague at another institution using the same architecture as you. So the Quanta S2Q server using the Tilera chip is similarly likely to be a one off or niche product, but a very valuable one to those who  purchase it. Tilera will need a software partner to really pump up the volumes of shipping product if they expect a wider market for their chips.

    But using a Tilera processor in a network switch or a ‘security’ device or some other inspection engine might prove very lucrative. I’m thinking of your typical warrantless wire-tapping application like the NSA‘s attempt to scoop up and analyze all the internet traffic at large carriers around the U.S. Analyzing data traffic in real time prevents folks like NSA from capturing and having to move around large volumes of useless data in order to have it analyzed at a central location. Instead localized computing nodes can do the initial inspection in realtime keying on phrases, words, numbers, etc. which then trigger the capturing process and send the tagged data back to NSA for further analysis. Doing that in parallel with a 100 core CPU would be very advantageous in that a much smaller footprint would be required in the secret closets NSA maintains at those big data carriers operations centers. Smaller racks, less power makes for a much less obvious presence in the data center.

  • Cloud on a chip: Sometimes the best hypervisor is none at all   • The Register

    Image representing Intel as depicted in CrunchBase
    Image via CrunchBase

    On the cloud front, one of the more interesting projects that Held is working on is called the Single-chip Cloud Computer, or SCC for short.

    via Cloud on a chip: Sometimes the best hypervisor is none at all   • The Register.

    Singe-chip Cloud Computer sounds a lot like that 80 core and 48 core CPU experiments that Intel had been working on a while back. There is a a note that the core is a Pentium 54c and that rings a bell too as it was the same core used for those multi-core CPUs. Now the research appears to be centered on the communications links between those cores and getting an optimal bit of work for a given amount of interconnectivity. Twenty-four cores is a big step down from 80 and 48 cores. I’m thinking Intel’s manufacturing process engineers are attempting to reign in the scope of this research to make it more worthy of manufacture. Whatever happens you will likely see adaptations or bits and pieces of these technologies in a future shipping product. I’m a little disappointed though that the scope has grown smaller. I had real high hopes Intel could pull off a big technological breakthrough with an 80 core CPU, but change comes slowly and Chip Fab lines are incredibly expensive to build, pilot and line out as they make new products. Conservatism is to be expected in an industry that has the highest level of up front capital expenditure required before there’s a return on the investment. If nothing else, companies like Seamicro, Tilera and ARM will continue to goose Intel into research efforts like this and innovate their old serial processors  a little bit more.

    On the other side of the argument there is the massive virtualization of OSes on more typical serial style multi-core CPUs from Intel. VMWare and competitors still continue to slice out clock cycles of the Intel processor to make them appear to be more than one physical machine. Datacenters have seen performance compromises using this scheme to be well worth the effort in staff and software licenses given the amount of space saved through consolidation. Less rack space, and power required, the higher the marginal return for that one computer host sitting on the network. But, what this article from The Register is trying to say is if a sufficiently dense multi-core cpu is used and the power requirements scaled down sufficiently you get the same kind of consolidation of rack space, but without the layer of software on top of it all to provide the virtualized computers themselves. A one-to-one relationship between computer core and actual virtual machine can be done without the typical machinations and complications required by a Hypervisor-style OS riding herd over the virtualized computers. In that case, less Hypervisor is more. More robust that is in terms of total compute cycles devoted to hosts, more robust design architecture to minimize single points of failure and choke points. So I say there’s plenty of room to innovate yet in the virtualization industry given that the CPUs and their architectures are in an early stage of innovating massively multi-core cpus.

  • Stop Blaming the Customers – the Fault is on Amazon Web Services – ReadWriteCloud

    Image representing Amazon Web Services as depi...
    Image via CrunchBase

    Almost as galling as the Amazon Web Services outage itself is a the litany of blog posts, such as this one and this one, that place the blame not on AWS for having a long failure and not communicating with its customers about it, but on AWS customers for not being better prepared for an outage.

    via Stop Blaming the Customers – the Fault is on Amazon Web Services – ReadWriteCloud.

    As Klint Finley points out in his article, everyone seems to be blaming the folks who ponied up money to host their websites/webapps on the Amazon data center cloud. Until the outage, I was not really aware of the ins and outs, workflow and configuration required to run something on Amazons infrastructure. I am small-scale, small potatoes mostly relying on free services which when the work is great, and when they don’t work, meh! I can take or leave them, my livelihood doesn’t depend on them (thank goodness). But for those who do depend on uptime and pay money for it, they need  some greater level of understanding by their service provider.

    Amazon doesn’t make things explicit enough to follow a best practice in configuring your website installation using their services. It appears some business had no outages (but didn’t follow best practices) and some folks did have long outages though they had set up everything ‘by the book’ following best practices. The service that lay at the center of the outage was called Relational Database Service (RDS) and Elastic Block Storage (EBS). Many websites use databases to hold contents of the website, collect data and transaction information, collect metadata about users likes/dislikes, etc. The Elastic Block Storage acts as the container for the data in the RDS. When your website goes down if you have things setup correctly things fail gracefully, you have duplicate RDS and EBS containers in the Amazon data center cloud that will take over and continue responding to people clicking on things and typing in information on your website instead of throwing up error messages or not responding at all (in a word it just magically continues working). However, if you don’t follow the “guidelines” as specified by Amazon, all bets are off you wasted money paying double for the more robust, fault tolerant failover service.

    Most people don’t care about this especially if they weren’t affected by the outages. But the business owners who suffered and their customers who they are liable for definitely do. So if the entrepreneurial spirit bites you, and you’re very interested in online commerce always be aware. Nothing is free, and especially nothing is free even if you pay for it and don’t get what you paid for. I would hope a leading online commerce company like Amazon could do a better job and in future make good on its promises.