Tuesday at Computex, OCZ claimed that it set a new benchmark of 1 million 4K write IOPS and 1.5 million read IOPS with a single Z-Drive R4 88-equipped 3U Colfax International Server.
Between the RevoDrive and the Z-Drive OCZ is tearing up the charts with product releases announced in Taipei, Taiwan‘s Computex 2011 trade show. This particular one off demonstration was using a number of OCZ’s announced but as yet unreleased Z-Drive R4 88 packed into a 3U Colfax International enclosure. In other words, it’s an idealized demonstration of what kind of performance you might achieve in a best case scenario. The speeds are in excess of 3Gbytes/sec. for writing and reading which for Webserving or Database hosting is going to make a big difference for people that need the I/O. Previously you would have had to use a very expensive large scale Fibre Channel hard drive array that split and RAID’d the data across so many spinning hard drive spindles that you might come partially close to matching these speeds. But the SIZE! Ohmigosh. You would not be able to fit that amount of hardware into a 3U enclosure, never. So space constrained data centers will benefit enormously from dumping some of their drive array infrastructure for these more compact I/O monsters (some are from other manufacturers too, like Violin, RamSan and Fusion-io). Again, as I have said before when Anandtech and Tom’s Hardware can get sample hardware to benchmark the performance I will be happy to see what else these PCIe SSDs can do.
Theres a new PCIe SSD in town: the RevoDrive 3. Armed with two SF-2281 controllers and anywhere from 128 – 256GB of NAND 120/240GB capacities, the RevoDrive 3 is similar to its predecessors in that the two controllers are RAIDed on card. Heres where things start to change though.
OCZ is back with a revision of its consumer grade PCIe SSD, the RevoDrive. This time out the SandForce SF-2281 makes an appearance and to great I/O effect. The bus interface is a true PCIe bridge chip as opposed to the last versions PCI-X to PCIe bridge. Also this device can be controlled completely through the OSes own drive utilities and TRIM support. All combined this is the most natively and well support PCIe SSD to hit the market. No benchmarks yet from a commercially shipping product. But my fingers are crossed that this thing is going to be faster than OCZ’s Vertex 3 and Vertex 3 Pro (I hope) while possibly holding more flash memory chips than those SATA 6 based SSDs.
One other upshot of this revised product is full OS booting support. So not only will TRIM work but your motherboard and the PCIe’s card electronics will allow you to boot directly off of the card. So this is by far the most evolved and versatile PCIe based SSD drive to date. Pricing is the next big question on my mind after reading the specifications. Hopefully will not be Enterprise grade (greater than $1200). I’ve found most off the prosumer and gamer market upgrade manufacturers are comfortable setting prices at the $1200 price point for these PCIe SSDs. And that trend has been pretty reliable going back to the original RevoDrive.
Lens-FitzGerald: I never thought of going into augmented reality, but cyberspace, any form of digital worlds, have always been one of the things I’ve been thinking about since I found out about science fiction. One of the first books I read of the cyber punk genre was Bruce Sterling‘s “Mirror Shades.” Mirror shades, meaning, of course, AR goggles. And that book came out in 1988 and ever since, this was my world.
An interview with the man that who created the most significant Augmented Reality (AR) application on handheld devices Layar. In the time since the first releases on smartphones like the Android in Europe, Layar has branched out to cover more of the OSes available on hand held devices. The interest I think has cooled somewhat on AR as social network and location has seemed to rule the day. And I would argue even location isn’t as fiery hot as it was at the beginning. But Facebook is still here with a vengeance. So wither the market for AR? What’s next you wonder, well it seems Qualcomm today has announced it’s very own AR Toolkit to help jump start the developer market more useful, nay killer AR apps. Stay tuned.
In short, big data simply means data sets that are large enough to be difficult to work with. Exactly how big is big is a matter of debate. Data sets that are multiple petabytes in size are generally considered big data (a petabye is 1,024 terabytes). But the debate over the term doesn’t stop there.
There’s big doin’s inside and outside the data center theses days. You cannot spend a day without a cool new article about some new project that’s just been open sourced from one of the departments inside the social networking giants. Hadoop being the biggest example. What you ask is Hadoop? It is a project Yahoo started after Google started spilling the beans on it’s two huge technological leaps in massively parallel databases and processing real time data streams. The first one was called BigTable. It is a huge distributed database that could be brought up on an inordinately large number of commodity servers and then ingest all the indexing data sent by Google’s web bots as they found new websites. That’s the database and ingestion point. The second point is the way in which the rankings and ‘pertinence’ of the indexed websites would be calculated through PageRank. The invention for the realtime processing of this data being collected is called MapReduce. It was a way of pulling in, processing and quickly sorting out the important highly ranked websites. Yahoo read the white papers put out by Google and subsequently created a version of those technologies which today power the Yahoo! search engine. Having put this into production and realizing the benefits of it, Yahoo turned it into an open source project to lower the threshold of people wanting to get into the Big Data industry. Similarly, they wanted to get many eyes of programmers looking at the source code and adding features, packaging it, and all importantly debugging what was already there. Hadoop was the name given to the Yahoo bag of software and this is what a lot of people initially adopt if they are trying to do large scale collection and real-time analysis of Big Data.
Another discovery along the way towards the Big Data movement was a parallel attempt to overcome the limitations of extending the schema of a typical database holding all the incoming indexed websites. Tables and Rows and Structured Query Language (SQL) have ruled the day since about 1977 or so, and for many kinds of tabbed data there is no substitute. However, the kinds of data being stored now fall into the big amorphous mass of binary large objects (BLOBs) that can slow down a traditional database. So a non-SQL approach was adopted and there are parts of the BigTable database and Hadoop that dump the unique key values and relational tables of SQL to just get the data in and characterize it as quickly as possible, or better yet to re-characterize it by adding elements to the schema after the fact. Whatever you are doing, what you collect might not be structured or easily structured so you’re going to need to play fast and loose with it and you need a database of some sort equal to that task. Enter the NoSQL movement to collect and analyze Big Data in its least structured form. So my recommendation to anyone trying to get the square peg of Relational Databases to fit the round hole of their unstructured data is to give up. Go NoSQL and get to work.
This first article from Read Write Web is good in that it lays the foundation for what a relational database universe looks like and how you can manipulate it. Having established what IS, future articles will be looking at what quick, dirty workarounds and one off projects people have come up with to fit their needs. And subsequently which ‘Works for Me’ type solutions have been turned into bigger open source projects that will ‘Work for Others’, as that is where each of these technologies will really differentiate themselves. Ease of use and lowering the threshold will be deciding factors for many people’s adoption of a NoSQL database I’m sure.
Upstart multicore chip maker Tilera is using the Interop networking trade show as the coming out party for its long-awaited Tile-Gx series of processors, which top out at 100 cores on a single die.
A further update on Tilera’s product launches as the old Interop tradeshow for network switch and infrastructure vendors is held in Las Vegas. They have tweaked the chip packaging of their cpus and now are going to market different cpus to different industries. This family of Tilera chips is called the 8000 series and will be followed by a next generation of 3000 and 5000 series chips. Projections are by the time the Tilera 3000 series is released the density of the chips will be sufficient to pack upwards of 20,000 cpu cores of Tilera chips in a single 42 unit tall, 19 inch wide server rack. with a future revision possibly doubling that number of cores to 40,000. That road map is very agressive but promising and shows that there is lots of scaling possible with the Tilera product over time. Hopefully these plans will lead to some big customers signing up to use Tilera in shipping product in the immediate and near future.
What I’m most interested in knowing is how does the Qanta server currently shipping that uses the Tilera cpu benchmark compared to an Intel Atom based or ARM based server on a generic webserver benchmark. While white papers and press releases have made regular appearances on the technolog weblogs, very few have attempted to get sample product and run it through the paces. I suspect, and cannot confirm that anyone who is a potential customer are given Non-disclosure Agreements and shipping samples to test in their data centers before making any big purchases. I also suspect that as is often the case the applications for these low power massively parallel dense servers is very narrow. Not unlike that for a super computer. IBM‘s Cell Processor that powers the Blue Gene super computers is essentially a PowerPC architecture with some extra optimizations and streamlining to make it run very specific workloads and algorithms faster. In a super computing environment you really need to tune your software to get the most out of the huge up front investment in the ‘iron’ that you got from the manufacturer. There’s not a lot of value add available in that scientific and super computing environment. You more or less roll your own solution, or beg, borrow or steal it from a colleague at another institution using the same architecture as you. So the Quanta S2Q server using the Tilera chip is similarly likely to be a one off or niche product, but a very valuable one to those who purchase it. Tilera will need a software partner to really pump up the volumes of shipping product if they expect a wider market for their chips.
But using a Tilera processor in a network switch or a ‘security’ device or some other inspection engine might prove very lucrative. I’m thinking of your typical warrantless wire-tapping application like the NSA‘s attempt to scoop up and analyze all the internet traffic at large carriers around the U.S. Analyzing data traffic in real time prevents folks like NSA from capturing and having to move around large volumes of useless data in order to have it analyzed at a central location. Instead localized computing nodes can do the initial inspection in realtime keying on phrases, words, numbers, etc. which then trigger the capturing process and send the tagged data back to NSA for further analysis. Doing that in parallel with a 100 core CPU would be very advantageous in that a much smaller footprint would be required in the secret closets NSA maintains at those big data carriers operations centers. Smaller racks, less power makes for a much less obvious presence in the data center.
Texas Memory Systems has absolutely creamed the SPC-1 storage benchmark with a system that comfortably exceeds the current record-holding IBM system at a cost per transaction of 95 per cent less.
One might ask a simple question, how is this even possible given the cost of the storage media involved. How is it a Flash based storage array from RamSan beat a huge pile of IBM hard drives all networked and bound together in a massive storage system? And how did it do it for less? Woe be to those unschooled in the ways of the Per-feshunal Data Center purchasing dept. You cannot enter the halls of the big players unless you got million dollar budgets for big iron servers and big iron storage. Fibre Channel and Infiniband rule the day when it comes to big data throughput. All those spinning drives accessed simultaneously as if each one held one slice of the data you were asking for, each one delivering up it’s 1/10 of 1% of the total file you were trying to retrieve. And the resulting speed makes it look like one hard drive that is 10X10 faster than your desktop computer hard drive all through the smoke and mirrors of the storage controllers and the software that makes them go. But what if, just what if we decided to take Flash memory chips and knit them together with a storage controller that made them appear to be just like a big iron storage system? Well since Flash obviously costs something more than $1 per gigabyte and disk drives cost somewhere less than 10 cents per gigabyte the Flash storage loses right?
In terms of total storage capacity Flash will lose for quite some time when you are talking about holding everything on disk all at the same time. But that is not what’s being benchmarked here at all. No, in fact what is being benchmarked is the rate at which Input (writing of data) and Output (reading of data) is done through the storage controllers. IOPS measure the total number of completed reads/writes done in a given amount of time. Previous to this latest example of the RamSan-630, IBM was king of the mountain with it’s huge striped Fibre Channel arrays all linked up through it’s own storage array controllers. RamSan came in at 400,503.2 IOPS as compared to IBM’s top of the line San Volume Controller with 380,489.3. That’s not very much difference you say, especially considering how much smaller the amount of data a RamSan can hold,… And that would be a valid argument but consider again, that’s not what we’re benchmarking it is the IOPS.
Total cost for the IBM benchmarked system per IOP was $18.83. RamSan (which best IBM in total IOPS) was a measly $1.05 per IOP. The cost is literally 95% less than IBM’s cost. Why? Consider the price (even if it was steeply discounted as most Tech Writers will say as a cavea) for IBM’s benchmarked system costs $7.17Million dollars. Remember I said you need million dollar budgets to play in the data center space. Now consider the RamSan-630 costs $419,000. If you want speed, dump your spinning hard drives, Flash is here to stay and you cannot argue with the speed versus the price at this level of performance. No doubt this is going to threaten the livelihood of a few big iron storage manufacturers. But through disruption, progress is made.
Usually every time there’s a die shrink of a computer processor there’s always an attendant evolution of the technology to to produce it. I think back recently to the introduction of super filtered water immersion lithography. The goal of immersion lithography was to increase the ability to resolve the fine line wire traces of the photo masks as they were exposed onto photosensitive emulsion coating a silicon wafer. The problem is the light travels from the photomask to the surface of the wafer through ‘air’. There’s a small gap, and air is full of optical scrambling atoms and molecules that make the photomask slightly blurry. If you put a layer of water between the mask the wafer, you have in a sense a ‘lens’ made of optically superior water molecules that act more predictably than ‘air’. Likewise you get better chip yields, more profit, higher margins etc.
As the wire traces on microchips continue to get thinner and transistors smaller the physics involved are harder to control. Electrodynamics begin to follow the laws of Quantum Electro-dynamics rather than Maxwell’s equations. This makes it harder to tell when a transistor has switched on or off and the basic digits of the digital computer (1s and 0s) become harder and harder to measure and register properly. IBM and Intel have waged a war on shrinking their dies all through the 80s and 90s. IBM chose to adopt new, sometimes exotic materials (copper metal for traces instead of aluminum, silicon on insulator, high-K dielectric gates). Intel chose to go the direction of improving what they had using higher energy light sources and only adopting very new processes when absolutely, positively necessary. At the same time, Intel was cranking out such volumes of current generation product it almost seem as though it didn’t need to innovate at all. But IBM kept Intel honest as did Taiwan Semiconductor Manufacturing Co. (contract manufacturer of micro-processors). And Intel continued to maintain its volume and technological advantage.
ARM (formerly the Acorn Risc Machine) became a cpu manufacturer during the golden age off RISC computers (early and mid-1980s). Over time they got out of manufacturing and started selling their processor designs to anyone that wanted to embed a core microprocessor into a bigger chip design. Eventually ARM became the defacto standard micro chip for smart handheld devices and telephones before Intel had to react. Intel had come up with a market leading low voltage cheap cpu in the Atom processor. But they did not have the specialized knowledge and capability ARM had with embedded cpus. Licensees of ARM designs began cranking out newer generations of higher performance and lower power cpus than Intel’s research labs could create and the stage was set for a battle royale of low power/high performance.
Which brings us now to an attempt to continue to scale down the processor power requirements through the same brute force that worked in the past. Moore’s Law, an epigram quoted from Intel’s Gordon Moore indicated the rate at which the ‘industry’ would continue to scaled down the size of the ‘wires’ in silicon chips would increase speed and lower costs. Speeds would double, prices would halve and this would continue on ad infinitum to some distant future. The problem has been always that the future is now. Intel hit a brick wall back around the end off the Pentium IV era when they couldn’t get speeds to double anymore without also doubling the amount of waste heat coming off of the chip. That heat was harder and harder to remove efficiently and soon, it appeared the chips would create so much heat they might melt. Intel worked around this by putting multiple CPUs on the same silicon wafers they used for previous generation chips and got some amount of performance scaling to work. Along those lines they have research projects to create first an 80 core processor, then a 48 and now a 24 core processor (which might actually turn into a shippable product). But what about Moore’s Law? Well, the scaling has continued downward, and power requirements have improved but it’s getting harder and harder to shave down those little wire traces and get the bang that drives profits for Intel. Now Intel is going the full-on research and development route by adopting a new way of making transistors on silicon. It’s called a Fin Field Effect Trasistor or FinFET. And it makes use of not just the surface layer of metal but the surface and the left and right sides, effectively giving you 3x the surface to move the electrons around the processor. If they can get this to work on a modern day silicon chip production line, they will be able to continue differentiating their product, keeping their costs manageable and selling more chips. But it’s a big risk and bet I’m sure everyone hopes will pay off.
As part of the “Let’s make the web faster” initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced “SPeeDY”), an application-layer protocol for transporting content over the web, designed specifically for minimal latency. In addition to a specification of the protocol, we have developed a SPDY-enabled Google Chrome browser and open-source web server. In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY. We hope to engage the open source community to contribute ideas, feedback, code, and test results, to make SPDY the next-generation application protocol for a faster web.
Google wants the World Wide Web to go faster. I think we all would like to have that as well. But what kind of heavy lifting is it going to take? The transition from Arpanet to the TCP/IP protocol took a very long time and required some heavy handed shoving to accomplish the cutover in 1984. We can all thank Vint Cerf for making that happen so that we could continue to grow and evolve as an online species (Tip of Hat). But now what? There’s been a move to evolved from TCP/IP version 4 to version 6 to accommodate the increase in number of network devices. Speed really wasn’t a consideration in that revision. I don’t know how this project integrates with TCP/IP vers. 6. But I hope maybe it can be pursued on a parallel course with the big migration to TCP/IP vers. 6.
What would be the worst thing that could happen is to create another Facebook/Twitter/Apple Store/Google/AOL cul-de-sac that only benefits the account holders loyal to Google. Yes it would be nice if Google Docs and all the other attendant services provided via/through Google got onboard the SPDY accelerator train. I would stand to benefit, but things like this should be pushed further up into the wider Internet so that everyone, everywhere has the same benefits. Otherwise this is an attempt to steal away user accounts and create churn in the competitors account databases.
Singe-chip Cloud Computer sounds a lot like that 80 core and 48 core CPU experiments that Intel had been working on a while back. There is a a note that the core is a Pentium 54c and that rings a bell too as it was the same core used for those multi-core CPUs. Now the research appears to be centered on the communications links between those cores and getting an optimal bit of work for a given amount of interconnectivity. Twenty-four cores is a big step down from 80 and 48 cores. I’m thinking Intel’s manufacturing process engineers are attempting to reign in the scope of this research to make it more worthy of manufacture. Whatever happens you will likely see adaptations or bits and pieces of these technologies in a future shipping product. I’m a little disappointed though that the scope has grown smaller. I had real high hopes Intel could pull off a big technological breakthrough with an 80 core CPU, but change comes slowly and Chip Fab lines are incredibly expensive to build, pilot and line out as they make new products. Conservatism is to be expected in an industry that has the highest level of up front capital expenditure required before there’s a return on the investment. If nothing else, companies like Seamicro, Tilera and ARM will continue to goose Intel into research efforts like this and innovate their old serial processors a little bit more.
On the other side of the argument there is the massive virtualization of OSes on more typical serial style multi-core CPUs from Intel. VMWare and competitors still continue to slice out clock cycles of the Intel processor to make them appear to be more than one physical machine. Datacenters have seen performance compromises using this scheme to be well worth the effort in staff and software licenses given the amount of space saved through consolidation. Less rack space, and power required, the higher the marginal return for that one computer host sitting on the network. But, what this article from The Register is trying to say is if a sufficiently dense multi-core cpu is used and the power requirements scaled down sufficiently you get the same kind of consolidation of rack space, but without the layer of software on top of it all to provide the virtualized computers themselves. A one-to-one relationship between computer core and actual virtual machine can be done without the typical machinations and complications required by a Hypervisor-style OS riding herd over the virtualized computers. In that case, less Hypervisor is more. More robust that is in terms of total compute cycles devoted to hosts, more robust design architecture to minimize single points of failure and choke points. So I say there’s plenty of room to innovate yet in the virtualization industry given that the CPUs and their architectures are in an early stage of innovating massively multi-core cpus.
Almost as galling as the Amazon Web Services outage itself is a the litany of blog posts, such as this one and this one, that place the blame not on AWS for having a long failure and not communicating with its customers about it, but on AWS customers for not being better prepared for an outage.
As Klint Finley points out in his article, everyone seems to be blaming the folks who ponied up money to host their websites/webapps on the Amazon data center cloud. Until the outage, I was not really aware of the ins and outs, workflow and configuration required to run something on Amazons infrastructure. I am small-scale, small potatoes mostly relying on free services which when the work is great, and when they don’t work, meh! I can take or leave them, my livelihood doesn’t depend on them (thank goodness). But for those who do depend on uptime and pay money for it, they need some greater level of understanding by their service provider.
Amazon doesn’t make things explicit enough to follow a best practice in configuring your website installation using their services. It appears some business had no outages (but didn’t follow best practices) and some folks did have long outages though they had set up everything ‘by the book’ following best practices. The service that lay at the center of the outage was called Relational Database Service (RDS) and Elastic Block Storage (EBS). Many websites use databases to hold contents of the website, collect data and transaction information, collect metadata about users likes/dislikes, etc. The Elastic Block Storage acts as the container for the data in the RDS. When your website goes down if you have things setup correctly things fail gracefully, you have duplicate RDS and EBS containers in the Amazon data center cloud that will take over and continue responding to people clicking on things and typing in information on your website instead of throwing up error messages or not responding at all (in a word it just magically continues working). However, if you don’t follow the “guidelines” as specified by Amazon, all bets are off you wasted money paying double for the more robust, fault tolerant failover service.
Most people don’t care about this especially if they weren’t affected by the outages. But the business owners who suffered and their customers who they are liable for definitely do. So if the entrepreneurial spirit bites you, and you’re very interested in online commerce always be aware. Nothing is free, and especially nothing is free even if you pay for it and don’t get what you paid for. I would hope a leading online commerce company like Amazon could do a better job and in future make good on its promises.