Tuesday at Computex, OCZ claimed that it set a new benchmark of 1 million 4K write IOPS and 1.5 million read IOPS with a single Z-Drive R4 88-equipped 3U Colfax International Server.
Between the RevoDrive and the Z-Drive OCZ is tearing up the charts with product releases announced in Taipei, Taiwan‘s Computex 2011 trade show. This particular one off demonstration was using a number of OCZ’s announced but as yet unreleased Z-Drive R4 88 packed into a 3U Colfax International enclosure. In other words, it’s an idealized demonstration of what kind of performance you might achieve in a best case scenario. The speeds are in excess of 3Gbytes/sec. for writing and reading which for Webserving or Database hosting is going to make a big difference for people that need the I/O. Previously you would have had to use a very expensive large scale Fibre Channel hard drive array that split and RAID’d the data across so many spinning hard drive spindles that you might come partially close to matching these speeds. But the SIZE! Ohmigosh. You would not be able to fit that amount of hardware into a 3U enclosure, never. So space constrained data centers will benefit enormously from dumping some of their drive array infrastructure for these more compact I/O monsters (some are from other manufacturers too, like Violin, RamSan and Fusion-io). Again, as I have said before when Anandtech and Tom’s Hardware can get sample hardware to benchmark the performance I will be happy to see what else these PCIe SSDs can do.
Theres a new PCIe SSD in town: the RevoDrive 3. Armed with two SF-2281 controllers and anywhere from 128 – 256GB of NAND 120/240GB capacities, the RevoDrive 3 is similar to its predecessors in that the two controllers are RAIDed on card. Heres where things start to change though.
OCZ is back with a revision of its consumer grade PCIe SSD, the RevoDrive. This time out the SandForce SF-2281 makes an appearance and to great I/O effect. The bus interface is a true PCIe bridge chip as opposed to the last versions PCI-X to PCIe bridge. Also this device can be controlled completely through the OSes own drive utilities and TRIM support. All combined this is the most natively and well support PCIe SSD to hit the market. No benchmarks yet from a commercially shipping product. But my fingers are crossed that this thing is going to be faster than OCZ’s Vertex 3 and Vertex 3 Pro (I hope) while possibly holding more flash memory chips than those SATA 6 based SSDs.
One other upshot of this revised product is full OS booting support. So not only will TRIM work but your motherboard and the PCIe’s card electronics will allow you to boot directly off of the card. So this is by far the most evolved and versatile PCIe based SSD drive to date. Pricing is the next big question on my mind after reading the specifications. Hopefully will not be Enterprise grade (greater than $1200). I’ve found most off the prosumer and gamer market upgrade manufacturers are comfortable setting prices at the $1200 price point for these PCIe SSDs. And that trend has been pretty reliable going back to the original RevoDrive.
Lens-FitzGerald: I never thought of going into augmented reality, but cyberspace, any form of digital worlds, have always been one of the things I’ve been thinking about since I found out about science fiction. One of the first books I read of the cyber punk genre was Bruce Sterling‘s “Mirror Shades.” Mirror shades, meaning, of course, AR goggles. And that book came out in 1988 and ever since, this was my world.
An interview with the man that who created the most significant Augmented Reality (AR) application on handheld devices Layar. In the time since the first releases on smartphones like the Android in Europe, Layar has branched out to cover more of the OSes available on hand held devices. The interest I think has cooled somewhat on AR as social network and location has seemed to rule the day. And I would argue even location isn’t as fiery hot as it was at the beginning. But Facebook is still here with a vengeance. So wither the market for AR? What’s next you wonder, well it seems Qualcomm today has announced it’s very own AR Toolkit to help jump start the developer market more useful, nay killer AR apps. Stay tuned.
Theres another issue holding users back from the Vertex 3: capacity. The Vertex 3 is available in 120, 240 and 480GB versions, there is no 60GB model. If you’re on a budget or like to plan frequent but rational upgrades, the Vertex 3 can be a tough sell.
OCZ apart from having the fastest SSD on the market now is attempting to branch out and down market simultaneously. And by down market I don’t mean anything other than the almighty PRICE. It’s all about the upgrade market for the PC Fan boys that want to trade up to get the next higher performing part for their gaming computer (If people still do that, play games on their PeeCees). Performance-wise it is designed to be less expensive and this SSD shows that it is not the highest speed part. So if you demand to own an OCZ branded SSD and won’t settle for anything less, but you don’t want to pay $499 to get it, the Agility 3 is just for you. Also if you read the full review the charts will show how all the current generation SATA 6 drives are shaping up (Intel included) versus the previous generation SATA 2.0 drives (3Gbytes/sec). OCZ Vertex 3 is still the king of the mountain at the 240GB size, but is still very much at a price premium.
A flash array controller needs: “An architecture built from the ground up around SSD technology that sizes cache, bandwidth, and processing power to match the IOPS that SSDs provide while extending their endurance. It requires an architecture designed to take advantage of SSDs unique properties in a way that makes a scalable all-SSD storage solution cost-effective today.”
I think that Storage Controllers are the point of differentiation now for the SSDs coming on the market today. Similarly the device that ties those SSDs into the comptuer and its OS are equally, nay more important. I’m thinking specifically about a product like the SandForce 2000 series SSD controllers. They more or less provide a SATA or SAS interface into a small array of flash memory chips that are made to look and act like a spinning hard drive. However, time is coming soon now where all those transitional conventions can just go away and a clean slate design can go forward. That’s why I’m such a big fan of the PCIe based flash storage products. I would love to see SandForce create a disk controller with one interface that speaks PCIe 2.0/3.0 and the other is just open to whatever technology Flash memory manufacturers are using today. Ideally then the Host Bus would always be a high speed PCI Express interface which could be licensed or designed from the ground up to speed I/O in and out of the Flash memory array. On the memory facing side it could be almost like an FPGA made to order according to the features, idiosyncrasies of any random Flash Memory architecture that is shipping at the time of manufacture. Same would apply for any type of error correction and over-provisioning for failed memory cells as the SSD ages through multiple read/write cycles.
In this article I quoted at the top from The Register, the big storage array vendors are attempting to market new products by adding Flash memory to either one component of the whole array product or in the case off EMC the whole product uses Flash memory based SSDs throughout. That more aggressive approach has seemed to be overly cost prohibitive given the manufacturing cost of large capacity commodity hard drives. But they problem is, in the market where these vendors compete, everyone pays an enormous price premium for the hard drives, storage controllers, cabling and software that makes it all work. Though the hard drive might be cheaper to manufacture, the storage array is not and that margin is what makes Storage Vendors a very profitable business to be in. As stated last week in the benchmark comparisons of High Throughput storage arrays, Flash based arrays are ‘faster’ per dollar than a well designed, engineered top-of-the-line hard drive based storage array from IBM. So for the segment of the industry that needs the throughput more than the total space, EMC will likely win out. But Texas Memory Systems (TMS) is out there too attempting to sign up OEM contracts with folks attempting to sell into the Storage Array market. The Register does a very good job surveying the current field of vendors and manufacturers trying to look at which companies might buy a smaller company like TMS. But the more important trend being spotted throughout the survey is the decidedly strong move towards native Flash memory in the storage arrays being sold into the Enterprise market. EMC has a lead, that most will be following real soon now.
In short, big data simply means data sets that are large enough to be difficult to work with. Exactly how big is big is a matter of debate. Data sets that are multiple petabytes in size are generally considered big data (a petabye is 1,024 terabytes). But the debate over the term doesn’t stop there.
There’s big doin’s inside and outside the data center theses days. You cannot spend a day without a cool new article about some new project that’s just been open sourced from one of the departments inside the social networking giants. Hadoop being the biggest example. What you ask is Hadoop? It is a project Yahoo started after Google started spilling the beans on it’s two huge technological leaps in massively parallel databases and processing real time data streams. The first one was called BigTable. It is a huge distributed database that could be brought up on an inordinately large number of commodity servers and then ingest all the indexing data sent by Google’s web bots as they found new websites. That’s the database and ingestion point. The second point is the way in which the rankings and ‘pertinence’ of the indexed websites would be calculated through PageRank. The invention for the realtime processing of this data being collected is called MapReduce. It was a way of pulling in, processing and quickly sorting out the important highly ranked websites. Yahoo read the white papers put out by Google and subsequently created a version of those technologies which today power the Yahoo! search engine. Having put this into production and realizing the benefits of it, Yahoo turned it into an open source project to lower the threshold of people wanting to get into the Big Data industry. Similarly, they wanted to get many eyes of programmers looking at the source code and adding features, packaging it, and all importantly debugging what was already there. Hadoop was the name given to the Yahoo bag of software and this is what a lot of people initially adopt if they are trying to do large scale collection and real-time analysis of Big Data.
Another discovery along the way towards the Big Data movement was a parallel attempt to overcome the limitations of extending the schema of a typical database holding all the incoming indexed websites. Tables and Rows and Structured Query Language (SQL) have ruled the day since about 1977 or so, and for many kinds of tabbed data there is no substitute. However, the kinds of data being stored now fall into the big amorphous mass of binary large objects (BLOBs) that can slow down a traditional database. So a non-SQL approach was adopted and there are parts of the BigTable database and Hadoop that dump the unique key values and relational tables of SQL to just get the data in and characterize it as quickly as possible, or better yet to re-characterize it by adding elements to the schema after the fact. Whatever you are doing, what you collect might not be structured or easily structured so you’re going to need to play fast and loose with it and you need a database of some sort equal to that task. Enter the NoSQL movement to collect and analyze Big Data in its least structured form. So my recommendation to anyone trying to get the square peg of Relational Databases to fit the round hole of their unstructured data is to give up. Go NoSQL and get to work.
This first article from Read Write Web is good in that it lays the foundation for what a relational database universe looks like and how you can manipulate it. Having established what IS, future articles will be looking at what quick, dirty workarounds and one off projects people have come up with to fit their needs. And subsequently which ‘Works for Me’ type solutions have been turned into bigger open source projects that will ‘Work for Others’, as that is where each of these technologies will really differentiate themselves. Ease of use and lowering the threshold will be deciding factors for many people’s adoption of a NoSQL database I’m sure.
I know Bruce Schneier was very hard on TSA’s changes in screening over time since they were first rushed into service. If Bruce’s attitude towards Security Theater can evolve, so can mine.
The WSJ reported recently that the FBI, looking for fresh leads in the 1982 case of Tylenol poisonings, suspects Ted "Unabomber" Kaczynski and is trying to get hold of a sample of his DNA. Coincidentally I was just thinking about that case thanks to Bruce Schneier. In his recent TED talk he mentions that the Tylenol incident led to tamper-proof caps — a perfect example of what Schneier likes to call "security theater": As a homework assignment, … Read More
Upstart multicore chip maker Tilera is using the Interop networking trade show as the coming out party for its long-awaited Tile-Gx series of processors, which top out at 100 cores on a single die.
A further update on Tilera’s product launches as the old Interop tradeshow for network switch and infrastructure vendors is held in Las Vegas. They have tweaked the chip packaging of their cpus and now are going to market different cpus to different industries. This family of Tilera chips is called the 8000 series and will be followed by a next generation of 3000 and 5000 series chips. Projections are by the time the Tilera 3000 series is released the density of the chips will be sufficient to pack upwards of 20,000 cpu cores of Tilera chips in a single 42 unit tall, 19 inch wide server rack. with a future revision possibly doubling that number of cores to 40,000. That road map is very agressive but promising and shows that there is lots of scaling possible with the Tilera product over time. Hopefully these plans will lead to some big customers signing up to use Tilera in shipping product in the immediate and near future.
What I’m most interested in knowing is how does the Qanta server currently shipping that uses the Tilera cpu benchmark compared to an Intel Atom based or ARM based server on a generic webserver benchmark. While white papers and press releases have made regular appearances on the technolog weblogs, very few have attempted to get sample product and run it through the paces. I suspect, and cannot confirm that anyone who is a potential customer are given Non-disclosure Agreements and shipping samples to test in their data centers before making any big purchases. I also suspect that as is often the case the applications for these low power massively parallel dense servers is very narrow. Not unlike that for a super computer. IBM‘s Cell Processor that powers the Blue Gene super computers is essentially a PowerPC architecture with some extra optimizations and streamlining to make it run very specific workloads and algorithms faster. In a super computing environment you really need to tune your software to get the most out of the huge up front investment in the ‘iron’ that you got from the manufacturer. There’s not a lot of value add available in that scientific and super computing environment. You more or less roll your own solution, or beg, borrow or steal it from a colleague at another institution using the same architecture as you. So the Quanta S2Q server using the Tilera chip is similarly likely to be a one off or niche product, but a very valuable one to those who purchase it. Tilera will need a software partner to really pump up the volumes of shipping product if they expect a wider market for their chips.
But using a Tilera processor in a network switch or a ‘security’ device or some other inspection engine might prove very lucrative. I’m thinking of your typical warrantless wire-tapping application like the NSA‘s attempt to scoop up and analyze all the internet traffic at large carriers around the U.S. Analyzing data traffic in real time prevents folks like NSA from capturing and having to move around large volumes of useless data in order to have it analyzed at a central location. Instead localized computing nodes can do the initial inspection in realtime keying on phrases, words, numbers, etc. which then trigger the capturing process and send the tagged data back to NSA for further analysis. Doing that in parallel with a 100 core CPU would be very advantageous in that a much smaller footprint would be required in the secret closets NSA maintains at those big data carriers operations centers. Smaller racks, less power makes for a much less obvious presence in the data center.
Texas Memory Systems has absolutely creamed the SPC-1 storage benchmark with a system that comfortably exceeds the current record-holding IBM system at a cost per transaction of 95 per cent less.
One might ask a simple question, how is this even possible given the cost of the storage media involved. How is it a Flash based storage array from RamSan beat a huge pile of IBM hard drives all networked and bound together in a massive storage system? And how did it do it for less? Woe be to those unschooled in the ways of the Per-feshunal Data Center purchasing dept. You cannot enter the halls of the big players unless you got million dollar budgets for big iron servers and big iron storage. Fibre Channel and Infiniband rule the day when it comes to big data throughput. All those spinning drives accessed simultaneously as if each one held one slice of the data you were asking for, each one delivering up it’s 1/10 of 1% of the total file you were trying to retrieve. And the resulting speed makes it look like one hard drive that is 10X10 faster than your desktop computer hard drive all through the smoke and mirrors of the storage controllers and the software that makes them go. But what if, just what if we decided to take Flash memory chips and knit them together with a storage controller that made them appear to be just like a big iron storage system? Well since Flash obviously costs something more than $1 per gigabyte and disk drives cost somewhere less than 10 cents per gigabyte the Flash storage loses right?
In terms of total storage capacity Flash will lose for quite some time when you are talking about holding everything on disk all at the same time. But that is not what’s being benchmarked here at all. No, in fact what is being benchmarked is the rate at which Input (writing of data) and Output (reading of data) is done through the storage controllers. IOPS measure the total number of completed reads/writes done in a given amount of time. Previous to this latest example of the RamSan-630, IBM was king of the mountain with it’s huge striped Fibre Channel arrays all linked up through it’s own storage array controllers. RamSan came in at 400,503.2 IOPS as compared to IBM’s top of the line San Volume Controller with 380,489.3. That’s not very much difference you say, especially considering how much smaller the amount of data a RamSan can hold,… And that would be a valid argument but consider again, that’s not what we’re benchmarking it is the IOPS.
Total cost for the IBM benchmarked system per IOP was $18.83. RamSan (which best IBM in total IOPS) was a measly $1.05 per IOP. The cost is literally 95% less than IBM’s cost. Why? Consider the price (even if it was steeply discounted as most Tech Writers will say as a cavea) for IBM’s benchmarked system costs $7.17Million dollars. Remember I said you need million dollar budgets to play in the data center space. Now consider the RamSan-630 costs $419,000. If you want speed, dump your spinning hard drives, Flash is here to stay and you cannot argue with the speed versus the price at this level of performance. No doubt this is going to threaten the livelihood of a few big iron storage manufacturers. But through disruption, progress is made.