computers data center flash memory SSD technology

Fusion-ios flash drill threatens to burst Violins pipes • The Register

Violin Memory logo
Violin Memory Inc.

NoSQL database supplier Couchbase says it is tweaking its key-value storage server to hook into Fusion-ios PCIe flash ioMemory products – caching the hottest data in RAM and storing lukewarm info in flash. Couchbase will use the ioMemory SDK to bypass the host operating systems IO subsystems and buffers to drill straight into the flash cache.

via Fusion-ios flash drill threatens to burst Violins pipes • The Register.

Can you hear it? It’s starting to happen. Can you feel it? The biggest single meme of the last 2 years Big Data/NoSQL is mashing up with PCIe SSDs and in memory databases. What does it mean? One can only guess but the performance gains to be had using a product like CouchBase to overcome the limits of a traditional tables/rows SQL database will be amplified when optimized and paired up with PCIe SSD data stores. I’m imagining something like a 10X boost in data reads/writes on the CouchBase back end. And something more like realtime performance from something that might have been treated previously like a Data Mart/Data warehouse. If the move to use the ioMemory SDK and directFS technology with CouchBase is successful you are going to see some interesting benchmarks and white papers about the performance gains.

What is Violin Memory Inc. doing in this market segment of tiered database caches? Violin is teaming with SAP to create a tiered cache for the HANA in memory databasefrom SAP. The SSD SAN array provided by Violin could be multi-tasked to do other duties (providing a cache to any machine on the SAN network). However, this product most likely would be a dedicated caching store to speed up all operations of a RAM based HANA installation, speeding up Online transaction processing and parallel queries on realtime data. No doubt SAP users could stand to gain a lot if they are already invested heavily into the SAP universe of products. But for the more enterprising, entrepreneurial types I think Fusio-io and Couchbase could help get a legacy free group of developers up and running with equal performance and scale. Which ever one you pick is likely to do the job once it’s been purchased, installed and is up and running in a QA environment.

Image representing Fusion-io as depicted in Cr...
Image via CrunchBase
cloud data center technology web standards

From Big Data to NoSQL: Part 3 (

Image representing ReadWriteWeb as depicted in...
Image via CrunchBase

In Part One we covered data, big data, databases, relational databases and other foundational issues. In Part Two we talked about data warehouses, ACID compliance, distributed databases and more. Now well cover non-relational databases, NoSQL and related concepts.

via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 3.

I really give a lot of credit to ReadWriteWeb for packaging up this 3 part series (started May 24th I think). This at least narrows down what is meant by all the fast and loose terms White Papers and Admen are throwing around to get people to consider their products in RFPs. Just know this though, in many cases to NoSQL databases that keep coming into the market tend to be one-off solutions created by big social networking companies who couldn’t get MySQL/Oracle/MSQL to scale in size/speed sufficiently during their early build-outs. Just think of Facebook hitting the 500million user mark and you will know that there’s got to be a better way than relational algebra and tables with columns and rows.

In part 3 we finally get to what we have all been waiting for, Non-relational Databases, so-called NoSQL. Google’s MapReduce technology is quickly shown as one of the most widely known examples of a NoSQL type distributed database that while not adhering to absolute or immediate consistency gets there with ‘eventual consistency (Consistency being the big C in the acronym ACID). The coolest thing about MapReduce is the similarity (at least in my mind) it bears to the Seti@Home Project where ‘work units’ were split out of large data tapes and distributed piecemeal over the Internet and analyzed on a person’s desktop computer. The complete units were then gathered up and brought together into a final result. This is similar to how Google does it’s big data analysis to get work done in its data centers. And it follows on in the opensource project Hadoop, an opensource version of MapReduce started by Yahoo and now part of the Apache organization.

Document databases are cool too, and very much like an Object-oriented Database where you have a core item with attributes appended. I think also of LDAP directories which also have similarities to Object -oriented databases. A person has a ‘Common Name’ or CN attribute. The CN is as close to a unique identifier as you can get, with all the attributes strung along, appended on the end as they need to be added, in no particular order. The ability to add attributes as needed is like ‘tagging’ in the way Social networking websites like Picture, Bookmark websites do it. You just add an arbitrary tag in order to help search engines index the site and help relevant web searches find your content.

The relationship between Graph Databases and Mind-Mapping is also very interesting. There’s a good graphic illustrating a Graph database of blog content to show how relation lines are drawn and labeled. So now I have a much better understanding of Graph databases as I have used mind-mapping products before. Nice parallel there I think.

At the very end of hte article there’s mention of NewSQL of which Drizzle is an interesting offshoot. Looking up more about it, I found it interesting as a fork of the MySQL project. Specifically Drizzle factors out tons of functions some folks absolutely need but don’t always have (like say 32-bit legacy support). There’s a lot of attempts to get the code smaller so the overall lines of code went from over 1 million for MySQL to just under 300,000 for the Drizzle project. Speed and simplicity is the order of the day with Drizzle. Add missing functions by simply add the plug-in to the main app and you get back some of the MySQL features that might have been missing.

*Note: Older survey of the NoSQL field conducted by ReadWriteWeb in 2009

cloud computers data center technology

From Big Data to NoSQL: Part 2 (from ReadWriteWeb)

Image representing ReadWriteWeb as depicted in...
Image via CrunchBase

In this section we’ll talk about data warehouses, ACID compliance, distributed databases and more.

via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 2.

After linking to the Part 1 of this series of articles on ReadWriteWeb (all the way back in May), today there’s yet more terminology and info for the enterprising, goal-oriented technologists. Again, there’s some good info and a diagram to explain some of the concepts, and what makes these things different from what we are already using today. I particularly like finding out about performance benefits of these different architectures versus tables, columns and rows of traditional associative algebra driven SQL databases.

Where I work we have lots of historic data kept on file in a Data Warehouse. This typically gets used to generate reports to show compliance, meet regulations and continue to receive government grants. For the more enterprising Information Analyst it also provides a source of  historic data for creating forecasts modeled on past activity. For the Data Scientist ir provides an opportunity to discover things people didn’t know existed within the data (Data Mining). But now that things are becoming more ‘realtime’ there’s a call for analyzing data streams as they occur instead of after the fact (Data Warehouses and Data Mining).

cloud data center technology

NoSQL is What? (via Jeremy Zawodny’s blog)

Image representing Jeremy Zawodny as depicted ...
Image by Flickr / Jeremy Zawodny via CrunchBase

Great set of comments along with a very good description of advantages of using NoSQL in a web application. There seems to be quite a bit of philosophical differences over whether or not NoSQL needs to be chosen at the earliest stages of ANY project. But Jeremy’s comments more or less prove, you pick the right tool for the right job, ‘Nuff Said.

Jeremy Zawodny: I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order. In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You … Read More

via Jeremy Zawodny’s blog

cloud data center google surveillance web standards

From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology (Part 1)

Process and data modeling
Image via Wikipedia

Big Data

In short, big data simply means data sets that are large enough to be difficult to work with. Exactly how big is big is a matter of debate. Data sets that are multiple petabytes in size are generally considered big data (a petabye is 1,024 terabytes). But the debate over the term doesn’t stop there.

via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology (Part 1).

There’s big doin’s inside and outside the data center theses days. You cannot spend a day without a cool new article about some new project that’s just been open sourced from one of the departments inside the social networking giants. Hadoop being the biggest example. What you ask is Hadoop? It is a project Yahoo started after Google started spilling the beans on it’s two huge technological leaps in massively parallel databases and processing real time data streams. The first one was called BigTable. It is a huge distributed database that could be brought up on an inordinately large number of commodity servers and then ingest all the indexing data sent by Google’s web bots as they found new websites. That’s the database and ingestion point. The second point is the way in which the rankings and ‘pertinence’ of the indexed websites would be calculated through PageRank. The invention for the realtime processing of this data being collected is called MapReduce. It was a way of pulling in, processing and quickly sorting out the important highly ranked websites. Yahoo read the white papers put out by Google and subsequently created a version of those technologies which today power the Yahoo! search engine. Having put this into production and realizing the benefits of it, Yahoo turned it into an open source project to lower the threshold of people wanting to get into the Big Data industry. Similarly, they wanted to get many eyes of programmers looking at the source code and adding features, packaging it, and all importantly debugging what was already there. Hadoop was the name given to the Yahoo bag of software and this is what a lot of people initially adopt if they are trying to do large scale collection and real-time analysis of Big Data.

Another discovery along the way towards the Big Data movement was a parallel attempt to overcome the limitations of extending the schema of a typical database holding all the incoming indexed websites. Tables and Rows and Structured Query Language (SQL) have ruled the day since about 1977 or so, and for many kinds of tabbed data there is no substitute. However, the kinds of data being stored now fall into the big amorphous mass of binary large objects (BLOBs) that can slow down a traditional database. So a non-SQL approach was adopted and there are parts of the BigTable database and Hadoop that dump the unique key values and relational tables of SQL to just get the data in and characterize it as quickly as possible, or better yet to re-characterize it by adding elements to the schema after the fact. Whatever you are doing, what you collect might not be structured or easily structured so you’re going to need to play fast and loose with it and you need a database of some sort equal to that task. Enter the NoSQL movement to collect and analyze Big Data in its least structured form. So my recommendation to anyone trying to get the square peg of Relational Databases to fit the round hole of their unstructured data is to give up. Go NoSQL and get to work.

This first article from Read Write Web is good in that it lays the foundation for what a relational database universe looks like and how you can manipulate it. Having established what IS, future articles will be looking at what quick, dirty workarounds and one off projects people have come up with to fit their needs. And subsequently which ‘Works for Me’ type solutions have been turned into bigger open source projects that will ‘Work for Others’, as that is where each of these technologies will really differentiate themselves. Ease of use and lowering the threshold will be deciding factors for many people’s adoption of a NoSQL database I’m sure.

cloud computers data center technology wintel

Microsoft Research Watch: AI, NoSQL and Microsoft’s Big Data Future

Image representing Microsoft as depicted in Cr...
Image via CrunchBase

Probase is a Microsoft Research project described as an “ongoing project that focuses on knowledge acquisition and knowledge serving.” Its primary goal is to “enable machines to understand human behavior and human communication.” It can be compared to  Cyc, DBpedia or Freebase in that it is attempting to compile a massive collection of structured data that can be used to power artificial intelligence applications.

via Microsoft Research Watch: AI, NoSQL and Microsoft’s Big Data Future – ReadWriteCloud.

Who knew Microsoft was so interested in things only IBM Research’s Watson could demonstrate? AI (artificial intelligence) seems to be targeted at Bing search engine results. And in order to back this all up, they have to ditch their huge commitment to Microsoft SQL Server and go for a NoSQL database in order to hold all the unstructured data. This seems like a huge shift away from desktop and data center applications and something much more oriented to a cloud computing application where collected data is money in the bank. This is best expressed in the example given in the story of Google vs. Facebook. Google may collect data, but it is really delivering ads to eyeballs. Whereas Facebook is just collecting the data and sharing that to the highest bidder. Seems like Microsoft is going the Facebook route of wanting to collect and own the data rather than merely hosting other people’s data (like Google and Yahoo).

google science & technology technology wintel wired culture

Big Web Operations Turn to Tiny Chips –

Stephen O’Grady, a founder at the technology analyst company RedMonk, said the technology industry often has swung back and forth between more standard computing systems and specialized gear.

via Big Web Operations Turn to Tiny Chips –

A little tip of the hat to Andrew Feldman, CEO of SeaMicro the startup company that announced it’s first product last week. The giant 512 cpu computer is being covered in this NYTimes article to spotlight the ‘exotic’ technologies both hardware and software some companies use to deploy huge web apps. It’s part NoSQL part low power massive parallelism.