In Part One we covered data, big data, databases, relational databases and other foundational issues. In Part Two we talked about data warehouses, ACID compliance, distributed databases and more. Now well cover non-relational databases, NoSQL and related concepts.
I really give a lot of credit to ReadWriteWeb for packaging up this 3 part series (started May 24th I think). This at least narrows down what is meant by all the fast and loose terms White Papers and Admen are throwing around to get people to consider their products in RFPs. Just know this though, in many cases to NoSQL databases that keep coming into the market tend to be one-off solutions created by big social networking companies who couldn’t get MySQL/Oracle/MSQL to scale in size/speed sufficiently during their early build-outs. Just think of Facebook hitting the 500million user mark and you will know that there’s got to be a better way than relational algebra and tables with columns and rows.
In part 3 we finally get to what we have all been waiting for, Non-relational Databases, so-called NoSQL. Google’s MapReduce technology is quickly shown as one of the most widely known examples of a NoSQL type distributed database that while not adhering to absolute or immediate consistency gets there with ‘eventual consistency (Consistency being the big C in the acronym ACID). The coolest thing about MapReduce is the similarity (at least in my mind) it bears to the Seti@Home Project where ‘work units’ were split out of large data tapes and distributed piecemeal over the Internet and analyzed on a person’s desktop computer. The complete units were then gathered up and brought together into a final result. This is similar to how Google does it’s big data analysis to get work done in its data centers. And it follows on in the opensource project Hadoop, an opensource version of MapReduce started by Yahoo and now part of the Apache organization.
Document databases are cool too, and very much like an Object-oriented Database where you have a core item with attributes appended. I think also of LDAP directories which also have similarities to Object -oriented databases. A person has a ‘Common Name’ or CN attribute. The CN is as close to a unique identifier as you can get, with all the attributes strung along, appended on the end as they need to be added, in no particular order. The ability to add attributes as needed is like ‘tagging’ in the way Social networking websites like Picture, Bookmark websites do it. You just add an arbitrary tag in order to help search engines index the site and help relevant web searches find your content.
The relationship between Graph Databases and Mind-Mapping is also very interesting. There’s a good graphic illustrating a Graph database of blog content to show how relation lines are drawn and labeled. So now I have a much better understanding of Graph databases as I have used mind-mapping products before. Nice parallel there I think.
At the very end of hte article there’s mention of NewSQL of which Drizzle is an interesting offshoot. Looking up more about it, I found it interesting as a fork of the MySQL project. Specifically Drizzle factors out tons of functions some folks absolutely need but don’t always have (like say 32-bit legacy support). There’s a lot of attempts to get the code smaller so the overall lines of code went from over 1 million for MySQL to just under 300,000 for the Drizzle project. Speed and simplicity is the order of the day with Drizzle. Add missing functions by simply add the plug-in to the main app and you get back some of the MySQL features that might have been missing.
*Note: Older survey of the NoSQL field conducted by ReadWriteWeb in 2009
- NoSQL for the rest of us (econsultancy.com)
- A Taxonomy of NoSQL Databases (arnoldit.com)
- Big Data, Medium Data…Learning More (infobright.org)