Category: data center

  • From Big Data to NoSQL: Part 3 (ReadWriteWeb.com)

    Image representing ReadWriteWeb as depicted in...
    Image via CrunchBase

    In Part One we covered data, big data, databases, relational databases and other foundational issues. In Part Two we talked about data warehouses, ACID compliance, distributed databases and more. Now well cover non-relational databases, NoSQL and related concepts.

    via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 3.

    I really give a lot of credit to ReadWriteWeb for packaging up this 3 part series (started May 24th I think). This at least narrows down what is meant by all the fast and loose terms White Papers and Admen are throwing around to get people to consider their products in RFPs. Just know this though, in many cases to NoSQL databases that keep coming into the market tend to be one-off solutions created by big social networking companies who couldn’t get MySQL/Oracle/MSQL to scale in size/speed sufficiently during their early build-outs. Just think of Facebook hitting the 500million user mark and you will know that there’s got to be a better way than relational algebra and tables with columns and rows.

    In part 3 we finally get to what we have all been waiting for, Non-relational Databases, so-called NoSQL. Google’s MapReduce technology is quickly shown as one of the most widely known examples of a NoSQL type distributed database that while not adhering to absolute or immediate consistency gets there with ‘eventual consistency (Consistency being the big C in the acronym ACID). The coolest thing about MapReduce is the similarity (at least in my mind) it bears to the Seti@Home Project where ‘work units’ were split out of large data tapes and distributed piecemeal over the Internet and analyzed on a person’s desktop computer. The complete units were then gathered up and brought together into a final result. This is similar to how Google does it’s big data analysis to get work done in its data centers. And it follows on in the opensource project Hadoop, an opensource version of MapReduce started by Yahoo and now part of the Apache organization.

    Document databases are cool too, and very much like an Object-oriented Database where you have a core item with attributes appended. I think also of LDAP directories which also have similarities to Object -oriented databases. A person has a ‘Common Name’ or CN attribute. The CN is as close to a unique identifier as you can get, with all the attributes strung along, appended on the end as they need to be added, in no particular order. The ability to add attributes as needed is like ‘tagging’ in the way Social networking websites like Picture, Bookmark websites do it. You just add an arbitrary tag in order to help search engines index the site and help relevant web searches find your content.

    The relationship between Graph Databases and Mind-Mapping is also very interesting. There’s a good graphic illustrating a Graph database of blog content to show how relation lines are drawn and labeled. So now I have a much better understanding of Graph databases as I have used mind-mapping products before. Nice parallel there I think.

    At the very end of hte article there’s mention of NewSQL of which Drizzle is an interesting offshoot. Looking up more about it, I found it interesting as a fork of the MySQL project. Specifically Drizzle factors out tons of functions some folks absolutely need but don’t always have (like say 32-bit legacy support). There’s a lot of attempts to get the code smaller so the overall lines of code went from over 1 million for MySQL to just under 300,000 for the Drizzle project. Speed and simplicity is the order of the day with Drizzle. Add missing functions by simply add the plug-in to the main app and you get back some of the MySQL features that might have been missing.

    *Note: Older survey of the NoSQL field conducted by ReadWriteWeb in 2009

  • From Big Data to NoSQL: Part 2 (from ReadWriteWeb)

    Image representing ReadWriteWeb as depicted in...
    Image via CrunchBase

    In this section we’ll talk about data warehouses, ACID compliance, distributed databases and more.

    via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 2.

    After linking to the Part 1 of this series of articles on ReadWriteWeb (all the way back in May), today there’s yet more terminology and info for the enterprising, goal-oriented technologists. Again, there’s some good info and a diagram to explain some of the concepts, and what makes these things different from what we are already using today. I particularly like finding out about performance benefits of these different architectures versus tables, columns and rows of traditional associative algebra driven SQL databases.

    Where I work we have lots of historic data kept on file in a Data Warehouse. This typically gets used to generate reports to show compliance, meet regulations and continue to receive government grants. For the more enterprising Information Analyst it also provides a source of  historic data for creating forecasts modeled on past activity. For the Data Scientist ir provides an opportunity to discover things people didn’t know existed within the data (Data Mining). But now that things are becoming more ‘realtime’ there’s a call for analyzing data streams as they occur instead of after the fact (Data Warehouses and Data Mining).

  • Tilera routs Intel, AMD in Facebook bakeoff • The Register

    Structure of the TILE64 Processor from Tilera
    Tile64 processor from Tilera

    Facebook lined up the Tilera-based Quanta servers against a number of different server configurations making use of Intels four-core Xeon L5520 running at 2.27GHz and eight-core Opteron 6128 HE processors running at 2GHz. Both of these x64 chips are low-voltage, low power variants. Facebook ran the tests on single-socket 1U rack servers with 32GB and on dual-socket 1U rack servers with 64GB.All three machines ran CentOS Linux with the 2.6.33 kernel and Memcached 1.2.3h.

    via Tilera routs Intel, AMD in Facebook bakeoff • The Register.

    You will definitely want to read this whole story as presented El Reg. They have a few graphs displaying the performance of the Tilera based Quanta data cloud in a box versus the Intel server rack. And let me tell you on certain very specific workloads like the Web Caching using Memcached I declare advantage Tilera. No doubt data center managers need to pay attention to this and get some more evidence to back up this initial white paper from Facebook, but this is big, big news. And all one need do apart from tuning the software for the chipset is add a few PCIe based SSDs or TMS RamSan and you have what could theoretically be the fastest possible web performance possible. Even at this level of performance, there’s still room to grow I think on the hard drive storage front. What I would hope in future to see is Facebook do an exhaustive test on the Quanta SQ-2 product versus Calxeda (ARM cloud in a box) and the Seamicro SM-10000×64 (64bit Intel Atom cloud in a box). It would prove an interesting research project just to see how much chipsets, chip architectures and instruction sets play in optimizing each for a particular style and category of data center workload. I know I will be waiting and watching.

  • First Sungard goes private and now Blackboard

    The buyers include Bain Capital, the Blackstone Group, Goldman Sachs Capital Partners, Kohlberg Kravis Roberts, Providence Equity Partners and Texas Pacific Group. The group is led by Silver Lake Partners. The deal is a leveraged buyout – Sungard will be taken private and its shares removed from Wall Street.

    via Sungard goes private • The RegisterPosted in CIO29th March 2005 10:37 GMT

    RTTNews – Private equity firm Providence Equity Partners, Inc. agreed Friday to take educational software and systems provider Blackboard, Inc. (BBBB: News ) private for $45 per share in an all-cash deal of $1.64 billion.

    It would appear now that Providence Equity Partners owns two giants in the Higher Ed outsourcing industry Sungard and Blackboard. What does this mean? Will there be consolidation where there is overlap between the two companies? Will there be attempts to steal customers or upsell each other’s products?

  • SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register

    Image representing SeaMicro as depicted in Cru...
    Image via CrunchBase

    An original SM10000 server with 512 cores and 1TB of main memory cost $139,000. The bump up to the 64-bit Atom N570 for 512 cores and the same 1TB of memory boosted the price to $165,000. A 768-core, 1.5TB machine using the new 64HD cards will run you $237,000. Thats 50 per cent more oomph and memory for 43.6 per cent more money. ®

    via SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register.

    SeaMicro continues to pump out the jams releasing another updated chassis in less than a year. There is now a grand total of 768 processor cores jammed in that 10U high box. Which leads me to believe they have just eclipsed the compute per rack unit of the Tilera and Calxeda massively parallel cloud servers in a box. But that would wrong because Calxeda is making a 2U server rack unit hold 120-4 core ARM cpus. So that gives you a grand total of 480 in just 2 rack units alone. Multiply that by 5 and you get 2400 cores in a 10U rack serving. So advantage Calxeda in total core count, however lets also consider software too. Atom being the cpu that Seamicro has chosen all along is an intel architecture chip and an x64 architecture at that. It is the best of both worlds for anyone who already had a big investment in Intel binary compatible OSes and applications. It is most often the software and it’s legacy pieces that drive the choice of which processor goes into your data cloud.

    Anyone who had clean slate to start from might be able to choose between Calxeda versus Seamicro for their applications and infrastructure. And if density/thermal design point per rack unit is very important Calxeda too will suit your needs I would think. But who knows? Maybe your workflow isn’t as massively parallel as a Calxeda server and you might have a much lower implementation threshold getting started on an Intel system, so again advantage Seamicro. A real industry analyst would look at these two competing companies as complimentary, different architectures for different workflows.

  • NoSQL is What? (via Jeremy Zawodny’s blog)

    Image representing Jeremy Zawodny as depicted ...
    Image by Flickr / Jeremy Zawodny via CrunchBase

    Great set of comments along with a very good description of advantages of using NoSQL in a web application. There seems to be quite a bit of philosophical differences over whether or not NoSQL needs to be chosen at the earliest stages of ANY project. But Jeremy’s comments more or less prove, you pick the right tool for the right job, ‘Nuff Said.

    Jeremy Zawodny: I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order. In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You … Read More

    via Jeremy Zawodny’s blog

  • Apple patents hint at future AR screen tech for iPad | Electronista

    Structure of liquid crystal display: 1 – verti...
    Image via Wikipedia

    Apple may be working on bringing augmented reality views to its iPad thanks to a newly discovered patent filing with the USPTO.

    via Apple patents hint at future AR screen tech for iPad | Electronista. (Originally posted at AppleInsider at the following link below)

    Original Article: Apple Insider article on AR

    Just a very brief look at a couple of patent filings by Apple with some descriptions of potential applications. They seem to want to use it for navigation purposes using the onboard video camera. One half the screen will use the live video feed, the other half is a ‘virtual’ rendition of that scene in 3D to allow you to find a path or maybe a parking space in between all those buildings.

    The second filing mentions a see-through screen whose opacity can be regulated by the user. The information display will take precedence over the image seen through the LCD panel. It will default to totally opaque using no voltage whatsoever (In Plane switching design for the LCD).

    However the most intriguing part of the story as told by AppleInsider is the use of sensors on the device to determine angle, direction, bearing to then send over the network. Why the network? Well the whole rendering of the 3D scene as described in first patent filing is done somewhere in the cloud and spit back to the iOS device. No onboard 3D rendering needed or at least not at that level of detail. Maybe those datacenters in North Carolina are really cloud based 3D rendering farms?

  • ARM daddy simulates human brain with million-chip super • The Register

    British Scientist, nominated for the Millenniu...
    Steve Furber (Image via Wikipedia)

    While everyone in the IT racket is trying to figure out how many Intel Xeon and Atom chips can be replaced by ARM processors, Steve Furber, the main designer of the 32-bit ARM RISC processor at Acorn in the 1980s and now the ICL professor of engineering at the University of Manchester, is asking a different question, and that is: how many neurons can an ARM chip simulate?

    via ARM daddy simulates human brain with million-chip super • The Register.

    The phrase reminds me a bit of an old TV commercial that would air during the Saturday cartoons. Tootsie Roll brand lollipops had a center made out of Tootsie Roll. The challenge was to determine how many licks does it take to get to the center of a Tootsie Roll Pop? The answer was, “The World May Never Know”. And so it goes for the simulations large scale and otherwise of the human brain.

    I remember also reading Stewart Brand’s 1985 book about the MIT Media Lab and their installation of a brand new multi-processor super computer called The Connection Machine (TCM). Danny Hillis was the designer and author of the original concept of stringing together a series of small one bit computer cores to act like ‘neurons’ in a larger array of cpus. The scale was designed to top out at around 65,535 (2^16). At the time MIT Media Lab only had the machine filled up 1/4 of the way but was attempting to do useful work with it at that size. Hillis spun out of MIT to create a startup company called Thinking Machines (to reflect the neuron style architecture he had pursued as a grad student). In fact all of Hillis’s ideas stemmed from his research that led up to the original Connection Machine Mark. 1.

    Spring forward to today and the sudden appearance of massively parallel, low-power servers like Calxeda using ARM chips, Qanta Sq-2 using Tilera chips (also an MIT spin out). Similarly the Seamicro SM-10000×64 which uses Intel Atom chips in large scale, large quantity. And Seamicro is making sales TODAY. It almost seems like a stereotypical case of an idea being way ahead of its time. So recognize the opportunity because now the person directly responsible for designing the ARM chip is attacking that same problem Danny Hillis was all those years ago.

    Personally I would like to see Hillis join in some way with this program not as Principal Investigator but may a background consultant. Nothing wrong with a few more eyes on the preliminary designs. Especially with Hillis’s background in programming those old mega-scale computers. That is the true black art of trying to do a brain simulator on this scale. Steve Furber might just be able to make lightning strike twice (once for Acorn/ARM cpus and once more for simulating the brain in silicon).

  • Atom smasher claims Hadoop cloud migration victory • The Register

    Image representing SeaMicro as depicted in Cru...
    Image via CrunchBase

    SeaMicro has been peddling its SM10000-64 micro server, based on Intels dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis. . .

    . . . The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

    via Atom smasher claims Hadoop cloud migration victory • The Register.

    While it is not always easy to illustrate the cost/benefit and Return on Investment on a lower power box like the Seamicro, running it head to head on a similar workload with a bunch of off the shelf Xeon boxes really shows the difference. The calculation of the benefit is critical too. What do you measure? Is it speed? Is it speed per transaction? Is it total volume allowed through? Or is it cost per unit transaction within a set amount of transactions? You’re getting closer with that last one. The test setup used a set number of transaction needing to be done in a set period of time. The benchmark then measure total power dissipation to accomplish that number of transactions in the set amount of time. SeaMicro came away the winner in unit cost per transaction in power terms. While the Xeon based servers had huge excess speed and capacity the power dissipation put it pretty far into the higher cost per transaction category.

    However it is very difficult to communicate this advantage that SeaMicro has over Intel. Future tests/benchmarks need to be constructed with clearly stated goals and criteria. Specifically if it can be communicated as a Case History of a particular problem that could be solved by either a SeaMicro server or a bunch of Intel boxes running Xeon cpus with big caches. Once that Case History is well described, then the two architectures are then put to work showing what the end goal is in clear terms (cost per transaction). Then and only then will SeaMicro communicate effectively how it does things different and how that can save money. Otherwise it’s too different to measure effectively versus a Intel Xeon based rack of servers.

  • Tilera throws gauntlet at Intels feet • The Register

    Upstart mega-multicore chip maker Tilera has not yet started sampling its future Tile-Gx 3000 series of server processors, and companies have already locked in orders for the chips.

    via Tilera throws gauntlet at Intels feet • The Register.

    Proof that sometimes  a shipping product doesn’t always make all the difference. Although it might be nice to tout performance of actual shipping product. What’s becoming more real is the power efficiency of the Tilera architcture core for core versus the Intel IA-64 architecture. Tilera can provide a much lower Thermal Design Point (TDM) per core than typical Intel chips running the same workloads. So Tilera for the win on paper anyways.