Category: technology

General technology, not anything in particular

  • From Big Data to NoSQL: Part 3 (ReadWriteWeb.com)

    Image representing ReadWriteWeb as depicted in...
    Image via CrunchBase

    In Part One we covered data, big data, databases, relational databases and other foundational issues. In Part Two we talked about data warehouses, ACID compliance, distributed databases and more. Now well cover non-relational databases, NoSQL and related concepts.

    via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 3.

    I really give a lot of credit to ReadWriteWeb for packaging up this 3 part series (started May 24th I think). This at least narrows down what is meant by all the fast and loose terms White Papers and Admen are throwing around to get people to consider their products in RFPs. Just know this though, in many cases to NoSQL databases that keep coming into the market tend to be one-off solutions created by big social networking companies who couldn’t get MySQL/Oracle/MSQL to scale in size/speed sufficiently during their early build-outs. Just think of Facebook hitting the 500million user mark and you will know that there’s got to be a better way than relational algebra and tables with columns and rows.

    In part 3 we finally get to what we have all been waiting for, Non-relational Databases, so-called NoSQL. Google’s MapReduce technology is quickly shown as one of the most widely known examples of a NoSQL type distributed database that while not adhering to absolute or immediate consistency gets there with ‘eventual consistency (Consistency being the big C in the acronym ACID). The coolest thing about MapReduce is the similarity (at least in my mind) it bears to the Seti@Home Project where ‘work units’ were split out of large data tapes and distributed piecemeal over the Internet and analyzed on a person’s desktop computer. The complete units were then gathered up and brought together into a final result. This is similar to how Google does it’s big data analysis to get work done in its data centers. And it follows on in the opensource project Hadoop, an opensource version of MapReduce started by Yahoo and now part of the Apache organization.

    Document databases are cool too, and very much like an Object-oriented Database where you have a core item with attributes appended. I think also of LDAP directories which also have similarities to Object -oriented databases. A person has a ‘Common Name’ or CN attribute. The CN is as close to a unique identifier as you can get, with all the attributes strung along, appended on the end as they need to be added, in no particular order. The ability to add attributes as needed is like ‘tagging’ in the way Social networking websites like Picture, Bookmark websites do it. You just add an arbitrary tag in order to help search engines index the site and help relevant web searches find your content.

    The relationship between Graph Databases and Mind-Mapping is also very interesting. There’s a good graphic illustrating a Graph database of blog content to show how relation lines are drawn and labeled. So now I have a much better understanding of Graph databases as I have used mind-mapping products before. Nice parallel there I think.

    At the very end of hte article there’s mention of NewSQL of which Drizzle is an interesting offshoot. Looking up more about it, I found it interesting as a fork of the MySQL project. Specifically Drizzle factors out tons of functions some folks absolutely need but don’t always have (like say 32-bit legacy support). There’s a lot of attempts to get the code smaller so the overall lines of code went from over 1 million for MySQL to just under 300,000 for the Drizzle project. Speed and simplicity is the order of the day with Drizzle. Add missing functions by simply add the plug-in to the main app and you get back some of the MySQL features that might have been missing.

    *Note: Older survey of the NoSQL field conducted by ReadWriteWeb in 2009

  • From Big Data to NoSQL: Part 2 (from ReadWriteWeb)

    Image representing ReadWriteWeb as depicted in...
    Image via CrunchBase

    In this section we’ll talk about data warehouses, ACID compliance, distributed databases and more.

    via From Big Data to NoSQL: The ReadWriteWeb Guide to Data Terminology Part 2.

    After linking to the Part 1 of this series of articles on ReadWriteWeb (all the way back in May), today there’s yet more terminology and info for the enterprising, goal-oriented technologists. Again, there’s some good info and a diagram to explain some of the concepts, and what makes these things different from what we are already using today. I particularly like finding out about performance benefits of these different architectures versus tables, columns and rows of traditional associative algebra driven SQL databases.

    Where I work we have lots of historic data kept on file in a Data Warehouse. This typically gets used to generate reports to show compliance, meet regulations and continue to receive government grants. For the more enterprising Information Analyst it also provides a source of  historic data for creating forecasts modeled on past activity. For the Data Scientist ir provides an opportunity to discover things people didn’t know existed within the data (Data Mining). But now that things are becoming more ‘realtime’ there’s a call for analyzing data streams as they occur instead of after the fact (Data Warehouses and Data Mining).

  • OCZ samples twin-core ARM SSD controller • The Register

    OCZ Technology
    Image via Wikipedia

    OCZ says it is available for evaluation now by OEMs and, we presume, OCZ will be using it in its own flash products. Were looking at 1TB SSDs using TLC flash, shipping sequential data out at 500MB/sec which boot quickly, and could be combined to provide multi-TB flash data stores. Parallelising data access would provide multi-GB/sec I/O. The flash future looks bright.

    via OCZ samples twin-core ARM SSD controller • The Register.

    Who knew pairing an ARM core with the drive electronics for a Flash based SSD could be so successful. Not only are the ARM chips helping to drive the cpus on our handheld devices, they are now becoming the SSD Drive controllers too! If OCZ is able to create these drive controllers with good yields (say 70% on the first run) then they are going to hopefully give themselves a pricing advantage and get a higher profit margin per device sold. This is assuming they don’t have to pay royalties for the SandForce drive controller on every device they ship.

    If OCZ was able to draw up their own drive controller, I would be surprised. However, since they have acquired Indilinx it seems like they are making good on the promise held by Indilinx’s current crop of drive controllers. Let’s just hope they are able to match the performance of SandForce at the same price points as well. Otherwise it’s nothing more than a kind of patent machine that will allow OCZ to wage lawsuits against competitors for Intellectual Property they acquired through the acquisition of Indilinx. And we have seen too much of that recently with Apple’s secret bid for Nortel’s patent pool and Google’s acquisition of Motorola.

  • Tilera routs Intel, AMD in Facebook bakeoff • The Register

    Structure of the TILE64 Processor from Tilera
    Tile64 processor from Tilera

    Facebook lined up the Tilera-based Quanta servers against a number of different server configurations making use of Intels four-core Xeon L5520 running at 2.27GHz and eight-core Opteron 6128 HE processors running at 2GHz. Both of these x64 chips are low-voltage, low power variants. Facebook ran the tests on single-socket 1U rack servers with 32GB and on dual-socket 1U rack servers with 64GB.All three machines ran CentOS Linux with the 2.6.33 kernel and Memcached 1.2.3h.

    via Tilera routs Intel, AMD in Facebook bakeoff • The Register.

    You will definitely want to read this whole story as presented El Reg. They have a few graphs displaying the performance of the Tilera based Quanta data cloud in a box versus the Intel server rack. And let me tell you on certain very specific workloads like the Web Caching using Memcached I declare advantage Tilera. No doubt data center managers need to pay attention to this and get some more evidence to back up this initial white paper from Facebook, but this is big, big news. And all one need do apart from tuning the software for the chipset is add a few PCIe based SSDs or TMS RamSan and you have what could theoretically be the fastest possible web performance possible. Even at this level of performance, there’s still room to grow I think on the hard drive storage front. What I would hope in future to see is Facebook do an exhaustive test on the Quanta SQ-2 product versus Calxeda (ARM cloud in a box) and the Seamicro SM-10000×64 (64bit Intel Atom cloud in a box). It would prove an interesting research project just to see how much chipsets, chip architectures and instruction sets play in optimizing each for a particular style and category of data center workload. I know I will be waiting and watching.

  • History of Sage

    A screenshot of Sagemath working.
    Image via Wikipedia

    The Sage Project Webpage http://www.sagemath.org/

    Sage is mathematical software, very much in the same vein as MATLAB, MAGMA, Maple, and Mathematica. Unlike these systems, every component of Sage is GPL-compatible. The interpretative language of Sage is Python, a mainstream programming language. Use Sage for studying a huge range of mathematics, including algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, and exact linear algebra.

    Explanation of what Sage does by the original author William Stein 

    (Long – roughly 50 minutes)

    Original Developer http://wstein.org/ and his history of Sage mathematical software development. Wiki listing http://wiki.sagemath.org/ with a list of participating commiters. Discussion lists for developers: Mostly done through Google Groups with associated RSS feeds. Mercurial Repository (start date Sat Feb 11 01:13:08 2006) Gonzalo Tornaria seems to have loaded the project in at this point. Current List of source code in TRAC with listing of commiters for the most recent release of Sage (4.7).

    • William Stein (wstein) Still very involved based on freqenecy of commits
    • Michael Abshoff (mabs) Ohloh has him ranked second only to William Stein with commits and time on project. He’s now left the project according to the Trac log.
    • Jeroen Demeyer (jdemeyer) commits a lot
    • J.H.Palmieri (palmieri) has done  number of tutorials and documentation he’s on the IRC channel
    • Minh Van Nguyen (nguyenminh2) has done some tutorials,documentation and work Categories module. He also appears to be the sysadmin on the Wiki
    • Mike Hansen (mhansen) Is on the IRC channel irc.freenode.net#sagemath and is a big contributor
    • Robert Bradshaw (robertwb) has done some very recent commits

    Changelog for the most recent release (4.7) of Sage. Moderators of irc.freenode.net#sagemath Keshav Kini (who maintains the Ohloh info) & schilly@boxen.math.washington.edu. Big milestone release of version 4.7 with tickets listed here based on modules: Click Here. And the Ohloh listing of top contributors to the project. There’s an active developer and end user community. Workshops are tracked here. Sage Days workshops tend to be hackfests for interested parties. But more importantly Developers can read up on this page, how to get started and what the process is as a Sage developer.

    Further questions that need to be considered. Look at the git repository and the developer blogs ask the following questions:

    1. Who approves patches? How many people? (There’s a large number of people responsible for reviewing patches, if I had to guess it could be 12 in total based on the most recent changelog)
    2. Who has commit access? & how many?
    3. Who is involved in the history of the project? (That’s pretty easy to figure out from the Ohloh and Trac websites for Sage)
    4. Who are the principal contributors, and have they changed over time?
    5. Who are the maintainers?
    6. Who is on the front end (user interface) and back end (processing or server side)?
    7. What have been some of the major bugs/problems/issues that have arisen during development? Who is responsible for quality control and bug repair?
    8. How is the project’s participation trending and why? (Seems to have stabilized with a big peak of 41 contribs about 2 years ago, look at Ohloh graph of commits, peak activity was 2009 and 2010 based on Ohloh graph).

    Note the period over which the Gource visualization occurs is since 2009, earliest entry in the Mercurial repository I could find was 2005. Sage was already a going concern prior to the Mercurial repository being put on the web. So the simulation doesn’t show the full history of development.

  • AppleInsider | Apple seen merging iOS, Mac OS X with custom A6 chip in 2012

    Steve Jobs while introducing the iPad in San F...
    Image via Wikipedia

    Rumors of an ARM-based MacBook Air are not new. In May, one report claimed that Apple had built a test notebook featuring the same low-power A5 processor found in the iPad 2. The report, which came from Japan, suggested that Apple officials were impressed by the results of the experiment.

    via AppleInsider | Apple seen merging iOS, Mac OS X with custom A6 chip in 2012.

    Following up on an article they did back on May 27th, and one prior to that on May 6th,  AppleInsider does a bit of prediction and prognosticating about the eventual fusion of iOS and Mac OS X. What they see triggering this is an ARM chip that would be able to execute 64-bit binaries across all of the product lines (A fabled ARM A-6). How long would it take to do this consolidation and interweaving? How many combined updaters, security patches, Pro App updaters would it take to get OS X 10.7 to be ‘more’ like iOS than it is today? Software development is going to take a while and it’s not just a matter of cross-compiling to an ARM chip from a software based on Intel chips.

    Given that 64-bit Intel Atom chips are already running on the new Seamircro SM10000 (x64), it won’t be long now I’m sure before the ARM equivalent ARM-15 chip hits full stride. The designers have been aiming for a 4-core ARM design that will be encompassed by the ARM-15 release real soon now (RSN). The next step after that chip is licensed and piloted, tested and put into production will be a 64-bit clean design. I’m curious to see if 64-bit will be applied across ALL the different product lines within Apple. Especially when the issue of power-usage and Thermal Design power (TDM) is considered, will 64-bit ARM chips be as battery friendly? I wonder. True Intel has jumped the 64-bit divide on the desktop with the Core 2 Duo line some time ago and made them somewhat battery friendly. But they cannot compare at all to the 10 hours+ one gets on a 32-bit ARM chip today using the iPad.

    Lastly, App Developers will also need to keep their Xcode environment up to date and merge in new changes constantly up to the big cutover to ARM x64. No telling what that’s going to be like apart from the previous 2 problems I have raised here. Apple in the 10.7 Lion run-up was very late in providing the support and tools to allow the developers to get their Apps ready. I will say though that in the history of migrations in Apple’s hardware/software, they have done more of them, more successfully than any other company. So I think they will be able to pull it off no doubt, but there will be much wailing and gnashing of teeth. And hopefully we’ll see something better as the end-users of the technology, something better than a much bigger profit margin for Apple (though that seems to be the prime mover in most recent cases as Steve Jobs has done the long slow fade into obscurity).

    If ARM x64 is inevitable and iOS on Everything too, then I’m hoping things don’t change so much I can’t do things similarly to the way I do them now on the desktop. Currently on OS X 10.7 I am ignoring completely:

    1. Gestures
    2. Misson Control
    3. Launch Pad
    4. AppStore (not really because I had to download Lion)

    Let’s hope this roster doesn’t get even longer over time as the iOS becomes the de facto OS on all Apple Products. Because I was sure hoping the future would be brighter than this. And as AppleInsider quotes from May 6th,

    “In addition to laptops, the report said that Apple would ‘presumably’ be looking to move its desktop Macs to ARM architecture as well. It characterized the transition to Apple-made chips for its line of computers as a ‘done deal’.”

  • First Sungard goes private and now Blackboard

    The buyers include Bain Capital, the Blackstone Group, Goldman Sachs Capital Partners, Kohlberg Kravis Roberts, Providence Equity Partners and Texas Pacific Group. The group is led by Silver Lake Partners. The deal is a leveraged buyout – Sungard will be taken private and its shares removed from Wall Street.

    via Sungard goes private • The RegisterPosted in CIO29th March 2005 10:37 GMT

    RTTNews – Private equity firm Providence Equity Partners, Inc. agreed Friday to take educational software and systems provider Blackboard, Inc. (BBBB: News ) private for $45 per share in an all-cash deal of $1.64 billion.

    It would appear now that Providence Equity Partners owns two giants in the Higher Ed outsourcing industry Sungard and Blackboard. What does this mean? Will there be consolidation where there is overlap between the two companies? Will there be attempts to steal customers or upsell each other’s products?

  • Google confirms Maps with local map downloads as iOS lags | Electronista

    A common message shown on TomTom OS when there...
    Image via Wikipedia

    Google Maps gets map downloads in Labs betaAfter a brief unofficial discovery, Google on Thursday confirmed that Google Maps 5.7 has the first experimental support for local maps downloads.

    via Google confirms Maps with local map downloads as iOS lags | Electronista.

    Google Maps for Android is starting to show a level of maturity only seen on dedicated GPS units. True, there still is no routing feature (you need access to Google’s servers for that functionality) But you at least a downloaded map that you can zoom out and in on to get a view without incurring heavy data charges. Yes, overseas you may rack up some big charges as you navigate live maps via the Google Maps app on Android. This is now solved partially by downloading in advance the immediate area you will be visiting (within a few miles radius). It’s an incremental improvement to be sure and makes Android phones a little more self sufficient without making you regret the data charges.

    Apple on the other hand is behind. Hands down they are kind of letting the 3rd party gps development go to folks like Navigon and TomTom who both require somewhat hefty fees to license their downloaded content. Apple’s Maps doesn’t compare to Navigon, TomTom, much less Google for actual usefulness in a wide range of situations. And Apple isn’t currently using the downloadable vector based maps introduced with this revision of Google Maps for Android vers. 5.7. So it will struggle with large jpeg images as you pan and scan around the map to find your location.

  • SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register

    Image representing SeaMicro as depicted in Cru...
    Image via CrunchBase

    An original SM10000 server with 512 cores and 1TB of main memory cost $139,000. The bump up to the 64-bit Atom N570 for 512 cores and the same 1TB of memory boosted the price to $165,000. A 768-core, 1.5TB machine using the new 64HD cards will run you $237,000. Thats 50 per cent more oomph and memory for 43.6 per cent more money. ®

    via SeaMicro pushes Atom smasher to 768 cores in 10U box • The Register.

    SeaMicro continues to pump out the jams releasing another updated chassis in less than a year. There is now a grand total of 768 processor cores jammed in that 10U high box. Which leads me to believe they have just eclipsed the compute per rack unit of the Tilera and Calxeda massively parallel cloud servers in a box. But that would wrong because Calxeda is making a 2U server rack unit hold 120-4 core ARM cpus. So that gives you a grand total of 480 in just 2 rack units alone. Multiply that by 5 and you get 2400 cores in a 10U rack serving. So advantage Calxeda in total core count, however lets also consider software too. Atom being the cpu that Seamicro has chosen all along is an intel architecture chip and an x64 architecture at that. It is the best of both worlds for anyone who already had a big investment in Intel binary compatible OSes and applications. It is most often the software and it’s legacy pieces that drive the choice of which processor goes into your data cloud.

    Anyone who had clean slate to start from might be able to choose between Calxeda versus Seamicro for their applications and infrastructure. And if density/thermal design point per rack unit is very important Calxeda too will suit your needs I would think. But who knows? Maybe your workflow isn’t as massively parallel as a Calxeda server and you might have a much lower implementation threshold getting started on an Intel system, so again advantage Seamicro. A real industry analyst would look at these two competing companies as complimentary, different architectures for different workflows.

  • NoSQL is What? (via Jeremy Zawodny’s blog)

    Image representing Jeremy Zawodny as depicted ...
    Image by Flickr / Jeremy Zawodny via CrunchBase

    Great set of comments along with a very good description of advantages of using NoSQL in a web application. There seems to be quite a bit of philosophical differences over whether or not NoSQL needs to be chosen at the earliest stages of ANY project. But Jeremy’s comments more or less prove, you pick the right tool for the right job, ‘Nuff Said.

    Jeremy Zawodny: I found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m not even sure where to start, so I guess I’ll go in order. In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.  You … Read More

    via Jeremy Zawodny’s blog