What elements need to be present for an open source project to be successful (And really what is success?) Recruiting pipeline is critical, participation is essential to life of the project. Eclipse/Sage/Octave/Blender/MySQL/Fedora/Linux/Apache/Firefox/Handbrake/VLC. Give up power early, let more people participate in the project as early as possible. Advertise the on-ramp for your commiters/contributors clearly. A license that is compatible with the target audience (GPL,LGPL,BSD,MIT).Re-use of existing technology to get going quicker and not re-invent wheels.
What does the path to development entry look like? Need to collect some stories from people who come into a developer community and are still with it (example of the Redhat Interns who started at mid-teenage years). For the classroom experience Inquiry/Active/Constructivism style learning on open source projects is a good start. Providing an outlet for creativity is another path.
What does small scale Community Architecture look like? (still open question) OS project managers need to look at the contribution pathway, lower barriers, maximize visibility not just of the project and the transparency of the processes, roadmaps.
The key idea was to create a component that could be scaled from use as a single embedded chip in dedicated devices like a TV set-top box, all the way up to a vast supercomputer built from a huge array of interconnected Transputers.
Connect them up and you had, what was, for its era, a hugely powerful system, able to render Mandelbrot Set images and even do ray tracing in real time – a complex computing task only now coming into the reach of the latest GPUs, but solved by British boffins 30-odd years ago.
I remember the Transputer. I remember seeing ISA-based add-on cards for desktop computers back in the early 1980s. They would advertise in the back of the popular computer technology magazines of the day. And while it seemed really mysterious what you could do with a Transputer, the price premium to buy those boards made you realize it must have been pretty magical.
Most recently while I was attending workshop in Open Source software I met a couple form employees of a famous manufacturer of camera film. In their research labs these guys used to build custom machines using arrays of Transputers to speed up image processing tasks inside the products they were developing. So knowing that there’s even denser architectures using chips like Tilera, Intel Atom and ARM chips absolutely blows them away. The price/performance ratio doesn’t come close.
Software was probably the biggest point off friction in that the tools to integrate the Transputer into the overall design required another level of expertise. That is true to of the General Purpose Graphics Processing Unit (GPGU) that nVidia championed and now markets with its Tesla product line. And the Chinese have created a hybrid supercomputer mating Tesla boards up with commodity cpus. It’s too bad that the economics of designing and producing the Transputer didn’t scale with the time (the way it has for Intel as a comparison). Clock speeds also fell behind too, which allowed general purpose micro-processors to spend the extra clock cycles performing the same calculations only faster. This is also the advantage that RISC chips had until they couldn’t overcome the performance increases designed in by Intel.
In Part One we covered data, big data, databases, relational databases and other foundational issues. In Part Two we talked about data warehouses, ACID compliance, distributed databases and more. Now well cover non-relational databases, NoSQL and related concepts.
I really give a lot of credit to ReadWriteWeb for packaging up this 3 part series (started May 24th I think). This at least narrows down what is meant by all the fast and loose terms White Papers and Admen are throwing around to get people to consider their products in RFPs. Just know this though, in many cases to NoSQL databases that keep coming into the market tend to be one-off solutions created by big social networking companies who couldn’t get MySQL/Oracle/MSQL to scale in size/speed sufficiently during their early build-outs. Just think of Facebook hitting the 500million user mark and you will know that there’s got to be a better way than relational algebra and tables with columns and rows.
In part 3 we finally get to what we have all been waiting for, Non-relational Databases, so-called NoSQL. Google’s MapReduce technology is quickly shown as one of the most widely known examples of a NoSQL type distributed database that while not adhering to absolute or immediate consistency gets there with ‘eventual consistency (Consistency being the big C in the acronym ACID). The coolest thing about MapReduce is the similarity (at least in my mind) it bears to the Seti@Home Project where ‘work units’ were split out of large data tapes and distributed piecemeal over the Internet and analyzed on a person’s desktop computer. The complete units were then gathered up and brought together into a final result. This is similar to how Google does it’s big data analysis to get work done in its data centers. And it follows on in the opensource project Hadoop, an opensource version of MapReduce started by Yahoo and now part of the Apache organization.
Document databases are cool too, and very much like an Object-oriented Database where you have a core item with attributes appended. I think also of LDAP directories which also have similarities to Object -oriented databases. A person has a ‘Common Name’ or CN attribute. The CN is as close to a unique identifier as you can get, with all the attributes strung along, appended on the end as they need to be added, in no particular order. The ability to add attributes as needed is like ‘tagging’ in the way Social networking websites like Picture, Bookmark websites do it. You just add an arbitrary tag in order to help search engines index the site and help relevant web searches find your content.
The relationship between Graph Databases and Mind-Mapping is also very interesting. There’s a good graphic illustrating a Graph database of blog content to show how relation lines are drawn and labeled. So now I have a much better understanding of Graph databases as I have used mind-mapping products before. Nice parallel there I think.
At the very end of hte article there’s mention of NewSQL of which Drizzle is an interesting offshoot. Looking up more about it, I found it interesting as a fork of the MySQL project. Specifically Drizzle factors out tons of functions some folks absolutely need but don’t always have (like say 32-bit legacy support). There’s a lot of attempts to get the code smaller so the overall lines of code went from over 1 million for MySQL to just under 300,000 for the Drizzle project. Speed and simplicity is the order of the day with Drizzle. Add missing functions by simply add the plug-in to the main app and you get back some of the MySQL features that might have been missing.
After linking to the Part 1 of this series of articles on ReadWriteWeb (all the way back in May), today there’s yet more terminology and info for the enterprising, goal-oriented technologists. Again, there’s some good info and a diagram to explain some of the concepts, and what makes these things different from what we are already using today. I particularly like finding out about performance benefits of these different architectures versus tables, columns and rows of traditional associative algebra driven SQL databases.
Where I work we have lots of historic data kept on file in a Data Warehouse. This typically gets used to generate reports to show compliance, meet regulations and continue to receive government grants. For the more enterprising Information Analyst it also provides a source of historic data for creating forecasts modeled on past activity. For the Data Scientist ir provides an opportunity to discover things people didn’t know existed within the data (Data Mining). But now that things are becoming more ‘realtime’ there’s a call for analyzing data streams as they occur instead of after the fact (Data Warehouses and Data Mining).
OCZ says it is available for evaluation now by OEMs and, we presume, OCZ will be using it in its own flash products. Were looking at 1TB SSDs using TLC flash, shipping sequential data out at 500MB/sec which boot quickly, and could be combined to provide multi-TB flash data stores. Parallelising data access would provide multi-GB/sec I/O. The flash future looks bright.
Who knew pairing an ARM core with the drive electronics for a Flash based SSD could be so successful. Not only are the ARM chips helping to drive the cpus on our handheld devices, they are now becoming the SSD Drive controllers too! If OCZ is able to create these drive controllers with good yields (say 70% on the first run) then they are going to hopefully give themselves a pricing advantage and get a higher profit margin per device sold. This is assuming they don’t have to pay royalties for the SandForce drive controller on every device they ship.
If OCZ was able to draw up their own drive controller, I would be surprised. However, since they have acquired Indilinx it seems like they are making good on the promise held by Indilinx’s current crop of drive controllers. Let’s just hope they are able to match the performance of SandForce at the same price points as well. Otherwise it’s nothing more than a kind of patent machine that will allow OCZ to wage lawsuits against competitors for Intellectual Property they acquired through the acquisition of Indilinx. And we have seen too much of that recently with Apple’s secret bid for Nortel’s patent pool and Google’s acquisition of Motorola.
Facebook lined up the Tilera-based Quanta servers against a number of different server configurations making use of Intels four-core Xeon L5520 running at 2.27GHz and eight-core Opteron 6128 HE processors running at 2GHz. Both of these x64 chips are low-voltage, low power variants. Facebook ran the tests on single-socket 1U rack servers with 32GB and on dual-socket 1U rack servers with 64GB.All three machines ran CentOS Linux with the 2.6.33 kernel and Memcached 1.2.3h.
You will definitely want to read this whole story as presented El Reg. They have a few graphs displaying the performance of the Tilera based Quanta data cloud in a box versus the Intel server rack. And let me tell you on certain very specific workloads like the Web Caching using Memcached I declare advantage Tilera. No doubt data center managers need to pay attention to this and get some more evidence to back up this initial white paper from Facebook, but this is big, big news. And all one need do apart from tuning the software for the chipset is add a few PCIe based SSDs or TMS RamSan and you have what could theoretically be the fastest possible web performance possible. Even at this level of performance, there’s still room to grow I think on the hard drive storage front. What I would hope in future to see is Facebook do an exhaustive test on the Quanta SQ-2 product versus Calxeda (ARM cloud in a box) and the Seamicro SM-10000×64 (64bit Intel Atom cloud in a box). It would prove an interesting research project just to see how much chipsets, chip architectures and instruction sets play in optimizing each for a particular style and category of data center workload. I know I will be waiting and watching.
The first IRC server, tolsun.oulu.fi. A Sun-3 server.
How does the “The Opensource Way” apply to your course?
I’m very impressed with the results of the groups working on software projects we saw in the Commarch presentations. Octave, Sage, Blender and Eclipse in some ways beat the commercial offerings in the sectors where they compete. Given the size of these groups and their ability to work collaboratively over wide distances AND working part-time on some pieces seems like a miraculous accomplishment. There are ways to apply this loose organizational structure I think to other group projects.
How will you incorporate what you learned here in your course work?
As a desktop support person, maybe the best I can do is to be aware and sympathetic to the Opensource Way. I’ve known students (fewer profs) over the years attempting to use Linux as their regular Desktop OS. So now I think I can be somewhat of an advocate for those folks. I’m definitely as a graphics person much more sympathetic to Blender after watching Ted and Rolando’s presentations surrounding Blender.
Any feedback/comments on POSSE RIT itself you would like to add?
I want to thank both Chris and Dave for forcing us to use IRC. I have not given IRC nearly enough credit as a tool for collaboration and group projects. So thanks for changing my bad attitude towards IRC. It’s incredible useful and valid and important to this very day.
Wednesday Afternoon: Worked with Ben and Nate a little. Got the latest Sugar emulator running under Fedora 15. And got Fortune Hunter running on it too. Screen size is a little big, but everything’s there and installed. Next step is to clone the repository, read the wiki see whazzup, yo! Created an account on Gitorius and made a personal clone on the gitorius website.
Might be able to force the screen size smaller, as making my screen bigger is out of the question. I’m at 1024×768 not 1200×900 (the default it seems for the XO). Also curious to look at some of the underlying bits contained in pygame.py as well.
Thursday morning: Found an example of how to clone using Git. Figured out the right syntax to clone the clone off of Gitorious. Gitorius also provides a correctly formatted Git URL which I pasted into the commaned line after first pasting in the example Git clone command. This was the string that did the job:
Yes it was THAT easy to clone. Now I’m browsing around the folders. So my next question is if I change anything, how do I then re-build/package this up into a .xo file I can pull into the Sugar Emulator? Also looking at the level builder Fortune Maker, I haven’t installed that into Sugar yet. Might just persue that as a form of even lower hanging fruit. Downloaded the Fortune Maker.xo file and got it installed as an activity in the Sugar Emulator. Might try playing around with making a Level in the game.
Made a dungeon, I think I exported it. Tried running Math Adventure Fortune Hunter on Sugar Emulator. It’s not getting past the opening ‘cut scene’ screen. Directions indicate you can hit the right arrow or clicking the ‘check mark’. I’m not seeing the check mark and it doesn’t respond to the right arrow on the keyboard. Also the main screen is too large for the screen (the 1024×768 is not the same size of the XO’s 1200×900.
Might also try downloading lemonade stand and getting it running as it uses the Fortune Engine same as the Fortune Hunter MA.
I knew ahead of time the path of least resistance based on reading the Mozilla Build page would be to go with Fedora and follow the directions explicitly. I already had a Fedora LXDE install that I managed to screw up NetworkManager on just today. And Chris Tyler had supplied us with shiny new Full Fedora 15 Live disks, so I figured, Why not just do the install and get a working Fedora back on that old partition. So I ran the install before dinner, tested it out everything was working and around 6:30 or so started on the Mozilla build instructions.
Luckily everything about my install was 100% vanilla un-customized save for the fact Gnome3 desktop would not run on the Integrated Intel graphics chip (oh well). I actually cut and pasted each command line direct from the web page into my terminal and got the Developer tools downloaded and updated. I got mercurial all squared away then did the clone of the Firefox repository. That didn’t take long, but then came the make build, and that took a while. Three hours of chugging along on a circa 2003 Low Voltage Intel 830 cpu with about 1.2Ghz and 640MB of RAM. I did however upgrade that internal HD to 250GB so plenty of swap space to be had there.
Noticed the build was taking so long, I had plenty of time to sign into the IRC channel and put in a status report. Chris was there and immediately recognized the RAM starvation issue. So I just patiently checked back to make sure that laptop wasn’t sleeping as it worked away. Three hours later, just as I was worrying it might not finish before I went to bed, I started seeing some concluding messages from make, and voila it was done. The Mozilla build directions tell you to go into dist/bin/ and run firefox from there. On my laptop I had to go into a platform specific folder (something-gnu-something-i686) first then I found the dist/bin/firefox. Launched firefox and it ran. So I think I picked the right OS as I didn’t have any path idiosyncrasies to sort out or any missing libraries or binaries either. Pretty straightforward on Fedora 15 32-bit Intel. Two thumbs up.
Sage is mathematical software, very much in the same vein as MATLAB, MAGMA, Maple, and Mathematica. Unlike these systems, every component of Sage is GPL-compatible. The interpretative language of Sage is Python, a mainstream programming language. Use Sage for studying a huge range of mathematics, including algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, and exact linear algebra.
Explanation of what Sage does by the original author William Stein
(Long – roughly 50 minutes)
Original Developer http://wstein.org/ and his history of Sage mathematical software development. Wiki listing http://wiki.sagemath.org/ with a list of participating commiters. Discussion lists for developers: Mostly done through Google Groups with associated RSS feeds. Mercurial Repository (start date Sat Feb 11 01:13:08 2006) Gonzalo Tornaria seems to have loaded the project in at this point. Current List of source code in TRAC with listing of commiters for the most recent release of Sage (4.7).
William Stein(wstein) Still very involved based on freqenecy of commits
Michael Abshoff(mabs)Ohloh has him ranked second only to William Stein with commits and time on project. He’s now left the project according to the Trac log.
Jeroen Demeyer(jdemeyer) commits a lot
J.H.Palmieri(palmieri) has done number of tutorials and documentation he’s on the IRC channel
Minh Van Nguyen(nguyenminh2) has done some tutorials,documentation and work Categories module. He also appears to be the sysadmin on the Wiki
Mike Hansen(mhansen) Is on the IRC channel irc.freenode.net#sagemath and is a big contributor
Robert Bradshaw(robertwb) has done some very recent commits
Changelog for the most recent release (4.7) of Sage. Moderators of irc.freenode.net#sagemath Keshav Kini (who maintains the Ohloh info) & schilly@boxen.math.washington.edu. Big milestone release of version 4.7 with tickets listed here based on modules: Click Here. And the Ohloh listing of top contributors to the project. There’s an active developer and end user community. Workshops are tracked here. Sage Days workshops tend to be hackfests for interested parties. But more importantly Developers can read up on this page, how to get started and what the process is as a Sage developer.
Further questions that need to be considered. Look at the git repository and the developer blogs ask the following questions:
Who approves patches? How many people? (There’s a large number of people responsible for reviewing patches, if I had to guess it could be 12 in total based on the most recent changelog)
Who has commit access? & how many?
Who is involved in the history of the project? (That’s pretty easy to figure out from the Ohloh and Trac websites for Sage)
Who are the principal contributors, and have they changed over time?
Who are the maintainers?
Who is on the front end (user interface) and back end (processing or server side)?
What have been some of the major bugs/problems/issues that have arisen during development? Who is responsible for quality control and bug repair?
How is the project’s participation trending and why? (Seems to have stabilized with a big peak of 41 contribs about 2 years ago, look at Ohloh graph of commits, peak activity was 2009 and 2010 based on Ohloh graph).
Note the period over which the Gource visualization occurs is since 2009, earliest entry in the Mercurial repository I could find was 2005. Sage was already a going concern prior to the Mercurial repository being put on the web. So the simulation doesn’t show the full history of development.