Blog

  • Intel’s Tri-Gate gamble: It’s now or never • The Register

    I am the author of this image.
    Image via Wikipedia

    Analysis  There are two reasons why Intel is switching to a new process architecture: it can, and it must.

    via Intel’s Tri-Gate gamble: It’s now or never • The Register.

    Usually every time there’s a die shrink of a computer processor there’s always an attendant evolution of the technology to to produce it. I think back recently to the introduction of super filtered water immersion lithography. The goal of immersion lithography was to increase the ability to resolve the fine line wire traces of the photo masks as they were exposed onto photosensitive emulsion coating a silicon wafer. The problem is the light travels from the photomask to the surface of the wafer through ‘air’. There’s a small gap, and air is full of optical scrambling atoms and molecules that make the photomask slightly blurry. If you put a layer of water between the mask the wafer, you have in a sense a ‘lens’ made of optically superior water molecules that act more predictably than ‘air’. Likewise you get better chip yields, more profit, higher margins etc.

    As the wire traces on microchips continue to get thinner and transistors smaller the physics involved are harder to control. Electrodynamics begin to follow the laws of Quantum Electro-dynamics rather than Maxwell’s equations. This makes it harder to tell when a transistor has switched on or off and the basic digits of the digital computer (1s and 0s) become harder and harder to measure and register properly. IBM and Intel have waged a war on shrinking their dies all through the 80s and 90s. IBM chose to adopt new, sometimes exotic materials (copper metal for traces instead of aluminum, silicon on insulator, high-K dielectric gates). Intel chose to go the direction of improving what they had using higher energy light sources and only adopting very new processes when absolutely, positively necessary. At the same time, Intel was cranking out such volumes of current generation product it almost seem as though it didn’t need to innovate at all. But IBM kept Intel honest as did Taiwan Semiconductor Manufacturing Co. (contract manufacturer of micro-processors). And Intel continued to maintain its volume and technological advantage.

    ARM (formerly the Acorn Risc Machine) became a cpu manufacturer during the golden age off RISC computers (early and mid-1980s). Over time they got out of manufacturing and started selling their processor designs to anyone that wanted to embed a core microprocessor into a bigger chip design. Eventually ARM became the defacto standard micro chip for smart handheld devices and telephones before Intel had to react. Intel had come up with a market leading low voltage cheap cpu in the Atom processor. But they did not have the specialized knowledge and capability ARM had with embedded cpus. Licensees of ARM designs began cranking out newer generations of higher performance and lower power cpus than Intel’s research labs could create and the stage was set for a battle royale of low power/high performance.

    Which brings us now to an attempt to continue to scale down the  processor power requirements through the same brute force that worked in the past. Moore’s Law, an epigram quoted from Intel’s Gordon Moore indicated the rate at which the ‘industry’ would continue to scaled down the size of the ‘wires’ in silicon chips would increase speed and lower costs. Speeds would double, prices would halve and this would continue on ad infinitum to some distant future. The problem has been always that the future is now. Intel hit a brick wall back around the end off the Pentium IV era when they couldn’t get speeds to double anymore without also doubling the amount of waste heat coming off of the chip. That heat was harder and harder to remove efficiently and soon, it appeared the chips would create so much heat they might melt. Intel worked around this by putting multiple CPUs on the same silicon wafers they used for previous generation chips and got some amount of performance scaling to work. Along those lines they have research projects to create first an 80 core processor, then a 48 and now a 24 core processor (which might actually turn into a shippable product). But what about Moore’s Law? Well, the scaling has continued downward, and power requirements have improved but it’s getting harder and harder to shave down those little wire traces and get the bang that drives profits for Intel. Now Intel is going the full-on research and development route by adopting a new way of making transistors on silicon. It’s called a Fin Field Effect Trasistor or FinFET. And it makes use of not just the surface layer of metal but the surface and the left and right sides, effectively giving you 3x the surface to move the electrons around the processor. If they can get this to work on a modern day silicon chip production line, they will be able to continue differentiating their product, keeping their costs manageable and selling more chips. But it’s a big risk and bet I’m sure everyone hopes will pay off.

  • SPDY: An experimental protocol for a faster web – The Chromium Projects

    Google Chromium alpha for Linux. User agent: M...
    Image via Wikipedia

    As part of the “Let’s make the web faster” initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced “SPeeDY”), an application-layer protocol for transporting content over the web, designed specifically for minimal latency.  In addition to a specification of the protocol, we have developed a SPDY-enabled Google Chrome browser and open-source web server. In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY. We hope to engage the open source community to contribute ideas, feedback, code, and test results, to make SPDY the next-generation application protocol for a faster web.

    via SPDY: An experimental protocol for a faster web – The Chromium Projects.

    Google wants the World Wide Web to go faster. I think we all would like to have that as well. But what kind of heavy lifting is it going to take? The transition from Arpanet to the TCP/IP protocol took a very long time and required some heavy handed shoving to accomplish the cutover in 1984. We can all thank Vint Cerf for making that happen so that we could continue to grow and evolve as an online species (Tip of Hat). But now what? There’s been a move to evolved from TCP/IP version 4 to version 6 to accommodate the increase in number of network devices. Speed really wasn’t a consideration in that revision. I don’t know how this project integrates with TCP/IP vers. 6. But I hope maybe it can be pursued on a parallel course with the big migration to TCP/IP vers. 6.

    What would be the worst thing that could happen is to create another Facebook/Twitter/Apple Store/Google/AOL cul-de-sac that only benefits the account holders loyal to Google. Yes it would be nice if Google Docs and all the other attendant services provided via/through Google got onboard the SPDY accelerator train. I would stand to benefit, but things like this should be pushed further up into the wider Internet so that everyone, everywhere has the same benefits. Otherwise this is an attempt to steal away user accounts and create churn in the competitors account databases.

  • Cloud on a chip: Sometimes the best hypervisor is none at all   • The Register

    Image representing Intel as depicted in CrunchBase
    Image via CrunchBase

    On the cloud front, one of the more interesting projects that Held is working on is called the Single-chip Cloud Computer, or SCC for short.

    via Cloud on a chip: Sometimes the best hypervisor is none at all   • The Register.

    Singe-chip Cloud Computer sounds a lot like that 80 core and 48 core CPU experiments that Intel had been working on a while back. There is a a note that the core is a Pentium 54c and that rings a bell too as it was the same core used for those multi-core CPUs. Now the research appears to be centered on the communications links between those cores and getting an optimal bit of work for a given amount of interconnectivity. Twenty-four cores is a big step down from 80 and 48 cores. I’m thinking Intel’s manufacturing process engineers are attempting to reign in the scope of this research to make it more worthy of manufacture. Whatever happens you will likely see adaptations or bits and pieces of these technologies in a future shipping product. I’m a little disappointed though that the scope has grown smaller. I had real high hopes Intel could pull off a big technological breakthrough with an 80 core CPU, but change comes slowly and Chip Fab lines are incredibly expensive to build, pilot and line out as they make new products. Conservatism is to be expected in an industry that has the highest level of up front capital expenditure required before there’s a return on the investment. If nothing else, companies like Seamicro, Tilera and ARM will continue to goose Intel into research efforts like this and innovate their old serial processors  a little bit more.

    On the other side of the argument there is the massive virtualization of OSes on more typical serial style multi-core CPUs from Intel. VMWare and competitors still continue to slice out clock cycles of the Intel processor to make them appear to be more than one physical machine. Datacenters have seen performance compromises using this scheme to be well worth the effort in staff and software licenses given the amount of space saved through consolidation. Less rack space, and power required, the higher the marginal return for that one computer host sitting on the network. But, what this article from The Register is trying to say is if a sufficiently dense multi-core cpu is used and the power requirements scaled down sufficiently you get the same kind of consolidation of rack space, but without the layer of software on top of it all to provide the virtualized computers themselves. A one-to-one relationship between computer core and actual virtual machine can be done without the typical machinations and complications required by a Hypervisor-style OS riding herd over the virtualized computers. In that case, less Hypervisor is more. More robust that is in terms of total compute cycles devoted to hosts, more robust design architecture to minimize single points of failure and choke points. So I say there’s plenty of room to innovate yet in the virtualization industry given that the CPUs and their architectures are in an early stage of innovating massively multi-core cpus.

  • Stop Blaming the Customers – the Fault is on Amazon Web Services – ReadWriteCloud

    Image representing Amazon Web Services as depi...
    Image via CrunchBase

    Almost as galling as the Amazon Web Services outage itself is a the litany of blog posts, such as this one and this one, that place the blame not on AWS for having a long failure and not communicating with its customers about it, but on AWS customers for not being better prepared for an outage.

    via Stop Blaming the Customers – the Fault is on Amazon Web Services – ReadWriteCloud.

    As Klint Finley points out in his article, everyone seems to be blaming the folks who ponied up money to host their websites/webapps on the Amazon data center cloud. Until the outage, I was not really aware of the ins and outs, workflow and configuration required to run something on Amazons infrastructure. I am small-scale, small potatoes mostly relying on free services which when the work is great, and when they don’t work, meh! I can take or leave them, my livelihood doesn’t depend on them (thank goodness). But for those who do depend on uptime and pay money for it, they need  some greater level of understanding by their service provider.

    Amazon doesn’t make things explicit enough to follow a best practice in configuring your website installation using their services. It appears some business had no outages (but didn’t follow best practices) and some folks did have long outages though they had set up everything ‘by the book’ following best practices. The service that lay at the center of the outage was called Relational Database Service (RDS) and Elastic Block Storage (EBS). Many websites use databases to hold contents of the website, collect data and transaction information, collect metadata about users likes/dislikes, etc. The Elastic Block Storage acts as the container for the data in the RDS. When your website goes down if you have things setup correctly things fail gracefully, you have duplicate RDS and EBS containers in the Amazon data center cloud that will take over and continue responding to people clicking on things and typing in information on your website instead of throwing up error messages or not responding at all (in a word it just magically continues working). However, if you don’t follow the “guidelines” as specified by Amazon, all bets are off you wasted money paying double for the more robust, fault tolerant failover service.

    Most people don’t care about this especially if they weren’t affected by the outages. But the business owners who suffered and their customers who they are liable for definitely do. So if the entrepreneurial spirit bites you, and you’re very interested in online commerce always be aware. Nothing is free, and especially nothing is free even if you pay for it and don’t get what you paid for. I would hope a leading online commerce company like Amazon could do a better job and in future make good on its promises.

  • Viking Modular plugs flash chips into memory sockets • The Register

    The 536,870,912 byte (512×2 20 ) capacity of t...
    Image via Wikipedia

    What a brilliant idea: put flash chips into memory sockets. Thats what Viking Modular is doing this with its SATADIMM product.

    via Viking Modular plugs flash chips into memory sockets • The Register.

    This sounds like an interesting evolution of the SSD type of storage. But, I don’t know if there is a big advantage forcing a RAM memory controller to be the bridge to a Flash Memory controller. In terms of bandwidth, the speed seems comparable to a 4x PCIe interface. I’m thinking now of how it might compare to PCIe based SSD from OCZ or Fusion-io. It seems like the advantage is still held by PCIe in terms of total bandwidth and capacity (above 500MB/sec and 2Terabytes total storage). It maybe a slightly lower cost, but the use of Single Level Cell Flash memory chips raises the cost considerably for any given size of storage, and this product from Viking uses the Single Level Cell flash memory. I think if this product ships, it will not compete very well against products like consumer level SSDs, PCIe SSDs, etc. However if they continue to develop the product and evolve it, there might be a niche where it can be performance or price competitive.

  • Facebook: No ‘definite plans’ to ARM data centers • The Register

    Image representing Facebook as depicted in Cru...
    Image via CrunchBase

    Clearly, ARM and Tilera are a potential threat to Intel’s server business. But it should be noted that even Google has called for caution when it comes to massively multicore systems. In a paper published in IEEE Micro last year, Google senior vice president of operations Urs Hölzle said that chips that spread workloads across more energy-efficient but slower cores may not be preferable to processors with faster but power-hungry cores.

    “So why doesn’t everyone want wimpy-core systems?” Hölzle writes. “Because in many corners of the real world, they’re prohibited by law – Amdahl’s law.

    via Facebook: No ‘definite plans’ to ARM data centers • The Register.

    The explanation given here by Google’s top systems person is that latency versus parallel processes overhead. Which means if you have to do all the steps in order, with a very low level of parallel tasks that results in much higher performance. And that is the measure that all the users of your service will judge you by. Making things massively parallel might provide the same level of response, but at a lower energy cost. However the complications due to communication and processing overhead to assemble all the data and send it over the wire will offset any advantage in power efficiency. In other words, everything takes longer and latency increases, and the users will deem your service to be slow and unresponsive. That’s the dilemna of Amdahl’s Law, the point of diminishing returns when adopting parallel computer architectures.

    Now compare this to something say we know much more concretely, like the Airline Industry. As the cost of tickets came down, the attempt to cut costs went up. Schedules for landings and gate assignments got more complicated and service levels have suffered terribly. No one is really all that happy about the service they get, even from the best airline currently operating. So maybe Amdahl’s Law doesn’t apply when there’s a false ceiling placed on what is acceptable in terms of the latency of a ‘system’. If airlines are not on time, but you still make your connection 99% of the time, who will complain? So by way of comparison there is a middle ground that may be achieved where more parallelizing of compute tasks will lower the energy required by a data center. It will require greater latency, and a worse experience for the users. But if everyone suffers equally from this and the service is not great but adequate, then the company will be able to cut costs through implementing more parallel processors in their data centers.

    I think Tilera holds a special attraction potentially for Facebook. Especially since Quanta their hardware assembler of choice is already putting together computers with the Tilera chip for customers now. It seems like this chain of associations might prove a way for Facebook to test the waters on a scale large enough to figure out the cost/benefits of massively parallel cpus in the data center. Maybe it will take another build out of a new data center to get there, but it will happen no doubt eventually.

  • Data hand tools – O’Reilly Radar

    A Shebang, also Hashbang or Sharp bang. This i...
    Image via Wikipedia

    Whenever you need to work with data, don’t overlook the Unix “hand tools.” Sure, everything I’ve done here could be done with Excel or some other fancy tool like R or Mathematica. Those tools are all great, but if your data is living in the cloud, using these tools is possible, but painful. Yes, we have remote desktops, but remote desktops across the Internet, even with modern high-speed networking, are far from comfortable. Your problem may be too large to use the hand tools for final analysis, but they’re great for initial explorations. Once you get used to working on the Unix command line, you’ll find that it’s often faster than the alternatives. And the more you use these tools, the more fluent you’ll become.

    via Data hand tools – O’Reilly Radar.

    This is a great remedial refresher on the Unix commandline and for me kind of reinforces an idea I’ve had that when it comes to computing We Live Like Kings. What? How is that possible, well think about what you are trying to accomplish and finding the least complicated quickest way to that point is a dying art. More often one is forced to follow or highly encouraged to set out on a journey with very well defined protocols/rituals included. You must use the APIs, the tools, the methods as specified by your group. Things falling outside that orthodoxy are frowned upon no matter what the speed and accuracy of the result. So doing it quick and dirty using some Shell scripting and utilities is going to be embarrassing for those unfamiliar with those same tools.

    My experience doing this involved a very low end attempt to split Web access logs into nice neat bits that began an ended on certain dates. I used grep, split, and a bunch of binaries I borrowed for doing log analysis and formatting the output into a web report. Overall it didn’t take much time, and required very little downloading, uploading,uncompressing,etc. It was all commandline based with all the output dumped to a directory on the same machine. I probably spent 20 minutes every Sunday running these by hand (as I’m not a cronjob master much less an atjob master). And none of the work I did was mission critical other than being a barometer of how much use the websites were getting from the users. I realize now I could have had the whole works automated with variables setup in the shell script to accommodate running on different days of the week, time changes, etc. But editing the scripts by hand in vi editor only made me quicker and more proficient in vi (which I still gravitate towards using even now).

    And as low end as my needs were and how little experience I had initially using these tools, I am grateful for the time I spent doing it. I feel so much more comfortable knowing I can figure out how to do these tasks on my own, pipe outputs into inputs for other utilities and get useful results. I think I understand it though I’m not a programmer, and couldn’t really leverage higher level things like data structures to get work done, no. I’m a brute force kind of guy and given how fast the CPUs are running, a few ugly, inefficient recursions isn’t going to kill me or my reputation. So here’s to Mike Loukides article and how much it reminds me of what I like about Unix.

  • Toshiba unwraps 24nm flash memory in possible iPhone 5 clue | Electronista

    The microcontroller on the right of this USB f...
    Image via Wikipedia

    The schedules may help back mounting beliefs that the iPhone 5 will 64GB iPhone 4 prototype appeared last month that hinted Apple was exploring the idea as early as last year. Just on Tuesday, a possible if disputed iPod touch with 128GB of storage also appeared and hinted at an upgrade for the MP3 player as well. Both the iPhone and the iPod have been stuck at 32GB and 64GB of storage respectively since 2009 and are increasingly overdue for additional space.

    via Toshiba unwraps 24nm flash memory in possible iPhone 5 clue | Electronista.

    Toshiba has revised its flash memory production lines again to keep pace with the likes of Intel, Micron and Samsung. Higher densities and smaller form factors seemed to indicate they are gearing up for a big production run of the highest capacity memory modules they can make. It’s looking like a new iPhone might be the candidate to receive newer multi-layer single chip 64GB Flash memory modules this year.

    A note of caution in this arms race of ever smaller feature sizes on the flash memory modules, the smaller you go the less memory read/write cycles you get. I’m becoming aware that each new generation of flash memory production has lost an amount of robustness. This problem has been camouflaged maybe even handled outright by the increase in over-provisioning of chips on a given size Solid State Disk (sometimes as low as 17% more chips than that which is typically used when the drive is full). Through careful statistical modeling and use of algorithms, an ideal shuffling of the deck of available flash memory chips allows the load to be spread out. No single chip fails as it’s workload is shifted continuously to insure it doesn’t receive anywhere near the maximum number of reliable read write cycles. Similarly, attempts to ‘recover’ data from failing memory cells within a chip module are also making up for these problems. Last but not least outright error-correcting hardware has been implemented on chip to insure everything just works from the beginning of the life of the Solid State Disk (SSD) to the finals days of its useful life.

    We may not see the SSD eclipse the venerable kind off high density storage, the Hard Disk Drive (HDD). Given the point of diminishing return provided by Moore’s Law (scaling down increases density, increases speed, lowers costs), Flash may never get down to the level of density we enjoy in a typical consumer brand HDD (2TBytes). We may have to settle for other schemes that get us to that target through other means. Which brings me to my favorite product of the moment, the PCIe based SSD. Which is nothing more than a big circuit board with a bunch of SSD’s tied together in a disk array with a big fat memory controller/error-correction controller sitting on it. In terms of speeds using the PCI Express bus, there are current products that beat single SATA 6 SSDs by a factor of two. And given the requirements of PCI, the form factor of any given module could be several times bigger and two generations older to reach the desired 2Terbyte storage of a typical SATA Hard Drive of today. Which to me sounds like a great deal if we could also see drops in price and increases in reliability by using older previous generation products and technology.

    But the mobile market is hard to please, as they are driving most decisions when it comes to what kind of Flash memory modules get ordered en masse. No doubt Apple, Samsung and anyone in consumer electronics will advise manufacturers to consistently shrink their chip sizes to increase density and keep prices up on final shipping product. I don’t know how efficiently an iPhone or iPad use the available memory say on a 64GByte iTouch let’s say. Most of that goes into storing the music, TV shows, and Apps people want to have readily available while passing time. The beauty of that design is it rewards consumption by providing more capacity and raising marginal profit at the same time. This engine of consumer electronics design doesn’t look likely to end in spite of the physical limitations of shrinking down Flash memory chips. But there will be a day of reckoning soon, not unlike when Intel hit the wall at 4Ghz serial processors and had to go multi-core to keep it’s marginal revenue flowing. It’s been very lateral progress in terms of processor performance since then. It is more than likely Flash memory chips cannot get any smaller without being really unreliable and defective, thereby sliding  into the same lateral incrementalism Intel has adopted. Get ready for the plateau.

  • Bye, Flip. We’ll Miss You | Epicenter | Wired.com

    Image representing Flip Video as depicted in C...
    Image via CrunchBase

    Cisco killed off the much-beloved Flip video camera Tuesday. It was an unglamorous end for a cool device that just few years earlier shocked us all by coming to dominate the video-camera market, utterly routing established players like Sony and Canon

    via Bye, Flip. We’ll Miss You | Epicenter | Wired.com.

    I don’t usually write about Consumer Electronics per se. This particular product category got my attention due to it’s long gestation and overwhelming domination of a category in the market that didn’t exist until it was created. It was the pocket video camera with a built-in flip out USB connector. Like a USB flash drive with a LCD screen, a lens and one big red button, the Flip pared down everything to the absolute essentials, including the absolute immediacy of online video sharing via YouTube and Facebook. Now the revolution has ended, devices have converged and many are telling the story of explaining Why(?) this has happened. In the case of Wired.com’s Robert Capps he claims Flip lost its way after Cisco lost its way doing the Flip 2 revision, trying to get a WiFi connected camera out there for people to record their ‘Lifestream’.

    Prior to Robert Capps, different writers for different pubs all spouted the conclusion of Cisco’s own Media Relations folks. Cisco’s Flip camera was the victim of inevitable convergence, pure and simple. Smartphones, in particular Apple’s iPhone kept adding features all once available only on the Flip. Easy recording, easy sharing, larger resolution, bigger LCD screen, and it could play Angry Birds too! I don’t cotton to that conclusion as fed to us by Cisco. It’s too convenient and the convergence myth does not account for the one thing Flip has the iPhone doesn’t have, has never had WILL never have. And that is a simple, industry standard connector. Yes folks convergence is not simply displacing cherry-picked features from one device and incorporating into yours, no. True convergence is picking up all that is BEST about one device and incorporating it, so that fewer and fewer compromises must be made. Which brings me to the issue of the Apple multi-pin connector that has been with us since the first iPod hit the market in 2002.

    See the Flip didn’t have a proprietary connector, it just had a big old ugly USB connector. Just as big and ugly as the one your mouse and keyboard use to connect to your desktop computer. The beauty of that choice was Flip could connect to just about any computer manufactured after 1998 (when USB was first hitting the market). The second thing was all the apps for making the Flip play back the videos you shot or to cut them down and edit them were sitting on the Flip, just like hard drive, waiting for you to install them on whichever random computer you wanted to use. Didn’t matter whether or not it had the software installed, it COULD be installed directly from the Flip itself. Isn’t that slick?! You didn’t have to first search for the software online, download and install, it was right there, just double-click and go.

    Compare this to the Apple iOS cul-de-sac we all know as iTunes. Your iPhone, iTouch, iPad, iPod all know your computer not through simply by communicating through it’s USB connector. You must first have iTunes installed AND have your proprietary Apple to USB connector to link-up. Then and only then can your device ‘see’ your computer and the Internet. This gated community provided through iTunes allows Apple to see what you are doing, market directly to you and watch as you connect to YouTube to upload your video. All with the intention of one day acting on that information, maintaining full control at each step along the path way from shooting to sharing your video. If this is convergence, I’ll keep my old Flip mino (non-HD) thankyou very much. Freedom (as in choice) is a wonderful thing and compromising that in the name of convergence (mis-recognized as convenience) is no compromise. It is a racket and everyone wants to sell you on the ‘good’ points of the racket. I am not buying it.

  • links for 2011-04-08

    • Great example of using command line utilities in Unix to do some useful examples of data reduction on structured text files. Need to read this later.