Calxeda in the news again this week with some more announcements regarding its plans. Remembering recently to the last article I posted on Calxeda, this company boasts an ARM based server packing 120 cpus (each with four cores) into a 2U high rack (making it just 3-1/2″ tall *see note). With every evolution in hardware one must needs get an equal if not greater revolution in software. Which is the point of the announcement by Calxeda of its new software partners.
It’s all mostly cloud apps, cloud provisioning and cloud management types of vendors. And with the partnership each company gets early access to the hardware Calxeda is promising to design, prototype and eventually manufacture. Both Google and Intel have poo-poohed the idea of using “wimpy processors” on massively parallel workloads claiming faster serialized workloads are still easier to manage through existing software/programming techniques. For many years as Intel has complained about the programming tools, it still has gone the multi-core/multi-thread route hoping to continue its domination by offering up ‘newer’ and higher performing products. So while Intel bad mouths parallelism on competing cpus it seems to be desperate to sell multi-core to willing customers year over year.
Even as power efficient as those cores maybe Intel’s old culture of maximum performance for the money still holds sway. Even the most recent Ultra-low Voltage i-series cpus are still hitting about 17Watts of power for chips clocking in around 1.8Ghz (speed boosting up to 2.9Ghz in a pinch). Even if Intel allowed these chips to be installed into servers we’re stilling talking a lot of Thermal Design Point (TDM) that has to be chilled to keep running.
Usually every time there’s a die shrink of a computer processor there’s always an attendant evolution of the technology to to produce it. I think back recently to the introduction of super filtered water immersion lithography. The goal of immersion lithography was to increase the ability to resolve the fine line wire traces of the photo masks as they were exposed onto photosensitive emulsion coating a silicon wafer. The problem is the light travels from the photomask to the surface of the wafer through ‘air’. There’s a small gap, and air is full of optical scrambling atoms and molecules that make the photomask slightly blurry. If you put a layer of water between the mask the wafer, you have in a sense a ‘lens’ made of optically superior water molecules that act more predictably than ‘air’. Likewise you get better chip yields, more profit, higher margins etc.
As the wire traces on microchips continue to get thinner and transistors smaller the physics involved are harder to control. Electrodynamics begin to follow the laws of Quantum Electro-dynamics rather than Maxwell’s equations. This makes it harder to tell when a transistor has switched on or off and the basic digits of the digital computer (1s and 0s) become harder and harder to measure and register properly. IBM and Intel have waged a war on shrinking their dies all through the 80s and 90s. IBM chose to adopt new, sometimes exotic materials (copper metal for traces instead of aluminum, silicon on insulator, high-K dielectric gates). Intel chose to go the direction of improving what they had using higher energy light sources and only adopting very new processes when absolutely, positively necessary. At the same time, Intel was cranking out such volumes of current generation product it almost seem as though it didn’t need to innovate at all. But IBM kept Intel honest as did Taiwan Semiconductor Manufacturing Co. (contract manufacturer of micro-processors). And Intel continued to maintain its volume and technological advantage.
ARM (formerly the Acorn Risc Machine) became a cpu manufacturer during the golden age off RISC computers (early and mid-1980s). Over time they got out of manufacturing and started selling their processor designs to anyone that wanted to embed a core microprocessor into a bigger chip design. Eventually ARM became the defacto standard micro chip for smart handheld devices and telephones before Intel had to react. Intel had come up with a market leading low voltage cheap cpu in the Atom processor. But they did not have the specialized knowledge and capability ARM had with embedded cpus. Licensees of ARM designs began cranking out newer generations of higher performance and lower power cpus than Intel’s research labs could create and the stage was set for a battle royale of low power/high performance.
Which brings us now to an attempt to continue to scale down the processor power requirements through the same brute force that worked in the past. Moore’s Law, an epigram quoted from Intel’s Gordon Moore indicated the rate at which the ‘industry’ would continue to scaled down the size of the ‘wires’ in silicon chips would increase speed and lower costs. Speeds would double, prices would halve and this would continue on ad infinitum to some distant future. The problem has been always that the future is now. Intel hit a brick wall back around the end off the Pentium IV era when they couldn’t get speeds to double anymore without also doubling the amount of waste heat coming off of the chip. That heat was harder and harder to remove efficiently and soon, it appeared the chips would create so much heat they might melt. Intel worked around this by putting multiple CPUs on the same silicon wafers they used for previous generation chips and got some amount of performance scaling to work. Along those lines they have research projects to create first an 80 core processor, then a 48 and now a 24 core processor (which might actually turn into a shippable product). But what about Moore’s Law? Well, the scaling has continued downward, and power requirements have improved but it’s getting harder and harder to shave down those little wire traces and get the bang that drives profits for Intel. Now Intel is going the full-on research and development route by adopting a new way of making transistors on silicon. It’s called a Fin Field Effect Trasistor or FinFET. And it makes use of not just the surface layer of metal but the surface and the left and right sides, effectively giving you 3x the surface to move the electrons around the processor. If they can get this to work on a modern day silicon chip production line, they will be able to continue differentiating their product, keeping their costs manageable and selling more chips. But it’s a big risk and bet I’m sure everyone hopes will pay off.
Singe-chip Cloud Computer sounds a lot like that 80 core and 48 core CPU experiments that Intel had been working on a while back. There is a a note that the core is a Pentium 54c and that rings a bell too as it was the same core used for those multi-core CPUs. Now the research appears to be centered on the communications links between those cores and getting an optimal bit of work for a given amount of interconnectivity. Twenty-four cores is a big step down from 80 and 48 cores. I’m thinking Intel’s manufacturing process engineers are attempting to reign in the scope of this research to make it more worthy of manufacture. Whatever happens you will likely see adaptations or bits and pieces of these technologies in a future shipping product. I’m a little disappointed though that the scope has grown smaller. I had real high hopes Intel could pull off a big technological breakthrough with an 80 core CPU, but change comes slowly and Chip Fab lines are incredibly expensive to build, pilot and line out as they make new products. Conservatism is to be expected in an industry that has the highest level of up front capital expenditure required before there’s a return on the investment. If nothing else, companies like Seamicro, Tilera and ARM will continue to goose Intel into research efforts like this and innovate their old serial processors a little bit more.
On the other side of the argument there is the massive virtualization of OSes on more typical serial style multi-core CPUs from Intel. VMWare and competitors still continue to slice out clock cycles of the Intel processor to make them appear to be more than one physical machine. Datacenters have seen performance compromises using this scheme to be well worth the effort in staff and software licenses given the amount of space saved through consolidation. Less rack space, and power required, the higher the marginal return for that one computer host sitting on the network. But, what this article from The Register is trying to say is if a sufficiently dense multi-core cpu is used and the power requirements scaled down sufficiently you get the same kind of consolidation of rack space, but without the layer of software on top of it all to provide the virtualized computers themselves. A one-to-one relationship between computer core and actual virtual machine can be done without the typical machinations and complications required by a Hypervisor-style OS riding herd over the virtualized computers. In that case, less Hypervisor is more. More robust that is in terms of total compute cycles devoted to hosts, more robust design architecture to minimize single points of failure and choke points. So I say there’s plenty of room to innovate yet in the virtualization industry given that the CPUs and their architectures are in an early stage of innovating massively multi-core cpus.
Quick Sync is just awesome. Its simply the best way to get videos onto your smartphone or tablet. Not only do you get most if not all of the quality of a software based transcode, you get performance thats better than what high-end discrete GPUs are able to offer. If you do a lot of video transcoding onto portable devices, Sandy Bridge will be worth the upgrade for Quick Sync alone.
For everyone else, Sandy Bridge is easily a no brainer. Unless you already have a high-end Core i7, this is what youll want to upgrade to.
Previously in this blog I have recounted stories from Tom’s Hardware and Anandtech.com surrounding the wicked cool idea of tapping the vast resources contained within your GPU while you’re not playing video games. Producers of GPUs like nVidia and AMD both wanted to market their products to people who not only gamed but occasionally ripped video from DVDs and played them back on ipods or other mobile devices. The amount of time sunk into doing these kinds of conversions were made somewhat less of a pain due to the ability to run the process on a dual core Wintel computer, browsing web pages while re-encoding the video in the background. But to get better speeds one almost always needs to monopolize all the cores on the machine and free software like HandBrake and others will take advantage of those extra cores, thus slowing your machine, but effectively speeding up the transcoding process. There was hope that GPUs could accelerate the transcoding process beyond what was achievable with a multi-core cpu from Intel. An example is also Apple’s widespread adoption of OpenCL as a pipeline to the GPU to send rendering requests for any video frames or video processing that may need to be done in iTunes, QuickTime or the iLife applications. And where I work, we get asked to do a lot of transcoding of video to different formats for customers. Usually someone wants a rip from a DVD that they can put on a flash drive and take with them into a classroom.
However, now it appears there is a revolution in speed in the works where Intel is giving you faster transcodes for free. I’m talking about Intel’s new Quick Sync technology using the integrated graphics core as a video transcode accelerator. The speeds of transcoding are amazingly fast and given the speed, trivial to do for anyone including the casual user. In the past everyone seemed to complain about how slow their computer was especially for ripping DVDs or transcoding the rips to smaller more portable formats. Now, it takes a few minutes to get an hour of video into the right format. No more blue Monday. Follow the link to the story and analysis from Anandtech.com as they ran head to head comparisons of all the available techniques of re-encoding/transcoding a Blue-ray video release into a smaller .mp4 file encoded in as h.264. They did comparisons of Intel four-core cpus (which took the longest and got pretty good quality) versus GPU accelerated transcodes, versus the new Intel QuickSync technology coming out soon on the Sandy Bridge gen Intel i7 cpus. It is wicked cool how fast these transcodes are and it will make the process of transcoding trivial compared to how long it takes to actually ‘watch’ the video you spent all that time converting.
Intel and Achronix-2 Great tastes that taste great together
According to Greg Martin, a spokesman for the FPGA maker, Achronix can compete with Xilinx and Altera because it has, at 1.5GHz in its current Speedster1 line, the fastest such chips on the market. And by moving to Intel’s 22nm technology, the company could have ramped up the clock speed to 3GHz.
That kind of says it all in one sentence, or two sentences in this case. The fastest FPGA on the market is quite an accomplishment unto itself. Putting that FPGA on the world’s most advanced production line and silicon wafter technology is what Andy Grove would called the 10X Effect. FPGA’s are reconfigurable processors that can have their circuits re-routed and optimized for different tasks over and over again. This is real beneficial for very small batches of processors where you need a custom design. Some of the things they can speed up is doing math or looking up things in a very large search through a database. In the past I was always curious whether they could be used a general purpose computer which could switch gears and optimize itself for different tasks. I didn’t know whether or not it would work or be worthwhile but it really seemed like there was a vast untapped reservoir of power in the FPGA.
Some super computer manufacturers have started using FPGAs as special purpose co-processors and have found immense speed-ups as a result. Oil prospecting companies have also used them to speed up analysis of seismic data and place good bets on dropping a well bore in the right spot. But price has always been a big barrier to entry as quoted in this article. $1,000 per chip is the cost. Which limits the appeal to those buyers where price is no object but speed and time are more important. The two big competitors in the field off FPGA manufacturing are Altix and Xilinx both of which design the chips but have them manufactured in other countries. This has led to FPGAs being second class citizens used older generation chip technologies on old manufacturing lines. They always had to deal with what they could get. Performance in terms of clock speed was always less too.
It was not unusual to see during the Megahertz and Gigahertz wars chip speeds increasing every month. FPGAs sped up too, but not nearly as fast. I remember seeing 200Mhz/sec and 400Mhz/sec touted as Xilinx and Altix top of the line products. With Achrnix running at 1.5Ghz, things have changed quite a bit. That’s a general purposed CPU speed in a completely customizable FPGA. This means you get speed that makes the FPGA even more useful. However, instead of going faster this article points out people would rather buy the same speed but use less electricity and generate less heat. There’s no better way to do this than to shrink the size of the circuits on the FPGA and that is the core philosophy of Intel Inc. They have just teamed up to put the Achronix FPGA on the smallest feature size production line using the most optimized, cost conscious manufacturer of silicon chips bar none.
Another point being made in the article is the market for FPGAs at this level of performance also tends to be more defense contract oriented. As a result, to maintain the level of security necessary to sell chips to this industry, the chips need to be made in the good ol’ USA and Intel doesn’t outsource anything when it comes to it’s top of the line production facilities. Everything is in Oregon, Arizona or Washington State and is guaranteed not to have any secret backdoors built in to funnel data to foreign governments.
I would love to see some University research projects start looking at FPGAs again and see if as speeds go up, power goes down if there’s a happy medium or mix of general purpose CPUs and FPGAs that might help the average joe working on his desktop, laptop or iPad. All I know is Intel entering a market will make it more competitive and hopefully lower the barrier of entry to anyone who would really like to get their hands on a useful processor that they can customize to their needs.
At an IDF keynote, Intel launched “Tunnel Creek,” a new Atom E600 SoC processor. One particular processor detailed is codenamed “Stellarton,” which consists of the Atom E600 processor paired with an Altera FPGA on a multi-chip package that provides additional flexibility for customers who want to incorporate proprietary I/O or acceleration.
Intel has announced a future product that pairs an Intel Atom processor with a Virtex FPGA. Now this is interesting, I just mentioned FPGA (field programmable gate array) chips and out of the blue Intel has summoned the same chip and married it to a little Atom core processor. They say it could be used as an accelerator of some sort. I’m wondering what specifically they had in mind (something very esoteric and niche like a TCP/IP offload processor). I would like to see some touting of its possible uses and not just say, “We want to see what happens.” Unfortunately the way the competition works in Consumer Electronics, you never tell people what’s inside. You let folks like iFixit do a teardown and put pictures up. You let industry websites research all the chips and what they cost, estimate the ones that are custom Integrated Circuits and report the cost to manufacture the device. That’s what they do with every Apple iPhone these days.
It would be cool if Intel could also sell this as a development kit for Stellarton’s users. Keep the price high enough to prevent people from releasing product based just on the kit’s CPU, but low enough to get people to try out some interesting projects. I’m guessing it would be a great tool to use for video transcoding, Mux/DeMuxing for video streams, etc. If anyone does release a shipping product thought it would be cool if they put the “Stellarton Inside” logo, so we know that FPGAs are doing the heavy lifting. The other possibility Intel mentions is to use the FPGA as a proprietary I/O so possibly like an Infiniband network interface? I still have hopes it’s used in the Consumer Electronics world.
Comment With Intel sending its “Larrabee” graphics co-processor out to pasture late last year – before it even reached the market – it is natural to assume that the chip maker is looking for something to boost the performance of high performance compute clusters and the supercomputer workloads they run. Nvidia has its Tesla co-processors and its CUDA environment. Advanced Micro Devices has its FireStream co-processors and the OpenCL environment it has helped create. And Intel has been relegated to a secondary role.
Intel’s long term graphics accelerator project code-named “Larabee. It’s an unfortunate side effect of losing all that money by time delays on the project that forces Intel now to reuse the processor as a component in a High Performance Computer (so-called Super Computer). The competition have been providing hooks or links into their CPUs and motherboard for auxiliary processors or co-processors for a number of years. AMD notably created a CPU socket with open specs that FPGA’s could slide into. Field Programmable Gate Arrays are big huge general purpose CPUs with all kinds of ways to reconfigure the circuits inside of them. So huge optimizations can be made in hardware that were previously done in Machine Code/Assembler by the compilers for that particular CPU. Moving from a high level programming language to an optimized hardware implementation of an algorithm can speed a calculation up by several orders of magnitude (1,000 times in some examples). AMD has had a number of wins in some small niches of the High Performance Computing market. But not all algorithms are created equal, and not all of them lend themselves to implementation in hardware (FPGA or it’s cousin the ASIC). So co-processors are a very limited market for any manufacturer trying to sell into the HPC market. Intel isn’t going to garner a lot of extra sales by throwing development versions of Larabee out to the HPC developers. Another strike is the dependence on a PCI express bus for communications to the Larabee chipset. While PCI Express is more than fast enough for graphics processing, an HPC setup would prefer a CPU socket adjacent to the general purpose CPUs. The way AMD has designed their motherboards all sockets are on the same motherboard and can communicate directly to one another instead of using the PCI Express bus. Thus, Intel loses again trying to market Larabee in the HPC market. One can only hope that other secret code-name projects like the CPU with 80 cores will see the light of day soon when it makes a difference rather than suffer the opportunity costs of a very delayed launch of Larabee.
Intel’s executives were quite brash when talking about Larrabee even though most of its public appearances were made on PowerPoint slides. They said that Larrabee would roar onto the scene and outperform competing products.
And so now finally the NY Times nails the coffin shut on Intel’s Larrabee saga. To refresh your memory this is the second attempt by Intel to create a graphics processor. The first failed attempt was some years ago in the late 1990s when 3dfx (bought by nVidia) was tearing up the charts with their Voodoo 1 and Voodoo 2 PCI-based 3D accelerator cards. The age of Quake, Quake 2 were upon us and everyone wanted smoother frame rates. Intel wanted to show its prowess in the design of a low cost graphics card running on the brand new AGP slot which Intel had just invented (remember AGP?). What turned out was a similar set of delays and poor performance as engineering samples came out of the development labs. Given the torrid pace of products released by nVidia and eventually ATI, Intel couldn’t keep up. Their benchmark was surpassed by the time their graphics card saw the light of day, and they couldn’t give them away. (see Wikipedia: Intel i740)
1998 saw the failure of the Intel i740 AGP graphics card
The Intel740, or i740, is a graphics processing unit using an AGP interface released by Intel in 1998. Intel was hoping to use the i740 to popularize the AGP port, while most graphics vendors were still using PCI. Released with enormous fanfare, the i740 proved to have disappointing real-world performance, and sank from view after only a few months on the market
Enter Larrabee, a whole new ball game at Intel, right?! The trend toward larger numbers of parallel processors on GPUs from nVidia and ATI/AMD led Intel to believe they might leverage some of their production lines to make a graphics card again. But this time it was different, nVidia had moved from single purpose GPUs to General Purpose GPUs in order to create a secondary market using their cards as compute intensive co-processor cards. They called it CUDA and provided a few development tools at the early stages. Intel latched onto this idea of the General Purpose GPU and decided they could do better. What’s more general purpose than an Intel x86 processor right? And what if you could provided the libraries and Hardware Abstraction Layer that could turn a larger number of processor cores into something that looked and smelled like a GPU?
For Intel it seemed like a win/win/win everybody wins. The manufacturing lines using older design rules at the 45nm size could be utilized for production, making the graphics card pure profit. They could put 32 processors on a card and program them to do multi duties for the OS (graphics for games, co-processor for transcoding videos to MP4). But each time they did a demo a product white paper and demo at a trade show it became obvious the timeline and schedule was slipping. They had benchmarks to show, great claims to make, future projections of performance to declare. Roadmaps were the order of the day. But just last week rumors started to set in.
Similar to the graphics card foray of the past Intel couldn’t beat it’s time to market demons. The Larrabee project was going to be so late and still was using 45nm manufacturing design rules. Given Intel’s top of the line production lines moved to 32nm this year, and nVidia and AMD are doing design process shrinks on their current products, Intel was at a disadvantage. Rather than scrap the thing and lose face again, they decided to recover somewhat and put Larrabee out there as a free software/hardware development kit and see if that was enough to get people to bite. I don’t know what if any benefit any development on this platform would bring. It would rank right up there with the Itanium and i740 as hugely promoted dead-end products with zero to negative market share. Big Fail – Do Not Want.
And for you armchair Monday morning technology quarter backs here are some links to enjoy leading up to the NYTimes article today:
Intel is finally going to ramp up it’s newest production lines to include Flash memory chips, thereby shrinking the design rules down to 34nm. Density of the new Flash memory chips is going to allow even larger Solid State Drives (SSD) and in some cases the prices may be less for the newer drives than the equivalent preceeding generation of SSDs. Price points quoted in the article are projected to be around $276 possibly as low as $261 for the 80GB/34nm based SSD from Intel. The closer to $200 the better, that’s the point at which you can buy some of the higher capacity traditional HDD’s from Seagate, and Western Digital. The day of the $200 Flash Drive is coming soon.
A Canadian RedFlagDeals technology website expects an announcement within a week and says there will be 80GB, 160GB and 320GB models.
Things are really beginning to heat up now that Toshiba and Samsung are making moves to market new SSD products. Intel is also revising it’s product line by trying to move it’s SSDs to the high end process technology at the 32nm design rule. Moving from 50nm to 32nm is going to increase densities, but most likely costs will stay high as usual for all Intel based product offerings. Nobody wants SSDs to suddenly become a commodity product. Not yet.
Intel is expected to bring forward the projected doubling of its SSD capacities to as early as next month.
The current X18-M and X25-M solid state drives (SSDs) use a 50nm process and have 80GB and 160GB capacities with 2-bit multi-level cell (MLC) technology. A single level cell (SLC) X25-E has faster I/O rates and comes in 32GB and 64GB capacities.