Categories
cloud data center fpga

Why Microsoft is building programmable chips that specialize in search — Tech News and Analysis

English: Altera Stratix IV EP4SGX230 FPGA on a PCB
English: Altera Stratix IV EP4SGX230 FPGA on a PCB (Photo credit: Wikipedia)

SUMMARY: Microsoft has been experimenting with its own custom chip effort in order to make its data centers more efficient, and these chips aren’t centered around ARM-based cores, but rather FPGAs from Altera.

via Why Microsoft is building programmable chips that specialize in search — Tech News and Analysis.

FPGAs for the win, at least for eliminating unnecessary Xeon CPUs for doing online analytic processing for the Bing Search service. MS are saying they can process the same amount of data with half the number of CPUs by offloading some of the heavy lifting from general purpose CPUs to specially programmed FPGAs tune to the MS algorithms to deliver up the best search results. For MS the cost of the data center will out, and if you can drop half of the Xeons in a data center you just cut your per transaction costs by half. That is quite an accomplishment these days of radical incrementalism when it comes to Data Center ops and DevOps. The Field Programmable Gate Array is known as a niche, discipline specific kind of hardware solution. But when flashed, and programmed properly and re-configured as workloads and needs change it can do some magical heavy lifting from a computing standpoint.

Specifically I’m thinking really repetitive loops or recursive algorithms that take forever to unwind and deliver a final result are things best done in hardware versus software. For Search Engines that might be the process used to determine the authority of a page in the rankings (like Google’s PageRank). And knowing you can further tune the hardware to fit the algorithm means you’ll spend less time attempting to do heavy lifting on the General CPU using really fast C/C++ code instead. In Microsoft’s plan that means less CPUs need to do the same amount of work. And better yet, if you determine a better algorithm for your daily batch processes, you can spin up a new hardware/circuit diagram and apply that to the compute cluster over time (and not have to do a pull and replace of large sections of the cluster). It will be interesting to see if Microsoft reports out any efficiencies in a final report, as of now this seems somewhat theoretical though it may have been tested at least in a production test bed of some sort using real data.

Categories
computers fpga gpu

10 Reasons OpenCL Will Change Your Design Strategy & Market Position | EE Times

OpenCL logo
OpenCL logo (Photo credit: Wikipedia)

OpenCL is a breakthrough precisely because it enables developers to accelerate the real-time execution of their algorithms quickly and easily — particularly those that lend themselves to the considerable parallel processing capabilities of FPGAs (which yield superior compute densities and far better performance/Watt than CPU- and GPU-based solutions)

via 10 Reasons OpenCL Will Change Your Design Strategy & Market Position | EE Times.

There’s still a lot of untapped energy available with the OpenCL programming tools. Apple is still the single largest manufacturer who has adopted OpenCL through a large number of it’s products (OS and App software). And I know from reading about super computing on GPUs that some large scale hybrid CPU/GPU computers have been ranked worldwide (the Chinese Tiahne being the first and biggest example). This article from EETimes encourages anyone with a brackground in C programming to try and give it a shot, see what algorithms could stand to be accelerated using the resources on the motherboard alone. But being EETimes they are also touting the benefits of using FPGAs in the mix as well.

To date the low-hanging fruit for desktop PC makers and their peripheral designers and manufacturers has been to reuse the GPU as massively parallel co-processor where it makes sense. But as the EETimes writer emphasizes, FPGAs can be equal citizens too and might further provide some more flexible acceleration. Interest in the FPGA as a co-processor for desktop to higher end enterprise data center motherboards was brought to the fore by AMD back in 2006 with the Torrenza cpu socket. The hope back then was that giving a secondary specialty processor (at the time an FPGA) might prove to be a market no one had addressed up to that point. So depending on your needs and what extra processors you might have available on your motherboard, OpenCL might be generic enough going forward to get a boost from ALL the available co-processors on your motherboard.

Whether or not we see benefits at the consumer level desktop is very dependent on the OS level support for OpenCL. To date the biggest adopter of OpenCL has been Apple as they needed an OS level acceleration API for video intensive apps like video editing suites. Eventually Adobe recompiled some of its Creative Suite apps to take advantage of OpenCL on MacOS. On the PC side Microsoft has always had DirectX as its API for accelerating any number of different multimedia apps (for playback, editing) and is less motivated to incorporate OpenCL at the OS level. But that’s not to say a 3rd party developer who saw a benefit to OpenCL over DirectX couldn’t create their own plumbing and libraries and get a runtime package that used OpenCL to support their apps or anyone who wanted to license this as part of a larger package installer (say for a game or for a multimedia authoring suite).

For the data center this makes way more sense than for the desktop, as DirectX isn’t seen as a scientific computing or means of allowing a GPU to be used as a numeric accelerator for scientific calculations. In this context, OpenCL might be a nice, open and easy to adopt library for people working on compute farms with massive numbers of both general purpose cpus and GPUs handing off parts of a calculation to one another over the PCI bus or across CPU sockets on a motherboard. So everyone’s needs are going to vary and widely vary in some cases. But OpenCL might help make that variation more easily addressed by having a common library that would allow one to touch all the co-processors available when a computation is needing to be sped up. So keep an eye on OpenCL as a competitor to any GPGPU style API and library put out by either nVidia or AMD or Intel. OpenCL might help people bridge differences between these different manufacturers too.

Image representing AMD as depicted in CrunchBase
Image via CrunchBase
Enhanced by Zemanta
Categories
computers science & technology surveillance technology

Intel lets outside chip maker into its fabs • The Register

 

Banner image Achronix 22i
Intel and Achronix-2 Great tastes that taste great together

 

According to Greg Martin, a spokesman for the FPGA maker, Achronix can compete with Xilinx and Altera because it has, at 1.5GHz in its current Speedster1 line, the fastest such chips on the market. And by moving to Intel’s 22nm technology, the company could have ramped up the clock speed to 3GHz.

via Intel lets outside chip maker into its fabs • The Register.

That kind of says it all in one sentence, or two sentences in this case. The fastest FPGA on the market is quite an accomplishment unto itself. Putting that FPGA on the world’s most advanced production line and silicon wafter technology is what Andy Grove would called the 10X Effect. FPGA’s are reconfigurable processors that can have their circuits re-routed and optimized for different tasks over and over again. This is real beneficial for very small batches of processors where you need a custom design. Some of the things they can speed up is doing math or looking up things in a very large search through a database. In the past I was always curious whether they could be used a general purpose computer which could switch gears and optimize itself for different tasks. I didn’t know whether or not it would work or be worthwhile but it really seemed like there was a vast untapped reservoir of power in the FPGA.

Some super computer manufacturers have started using FPGAs as special purpose co-processors and have found immense speed-ups as a result. Oil prospecting companies have also used them to speed up analysis of seismic data and place good bets on dropping a well bore in the right spot. But price has always been a big barrier to entry as quoted in this article. $1,000 per chip is the cost. Which limits the appeal to those buyers where price is no object but speed and time are more important. The two big competitors in the field off FPGA manufacturing are Altix and Xilinx both of which design the chips but have them manufactured in other countries. This has led to FPGAs being second class citizens used older generation chip technologies on old manufacturing lines. They always had to deal with what they could get. Performance in terms of clock speed was always less too.

It was not unusual to see during the Megahertz and Gigahertz wars chip speeds increasing every month. FPGAs sped up too, but not nearly as fast. I remember seeing 200Mhz/sec and 400Mhz/sec touted as Xilinx and Altix top of the line products. With Achrnix running at 1.5Ghz, things have changed quite a bit. That’s a general purposed CPU speed in a completely customizable FPGA. This means you get speed that makes the FPGA even more useful. However, instead of going faster this article points out people would rather buy the same speed but use less electricity and generate less heat. There’s no better way to do this than to shrink the size of the circuits on the FPGA and that is the core philosophy of Intel Inc. They have just teamed up to put the Achronix FPGA on the smallest feature size production line using the most optimized, cost conscious manufacturer of silicon chips bar none.

Another point being made in the article is the market for FPGAs at this level of performance also tends to be more defense contract oriented. As a result, to maintain the level of security necessary to sell chips to this industry, the chips need to be made in the good ol’ USA and Intel doesn’t outsource anything when it comes to it’s top of the line production facilities. Everything is in Oregon, Arizona or Washington State and is guaranteed not to have any secret backdoors built in to funnel data to foreign governments.

I would love to see some University research projects start looking at FPGAs again and see if as speeds go up, power goes down if there’s a happy medium or mix of general purpose CPUs and FPGAs that might help the average joe working on his desktop, laptop or iPad. All I know is Intel entering a market will make it more competitive and hopefully lower the barrier of entry to anyone who would really like to get their hands on a useful processor that they can customize to their needs.

Categories
computers technology wintel

Intel Debuts New Atom System-on-Chip Processor

This is a an Altera Flex FPGA with 20,000 cell...
Image via Wikipedia

At an IDF keynote, Intel launched “Tunnel Creek,” a new Atom E600 SoC processor. One particular processor detailed is codenamed “Stellarton,” which consists of the Atom E600 processor paired with an Altera FPGA on a multi-chip package that provides additional flexibility for customers who want to incorporate proprietary I/O or acceleration.

via Intel Debuts New Atom System-on-Chip Processor.

Intel has announced a future product that pairs an Intel Atom processor with a Virtex FPGA. Now this is interesting, I just mentioned FPGA (field programmable gate array) chips and out of the blue Intel has summoned the same chip and married it to a little Atom core processor. They say it could be used as an accelerator of some sort. I’m wondering what specifically they had in mind (something very esoteric and niche like a TCP/IP offload processor). I would like to see some touting of its possible uses and not just say, “We want to see what happens.” Unfortunately the way the competition works in Consumer Electronics, you never tell people what’s inside. You let folks like iFixit do a teardown and put pictures up. You let industry websites research all the chips and what they cost, estimate the ones that are custom Integrated Circuits and report the cost to manufacture the device. That’s what they do with every Apple iPhone these days.

It would be cool if Intel could also sell this as a development kit for Stellarton’s users. Keep the price high enough to prevent people from releasing product based just on the kit’s CPU, but low enough to get people to try out some interesting projects. I’m guessing it would be a great tool to use for video transcoding, Mux/DeMuxing for video streams, etc. If anyone does release a shipping product thought it would be cool if they put the “Stellarton Inside” logo, so we know that FPGAs are doing the heavy lifting. The other possibility Intel mentions is to use the FPGA as a proprietary I/O so possibly like an Infiniband network interface? I still have hopes it’s used in the Consumer Electronics world.

Categories
computers navigation science & technology surveillance

Custom superchippery pulls 3D from 2D images like humans • The Register

Computing brainboxes believe they have found a method which would allow robotic systems to perceive the 3D world around them by analysing 2D images as the human brain does – which would, among other things, allow the affordable development of cars able to drive themselves safely.

via Custom superchippery pulls 3D from 2D images like humans • The Register.

The beauty of this new work is they designed a custom CPU using a Virtex 6 FPGA (Field Programmable Gate Array). FPGA for those who don’t know is a computer chip that you can ‘re-wire’ through software to take on mathematical task you can dream up. In the old days this would have required a custom chip to be engineered, validated and manufactured at great cost. FPGAs require development kits and FPGA chips you need to program. With this you can optimize every step within the computer processor and speed things up much more than a general purpose computer processor (like the Intel chip that powers your Windows or Mac computer). In this example of the research being done the custom designed computer circuitry is using video images to decide where in the world a robot can safely drive as it maneuvers around on the ground. I know Hans Moravec has done a lot with it at Carnegie Mellon U. And it seems that this group is from Yale’s engineering dept. which is encouraging to see the techniques embraced and extended by another U.S. university. The low power of this processor and it’s facility for processing the video images in real-time is ahead of its time and hopefully will find some commercial application either in robotics or automotive safety controls. As for me I’m still hoping for a robot chauffeur.

Categories
computers technology wintel

Intel linked with HPC boost buy • The Register

Comment With Intel sending its “Larrabee” graphics co-processor out to pasture late last year – before it even reached the market – it is natural to assume that the chip maker is looking for something to boost the performance of high performance compute clusters and the supercomputer workloads they run. Nvidia has its Tesla co-processors and its CUDA environment. Advanced Micro Devices has its FireStream co-processors and the OpenCL environment it has helped create. And Intel has been relegated to a secondary role.

via Intel linked with HPC boost buy • The Register.

Intel’s long term graphics accelerator project code-named “Larabee. It’s an unfortunate side effect of losing all that money by time delays on the project that forces Intel now to reuse the processor as a component in a High Performance Computer (so-called Super Computer). The competition have been providing hooks or links into their CPUs and motherboard for auxiliary processors or co-processors for a number of years. AMD notably created a CPU socket with open specs that FPGA’s could slide into. Field Programmable Gate Arrays are big huge general purpose CPUs with all kinds of ways to reconfigure the circuits inside of them. So huge optimizations can be made in hardware that were previously done in Machine Code/Assembler by the compilers for that particular CPU. Moving from a high level programming language to an optimized hardware implementation of an algorithm can speed a calculation up by several orders of magnitude (1,000 times in some examples). AMD has had a number of wins in some small niches of the High Performance Computing market. But not all algorithms are created equal, and not all of them lend themselves to implementation in hardware (FPGA or it’s cousin the ASIC). So co-processors are a very limited market for any manufacturer trying to sell into the HPC market. Intel isn’t going to garner a lot of extra sales by throwing development versions of Larabee out to the HPC developers. Another strike is the dependence on a PCI express bus for communications to the Larabee chipset. While PCI Express is more than fast enough for graphics processing, an HPC setup would prefer a CPU socket adjacent to the general purpose CPUs. The way AMD has designed their motherboards all sockets are on the same motherboard and can communicate directly to one another instead of using the PCI Express bus. Thus, Intel loses again trying to market Larabee in the HPC market. One can only hope that other secret code-name projects like the CPU with 80 cores will see the light of day soon when it makes a difference rather than suffer the opportunity costs of a very delayed launch of Larabee.