OpenCL is a breakthrough precisely because it enables developers to accelerate the real-time execution of their algorithms quickly and easily — particularly those that lend themselves to the considerable parallel processing capabilities of FPGAs (which yield superior compute densities and far better performance/Watt than CPU- and GPU-based solutions)
There’s still a lot of untapped energy available with the OpenCL programming tools. Apple is still the single largest manufacturer who has adopted OpenCL through a large number of it’s products (OS and App software). And I know from reading about super computing on GPUs that some large scale hybrid CPU/GPU computers have been ranked worldwide (the Chinese Tiahne being the first and biggest example). This article from EETimes encourages anyone with a brackground in C programming to try and give it a shot, see what algorithms could stand to be accelerated using the resources on the motherboard alone. But being EETimes they are also touting the benefits of using FPGAs in the mix as well.
To date the low-hanging fruit for desktop PC makers and their peripheral designers and manufacturers has been to reuse the GPU as massively parallel co-processor where it makes sense. But as the EETimes writer emphasizes, FPGAs can be equal citizens too and might further provide some more flexible acceleration. Interest in the FPGA as a co-processor for desktop to higher end enterprise data center motherboards was brought to the fore by AMD back in 2006 with the Torrenza cpu socket. The hope back then was that giving a secondary specialty processor (at the time an FPGA) might prove to be a market no one had addressed up to that point. So depending on your needs and what extra processors you might have available on your motherboard, OpenCL might be generic enough going forward to get a boost from ALL the available co-processors on your motherboard.
Whether or not we see benefits at the consumer level desktop is very dependent on the OS level support for OpenCL. To date the biggest adopter of OpenCL has been Apple as they needed an OS level acceleration API for video intensive apps like video editing suites. Eventually Adobe recompiled some of its Creative Suite apps to take advantage of OpenCL on MacOS. On the PC side Microsoft has always had DirectX as its API for accelerating any number of different multimedia apps (for playback, editing) and is less motivated to incorporate OpenCL at the OS level. But that’s not to say a 3rd party developer who saw a benefit to OpenCL over DirectX couldn’t create their own plumbing and libraries and get a runtime package that used OpenCL to support their apps or anyone who wanted to license this as part of a larger package installer (say for a game or for a multimedia authoring suite).
For the data center this makes way more sense than for the desktop, as DirectX isn’t seen as a scientific computing or means of allowing a GPU to be used as a numeric accelerator for scientific calculations. In this context, OpenCL might be a nice, open and easy to adopt library for people working on compute farms with massive numbers of both general purpose cpus and GPUs handing off parts of a calculation to one another over the PCI bus or across CPU sockets on a motherboard. So everyone’s needs are going to vary and widely vary in some cases. But OpenCL might help make that variation more easily addressed by having a common library that would allow one to touch all the co-processors available when a computation is needing to be sped up. So keep an eye on OpenCL as a competitor to any GPGPU style API and library put out by either nVidia or AMD or Intel. OpenCL might help people bridge differences between these different manufacturers too.
For now, use Handbrake for simple, effective encodes. Arcsoft or Xilisoft might be worth a look if you know you’ll be using CUDA or Quick Sync and have no plans for any demanding work. Avoid MediaEspresso entirely.
Joel Hruska does a great survey of GPU enabled video encoders. He even goes back to the original Avivo and Badaboom encoders put out by AMD and nVidia when they were promoting GPU accelerated video encoding. Sadly the hype doesn’t live up to the results. Even Intel’s most recent competitor in the race, QuickSync, is left wanting. HandBrake appears to be the best option for most people and the most reliable and repeatable in the results it gives.
Ideally the maintainers of the HandBrake project might get a boost by starting up a fork of the source code that has Intel QuickSync support. There’s no indication now that that everyone is interested in proprietary Intel technology like QuickSynch as expressed in this article from Anandtech. OpenCL seems like a more attractive option for the Open Source community at large. So the OpenCL/HandBrake development is at least a little encouraging. Still as Joel Hruska points out the CPU still is the best option for encoding high quality at smaller frame sizes, it just beats the pants off all the GPU accelerated options available to date.
AMD, and NVIDIA before it, has been trying to convince us of the usefulness of its GPUs for general purpose applications for years now. For a while it seemed as if video transcoding would be the killer application for GPUs, that was until Intel’s Quick Sync showed up last year.
There’s a lot to talk about when it comes to accelerated video transcoding, really. Not the least of which is HandBrake’s dominance generally for anyone doing small scale size reductions of their DVD collections for transport on mobile devices. We owe it all to the open source x264 codec and all the programmers who have contributed to it over the years, standing on one another’s shoulders allowing us to effortlessly encode or transcode gigabytes of video to manageable sizes. But Intel has attempted to rock the boat by inserting itself into the fray by tooling its QuickSync technology for accelerating the compression and decompression of video frames. However it is a proprietary path pursued by a few small scale software vendors. And it prompts the question, when is open source going to benefit from the proprietary Intel QuickSync technology? Maybe its going to take a long time. Maybe it won’t happen at all. Lucky for the HandBrake users in the audience some attempt is being made now to re-engineer the x264 codec to take advantage of any OpenCL compliant hardware on a given computer.
The newest generation of Intel chips was demonstrated at the Consumer Electronics Show in Las Vegas. Some of the technology fanboi websites got early samples of chips and motherboards that use the new chips and chipsets. Aside from having the memory controller on the CPU, another benefit is the integrated graphics chip can be re-purposed to accelerate video transcoding. Intel calls it QuickSync, and I call it effing magic.
Quick Sync is just awesome. Its simply the best way to get videos onto your smartphone or tablet. Not only do you get most if not all of the quality of a software based transcode, you get performance thats better than what high-end discrete GPUs are able to offer. If you do a lot of video transcoding onto portable devices, Sandy Bridge will be worth the upgrade for Quick Sync alone.
For everyone else, Sandy Bridge is easily a no brainer. Unless you already have a high-end Core i7, this is what youll want to upgrade to.
Previously in this blog I have recounted stories from Tom’s Hardware and Anandtech.com surrounding the wicked cool idea of tapping the vast resources contained within your GPU while you’re not playing video games. Producers of GPUs like nVidia and AMD both wanted to market their products to people who not only gamed but occasionally ripped video from DVDs and played them back on ipods or other mobile devices. The amount of time sunk into doing these kinds of conversions were made somewhat less of a pain due to the ability to run the process on a dual core Wintel computer, browsing web pages while re-encoding the video in the background. But to get better speeds one almost always needs to monopolize all the cores on the machine and free software like HandBrake and others will take advantage of those extra cores, thus slowing your machine, but effectively speeding up the transcoding process. There was hope that GPUs could accelerate the transcoding process beyond what was achievable with a multi-core cpu from Intel. An example is also Apple’s widespread adoption of OpenCL as a pipeline to the GPU to send rendering requests for any video frames or video processing that may need to be done in iTunes, QuickTime or the iLife applications. And where I work, we get asked to do a lot of transcoding of video to different formats for customers. Usually someone wants a rip from a DVD that they can put on a flash drive and take with them into a classroom.
However, now it appears there is a revolution in speed in the works where Intel is giving you faster transcodes for free. I’m talking about Intel’s new Quick Sync technology using the integrated graphics core as a video transcode accelerator. The speeds of transcoding are amazingly fast and given the speed, trivial to do for anyone including the casual user. In the past everyone seemed to complain about how slow their computer was especially for ripping DVDs or transcoding the rips to smaller more portable formats. Now, it takes a few minutes to get an hour of video into the right format. No more blue Monday. Follow the link to the story and analysis from Anandtech.com as they ran head to head comparisons of all the available techniques of re-encoding/transcoding a Blue-ray video release into a smaller .mp4 file encoded in as h.264. They did comparisons of Intel four-core cpus (which took the longest and got pretty good quality) versus GPU accelerated transcodes, versus the new Intel QuickSync technology coming out soon on the Sandy Bridge gen Intel i7 cpus. It is wicked cool how fast these transcodes are and it will make the process of transcoding trivial compared to how long it takes to actually ‘watch’ the video you spent all that time converting.