Category: gpu

  • AnandTech – Testing OpenCL Accelerated Handbrake with AMD’s Trinity

    Image representing AMD as depicted in CrunchBase
    Image via CrunchBase

    AMD, and NVIDIA before it, has been trying to convince us of the usefulness of its GPUs for general purpose applications for years now. For a while it seemed as if video transcoding would be the killer application for GPUs, that was until Intel’s Quick Sync showed up last year.

    via AnandTech – What We’ve Been Waiting For: Testing OpenCL Accelerated Handbrake with AMD’s Trinity.

    There’s a lot to talk about when it comes to accelerated video transcoding, really. Not the least of which is HandBrake’s dominance generally for anyone doing small scale size reductions of their DVD collections for transport on mobile devices. We owe it all to the open source x264 codec and all the programmers who have contributed to it over the years, standing on one another’s shoulders allowing us to effortlessly encode or transcode gigabytes of video to manageable sizes. But Intel has attempted to rock the boat by inserting itself into the fray by tooling its QuickSync technology for accelerating the compression and decompression of video frames. However it is a proprietary path pursued by a few small scale software vendors. And it prompts the question, when is open source going to benefit from the proprietary Intel QuickSync technology? Maybe its going to take a long time. Maybe it won’t happen at all. Lucky for the HandBrake users in the audience some attempt is being made now to re-engineer the x264 codec to take advantage of any OpenCL compliant hardware on a given computer.

    Image representing NVidia as depicted in Crunc...
    Image via CrunchBase
  • AnandTech – The Intel Ivy Bridge Core i7 3770K Review

    Similarly disappointing for everyone who isnt Intel, its been more than a year after Sandy Bridges launch and none of the GPU vendors have been able to put forth a better solution than Quick Sync. If youre constantly transcoding movies to get them onto your smartphone or tablet, you need Ivy Bridge. In less than 7 minutes, and with no impact to CPU usage, I was able to transcode a complete 130 minute 1080p video to an iPad friendly format—thats over 15x real time.

    via AnandTech – The Intel Ivy Bridge Core i7 3770K Review.

    QuickSync for anyone who doesn’t follow Intel’s own technology white papers and cpu releases is a special feature of Sandy Bridge era Intel CPUs. Originally its duty on Intel is as old as the Clarkdale series with embedded graphics (first round of the 32nm design rule). It can do things like just simply speeding up the process of decoding a video stream saved in a number of popular video formats VC-1, H.264, MP4, etc. Now it’s marketed to anyone trying to speed up the transcoding of video from one format to another. The first Sandy Bridge CPUs using the the hardware encoding portion of QuickSync showed incredible speeds as compared to GPU-accelerated encoders of that era. However things have been kicked up a further notch in the embedded graphics of the Intel Ivy Bridge series CPUs.

    In the quote at the beginning of this article, I included a summary from the Anandtech review of the Intel  Core i7 3770 which gives a better sense of the magnitude of the improvement. The full 130 minute Blu-ray DVD was converted at a rate of 15 times real time, meaning for every minute of video coming off the disk, QuickSync is able to transcode it in 4 seconds! That is major progress for anyone who has followed this niche of desktop computing. Having spent time capturing, editing and exporting video I will admit transcoding between formats is a lengthy process that uses up a lot of CPU resources. Offloading all that burden to the embedded graphics controller totally changes that traditional impedance of slowing the computer to a crawl and having to walk away and let it work.

    Now transcoding is trivial, it costs nothing in terms of CPU load. And any time it can be faster than realtime means you don’t have to walk away from your computer (or at least not for very long), but 10X faster than real time makes that doubly true. Now we are fully at 15X realtime for a full length movie. The time spent is so short you wouldn’t ever have a second thought about “Will this transcode slow down the computer?” It won’t in fact you can continue doing all your other work, be productive, have fun and continue on your way just as if you hadn’t just asked your computer to do the most complicated, time consuming chore that (up until now) you could possibly ask it to do.

    Knowing this application of the embedded graphics is so useful for desktop computers makes me wonder about Scientific Computing. What could Intel provide in terms of performance increases for simulations and computation in a super-computer cluster? Seeing how hybrid super computers using nVidia Tesla GPU co-processors mixed with Intel CPUs have slowly marched up the list of the Top 500 Supercomputers makes me think Intel could leverage QuickSync further,. . . Much further. Unfortunately this performance boost is solely dependent on a few vendors of proprietary transcoding software. The open software developers do not have an opening into the QuickSync tech in order to write a library that will re-direct a video stream into the QuickSync acceleration pipeline. When somebody does accomplish this feat, it may be shortly after when you see some Linux compute clusters attempt to use QuickSync as an embedded algorithm accelerator too.

    Timeline of Intel processor codenames includin...
    Timeline of Intel processor codenames including released, future and canceled processors. (Photo credit: Wikipedia)
  • Nvidia: No magic compilers for HPC coprocessors • The Register

    Image representing NVidia as depicted in Crunc...
    Image via CrunchBase

    And with clock speeds topped out and electricity use and cooling being the big limiting issue, Scott says that an exaflops machine running at a very modest 1GHz will require one billion-way parallelism, and parallelism in all subsystems to keep those threads humming.

    via Nvidia: No magic compilers for HPC coprocessors • The Register.

    Interesting write-up of a blog entry from nVidia‘s chief of super-computing, including his thoughts regarding scaling up to an exascale supercomputer. I’m surprised at how power efficient a GPU is for floating point operations. I’m amazed at these company’s ability to measure the power consumption down to the single operation level. Microjoules and picojoules are worlds apart from on another and here’s the illustration:

    1 Microjoule is 1 millionth of a joule or 1×10-6 (six decimal places) whereas 1 picojoule is 1×10-12 or twice as many decimal places a total of 12 zeroes. So that is a HUGE difference 6 orders of magnitude in efficiency from an electrical consumption standpoint. The nVidia guy, author Steve estimates that to get to exascale supercomputers any hybrid CPU/GPU machine would need GPUs that have one order of magnitude higher efficiency in joules per floating point operation (FLOP) or 1×10-13, one whole decimal point better. To borrow a cliche, Supercomputer manufacturers have their work cut out for them. The way forward is efficiency and the GPU has the edge per operation, and all they need do is increase the efficiency that one decimal point to get them closer to the exascale league of super-computing.

    Why is exascale important to the scientific community at large? In one segment there’s never enough cycles per second to satisfy the scale of the computations being done. Models of systems can be created but the simulations they provide may not have enough fine grained ‘detail’. The detail say for weather model simulating a period of time in the future needs to know the current conditions then it can start the calculation. But the ‘resolution’ or fine-grained detail of ‘conditions’ is what limits the accuracy over time. Especially when small errors get amplified by each successive cycle of calculating. One way to help limit the damage by these small errors is to increase the resolution or the land area over which you are assign a ‘current condition’. So instead of 10 miles of resolution (meaning each block on the face of the planet is 10miles square), you switch to 1mile resolution. Any error in a one mile square patch is less likely to cause huge errors in the future weather prediction. But now you have to calculate 10x the number of squares as compared to the previous best model which you set at 10miles of resolution. That’s probably the easiest way to see how demands on the computer increase as people increase the resolution of their weather prediction models. But it’s not limited just to weather. It could be used to simulate a nuclear weapon aging over time. Or it could be used to decrypt foreign messages intercepted by NSA satellites. The speed of the computer would allow more brute force attempts ad decrypting any message they capture.

    Nvidia Riva TNT2 M64 GPU Deutsch: Nvidia Riva ...
    Nvidia Riva TNT2 M64 GPU Deutsch: Nvidia Riva TNT2 M64 Grafikprozessor (Photo credit: Wikipedia)

    In spite of all the gains to be had with an exascale computer, you still have to program the bloody thing to work with your simulation. And that’s really the gist of this article, no free lunch in High Performance Computing. The level of knowledge of the hardware required to get anything like the maximum theoretical speed is a lot higher than one would think. There’s no magic bullet or ‘re-compile’ button that’s going to get your old software running smoothly on the exascale computer. More likely you and a team of the smartest scientists are going to work for years to tailor your simulation to the hardware you want to run it on. And therein lies the rub, the hardware alone isn’t going to get you the extra performance.

  • Apple A5X CPU in Review

    Apple Inc.
    Apple Inc. (Photo credit: marcopako )

    A meta-analysis of the Apple A5X system on chip

    (from the currently shipping 3rd Gen iPad)

    New Ipad’s A5X beats NIVIDIA Tegra 3 in some tests (MacNN|Electronista)

    Apple’s A5X Die (and Size?) Revealed (Anandtech.com)

    Chip analysis reveals subtle changes to new iPad innards (AppleInsider-quoting Anandtech)

    Apple A5X Die Size Measured: 162.94mm^2, Samsung 45nm LP Confirmed (Update from Anandtech based on a more technical analysis of the chip)

    Reading through all the hubbub and hand-waving from the technology ‘teardown’ press outlets, one would have expected a bigger leap from Apple’s chip designers. A fairly large chip sporting an enormous graphics processor integrated into the die is what Apple came up with to help boost itself to the next higher rez display (so-called Retina Display). The design rule is still a pretty conservative 45nm (rather than try to push the envelope by going with 32nm or thinner to bring down the power requirements). Apple similarly had to boost its battery capacity to make up for this power hungry pixel demon by almost 2X more than the first gen iPad. So for almost the ‘same’ amount of battery capacity (10 hours of reserve power), you get the higher rez display. But a bigger chip and higher rez display will add up to some extra heat being generated, generally speaking. Which leads us to a controversy.

    Given this knowledge there has been a recent back and forth argument over thermal design point for iPad 3rd generation. Consumer Reports published an online article saying the power/heat dissipation was much higher than previous generation iPads. They included some thermal photographs indicating the hot spots on the back of the device and relative temperatures. While the iPad doesn’t run hotter than a lot of other handheld devices (say Android tablets). It does run hotter than say an iPod Touch. But as Apple points out that has ALWAYS been the case. So you gain some things you give up some things and still Apple is the market leader in this form factor, years ahead of the competition. And now the tempest in the teapot is winding down as Consumer Reports (via LA Times.com)has rated the 3rd Gen iPad as it’s no. 1 tablet on the market (big surprise). So while they aren’t willing to retract their original claim of high heat, they are willing to say it doesn’t count as ’cause for concern’. So you be the judge when you try out the iPad in the Apple Store. Run it through its paces, a full screen video or 2 should heat up the GPU and CPU enough to get the electrons really racing through the device.

    A picture of the Apple A5X
    This is the new System on Chip used by the Apple 3rd generation iPad
  • AMD Snatches New-Age Server Maker From Under Intel | Wired Enterprise | Wired.com

    Image representing AMD as depicted in CrunchBase
    Image via CrunchBase

    Chip designer and chief Intel rival AMD has signed an agreement to acquire SeaMicro, a Silicon Valley startup that seeks to save power and space by building servers from hundreds of low-power processors.

    via AMD Snatches New-Age Server Maker From Under Intel | Wired Enterprise | Wired.com.

    It was bound to happen eventually, I guess. SeaMicro has been acquired by AMD. We’ll see what happens as a result as SeaMicro is a customer of Intel’s Atom chips and now most recently Xeon server chips as well. I have no idea where this is going or what AMD intends to do, but hopefully this won’t scare off any current or near future customers.

    SeaMicro’s competitive advantage has been and will continue to be the development work they performed on that custom ASIC chip they use in all their systems. That bit of intellectual property was in essence the reason AMD decided to acquire SeaMicro and hopefully let it gain an engineering advantage for systems it might put out on the market in the future for large scale Data Centers.

    While this is all pretty cool technology, I think that SeaMicro’s best move was to design its ASIC so that it could take virtually any common CPU. In fact, SeaMicro’s last big announcement introduced its SM10000-EX option, which uses low-power, quad-core Xeon processors to more than double compute performance while still keeping the high density, low-power characteristics of its siblings.

    via SeaMicro acquisition: A game-changer for AMD • The Register.

    So there you have it Wired and The Register are reporting the whole transaction pretty positively. Looks on the surface to be a win for AMD as it can design new server products and get them to market quickly using the SeaMicro ASIC as a key ingredient. SeaMicro can still service it’s current customers and eventually allow AMD to up sell or upgrade as needed to keep the ball rolling. And with AMD’s Fusion architecture marrying GPUs with CPU cores who knows what cool new servers might be possible? But as usual the nay-sayers the spreaders of Fear, Uncertainty and Doubt have questioned the value of SeaMicro and their original product he SM-10000.

    Diane Bryant, the general manager of Intel’s data center and connected systems group at a press conference for the launch of new Xeon processors had this to say, ““We looked at the fabric and we told them thereafter that we weren’t even interested in the fabric,” when asked about SeaMicro’s attempt to interest Intel in buying out the company. To Intel there’s nothing special enough in the SeaMicro to warrant buying the company. Furthermore Bryant told Wired.com:

    “…Intel has its own fabric plans. It just isn’t ready to talk about them yet. “We believe we have a compelling solution; we believe we have a great road map,” she said. “We just didn’t feel that the solution that SeaMicro was offering was superior.”

    This is a move straight out of Microsoft’s marketing department circa 1992 where they would pre-announce a product that never shipped was barely developed beyond a prototype stage. If Intel is really working on this as a new product offering you would have seen an announcement by now, rather than a vague, tangential reference that appears more like a parting shot than a strategic direction. So I will be watching intently in the coming months and years if needed to see what if any Intel ‘fabric technology’ makes its way from the research lab, to the development lab and to final product shipping. However don’t be surprised if this is Intel attempting to undermine AMD’s choice to purchase SeaMicro. Likewise, Forbes.com later reported from a representative from SeaMicro that their company had not tried to encourage Intel to acquire SeaMicro. It is anyone’s guess who is really correct and being 100% honest in their recollections. However I am still betting on SeaMicro’s long term strategy of pursuing low power, ultra dense, massively parallel servers. It is an idea whose time has come.

    Image representing Intel as depicted in CrunchBase
    Image via CrunchBase
  • AnandTech – AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One

    Image representing AMD as depicted in CrunchBase
    Image via CrunchBase

    Quick Sync made real-time H.264 encoding practical on even low-power devices, and made GPU encoding redundant at the time. AMD of course isn’t one to sit idle, and they have been hard at work at their own implementation of that technology: the Video Codec Engine VCE.

    via AnandTech – AMD Radeon HD 7970 Review: 28nm And Graphics Core Next, Together As One.

    Intel’s QuickSync helped speed up the realtime encoding of H.264 video. AMD is striking back and has Hybrid Mode VCE operations that will speed things up EVEN MORE! The key to having this hit the market and get widely adopted of course is the compatibility of the software with a wide range of video cards from AMD. The original CUDA software environment from nVidia took a while to disperse into the mainstream as it had a limited number of graphics cards it could support when it rolled out. Now it’s part of the infrastructure and more or less provided gratis whenever you buy ANY nVidia graphics card today. AMD has to follow this semi-forced adoption of this technology as fast as possible to deliver the benefit quickly. At the same time the User Interface to this VCE software had better be a great design and easy to use. Any type of configuration file dependencies and tweaking through preference files should be eliminated to the point where you merely move a slider up and down a scale (Slower->Faster). And that should be it.

    And if need be AMD should commission an encoder App or a plug-in to an open source project like HandBrake to utilize the VCE capability upon detection of the graphics chip on the computer. Make it ‘just happen’ without the tempting early adopter approach of making a tool available and forcing people to ‘build’ a version of an open source encoder to utilize the hardware properly. Hands-off approaches that favor early adopters is going to consign this technology to the margins for a number of years if AMD doesn’t take a more activist role. QuickSync on Intel hasn’t been widely touted either so maybe it’s a moot point to urge anyone to treat their technology as an insanely great offering. But I think there’s definitely brand loyalty that could be brought into play if the performance gains to be had with a discreet graphics card far outpace the integrated graphics solution of QuickSync provided by Intel. If you can achieve a 10x order of magnitude boost, you should be pushing that to all the the potential computer purchasers from this announcement forward.

  • Rise of the Multi-Core Mesh Munchkins: Adapteva Announces New Epiphany Processor – HotHardware

    Epiphany Processor from Adapteva
    Epiphany Block Diagram

    Many-core processors are apparently the new black for 2011. Intel continues to work on both its single chip cloud computer and Knights Corner, Tilera made headlines earlier this year, and now a new company, Adapteva, has announced its own entry into the field.

    via Rise of the Multi-Core Mesh Munchkins: Adapteva Announces New Epiphany Processor – HotHardware.

    A competitor to Tilera and Intel’s MIC  has entered the field as a mobile processor, co-processor. Given the volatile nature of chip architectures in the mobile market, this is going to be hard sell for some device designers I think. I say this as each new generation of Mobile CPU gets more and more integrated features as each new die shrink allows more embedded functions. The Graphic processors are now being embedded wholesale into every smartphone cpu. Other features like memory controllers and baseband processors will now doubt soon be added to the list as well. If Adapteva wants any traction at all in the Mobile market they will need to further their development of the Epiphany into a synthesizable core that can be added to an existing cpu (most likely a design from ARM). Otherwise trying to stick with being a separate auxiliary chip is going to hamper and severely limit the potential applications of their product.

    Witness the integration of the graphics processing unit. Not long ago it was a way to differentiate a phone but required it to be integrated into the motherboard design along with any of the power requirements it required. In a very short time, after GPUs were added to cell phones they were integrated into the CPU chip sandwich to help keep manufacturing and power budget in check. If the Epiphany had been introduced around the golden age of discrete chips on cell phone motherboards, it would make a lot more sense. But now you need to be embedded, integrated and 100% ARM compatible with a fully baked developer toolkit. Otherwise, it’s all uphill from the product introduction forward. If there’s an application for the Ephiphany co-processor I hope they concentrate more on the tools to fully use the device and develop a niche right out of the gate rather than attempt to get some big name but small scale wins on individual devices from the Android market. That seems like the most likely candidates for shipping product right now.

  • ARM vet: The CPUs future is threatened • The Register

    8-inch silicon wafer with multiple intel Penti...
    Image via Wikipedia

    Harkening back to when he joined ARM, Segars said: “2G, back in the early 90s, was a hard problem. It was solved with a general-purpose processor, DSP, and a bit of control logic, but essentially it was a programmable thing. It was hard then – but by todays standards that was a complete walk in the park.”

    He wasn’t merely indulging in “Hey you kids, get off my lawn!” old-guy nostalgia. He had a point to make about increasing silicon complexity – and he had figures to back it up: “A 4G modem,” he said, “which is going to deliver about 100X the bandwidth … is going to be about 500 times more complex than a 2G solution.”

    via ARM vet: The CPUs future is threatened • The Register.

    A very interesting look a the state of the art in microprocessor manufacturing, The Register talks with one of the principles at ARM, the folks who license their processor designs to almost every cell phone manufacturer worldwide. Looking at the trends in manufacturing, Simon Segars is predicting a more difficult level of sustained performance gains in the near future. Most advancement he feels will be had by integrating more kinds of processing and coordinating the I/O between those processors on the same processor die. Which is kind of what Intel is attempting to do integrating graphics cores, memory controllers and CPU all on one slice of silicon. But the software integration is the trickiest part, and Intel still sees fit to just add more general purpose CPU cores to continue making new sales. Processor clocks stay pretty rigidly near the 3GHz boundary and have not shifted significantly since the end of the Pentium IV era.

    Note too, the difficulty of scaling up as well as designing the next gen chips. Referring back to my article from Dec.21,  2010; 450mm wafers (commentary on Electronista article), Intel is the only company rich enough to scale up to the next size of wafer. Every step in the manufacturing process has become so specialized that the motivation to create new devices for manufacture and test just isn’t there because the total number of manufacturers who can scale up to the next largest size of silicon wafer is probably 4 companies worldwide. That’s a measure of how exorbitantly expensive large scale chip manufacturing has become. It seems more and more a plateau is being reached in terms of clock speeds and the size of wafers finished in manufacturing. With these limits, Simon Segars thesis becomes even stronger.

  • Apple patents hint at future AR screen tech for iPad | Electronista

    Structure of liquid crystal display: 1 – verti...
    Image via Wikipedia

    Apple may be working on bringing augmented reality views to its iPad thanks to a newly discovered patent filing with the USPTO.

    via Apple patents hint at future AR screen tech for iPad | Electronista. (Originally posted at AppleInsider at the following link below)

    Original Article: Apple Insider article on AR

    Just a very brief look at a couple of patent filings by Apple with some descriptions of potential applications. They seem to want to use it for navigation purposes using the onboard video camera. One half the screen will use the live video feed, the other half is a ‘virtual’ rendition of that scene in 3D to allow you to find a path or maybe a parking space in between all those buildings.

    The second filing mentions a see-through screen whose opacity can be regulated by the user. The information display will take precedence over the image seen through the LCD panel. It will default to totally opaque using no voltage whatsoever (In Plane switching design for the LCD).

    However the most intriguing part of the story as told by AppleInsider is the use of sensors on the device to determine angle, direction, bearing to then send over the network. Why the network? Well the whole rendering of the 3D scene as described in first patent filing is done somewhere in the cloud and spit back to the iOS device. No onboard 3D rendering needed or at least not at that level of detail. Maybe those datacenters in North Carolina are really cloud based 3D rendering farms?

  • AppleInsider | Expanded GPU support in Apple’s Mac OS X 10.6.7 hints at future Mac hardware

    HIS Radeon HD 5850 AMD ATI
    Image by Forrestal_PL via Flickr

    “Could Apple be opening up the platform more?” he asked. “What happens to NVIDIA? Why support for cards that aren’t in Macs yet? Will the 2011 Sandy Bridge iMacs contain one or more of these new 6xxx cards?”

    via AppleInsider | Expanded GPU support in Apple’s Mac OS X 10.6.7 hints at future Mac hardware.

    This is an interesting tidbit of news. A Macintosh hacker has discovered within the most recent update of Mac OS X 10.6 a number of hardware drivers for ATI graphics cards that do not ship and are currently ‘unsupported’ on the Mac. Anyone who has attempted to buy after market, third party OEM graphics cards for Macs know this is treacherous minefield to navigate. The principle problem being Apple absolutely positively does not want people sticking any old graphics card in the Macintosh Pro towers. Or even in old legacy towers going back to the first PowerPC/PCI based Macs. No, you must buy direct from Apple the bona fide supported hardware with drivers they supply. In a pinch you might be able to fake it with a PC graphics card that has had its BIOS flashed to make it appear to be a genuine Apple part.

    But now if Apple is just bundling up a bunch of drivers for various and sundry graphics cards (albeit from one supplier: ATI), is it possible you could finally buy any card you wanted and it would work? That would be big news indeed for any owner of an end-user upgradeable Macintosh Pro owner and welcome news at that. I’m hoping that this news continues to develop and Apple comes out with a policy or strategy statement heralding a change in past policy towards peripheral manufacturers. More devices being supported would be a great thing.