nVidia is making a new bit of electronics hardware to be added to LCD displays made by third party manufacturers. The idea is to send syncing data to the display to let it know when a frame is rendered by the 3D video hardware on the video card. Having this bit of extra electronics will smooth out the high rez/high frame rate games played by the elite desktop game players.
It would be cool to also see this adopted for the game console markets as well, meaning TV manufacturers could also use this same idea and make your PS4 and XBox One play smoother as well. It’s a chicken and egg situation though, where unless someone like Steam or another manufacturer tries to push this out to a wider audience, it will get stuck as a niche product for the higher of the end of the high end PC desktop gamers. But it is definitely a step in the right direction and helps push us further away from the old VGA standard from some years ago. Video cards AND displays should both be smart those no reason, no excuse to not have them both be somewhat more aware of their surroundings and coordinate things. And if AMD decide they too need this capability, how soon after that will both AMD and nVidia have to come to the table and get a standard going? I hope that would happen sooner rather than later and that too would possibly drive this technology to a wider audience.
And with clock speeds topped out and electricity use and cooling being the big limiting issue, Scott says that an exaflops machine running at a very modest 1GHz will require one billion-way parallelism, and parallelism in all subsystems to keep those threads humming.
Interesting write-up of a blog entry from nVidia‘s chief of super-computing, including his thoughts regarding scaling up to an exascale supercomputer. I’m surprised at how power efficient a GPU is for floating point operations. I’m amazed at these company’s ability to measure the power consumption down to the single operation level. Microjoules and picojoules are worlds apart from on another and here’s the illustration:
1 Microjoule is 1 millionth of a joule or 1×10-6 (six decimal places) whereas 1 picojoule is 1×10-12 or twice as many decimal places a total of 12 zeroes. So that is a HUGE difference 6 orders of magnitude in efficiency from an electrical consumption standpoint. The nVidia guy, author Steve estimates that to get to exascale supercomputers any hybrid CPU/GPU machine would need GPUs that have one order of magnitude higher efficiency in joules per floating point operation (FLOP) or 1×10-13, one whole decimal point better. To borrow a cliche, Supercomputer manufacturers have their work cut out for them. The way forward is efficiency and the GPU has the edge per operation, and all they need do is increase the efficiency that one decimal point to get them closer to the exascale league of super-computing.
Why is exascale important to the scientific community at large? In one segment there’s never enough cycles per second to satisfy the scale of the computations being done. Models of systems can be created but the simulations they provide may not have enough fine grained ‘detail’. The detail say for weather model simulating a period of time in the future needs to know the current conditions then it can start the calculation. But the ‘resolution’ or fine-grained detail of ‘conditions’ is what limits the accuracy over time. Especially when small errors get amplified by each successive cycle of calculating. One way to help limit the damage by these small errors is to increase the resolution or the land area over which you are assign a ‘current condition’. So instead of 10 miles of resolution (meaning each block on the face of the planet is 10miles square), you switch to 1mile resolution. Any error in a one mile square patch is less likely to cause huge errors in the future weather prediction. But now you have to calculate 10x the number of squares as compared to the previous best model which you set at 10miles of resolution. That’s probably the easiest way to see how demands on the computer increase as people increase the resolution of their weather prediction models. But it’s not limited just to weather. It could be used to simulate a nuclear weapon aging over time. Or it could be used to decrypt foreign messages intercepted by NSA satellites. The speed of the computer would allow more brute force attempts ad decrypting any message they capture.
In spite of all the gains to be had with an exascale computer, you still have to program the bloody thing to work with your simulation. And that’s really the gist of this article, no free lunch in High Performance Computing. The level of knowledge of the hardware required to get anything like the maximum theoretical speed is a lot higher than one would think. There’s no magic bullet or ‘re-compile’ button that’s going to get your old software running smoothly on the exascale computer. More likely you and a team of the smartest scientists are going to work for years to tailor your simulation to the hardware you want to run it on. And therein lies the rub, the hardware alone isn’t going to get you the extra performance.
Given Tuesday’s announcement of the first ARM-15 architecture chip from ARM and TMSC, the ball is rolling now. We’re getting closer and closer to desktop capable CPUs in terms of clock, core and now data/instruction bus widths. Once the ARM-15 64bit chip hits the market Qualcomm is going to need to accelerate its development of competing chips. But for now, I think integrating many functions into the same die will have to do. Here now is Qualcomm’s SnapDragon, Read On:
Qualcomm remains the only active player in the smartphone/tablet space that uses its architecture license to put out custom designs. The benefit to a custom design is typically better power and performance characteristics compared to the more easily synthesizable designs you get directly from ARM. The downside is development time and costs go up tremendously.
I’m very curious to see how the different ARM based processors fair against one anther in each successive generation. Especially the move to ARM-15 (x64) none of which will see a quick implementation on a handheld mobile device. ARM-15 is a long ways off yet, but it appears in spite of the next big thing in ARM designed cores, there’s a ton of incremental improvements and evolutionary progress being made on current generation ARM cores. ARM-8 and ARM-9 have a lot of life in them for the foreseeable future including die shrinks that allow either faster clock speeds or constant clock speeds and lower power drain and lower Thermal Design Point (TDP).
Apple’s also going steadily towards the die shrink in order to cement current gains made in it’s A5 chip design too. Taiwan Manfucturing Semi-Conductor (TMSC) is the biggest partner in this direction and is attempting to run the next iteration of Apple mobile processors on its state of the art 22 nanometer design rule process.