Even a PCIe 2.0 x2 link offers about a 40% increase in maximum throughput over SATA 6Gbps. Like most interfaces, PCIe 2.0 isn’t 100% efficient and based on our internal tests the bandwidth efficiency is around 78-79%, so in the real world you should expect to get ~780MB/s out of a PCIe 2.0 x2 link, but remember that SATA 6Gbps isn’t 100% either (around 515MB/s is the typical maximum we see). The currently available PCIe SSD controller designs are all 2.0 based but we should start to see some PCIe 3.0 drives next year. We don’t have efficiency numbers for 3.0 yet but I would expect to see nearly twice the bandwidth of 2.0, making +1GB/s a norm.
As I’ve watched the SSD market slowly grow and bloom it does seem as though the rate at which big changes occur has slowed. The SATA controllers on the drives themselves were kicked up a notch as the transition from SATA-1 to SATA-2 gave us consistent 500MB/sec read/write speeds. And that has stayed stable forever due to the inherent limit of SATA-2. I had been watching very closely developments in PCIe based SSDs but the prices were always artificially high due to the market for these devices being data centers. Proof positive of this is Fusion-io catered mostly to two big purchasers of their product, Facebook and Apple. Subsequently their prices always put them in the enterprise level $15K for one PCIe slot device (at any size/density of storage).
Apple has come to the rescue in every sense of the word by adopting PCIe SSDs as the base level SSD for their portable computers. Starting last Summer 2013 Apple started released Mac Book Pro laptops with PCIe SSDs and then eventually started designing them into the Mac Book Air as well. The last step was to fully adopt it in their desktop Mac Pro (which has been slow to hit the market). The performance of the PCIe SSD in the Mac Pro as compared to any other shipping computer is the highest for a consumer level product. As the Mac gains some market share for all computers being shipped, Mac buyers are gaining more speed from their SSD as well.
So what further plans are in the works for the REST of the industry? Well SATA-express seems to be a way forward for the 90% of the market still buying Windows PCs. And it’s a new standard being put forth by the SATA-IO standards committee. With any luck the enthusiast market motherboard manufacturers will adopt it as fast as it passes the committees, and we’ll see an Anandtech or Tom’s Hardware guide review doing a real benchmark and analysis of how well it matches up against the previous generation hardware.
I would like to applaud Apples 32nm migration plan. By starting with lower volume products and even then, only on a portion of the iPad 2s available on the market, Apple maintains a low profile and gets great experience with Samsungs 32nm HK+MG process.
Anand Lal Shimpi @ Anandtech.com does a great turn explaining some of the Electrical Engineering minutiae entailed by Apple’s un-publicized switch to a smaller design rule for some of it’s 2nd Generation iPads. Specifically this iPad’s firmware reads as the iPad 2,4 version indicating a 32nm version of the Apple A5 chip. And boy howdy, is there a difference between 45nm A5 vs. 32nm A5 on the iPad 2.
Anand first explains the process technology involved in making the new chip (metal gate electrodes and High dielectric constant gate oxides). Most of it is chosen to keep electricity from leaking between the two sides of the transistor “switch” that populate the circuits on the processor. The metal gates can handle a higher voltage which is needed to overcome the high dielectric constant of the gate oxide (it is more resistant to conducting electricity, so it needs more voltage ‘oomph!’ applied it). Great explanation I think regarding those two on-die changes with the new Samsung 32nm design ruling. Both of the changes help keep the electrical current from leaking all over the processor.
What does this change mean? Well the follow-up to that question is the benchmarks that Anand runs in the rest of the article checking battery life at each step of the way. Informally it appears the iPad2,4 will have roughly 1 extra hour of battery life as compared to the original iPad2,1 using the larger 45nm A5 chip. Performance of the graphics and cpu are exactly the SAME as the first generation A5. So as the article title indicates this change was just a straightforward die shrink from 45nm to 32nm and no doubt is helping validate the A5 architecture on the new production line process technology. And this will absolutely be required to wedge the very large current generation A5x cpu on the iPad 3 into a new iPhone in the Fall 2012.
But consider this, even as Apple and Samsung both refine and innovate on the ARM architecture for mobile devices, Intel is still the technology leader (bar none). Intel has got 22nm production lines up and running and is releasing Ivy Bridge CPUs with that design rule this Summer 2012. While Intel doesn’t literally compete in the mobile chip industry (there have been attempts in the past), it at least can tout being the most dense, power efficient chip in the categories it dominates. I cannot help but wonder what kind of gains could be made if an innovator like Apple had access to an ARM chip foundry with all of Intel’s process engineering and optimization. What would an A5X chip look like at the 22nm design ruling with all the power efficiency and silicon process technologies applied to it? How large would the die be? What kind of battery life would you see if you die-shrunk an A5X all the way down to 22nm? That to me is the Andy Grove 10X improvement I would like to see. Could we get 11-12 continuous hours of battery life on a cell phone? Could we see a cell phone with more cpu/graphics capability than current generation Xbox and Playstations? Hard to tell, I know, but thinking about it is just so darned much fun I cannot help but think about it.
The newest generation of Intel chips was demonstrated at the Consumer Electronics Show in Las Vegas. Some of the technology fanboi websites got early samples of chips and motherboards that use the new chips and chipsets. Aside from having the memory controller on the CPU, another benefit is the integrated graphics chip can be re-purposed to accelerate video transcoding. Intel calls it QuickSync, and I call it effing magic.
Quick Sync is just awesome. Its simply the best way to get videos onto your smartphone or tablet. Not only do you get most if not all of the quality of a software based transcode, you get performance thats better than what high-end discrete GPUs are able to offer. If you do a lot of video transcoding onto portable devices, Sandy Bridge will be worth the upgrade for Quick Sync alone.
For everyone else, Sandy Bridge is easily a no brainer. Unless you already have a high-end Core i7, this is what youll want to upgrade to.
Previously in this blog I have recounted stories from Tom’s Hardware and Anandtech.com surrounding the wicked cool idea of tapping the vast resources contained within your GPU while you’re not playing video games. Producers of GPUs like nVidia and AMD both wanted to market their products to people who not only gamed but occasionally ripped video from DVDs and played them back on ipods or other mobile devices. The amount of time sunk into doing these kinds of conversions were made somewhat less of a pain due to the ability to run the process on a dual core Wintel computer, browsing web pages while re-encoding the video in the background. But to get better speeds one almost always needs to monopolize all the cores on the machine and free software like HandBrake and others will take advantage of those extra cores, thus slowing your machine, but effectively speeding up the transcoding process. There was hope that GPUs could accelerate the transcoding process beyond what was achievable with a multi-core cpu from Intel. An example is also Apple’s widespread adoption of OpenCL as a pipeline to the GPU to send rendering requests for any video frames or video processing that may need to be done in iTunes, QuickTime or the iLife applications. And where I work, we get asked to do a lot of transcoding of video to different formats for customers. Usually someone wants a rip from a DVD that they can put on a flash drive and take with them into a classroom.
However, now it appears there is a revolution in speed in the works where Intel is giving you faster transcodes for free. I’m talking about Intel’s new Quick Sync technology using the integrated graphics core as a video transcode accelerator. The speeds of transcoding are amazingly fast and given the speed, trivial to do for anyone including the casual user. In the past everyone seemed to complain about how slow their computer was especially for ripping DVDs or transcoding the rips to smaller more portable formats. Now, it takes a few minutes to get an hour of video into the right format. No more blue Monday. Follow the link to the story and analysis from Anandtech.com as they ran head to head comparisons of all the available techniques of re-encoding/transcoding a Blue-ray video release into a smaller .mp4 file encoded in as h.264. They did comparisons of Intel four-core cpus (which took the longest and got pretty good quality) versus GPU accelerated transcodes, versus the new Intel QuickSync technology coming out soon on the Sandy Bridge gen Intel i7 cpus. It is wicked cool how fast these transcodes are and it will make the process of transcoding trivial compared to how long it takes to actually ‘watch’ the video you spent all that time converting.