Just when you think you understand the trio (as I thought I did up until my final interview with Grove) you learn something new that turns everything upside-down. The Intel Trinity must be considered one of the most successful teams in business history, yet it seems to violate all the laws of successful teams.
Agreed, this is a topic near and dear to my heart as I’ve collectively read a number of the stories published over the years from the Tech Press. From Tracy Kidder‘s, Soul of a New Machine, to Fred Brook’s The Miracle Man Month, Steven Levy’s Insanely Great. The story of Xerox PARC as told in Dealer’s of Lightning, the Arpanet Project as told in Where Wizards Stay Up Late. And moving somewhat along those lines, Stewart Brand’s The Media Lab and Howard Rheingold’s Virtual Reality. All of these are studies at some level of organizational theory in the high technology field.
And one thing you find commonly is there’s one charismatic individual that joins up at some point (early or late doesn’t matter) who then brings in a flood of followers and talent that is the kick in the pants that really gets momentum going. The problem is with a startup company say like Intel or its predecessor, Fairchild Semiconductor, there’s more than one charismatic individual. And keeping that organization stitched together even just loosely is probably the biggest challenge of all. So I’ll be curious to read this book Michael Malone and see how it compares to the other books in my anthology of organization theory in high tech. Should be a good, worthwhile read.
In almost every kind of electronic equipment we buy today, there is memory in the form of SRAM and/or flash memory. Following Moores law, memories have doubled in size every second year. When Intel introduced the 1103 1Kbit dynamic RAM in 1971, it cost $20. Today, we can buy a 4Gbit SDRAM for the same price.
Read now, a look back from an Ericsson engineer surveying the use of solid state, chip-based memory in electronic devices. It is always interesting to know how these things start and evolved over time. Advances in RAM design and manufacture are the quintessential example of Moore’s Law even more so than the advances in processors during the same time period. Yes CPUs are cool and very much a foundation upon which everything else rests (especially dynamic ram storage). But remember this Intel didn’t start out making microprocessors, they started out as a dynamic RAM chip company at a time that DRAM was just entering the market. That’s the foundation upon which even Gordon Moore knew the rate at which change was possible with silicon based semiconductor manufacturing.
Now we’re looking at mobile smartphone processors and System on Chip (SoC) advancing the state of the art. Desktop and server CPUs are making incremental gains but the smartphone is really trailblazing in showing what’s possible. We went from combining the CPU with the memory (so-called 3D memory) and now graphics accelerators (GPU) are in the mix. Multiple cores and soon fully 64bit clean cpu designs are entering the market (in the form of the latest model iPhones). It’s not just a memory revolution, but it is definitely a driver in the market when we migrated from magnetic core memory (state of the art in 1951-52 while developed at MIT) to the Dynamic RAM chip (state of the art in 1968-69). That drive to develop the DRAM brought all other silicon based processes along with it and all the boats were raised. So here’s to the DRAM chip that helped spur the revolution. Without those shoulders, the giants of today wouldn’t be able to stand.
Similarly disappointing for everyone who isnt Intel, its been more than a year after Sandy Bridges launch and none of the GPU vendors have been able to put forth a better solution than Quick Sync. If youre constantly transcoding movies to get them onto your smartphone or tablet, you need Ivy Bridge. In less than 7 minutes, and with no impact to CPU usage, I was able to transcode a complete 130 minute 1080p video to an iPad friendly format—thats over 15x real time.
QuickSync for anyone who doesn’t follow Intel’s own technology white papers and cpu releases is a special feature of Sandy Bridge era Intel CPUs. Originally its duty on Intel is as old as the Clarkdale series with embedded graphics (first round of the 32nm design rule). It can do things like just simply speeding up the process of decoding a video stream saved in a number of popular video formats VC-1, H.264, MP4, etc. Now it’s marketed to anyone trying to speed up the transcoding of video from one format to another. The first Sandy Bridge CPUs using the the hardware encoding portion of QuickSync showed incredible speeds as compared to GPU-accelerated encoders of that era. However things have been kicked up a further notch in the embedded graphics of the Intel Ivy Bridge series CPUs.
In the quote at the beginning of this article, I included a summary from the Anandtech review of the Intel Core i7 3770 which gives a better sense of the magnitude of the improvement. The full 130 minute Blu-ray DVD was converted at a rate of 15 times real time, meaning for every minute of video coming off the disk, QuickSync is able to transcode it in 4 seconds! That is major progress for anyone who has followed this niche of desktop computing. Having spent time capturing, editing and exporting video I will admit transcoding between formats is a lengthy process that uses up a lot of CPU resources. Offloading all that burden to the embedded graphics controller totally changes that traditional impedance of slowing the computer to a crawl and having to walk away and let it work.
Now transcoding is trivial, it costs nothing in terms of CPU load. And any time it can be faster than realtime means you don’t have to walk away from your computer (or at least not for very long), but 10X faster than real time makes that doubly true. Now we are fully at 15X realtime for a full length movie. The time spent is so short you wouldn’t ever have a second thought about “Will this transcode slow down the computer?” It won’t in fact you can continue doing all your other work, be productive, have fun and continue on your way just as if you hadn’t just asked your computer to do the most complicated, time consuming chore that (up until now) you could possibly ask it to do.
Knowing this application of the embedded graphics is so useful for desktop computers makes me wonder about Scientific Computing. What could Intel provide in terms of performance increases for simulations and computation in a super-computer cluster? Seeing how hybrid super computers using nVidia Tesla GPU co-processors mixed with Intel CPUs have slowly marched up the list of the Top 500 Supercomputers makes me think Intel could leverage QuickSync further,. . . Much further. Unfortunately this performance boost is solely dependent on a few vendors of proprietary transcoding software. The open software developers do not have an opening into the QuickSync tech in order to write a library that will re-direct a video stream into the QuickSync acceleration pipeline. When somebody does accomplish this feat, it may be shortly after when you see some Linux compute clusters attempt to use QuickSync as an embedded algorithm accelerator too.
Chip designer and chief Intel rival AMD has signed an agreement to acquire SeaMicro, a Silicon Valley startup that seeks to save power and space by building servers from hundreds of low-power processors.
It was bound to happen eventually, I guess. SeaMicro has been acquired by AMD. We’ll see what happens as a result as SeaMicro is a customer of Intel’s Atom chips and now most recently Xeon server chips as well. I have no idea where this is going or what AMD intends to do, but hopefully this won’t scare off any current or near future customers.
SeaMicro’s competitive advantage has been and will continue to be the development work they performed on that custom ASIC chip they use in all their systems. That bit of intellectual property was in essence the reason AMD decided to acquire SeaMicro and hopefully let it gain an engineering advantage for systems it might put out on the market in the future for large scale Data Centers.
While this is all pretty cool technology, I think that SeaMicro’s best move was to design its ASIC so that it could take virtually any common CPU. In fact, SeaMicro’s last big announcement introduced its SM10000-EX option, which uses low-power, quad-core Xeon processors to more than double compute performance while still keeping the high density, low-power characteristics of its siblings.
So there you have it Wired and The Register are reporting the whole transaction pretty positively. Looks on the surface to be a win for AMD as it can design new server products and get them to market quickly using the SeaMicro ASIC as a key ingredient. SeaMicro can still service it’s current customers and eventually allow AMD to up sell or upgrade as needed to keep the ball rolling. And with AMD’s Fusion architecture marrying GPUs with CPU cores who knows what cool new servers might be possible? But as usual the nay-sayers the spreaders of Fear, Uncertainty and Doubt have questioned the value of SeaMicro and their original product he SM-10000.
Diane Bryant, the general manager of Intel’s data center and connected systems group at a press conference for the launch of new Xeon processors had this to say, ““We looked at the fabric and we told them thereafter that we weren’t even interested in the fabric,” when asked about SeaMicro’s attempt to interest Intel in buying out the company. To Intel there’s nothing special enough in the SeaMicro to warrant buying the company. Furthermore Bryant told Wired.com:
“…Intel has its own fabric plans. It just isn’t ready to talk about them yet. “We believe we have a compelling solution; we believe we have a great road map,” she said. “We just didn’t feel that the solution that SeaMicro was offering was superior.”
This is a move straight out of Microsoft’s marketing department circa 1992 where they would pre-announce a product that never shipped was barely developed beyond a prototype stage. If Intel is really working on this as a new product offering you would have seen an announcement by now, rather than a vague, tangential reference that appears more like a parting shot than a strategic direction. So I will be watching intently in the coming months and years if needed to see what if any Intel ‘fabric technology’ makes its way from the research lab, to the development lab and to final product shipping. However don’t be surprised if this is Intel attempting to undermine AMD’s choice to purchase SeaMicro. Likewise, Forbes.com later reported from a representative from SeaMicro that their company had not tried to encourage Intel to acquire SeaMicro. It is anyone’s guess who is really correct and being 100% honest in their recollections. However I am still betting on SeaMicro’s long term strategy of pursuing low power, ultra dense, massively parallel servers. It is an idea whose time has come.
Three Dimensional transistors in the news again. Previously Intel announced they were adopting a new design for their next generation next smaller design rule for the Ivy Bridge generation Intel CPUs. Now ARM is also doing work to integrate similar technology into their ARM cpu cores as well. No doubt in order to lower Thermal Design Point and maintain clock speed as well are both driving this move to refine and narrow the design rules for the ARM architecture. Knowing Intel is still the top research and development outfit for silicon semi-conductors would give pause to anyone directly competing with them, but ARM is king of the low power semi-conductor and keeping pace with Intel’s design rules is an absolute necessity.
I don’t know how quickly ARM is going to be able to get a licensee to jump onboard and adopt the new design. Hopefully a large operation like Samsung can take this on and get the chip into it’s design, development, production lines at a chip fabrication facility as soon as possible. Likewise other contract manufacturers like Taiwan Semiconductor Manufacturing Company (TSMC) should also try to get this chip into their facilities quickly too. That way the cell-phone and tablet markets can benefit too as they use a lot of ARM licensed cpu cores and similar intellectual property in their shipping products. And my interest is not so much invested in the competition between Intel and ARM for low power computing but more the overall performance of any single ARM design once it’s been in production for a while and optimized the way Apple designs its custom CPUs using ARM licensed cpu cores. The single most outstanding achievement of Apple in their design and production of the iPad is the battery charge duration of 10 hours. Which to date, is an achievement that has not been beaten, even by other manufacturers and products who also license ARM intellectual property. So if the ARM design is good and can be validated and proto-typed with useful yields quickly, Apple will no doubt be the first to benefit, and by way of Apple so will the consumer (hopefully).
The big question is endurance, however we wont see a reduction in write cycles this time around. IMFTs 20nm client-grade compute NAND used in consumer SSDs is designed for 3K – 5K write cycles, identical to its 25nm process.
If true this will help considerably in driving down cost of Flash memory chips while maintaining the current level of wear and performance drop seen over the lifetime of a chip. Stories I have read previously indicated that Flash memory might not continue to evolve using the current generation of silicon chip manufacturing technology. Performance drops occur as memory cells wear out. Memory cells were wearing out faster and faster as the wires and transistors got smaller and narrower on the Flash memory chip.
The reason for this is memory cells have to be erased in order to free them up and writing and erasing take a toll on the memory cell each time one of these operations is performed. Single Level memory cells are the most robust, and can go through many thousands even millions of write and erase cycles before they wear out. However the cost per megabyte of Single Level memory cells make it an Enterprise level premium price level for Corporate customers generally speaking. Two Level memory cells are much more cost effective, but the structure of the cells makes them less durable than Single Level cells. And as the wires connecting them get thinner and narrower, the amount of write and erase cycles they can endure without failing drops significantly. Enterprise customers in the past would not purchase products specifically because of this limitation of the Two level memory cell.
As companies like Intel and Samsung tried to make Flash memory chips smaller and less expensive to manufacture, the durability of the chips became less and less. The question everyone asked is there a point of diminishing return where smaller design rules, thinner wires is going to make chips so fragile? The solution for most manufacturers is to add spare memory cells, “over-providing” so that when a cell fails, you can unlock a spare and continue using the whole chip. The over -provisioning no so secret trick has been the way most Solid State Disks (SSDs) have handled the write/erase problem for Two Level memory cells. But even then, the question is how much do you over-provision? Another technique used is called wear-levelling where a memory controller distributes writes/erases over ALL the chips available to it. A statistical scheme is used to make sure each and every chip suffers equally and gets the same number of wear and tear apllied to it. It’s difficult balancing act manufacturers of Flash Memory and storage product manufacturers who consume those chips to make products that perform adequately, do not fail unexpectedly and do not cost too much for laptop and desktop manufacturers to offer to their customers.
If Intel and Micron can successfully address the fragility of Flash chips as the wiring and design rules get smaller and smaller, we will start to see larger memories included in more mobile devices. I predict you will see iPhones and Samsung Android smartphones with upwards of 128GBytes of Flash memory storage. Similarly, tablets and ultra-mobile laptops will also start to have larger and larger SSDs available. Costs should stay about where they are now in comparison to current shipping products. We’ll just have more products to choose from, say like 1TByte SSDs instead of the more typical high end 512GByte SSDs we see today. Prices might also come down, but that’s bound to take a little longer until all the other Flash memory manufacturers catch up.
So Intel gets an interview with a Conde-Nast writer for a sub-blog of Wired.com. I doubt too many purchasers or data center architects consult Cloudline@Wired.com. But all the same, I saw through many thinly veiled bits of handwaving and old saws from Intel saying, “Yes, this exists but we’re already addressing it with our exiting product lines,. . .” So, I wrote in a comment to this very article. Especially regarding a throw-away line mentioning the ‘future’ of the data center and the direction the Data Center and Cloud Computing market was headed. However the moderator never published the comment. In effect, I raised the Question: Whither Tilera? And the Quanta SM-2 server based on the Tilera Chip?
Aren’t they exactly what is described by the author John Stokes as a network of cores on a chip? And given the scale of Tilera’s own product plans going into the future and the fact they are not just concentrating on Network gear but actual Compute Clouds too, I’d say both Stokes and Walcyzk are asking the wrong questions and directing our attention in the wrong direction. This is not a PR battle but a flat out technology battle. You cannot win this with words and white papers but in fact it requires benchmarks and deployments and Case Histories. Technical merit and superior technology will differentiate the players in the Cloud in a Box race. And this hasn’t been the case in the past as Intel has battled AMD in the desktop consumer market. In the data center Intel Fear Uncertainty and Doubt is the only weapon they have.
And I’ll quote directly from John Stokes’s article here describing EXACTLY the kind of product that Tilera has been shipping already:
“Instead of Xeon with virtualization, I could easily see a many-core Atom or ARM cluster-on-a-chip emerging as the best way to tackle batch-oriented Big Data workloads. Until then, though, it’s clear that Intel isn’t going to roll over and let ARM just take over one of the hottest emerging markets for compute power.”
The key phrase here is cluster on a chip, in essence exactly what Tilera has strived to achieve with its Tilera64 based architecture. To review from previous blog entries of this website following the announcements and timelines published by Tilera:
Through first quarter of 2012, Intel will be releasing new SSDs: Intel SSD 520 “Cherryville” Series replacement for the Intel SSD 510 Series, Intel SSD 710 “Lyndonville” Series Enterprise HET-MLC SSD replacement for X25-E series, and Intel SSD 720 “Ramsdale” Series PCIe based SSD. In addition, you will be seeing two additional mSATA SSDs codenamed “Hawley Creek” by the end of the fourth quarter 2011.
That’s right folks Intel is jumping on the high performance PCIe SSD bandwagon with the Intel SSD 720 in the first quarter of 2012. Don’t know what price they will charge but given quotes and pre-releases of specs it’s going to compete against products from competitors like RamSan, Fusion-io and the top level OCZ PCIe prouct the R4. My best guess is based on pricing for those products it will be in the roughly $10,000+ category with an 8x PCI interface and fully complement of Flash memory (usually over 1TB on this class of PCIe card).
Knowing that Intel’s got some big engineering resources behind their SSD designs, I’m curious to see how close they can come to the performance statistics quoted in this table here:
2200 Mbytes/sec of Read throughput and 1100Mbytes/sec of Write throughput. Those are some pretty heft numbers compared to currently shipping products in the upper pro-summer and lower Enterprise Class price category. Hopefully Anandtech will get a shipping or even pre-release version before the end of the year and give it a good torture test. Following Anand Lai Shimpi on his Twitter feed, I’m seeing all kinds of tweets about how a lot of pre-release products from manufacturers off SSDs and PCIe SSDs fail during the benchmarks. Doesn’t bode well for the Quality Control depts. at the manufacturers assembling and testing these products. Especially considering the price premium of these items, it would be much more reassuring if the testing was more rigorous and conservative.
Proof that sometimes a shipping product doesn’t always make all the difference. Although it might be nice to tout performance of actual shipping product. What’s becoming more real is the power efficiency of the Tilera architcture core for core versus the Intel IA-64 architecture. Tilera can provide a much lower Thermal Design Point (TDM) per core than typical Intel chips running the same workloads. So Tilera for the win on paper anyways.
Thus far, Intels Many Integrated Core MIC is little more than a research project. Intel picked up the remnants of the failed “Larrabee” graphics card project and rechristened it Knights and put it solely in the service of the king of computing, the CPU.
Ahhh, alas poor ol’ Larrabee, we hardly knew ye. And yet, somehow your ghost will rise again, and again and again. I remember the hints at the 80 core cpu, which then fell to 64 cores, 40 cores and now just today I read this article to find out it is merely Larrabee and only has a grand total of (hold tight, are you ready for this shocker?) 32 cores. Wait what was that? Did you say 32 cores? Let’s turn back the page to May 15, 2009 where Intel announced the then new Larrabee graphics processing engine with a 32-core processor. That’s right, nothing (well maybe not nothing) has happened in TWO YEARS! Or very little has happened a few die shrinks, and now the upcoming 3D transistors (tri-gate) for the 22nm design revision for Intel Architecture CPUs. It also looks like they may have shuffled around the floor plan/layout of the first gen Larrabee CPU to help speed things up a bit. But, other than these incrementalist appointments the car looks vastly like the model year car from two years ago. Now, what we can also hope has improved since 2009 is the speed and efficiency of the compilers Intel’s engineers have crafted to accompany the release of this re-packaged Larrabee.