computers mobile technology

AnandTech – Qualcomms New Snapdragon S4: MSM8960 & Krait Architecture Explored

Qualcomm remains the only active player in the smartphone/tablet space that uses its architecture license to put out custom designs. The benefit to a custom design is typically better power and performance characteristics compared to the more easily synthesizable designs you get directly from ARM. The downside is development time and costs go up tremendously.

via AnandTech – Qualcomms New Snapdragon S4: MSM8960 & Krait Architecture Explored.

The snapdragon cpu
From the Qualcomm Website: Snapdragon

I’m very curious to see how the different ARM based processors fair against one anther in each successive generation. Especially the move to ARM-15 (x64) none of which will see a quick implementation on a handheld mobile device. ARM-15 is a long ways off yet, but it appears in spite of the next big thing in ARM designed cores, there’s a ton of incremental improvements and evolutionary progress being made on current generation ARM cores. ARM-8 and ARM-9 have a lot of life in them for the foreseeable future including die shrinks that allow either faster clock speeds or constant clock speeds and lower power drain and lower Thermal Design Point (TDP).

Apple’s also going steadily towards the die shrink in order to cement current gains made in it’s A5 chip design too. Taiwan Manfucturing Semi-Conductor (TMSC) is the biggest partner in this direction and is attempting to run the next iteration of Apple mobile processors on its state of the art 22 nanometer design rule process.

cloud computers technology

ARM daddy simulates human brain with million-chip super • The Register

British Scientist, nominated for the Millenniu...
Steve Furber (Image via Wikipedia)

While everyone in the IT racket is trying to figure out how many Intel Xeon and Atom chips can be replaced by ARM processors, Steve Furber, the main designer of the 32-bit ARM RISC processor at Acorn in the 1980s and now the ICL professor of engineering at the University of Manchester, is asking a different question, and that is: how many neurons can an ARM chip simulate?

via ARM daddy simulates human brain with million-chip super • The Register.

The phrase reminds me a bit of an old TV commercial that would air during the Saturday cartoons. Tootsie Roll brand lollipops had a center made out of Tootsie Roll. The challenge was to determine how many licks does it take to get to the center of a Tootsie Roll Pop? The answer was, “The World May Never Know”. And so it goes for the simulations large scale and otherwise of the human brain.

I remember also reading Stewart Brand’s 1985 book about the MIT Media Lab and their installation of a brand new multi-processor super computer called The Connection Machine (TCM). Danny Hillis was the designer and author of the original concept of stringing together a series of small one bit computer cores to act like ‘neurons’ in a larger array of cpus. The scale was designed to top out at around 65,535 (2^16). At the time MIT Media Lab only had the machine filled up 1/4 of the way but was attempting to do useful work with it at that size. Hillis spun out of MIT to create a startup company called Thinking Machines (to reflect the neuron style architecture he had pursued as a grad student). In fact all of Hillis’s ideas stemmed from his research that led up to the original Connection Machine Mark. 1.

Spring forward to today and the sudden appearance of massively parallel, low-power servers like Calxeda using ARM chips, Qanta Sq-2 using Tilera chips (also an MIT spin out). Similarly the Seamicro SM-10000×64 which uses Intel Atom chips in large scale, large quantity. And Seamicro is making sales TODAY. It almost seems like a stereotypical case of an idea being way ahead of its time. So recognize the opportunity because now the person directly responsible for designing the ARM chip is attacking that same problem Danny Hillis was all those years ago.

Personally I would like to see Hillis join in some way with this program not as Principal Investigator but may a background consultant. Nothing wrong with a few more eyes on the preliminary designs. Especially with Hillis’s background in programming those old mega-scale computers. That is the true black art of trying to do a brain simulator on this scale. Steve Furber might just be able to make lightning strike twice (once for Acorn/ARM cpus and once more for simulating the brain in silicon).

cloud data center google technology wintel

ARM server hero Calxeda lines up software super friends • The Register

Company Logo
Maker of the massively parallel ARM-based server

via ARM server hero Calxeda lines up software super friends • The Register.

Calxeda in the news again this week with some more announcements regarding its plans. Remembering recently to the last article I posted on Calxeda, this company boasts an ARM based server packing 120 cpus (each with four cores) into a 2U high rack (making it just 3-1/2″ tall *see note). With every evolution in hardware one must needs get an equal if not greater revolution in software. Which is the point of the announcement by Calxeda of its new software partners.

It’s all mostly cloud apps, cloud provisioning and cloud management types of vendors. And with the partnership each company gets early access to the hardware Calxeda is promising to design, prototype and eventually manufacture. Both Google and Intel have poo-poohed the idea of using “wimpy processors” on massively parallel workloads claiming faster serialized workloads are still easier to manage through existing software/programming techniques. For many years as Intel has complained about the programming tools, it still has gone the multi-core/multi-thread route hoping to continue its domination by offering up ‘newer’ and higher performing products. So while Intel bad mouths parallelism on competing cpus it seems to be desperate to sell multi-core to willing customers year over year.

Even as power efficient as those cores maybe Intel’s old culture of maximum performance for the money still holds sway. Even the most recent Ultra-low Voltage i-series cpus are still hitting about 17Watts of power for chips clocking in around 1.8Ghz (speed boosting up to 2.9Ghz in a pinch). Even if Intel allowed these chips to be installed into servers we’re stilling talking a lot of  Thermal Design Point (TDM) that has to be chilled to keep running.

cloud data center google technology

Facebook: No ‘definite plans’ to ARM data centers • The Register

Image representing Facebook as depicted in Cru...
Image via CrunchBase

Clearly, ARM and Tilera are a potential threat to Intel’s server business. But it should be noted that even Google has called for caution when it comes to massively multicore systems. In a paper published in IEEE Micro last year, Google senior vice president of operations Urs Hölzle said that chips that spread workloads across more energy-efficient but slower cores may not be preferable to processors with faster but power-hungry cores.

“So why doesn’t everyone want wimpy-core systems?” Hölzle writes. “Because in many corners of the real world, they’re prohibited by law – Amdahl’s law.

via Facebook: No ‘definite plans’ to ARM data centers • The Register.

The explanation given here by Google’s top systems person is that latency versus parallel processes overhead. Which means if you have to do all the steps in order, with a very low level of parallel tasks that results in much higher performance. And that is the measure that all the users of your service will judge you by. Making things massively parallel might provide the same level of response, but at a lower energy cost. However the complications due to communication and processing overhead to assemble all the data and send it over the wire will offset any advantage in power efficiency. In other words, everything takes longer and latency increases, and the users will deem your service to be slow and unresponsive. That’s the dilemna of Amdahl’s Law, the point of diminishing returns when adopting parallel computer architectures.

Now compare this to something say we know much more concretely, like the Airline Industry. As the cost of tickets came down, the attempt to cut costs went up. Schedules for landings and gate assignments got more complicated and service levels have suffered terribly. No one is really all that happy about the service they get, even from the best airline currently operating. So maybe Amdahl’s Law doesn’t apply when there’s a false ceiling placed on what is acceptable in terms of the latency of a ‘system’. If airlines are not on time, but you still make your connection 99% of the time, who will complain? So by way of comparison there is a middle ground that may be achieved where more parallelizing of compute tasks will lower the energy required by a data center. It will require greater latency, and a worse experience for the users. But if everyone suffers equally from this and the service is not great but adequate, then the company will be able to cut costs through implementing more parallel processors in their data centers.

I think Tilera holds a special attraction potentially for Facebook. Especially since Quanta their hardware assembler of choice is already putting together computers with the Tilera chip for customers now. It seems like this chain of associations might prove a way for Facebook to test the waters on a scale large enough to figure out the cost/benefits of massively parallel cpus in the data center. Maybe it will take another build out of a new data center to get there, but it will happen no doubt eventually.

computers data center mobile technology

Calxeda boasts of 5 watt ARM server node • The Register

Calxeda is not going to make and sell servers, but rather make chips and reference machines that it hopes other server makers will pick up and sell in their product lines. The company hopes to start sampling its first ARM chips and reference servers later this year. The first reference machine has 120 server nodes in a 2U rack-mounted format, and the fabric linking the nodes together internally can be extended to interconnect multiple enclosures together.

via Calxeda boasts of 5 watt ARM server node • The Register.

SeaMicro and now Calxeda are going gangbusters for the ultra dense low power server market. Unlike SeaMicro, Calxeda wants to create reference designs it licenses to manufacturers who will build machines with 120 cores in a 2 Unit rack. SeaMicro’s record right now is 512 cores per 10U rack  or roughly 102+ cores in a 2 Unit rack. The difference is the SeaMicro product uses an Intel low power Atom cpu,  whereas Calxeda is using a processor used more often in smart phones and tablet computers. SeaMicro has hinted they are not wedded to the Intel Architecture, but they are more interested in shipping real live product than coming up with generic designs others can license. In the long run it’s entirely possible SeaMicro may switch to a different CPU, they have indicated previously they have designed their servers with flexibility enough to swap out the processor to any other CPU if necessary. It would be really cool to see an apples-to-apples comparison of a SeaMicro server using first Intel CPUs versus ARM-based CPUs.

computers mobile technology

IBM Teams Up With ARM for 14-nm Processing

iPad, iPhone, MacBook Pro
Big, Little & Little-est!

Monday IBM announced a partnership with UK chip developer ARM to develop 14-nm chip processing technology. The news confirms the continuation of an alliance between both parties that launched back in 2008 with an overall goal to refine SoC density, routability, manufacturability, power consumption and performance.

via IBM Teams Up With ARM for 14-nm Processing.

Interesting that IBM is striking out so far away from the current state of the art processing node for silicon chips. 22nm or there abouts is the what most producers of flash memory are targeting for their next generation product. Smaller sizes mean more chips per wafer, higher density means storage sizes go up for both flash drives and SSDs without increasing in physical size (who wants to use brick sized external SSDs right?). Too, it is interesting that ARM is the partner with IBM for their farthest target yet in chip production design rule sizes. But it appears that System-on-Chip (SoC) designers like ARM are now state of the art producers of power and waste heat optimized computing. Look at Apple’s custom A4 processor for the iPad and iPhone. That chip has lower power requirements than any other chip on the market. It is currently leading the pack for battery life in the iPad (10 hours!). So maybe it does make sense to choose ARM right now as they can benefit the most and the fastest from any shrink in the size of the wire traces used to create a microprocessor or a whole integrated system on a chip. Strength built on strength, that’s a winning combination and shows that IBM and ARM have an affinity for the lower power consumption future of cell phone and tablet computing.

But consider this also, the last article I wrote about Tilera’s product plans regarding cloud computing in a box. ARM chips could easily be the basis for much lower power, much higher density computing clouds. Imagine a GooglePlex style datacenter running ARM CPUs on cookie trays instead of commodity Intel parts. That’s a lot of CPUs and a lot less power draw, both big pluses for a Google design team working on a new data center. True, legacy software concerns might over rule a switch to lower power parts. But if the cost of electricity would offset the opportunity cost of switching to a new CPU (an having to re-compile software for the new chip) then Google would be crazy not to seize up on this.

computers macintosh technology

Apple A4 SOC unveiled – It’s an ARM CPU and the GPU! – Bright Side Of News*

Getting back to Apple A4, Steve Jobs incorrectly addressed Apple A4 as a CPU. We’re not sure was this to keep the mainstream press enthused, but A4 is not a CPU. Or we should say, it’s not just a CPU. Nor did PA Semi/Apple had anything to do with the creation of the CPU component.

via Apple A4 SOC unveiled – It’s an ARM CPU and the GPU! – Bright Side Of News*.

Apple's press release image of the A4 SoC

Interesting info on the Apple A4 System on Chip which is being used by the recently announced Ipad tablet computer. The world of mobile, low power processors is dominated by the designs of ARM Holdings Inc. Similarly ARM is providing the graphics processor intellectual property too. So in the commodity CPU/GPU and System on Chip (SoC) market ARM is the only way to go. You buy the license you layout the chip with all the core components you license and shop that around to a chip foundry. Samsung has a lot of expertise fabricating these chips made to order using the ARM designs. But Apparently another competitor Global Foundries is shrinking its design rules (meaning lower power and higher clock speeds) and may become the foundry of choice. Unfortunately outfits like iFixit can only figure out what chips and components go into an electronics device. They cannot reverse engineer the components going into the A4, and certainly anyone else would probably be sued by Apple if they did spill the beans on the A4’s exact layout and components. But  because everyone is working from the same set of Lego Blocks for the CPUs and GPUs and forming them into full Systems on a Chip, some similarities are going to occur.

The heart of the new Apple A4 System on Chip

One thing pointed out in this article is the broad adoption of the same clockspeed for all these ARM derived SoCs. 1Ghz is the clock speed across the board despite differences in manufacturers and devices. The reason being everyone is using the same ARM cpu cores and they  are designed to run optimally at the 1Ghz clock rate. So the more things change (meaning faster and faster time to market for more earth shaking designs) the more they stay the same (people adopt commodity CPU designs and become more similar in performance). It will take a big investment for Apple and PA Semiconductor to really TRULY differentiate themselves with a unique and different and proprietary CPU of any type. They just don’t have the time, though they may have the money. So when Jobs tells you something is exclusive to Apple, that may be true for industrial design. But for CPU/GPU/SoC, … Don’t Believe the Hype surround the Apple A4.

Also check out AppleInsider’s coverage of this same topic.


NYTimes weighs in on the Apple A4 chip and what it means for the iPad maintaining its competitive advantage. NYTimes gives Samsung more credit than Apple because they manufacture the chip. What they will not speculate on or guess at is ARM Holdings Inc. sale of licenses to it’s Cortex A-9 to Apple. They do hint that the nVidia Tegra CPU is going to compete directly against Apple’s iPad using the A4. However, as Steve Jobs has pointed out more than once, “Great Products Ship”. And anyone else in the market who has licensed the Cortex A-9 from ARM had better get going. You got 60 days or 90 days depending on your sales/marketing projections to compete directly with the iPad.