Intel and Achronix-2 Great tastes that taste great together
According to Greg Martin, a spokesman for the FPGA maker, Achronix can compete with Xilinx and Altera because it has, at 1.5GHz in its current Speedster1 line, the fastest such chips on the market. And by moving to Intel’s 22nm technology, the company could have ramped up the clock speed to 3GHz.
That kind of says it all in one sentence, or two sentences in this case. The fastest FPGA on the market is quite an accomplishment unto itself. Putting that FPGA on the world’s most advanced production line and silicon wafter technology is what Andy Grove would called the 10X Effect. FPGA’s are reconfigurable processors that can have their circuits re-routed and optimized for different tasks over and over again. This is real beneficial for very small batches of processors where you need a custom design. Some of the things they can speed up is doing math or looking up things in a very large search through a database. In the past I was always curious whether they could be used a general purpose computer which could switch gears and optimize itself for different tasks. I didn’t know whether or not it would work or be worthwhile but it really seemed like there was a vast untapped reservoir of power in the FPGA.
Some super computer manufacturers have started using FPGAs as special purpose co-processors and have found immense speed-ups as a result. Oil prospecting companies have also used them to speed up analysis of seismic data and place good bets on dropping a well bore in the right spot. But price has always been a big barrier to entry as quoted in this article. $1,000 per chip is the cost. Which limits the appeal to those buyers where price is no object but speed and time are more important. The two big competitors in the field off FPGA manufacturing are Altix and Xilinx both of which design the chips but have them manufactured in other countries. This has led to FPGAs being second class citizens used older generation chip technologies on old manufacturing lines. They always had to deal with what they could get. Performance in terms of clock speed was always less too.
It was not unusual to see during the Megahertz and Gigahertz wars chip speeds increasing every month. FPGAs sped up too, but not nearly as fast. I remember seeing 200Mhz/sec and 400Mhz/sec touted as Xilinx and Altix top of the line products. With Achrnix running at 1.5Ghz, things have changed quite a bit. That’s a general purposed CPU speed in a completely customizable FPGA. This means you get speed that makes the FPGA even more useful. However, instead of going faster this article points out people would rather buy the same speed but use less electricity and generate less heat. There’s no better way to do this than to shrink the size of the circuits on the FPGA and that is the core philosophy of Intel Inc. They have just teamed up to put the Achronix FPGA on the smallest feature size production line using the most optimized, cost conscious manufacturer of silicon chips bar none.
Another point being made in the article is the market for FPGAs at this level of performance also tends to be more defense contract oriented. As a result, to maintain the level of security necessary to sell chips to this industry, the chips need to be made in the good ol’ USA and Intel doesn’t outsource anything when it comes to it’s top of the line production facilities. Everything is in Oregon, Arizona or Washington State and is guaranteed not to have any secret backdoors built in to funnel data to foreign governments.
I would love to see some University research projects start looking at FPGAs again and see if as speeds go up, power goes down if there’s a happy medium or mix of general purpose CPUs and FPGAs that might help the average joe working on his desktop, laptop or iPad. All I know is Intel entering a market will make it more competitive and hopefully lower the barrier of entry to anyone who would really like to get their hands on a useful processor that they can customize to their needs.
Building upon the original 1st-generation RevoDrive, the new version boasts speeds up to 740 MB/s and up to 120,000 IOPS, almost three times the throughput of other high-end SATA-based solutions.
One cannot make this stuff up, two weeks ago Angelbird announced its bootable PCI Express SSD. Late yesterday OCZ one of the biggest 3rd party after market makers of SSDs announces a new PCI Express SSD which is also bootable. Big difference between the Angelbird product and OCZ’s RevoDrive is the throughput on the top end. This means if you purchase the most expensive fully equipped card from either manufacturer you will get 900+MBytes/sec. on the Angelfire versus 700+MBytes/sec. on the Revodrive from OCZ. Other differences include the ‘native’ support of the OCZ on the Host OS. I think this means that they aren’t using the ‘virtual OS’ on the embedded chips to boot so much as having the PCIe drive electronics make everything appear to be a real native boot drive. Angelbird uses an embedded OS to virtualize and abstract the hardware so that you get to boot any OS you want and run it off the flash memory onboard.
The other difference I can see from reading the announcements is that only the largest configured size on the Angelbird that gets you the fastest throughput. As drives are added the RAID array is striped over more available flash drives. The OCZ product also does a RAID array to increase speed, however they hit the maximum throughput at an intermediate size (~250GByte configuration) and at the maximum size too. So if you want an ‘normal’ to ‘average’ size storage but better throughput you don’t have to buy the maxed out most expensive version of the OCZ RevoDrive to get there. Which means this could be a more manageable price for the gaming market or for the PC fanboys who want faster boot times. Don’t get me wrong though, I’m not recommending buying an expensive 250GByte RevoDrive if a similarly sized SATA SSD costs a good deal less. No far from it, the speed difference may not be worth the price you pay. But, the RevoDrive could be upgraded over time and keep your speeds at the max 700+MBytes/sec. you get with its high throughput intermediate configuration. Right now, I don’t have any prices to compare for either the Angelbird or OCZ Revodrive products. I can tell you however that the Fusion-io low end desktop product is in the $700-$800 range and doesn’t come with upgradeable storage, you get a few sizes to choose from, and that’s it. If either of the two products ship at a price significantly less than the Fusion-io product everyone will flock to them I’m sure.
Two other significant features touted by both product announcements are the SandForce SF-1200 flash controller. Right now that controller is the de facto standard high throughput part everyone is using for the SATA SSD products. There’s even an intermediate part on the market called the SF-1500 (their top end offering). So it’s de rigeur to include the SandForce SF-1200t in any product you hope to sell to a wide audience (especially hardware fanboys). However, let me caution you that in the flurry of product announcements and always keeping an eye on preventing buyers remorse, SandForce did announce very recently a new drive controller they have labelled the SF-2000 series. This part may or may not be targeted for the consumer desktop market, but depending on how well it performs once it starts shipping you may want to wait and see if the revision of this crop of newly announced PCIe cards adopts the SandForce controller chip to gain the extra throughput it is touting. The new controller is rated at 740MBytes/sec. all by itself, with 4 SSDs attached to it on a PCIe card, theoretically four times 740 equals 2,096 and that is a substantially large quantity of data coming through th PCI Express data bus. Luckily for most of us the PCI Express interface on a 4X (four lane) data bus has a while to go before it gets saturated by all this disk throughput. The question is how long will it take to overwhelm the a four lane PCI Express connector? I hope to see the day this happens.
Intel, Dell, EMC, Fujitsu and IBM are forming a working group to standardise PCIe-based solid state drives SSD, and have a webcast coming out today to discuss it.
Now this is interesting in that just two weeks after Angelbird pre-announces its own PCIe flash based SSD product, now Intel is forming a consortium. Things are heating up, this is now a hot new category and I want to draw your attention to a sentence in this Register article:
By connecting to a server’s PCIe bus, SSDs can pour out their contents faster to the server than by using Fibre Channel or SAS connectivity. The flash is used as a tier of memory below DRAM and cuts out drive array latency when reading and writing data.
This is without a doubt the first instance I have read that there is a belief, even just in the minds of the author of this article, that Fibre Channel and Serial Attached SCSI aren’t fast enough. Who knew PCI Express would be preferable to an old storage interface when it comes to enterprise computing? Lookout world, there’s a new sheriff in town and his name is PCIe SSD. This product category though will be not for the consumer end of the market at least not for this consortium. It is targeting the high margin, high end, data center market where interoperability keeps vendor lock-in from occurring. By choosing interoperability everyone has to gain an advantage not through engineering necessarily but through firmware most likely. If that’s the differentiator than whomever has the best embedded programming team will have the best throughput and the highest rated product. Let’s hope this all eventually finds a market saturation point driving the technology down into the consumer desktop, thus enabling a next big burst in desktop computer performance. I hope PCIe SSD’s become the next storage of choice and that motherboards can be rid of all SATA disk I/O ports and firmware in the near future. We don’t need SATA SSDs, we do need PCIe SSDs.
Extreme SSD performance over PCI-Express on the cheap? There’s hope!
A company called Angelbird is working on bringing high-performance SSD solutions to the masses, specifically, user upgradeable PCI-Express SSD solution.
This is one of a pair of SSD announcements that came in on Tuesday. SSDs are all around us now and the product announcements are coming in faster and harder. The first one, is from a British company named Angelbird. Looking at the website announcing the specs of their product, it is on paper a very fast PCIe based SSD drive. Right up there with Fusion-io in terms of what you get for the dollars spent. I’m a little concerned however due to the reliance of an OS hosted in the firmware of the PCIe card. I would prefer something a little more peripheral like that the OS supports natively, rather than have the card become the OS. But this is all speculative until actual production or test samples hit the review websites and we see some kind of benchmarks from the likes of Tom’s Hardware or Anandtech.
From MacNN|Electronista:
Iomega threw itself into external solid-state drives today through the External SSD Flash Drive. The storage uses a 1.8-inch SSD that lets it occupy a very small footprint but still outperform a rotating hard drive:
The second story covers a new product from Iomega where we have for the first time an external SSD from a mainstream manufacturer. Price is at premium compared to the performance, but if you like the looks you’ll be willing to pay. It’s not bad speeds for reading and writing, but it’s not the best compared to the amount of money you’re paying. And why do they still use a 2.5″ external case if it’s internally a 1.8″ drive? Couldn’t they shrink it down to the old Firefly HDD size from back in the day? It should be the smaller.
SandForce has now announced an SF-2000 controller that doubles up the I/O performance of the SF-1500. The new product runs at 60,000 sustained read and write IOPS and does 500MB/sec when handling read or write data. It uses a 6Gbit/s SATA interface and SandForce says it can make use of single-level cell flash, MLC or the enterprise MLC put out by Micron.
Sandforce is continuing to make great strides in its SSDdisk controller architecture. There’s no stopping the train now. But as always read the fine print on any SSD product you buy and find out who manufactures the drive controller and what version it is. Benchmarks are always a good thing to consult too before you buy.
At an IDF keynote, Intel launched “Tunnel Creek,” a new Atom E600 SoC processor. One particular processor detailed is codenamed “Stellarton,” which consists of the Atom E600 processor paired with an Altera FPGA on a multi-chip package that provides additional flexibility for customers who want to incorporate proprietary I/O or acceleration.
Intel has announced a future product that pairs an Intel Atom processor with a Virtex FPGA. Now this is interesting, I just mentioned FPGA (field programmable gate array) chips and out of the blue Intel has summoned the same chip and married it to a little Atom core processor. They say it could be used as an accelerator of some sort. I’m wondering what specifically they had in mind (something very esoteric and niche like a TCP/IP offload processor). I would like to see some touting of its possible uses and not just say, “We want to see what happens.” Unfortunately the way the competition works in Consumer Electronics, you never tell people what’s inside. You let folks like iFixit do a teardown and put pictures up. You let industry websites research all the chips and what they cost, estimate the ones that are custom Integrated Circuits and report the cost to manufacture the device. That’s what they do with every Apple iPhone these days.
It would be cool if Intel could also sell this as a development kit for Stellarton’s users. Keep the price high enough to prevent people from releasing product based just on the kit’s CPU, but low enough to get people to try out some interesting projects. I’m guessing it would be a great tool to use for video transcoding, Mux/DeMuxing for video streams, etc. If anyone does release a shipping product thought it would be cool if they put the “Stellarton Inside” logo, so we know that FPGAs are doing the heavy lifting. The other possibility Intel mentions is to use the FPGA as a proprietary I/O so possibly like an Infiniband network interface? I still have hopes it’s used in the Consumer Electronics world.
Computing brainboxes believe they have found a method which would allow robotic systems to perceive the 3D world around them by analysing 2D images as the human brain does – which would, among other things, allow the affordable development of cars able to drive themselves safely.
The beauty of this new work is they designed a custom CPU using a Virtex 6 FPGA (Field Programmable Gate Array). FPGA for those who don’t know is a computer chip that you can ‘re-wire’ through software to take on mathematical task you can dream up. In the old days this would have required a custom chip to be engineered, validated and manufactured at great cost. FPGAs require development kits and FPGA chips you need to program. With this you can optimize every step within the computer processor and speed things up much more than a general purpose computer processor (like the Intel chip that powers your Windows or Mac computer). In this example of the research being done the custom designed computer circuitry is using video images to decide where in the world a robot can safely drive as it maneuvers around on the ground. I know Hans Moravec has done a lot with it at Carnegie Mellon U. And it seems that this group is from Yale’s engineering dept. which is encouraging to see the techniques embraced and extended by another U.S. university. The low power of this processor and it’s facility for processing the video images in real-time is ahead of its time and hopefully will find some commercial application either in robotics or automotive safety controls. As for me I’m still hoping for a robot chauffeur.
We have seen a turnaround however. At last year’s IDF Intel showed off a proof of concept PCIe SSD that could push 1 million IOPS. And with the consumer SSD market dominated by a few companies, the smaller players turned to building their own PCIe SSDs to go after the higher margin enterprise market. Enterprise customers had the budget and the desire to push even more bandwidth. Throw a handful of Indilinx controllers on a PCB, give it a good warranty and you had something you could sell to customers for over a thousand dollars.
Anandtech does a review of the OCZ RevoDrive. A PCIe SSD for the consumer market. It’s not as fast as a Fusion-io, but then it isn’t nearly as expensive either. How fast is it say compared to a typical SATA SSD? Based on the benchmarks in this review it seems as though the RevoDrive is a little faster than most SATA SSDs, but it also costs about $20 more than a really good 120GB SSD. Be warned that this is the Suggest Retail price, and no shipping product yet exists. Prices may vary once this PCIe card finally hits the market. But I agree 100% with this quote from the end of the review:
“If OCZ is able to deliver a single 120GB RevoDrive at $369.99 this is going to be a very tempting value.”
Indeed, much more reasonable than a low end Fusion-io priced closer to $700+, but not as fast either. You picks your products, you pays yer money.
The Register did an article recently following up on a press release from Tilera. The news this week is Tilera is now working on the next big thing, Quanta will be shipping a 2U rack mounted computer with 512 processing cores inside. Why is that significant? Well 512 is the magic number quoted in the announcement last week from upstart server maker SeaMicro. The SM10000 from SeaMicro boasts 512 Intel cores inside a 10U box. Which makes me wonder who or what is all this good for? Based solely on press releases and articles written to date about Tilera, their targeted customers aren’t quite as general say as SeaMicro. Even though each core in a Tilera cpu can run it’s own OS and share data, it is up to the device manufacturers licensing the Tilera chip to do the heavy lifting of developing the software and applications that make all that raw iron do useful work. The cpus on the SeaMicro hardware however are full Intel x86 capable Atom cpus tied together with a lot of management hardware and software provided by SeaMicro. Customers in this case are most likely going to load software applications they already have in operation on existing Intel hardware. Development time or re-coding or recompiling is unnecessary as SeaMicro’s value add is the management interface for all that raw iron. Quanta is packaging up the Tilera in a way that will make it more palatable to a potential customer who might also be considering buying SeaMicro’s project. It all depends on what apps you want to run, what performance you expect, and how dense you need all your cores to be when they are mounted in the rack. Numerically speaking, the race for ultimate density right now the Quanta SQ2 wins with 512 general purpose CPUs in a 2U rack mount. SeaMicro has 512 in a 10U rack mount. However, that in now way reflects the differences in the OSes and types of applications and performance you might see when using either piece of hardware.
“Hot Chips The multi-core chip revolution advanced this week with the emergence of Tilera – a start-up using so-called mesh processor designs to go after the networking and multimedia markets.”
“Tilera introduced a Linux-based development kit for its scalable, 64-core Tile64 SoC (system-on-chip). The company also announced a dual 10GbE PCIExpress card based on the chip (pictured at left), revealed a networking customer win with Napatech, and demo’d the Tile64 running real-time 1080P HD video.”
“This week, Tilera is putting its second-generation chips into the field and is getting some traction among various IT suppliers, who want to put the Tile64 processors and their homegrown Linux environment to work.”
“Tilera was founded in Santa Clara, California, in October 2004. The company’s research and development is done in its Westborough, Massachusetts lab, which makes sense given that the Tile64 processor that is based on an MIT project called Raw. The Raw project was funded by the U.S. National Science Foundation and the Defense Advanced Research Projects Agency, the research arm of the U.S. Department of Defense, back in 1996, and it delivered a 16-core processor connected by a mesh of on-core switches in 2002.”
“Upstart massively multicore chip designer Tilera has divulged the details on its upcoming third generation of Tile processors, which will sport from 16 to 100 cores on a single die.”
“Look at the markets Tilera is aiming these chips at. These applications have lots of parallelism, require very high throughput, and need a low power footprint. The benefits of a system using a custom processor are large enough that paying someone to write software for the job is more than worth it.”
“While Doud was not at liberty to reveal the details, he did tell El Reg that Tilera had inked a deal with Quanta that will see the Taiwanese original design manufacturer make servers based on the future Tile-Gx series of chips, which will span from 16 to 100 RISC cores and which will begin to ship at the end of 2010.”
“The current processors have made some design wins among networking, wireless infrastructure, and communications equipment providers, but the Tile-Gx series is going to give gear makers a slew of different options.”
From where I stand, the SM10000 looks like the type of product that if you could benefit from having it, you’ve been waiting for something like it. In other words, you will have been asking for something like the SM10000 for quite a while already. SeaMicro is simply granting your wish.
This announcement that has been making the rounds this Monday June 14th has hit Wired.com, Anandtech, Slashdot, everywhere. It is a press release full court press. But it is an interesting product on paper for anyone who is doing analysis of datasets using large numbers of CPUs for regressions or large scale simulations too. And it is at it’s core virtual Machines, with virtual peripherals (memory, disk, networking). I don’t know how you benchmark something like this, but it is impressive in its low power consumption and size. It only takes up 10U of a 42U rack. It fits 512 CPUs in that 10U area as well.
Imagine 324 of these plugged in and racked up
This takes me back to the days of RLX Technologies when blade servers were so new nobody knew what they were good for. The top of the line RLX unit had 324 CPUs in a 42U rack. And each blade had a Transmeta Crusoe processor which was designed to run at a lower clock speed and much more efficiently from a thermal standpoint. When managed by the RLX chassis hardware and software and paired up to an F5 Networks load balancer BIG-IP, the whole thing was an elegant design. However the advantage of using Transmeta’s CPU was lost on a lot of people, including technology journalists who bashed it for being too low performance for most IT shops and data centers. Nobody had considered the total cost of ownership including the cooling and electricity. In those days, clock speed was the only measure of a server’s usefulness.
Enter Google into the data center market, and the whole scale changes. Google didn’t care about clock speed nearly as much as lowering its total overall costs for its huge data centers. Even the technical journalists began to understand the cost savings of lowering the clock speed a few hundred megahertz and placing servers more densely into a fixed sized data center. Movements in the High Performance computing also led to large scale installations of commodity servers being all bound together into one massively parallel super computer. More space was needed for physical machines racked up in the data centers. Everyone could see the only way to build out was to build more data centers, build bigger data centers or pack more servers into the existing footprint of current data centers. Manufacturers like Compaq got into the Blade server market, along with IBM and Hewlett Packard. Everyone engineered their own proprietary interfaces and architectures, but all of them focused on the top of the line server CPUs from Intel. As a result, the heat dissipation was enormous and the densities of these blade centers was pretty low (possibly 14 CPUs in a 4U rack mount).
Look at all those CPUs on one motherboard!
IBM began to experiment with lower clocked PowerPC chips in a massively parallel super computer called the Blue Gene. In my opinion this started to change people’s belief about what direction data center architectures could go. The density of the ‘drawers’ in the Blue Gene server cabinets is pretty high. Lot more CPUs, power supplies, storage and RAM in each unit than in a comparable base level commodity server from Dell or HP (the previous most common building block for the massively parallel super computers). Given these trends it’s very promising to see what Seamicro has done with its first product. I’m not saying this is a super computer in a 10U box, but there are plenty of workloads that would fit within the scope of this server’s capabilities. And what’s cooler is the virtual abstraction of all the hardware from the RAM, to the networking to the storage. It’s like the golden age of IBM machine partitioning and Virtual Machines but on an Intel architecture. Depending on how quickly they can ramp up production and market their goods, Seamicro might be game changer or it might be a takeover target from the likes of HP or IBM.