Micron’s ClearNAND: 25nm + ECC, Combats Increasing Error Rates – AnandTech
This is a really good technical article on attempts made by Micron and Intel to fix read/write errors in their Solid State memory based on Flash memory chips. Each revision of their design and materials for manufacture helps decrease the size of the individual memory cells on the flash memory chip however as the design rules (the distance between the wires) decrease, random errors increase. And the materials themselves suffer from fatigue with each read and write cycle. The fatigue is due in no small part (pun intended) on the size, specifically thickness of some layers in the sandwich that make up a flash memory cell. Thinner materials just wear out quicker. Typically this wearing out was addressed by adding extra unused memory cells that could act as a spare memory cell whenever one of them finally gave up the ghost, stopped working altogether. Another technique is to spread reads/writes over an area much greater than (sometimes 23% bigger) than the size of the storage on the outside of the packing. This is called wear levelling and it’s like rotating your tires to ensure they don’t start to get bare patches on them too quickly.
All these techniques will only go so far as the sizes and thickness continue to shrink. So taking a chapter out of the bad old days of computing, we are back into Error Correcting Codes or ECC. When memory errors were common and you needed to guarantee your electronic logic was not creating spontaneous errors, bits of data called parity bits would be woven into all the operations to insure something didn’t accidentally flip from being a 1 to a 0. ECC memory is still widely used in data center computers that need to guarantee the spontaneous bits don’t get flipped by say, a stray cosmic ray raining down upon us. Now however ECC is becoming the next tool after spare memory cells and wear leveling to insure flash memory can continue to grow smaller and still be reliable.
Two methods in operation today are to build the ECC memory controllers into the Flash memory modules themselves. This raises the cost of the chip, but lowers the cost to the manufacturer of a Solid State Disk or MP3 player. They don’t have to add the error correction after the fact or buy another part and integrate it into their design. The other more ‘state of the art’ method is to build the error correction into the Flash memory controller (as opposed to the memory cells), providing much more leeway in how it can be implemented, updated over time. As it turns out the premier manufacturer/designer of Flash memory controllers SandForce already does this with the current shipping version of their SF-1200 Flash memory controller. SandForce still has two more advanced controllers yet to hit the market, so they are only going to become stronger if they have already adopted ECC into their current shipping product.
Which way the market chooses to go will depend on how low the target price is for the final shipping product. Low margin, high volume goods will most likely go with no error correction and take their chances. Other higher end goods may adopted the embedded ECC from Micron and Intel. Top of the line data center purchasers will not stray far from the cream of the crop, high margin SandForce controllers as they are still providing great performance/value even in their early generation products.