Ecc Ram

Status
Not open for further replies.

rookie1010

Fully Optimized
Messages
2,069
does the ECC in ECC RAM mean error correcting code?

if it does, can some one tell me what correcting code does the RAM have. i think it might be some block code
 
ECC RAM = Error Checking and Correcting RAM.

It is my understanding that this type of RAM generally goes into servers.

The explanation I was given was: In a server, the RAM modules are very busy. If the RAM gets a bad bit of data it is able to correct it. If it wasn't able to correct it, eventually the RAM would be overwhelmed with bad bits of data and the server would crash. Essentially it is a method of keeping the server on line.

That may not be the most accurate or complete explanation, but it works for me. Iguess if i ever build a server, i'll do some real research on the subject. :)
 
ECC is defined as Error Checking and Correcting memory but there is no code to per say. It is all a function of the memory and is preprogrammed in the SPD chip. ECC memory will fix a one bit error but will crash on a 2 bit error. There a either 9 or 18 chips on the module, when information passes through the memory the ECC portion will add a 9th bit to the 8 bit string which will be a 1 or 0 and this should nulify a one bit error. The AMD FX series requires the use of ECC/REG memory so this type of memory is not specific to just servers alone.

Hope this helps. You can also make good use of google to search for answers.
 
if it corrects errors, then there has to be a code.
there are two types of codes
convolutional e.g. 1/2 rate, 1/3 rate
block code e.g. hamming, Reed solomon, reed muller.

block codes are invariably, easier to decode than convolutional codes
convolutional codes, and turbo codes are really complicated.

bet you could not compile that information using google.

all i was asking is, does anybody know what code,( i have a feeling it is a block code) do they use for memories.
 
actually, crucial is correct. parity/non parity/ecc (continuation of parity) use a 9th bit for error detection. essenctially, when info is passed into memory it checks that info. if it is odd (known as odd parity, weird huh?) then the 9th bit is assigned a 0. if it is even, then it is assigned a '1'. when data is passed from memory, this is checked by the system to make. it looks, sees a one. well, then the remaining 8 bits had better be of an odd numbered nature or there are problems. note that with the 9th bit, ALL will be of an odd number. for example, all 0's. add 1 for ECC = 1. 1 + 0 (due to odd number) = 1
2 (even) + 1 =3
3 + 0 = 3
4 + 1 = 5
5 + 0 = 5
and so on.

Now the correction comes like this. If ONLY one bit is wrong, then a halt command is given to the processor since it KNOWS something is amiss saving any possible corruption. How does it know this? Good question. Because when the info was passed from memory and checked, the bit was set high, or 1, for example. Well, that means it should be an even number (0110 0000 = 2 = even number). Well, the system sees the the bit that is set high. It now checks the data. Now lets say something happened and it read (0100 0000 = 1 = and odd number). Well, that doesn't coincide with the parity/ecc bit. I SHOULD be an even number, but it's not! Well, somethings wrong and it KNOWS it. So stop the processor so nothing gets corrupted. Now here's scenario number two:

2 bits have gone bad. Well, here's the problem with that. If you have info being passed to memory and its (0011 0110). Well, thats 4, an even number so the bit is set high, or at a one. Now when its passed OUT of memory, say something weird happens and now its read as (1101 1011). Well, as you can see the data has changed. Problem is ECC will NOT catch this. The reason is because you have 6 even, so when its passed from memory it sees the bit set high. It checks, says "okay, my information SHOULD be even" Well, it is even so it lets it go on. It's info, though is NOT correct. Its corrupted data and it never knew it because it was of an even number. Thats the problem. Of course the odds of this happening are not very high, but they do exist.


Also, this is only for error detection, I'll put more in on correction but have to run on a network call.
 
Thanks killians45, appreciate the deeper explaination. rookie1010, I would also agree you will not be able to find the correct coding from google, hehe. You might want to try looking for the answer at:

www.micron.com check the data sheets for any one of the ECC modules and it might tell you there. Most of the coding info is not readily available only because it takes a special machine to interface and flash. They may have programs out there but this I am not aware of.
 
Now, onto the correction code. ECC does this (the above was for parity). ECC used larger blocks of memory to support its algorithms. 7
bits for 32, 8 for 64, etc... and the chipset that blocks these 7bits into a group is needed to support this. What happens is this...

ECC can detect and fix 1 bit errors, with no notice at all depending on your o/s. It can ALSO detect 2,3 and 4 bit errors but wont fix them.... dont matter, seeing how they're rare. Damn, I ramble alot,.... heh, anyhow. There is usually a VERY small degration in performance due to the algorithms being more complex than simple parity checking. Not sure how much, but it probably wont be noticable. Now explaining how is a bit harder.... okay, you have a block of code. all 1's and zeroes. 32 bits worth. this is going to assume you know your binary. now take those 32bits and add them up (the decimal value of 32 bits worth of binary). You will get a certain number. now, what will happen *I may be mistaken on this, it's been a LOONNNGGG TIME!!! If so, crucial... correct me on this one!!!* is the bits used for ECC will take the TOTAL decimal form and convert the numerical value (NOT THE DATA, JUST THE NUMERICAL VALUE OF IT!!!!) BACK into binary form. It can then look at the section header and determine where the block of info needs changing. Say, the total value of some part is 244, however according to the error correction block it should of been 240. Well, we can deduce this from it. The info that came across was
1111 0100 or 244. Well, the bit in the 4 place, since that is the value that subtracted would be correct, should NOT be a 1. So it will change it accordingly to 1111 0000 which is equal to 240 in decimal. That matches what the ECC block/number value is. So, it's fixed.

Like I said it's been awhile since I"ve needed to go this indepth, so crucial if I'm wrong on any of this or left something out... let me know!!! Thnx :)
 
what i meant by code is not computer code but a scheme,
killian what you described and you said is a detection code

the 8+1 scheme or code is a parity code which is also used in serial communications.

killian when you get back can you tell us how the parity bit can be used for correction. i always thought it just tells your byte is corrupted, thaat is it, since it could not tell the position of the corruption, hence it could not correct it.
 
I may of typed incorrectly. Parity is ONLY detection. It corrects nothing, if it finds something wrong it halts the process. However, if 2 bits are wrong it is in the same odd/even state and passess the corrupt infor along the line. Wont correct it, though. Above is the ECC parity. Not JUST parity, sorry for the mis-type! You are correct, though. Parity just recognizes the problem. ECC will recognize and correct 1bit problems and will also recognize 2,3 and 4 bit problems but not fix them although very rare. As far as the scheme, the labeling or "code" is just the algorithm it uses, I believe. Found this so editing it here. As far as most systems, a 64bit bus arch uses the Hamming code or algorithm.
 
Status
Not open for further replies.
Back
Top Bottom