Why So Fast?

bonehelm · Jul 28, 2006

gaara said:
"Netburst" is a nickname that refers to the core design of Pentium 4 procesors such as the Presler 965 you've listed above, it has nothing to do with C2D processors.

I'm pretty sure that massacreinfallx was talking about the unified cache found on Conroe chips. This just means that the extremely fast ondie memory that the core uses is accessible to both cores. In other words, say you're running a single threaded app and one core is idle, the other core is free to use the entire 4MB cache. Previously, multicore processors have cores with dedicated cache, or they each had 2MB, and they could not "share" each others cache

Now, a processor executes an operation via sending a thread down a "pipeline" that has various stages that analyze and decode the thread so the processor can execute it. The number of stages defines the length of the pipeline. Presler was based on the Prescott revision and probably had 31 pipeline stages, whereas Conroe/C2D has 14 pipeline stages. The longer a pipeline is, the easier it is for the core to scale in terms of clock frequency which is why the Presler has a faster operating frequency.

However, an clock cycle is defined as the time it takes the core to execute an operation. You should be able to understand that a core with a shorter pipeline is able to execute an operation much faster than a core with a longer pipeline because the thread has to go through less stages. Therefore, the time it takes a 14 stage core to execute an operation is less than a 31 stage core, therefore the clock cycle of the 14 stage core is more efficent, therefore it requires less cycles to perform a similar operation when compared to a 31 pipeline core, therefore it can operate at a slower frequency and still perform at the same, or even greater speeds.

The megahertz wars as they have been labelled are irrelevant now, you have to factor in the IPC (instruction per cycle) rate of a core in order to determine its power.

He's right.

Conroes are significantly faster than most intel CPUs you can buy now. And people should stop judging how fast and good a CPU is just by its operating speed and caches. Same thing applies for graphic cards.

gaara · Jul 28, 2006

massacreinfallx was saying that instead of having a dual core with 2 seperate dies, Core 2 Duo is the first to have the 2 cores on the same die.

Again there is misconceptions that need to be cleared up. You can't have two CPU dies otherwise you have two literal chips and need a second socket, a die is basically everything from the pins to the IHS, it's somewhat like a PCB for a core, so no, Pentium D had two cores on the same die. However, Pentium D was pretty much just two single core processors "glued" together and they aren't considered "native". In other words, the two cores on Pentium D processors don't directly communicate with one another, they each have their independant FSB and use that to send information to the northbridge/memory controller, which then in turn sends anything back down the other cores FSB if it has to. AMD64 has the memory controller ondie therefore the FSB speed is the effective speed of the core, creating a "direct connection" considering the cores can basically relay between each other at their own effective speeds plus the latency without relying on an external bus. Hence why K8L is being labeled a native quadcore.

I haven't followed C2D very closely so I'm not sure if Intel corrected this or whether it is infact just another two cores glued together that don't really work together ondie, and I wasn't able to find anything to indicate one way or another. The fact that Conroe cores have unified cache leads me to believe that they are directly connected, however if someone can confirm that would be nice.

You have to consider that if infact independant Intel cores are still dependant on an external bus/memory controller it will create problems for them once we start seeing larger multicore variations (IE 4 core, 8 core etc.) that actually use multiple threads, considering then unlike AMD64 which is effectively communicating at whatever the CPU speed is, the Intel cores will end up with a FSB bottleneck that will severely hinder performance. Of course, if Intel finally decided to incorporate an ondie memory controller (and they have tried in the past), this issue should become irrelevant

The General has the right idea I think, basically PD looks like (where [] represents everything ondie):

[CORE] > FSB > MEM CONTROLLER < FSB < [CORE]

AMD64 looks like:

[CORE(<HTT>)MEM CONTROLLER(<HTT>)CORE]

wouldn't know what C2D looks like

JoshSB · Jul 28, 2006

Interesting

Jumping_Bean514 · Jul 28, 2006

pretty good information here guys, thanks allot.

Saint71 · Jul 28, 2006

Im starting to get overwhelmed trying to understand all this.

psp_crazy1 · Jul 28, 2006

gaara said:
"Netburst" is a nickname that refers to the core design of Pentium 4 procesors such as the Presler 965 you've listed above, it has nothing to do with C2D processors.

I'm pretty sure that massacreinfallx was talking about the unified cache found on Conroe chips. This just means that the extremely fast ondie memory that the core uses is accessible to both cores. In other words, say you're running a single threaded app and one core is idle, the other core is free to use the entire 4MB cache. Previously, multicore processors have cores with dedicated cache, or they each had 2MB, and they could not "share" each others cache

Now, a processor executes an operation via sending a thread down a "pipeline" that has various stages that analyze and decode the thread so the processor can execute it. The number of stages defines the length of the pipeline. Presler was based on the Prescott revision and probably had 31 pipeline stages, whereas Conroe/C2D has 14 pipeline stages. The longer a pipeline is, the easier it is for the core to scale in terms of clock frequency which is why the Presler has a faster operating frequency.

However, an clock cycle is defined as the time it takes the core to execute an operation. You should be able to understand that a core with a shorter pipeline is able to execute an operation much faster than a core with a longer pipeline because the thread has to go through less stages. Therefore, the time it takes a 14 stage core to execute an operation is less than a 31 stage core, therefore the clock cycle of the 14 stage core is more efficent, therefore it requires less cycles to perform a similar operation when compared to a 31 pipeline core, therefore it can operate at a slower frequency and still perform at the same, or even greater speeds.

The megahertz wars as they have been labelled are irrelevant now, you have to factor in the IPC (instruction per cycle) rate of a core in order to determine its power.

lol but with gfx cards its the oppisite.
the more pipelines the better.
btw awesome info ^_^

Khann · Jul 28, 2006

Completely different pipelines.

gaara · Jul 28, 2006

the more pipelines the better.

Yes but a CPU only has one pipeline, and I was referring to the stages in a single pipeline. GPU is the same concept pretty much, look at r580 core, it does 3 shader operations per cycle yet only has 16 pipelines compared to the 24 (or is it 32?) on the 7900GTX

Why So Fast?

bonehelm

Daemon Poster

gaara

Fully Optimized

JoshSB

Geek Squad

Jumping_Bean514

Daemon Poster

Saint71

In Runtime

psp_crazy1

Daemon Poster

Khann

In Runtime

gaara

Fully Optimized

Similar threads