Why So Fast? - Page 2 - Techist - Tech Forum

Go Back   Techist - Tech Forum > Computer Hardware > New Systems | Building and Buying
Click Here to Login
Closed Thread
 
Thread Tools Display Modes
 
Old 07-28-2006, 03:55 AM   #11 (permalink)
Ultra Techie
 
bonehelm's Avatar
 
Join Date: Oct 2005
Posts: 612
Default

Quote:
Originally posted by gaara
"Netburst" is a nickname that refers to the core design of Pentium 4 procesors such as the Presler 965 you've listed above, it has nothing to do with C2D processors.

I'm pretty sure that massacreinfallx was talking about the unified cache found on Conroe chips. This just means that the extremely fast ondie memory that the core uses is accessible to both cores. In other words, say you're running a single threaded app and one core is idle, the other core is free to use the entire 4MB cache. Previously, multicore processors have cores with dedicated cache, or they each had 2MB, and they could not "share" each others cache

Now, a processor executes an operation via sending a thread down a "pipeline" that has various stages that analyze and decode the thread so the processor can execute it. The number of stages defines the length of the pipeline. Presler was based on the Prescott revision and probably had 31 pipeline stages, whereas Conroe/C2D has 14 pipeline stages. The longer a pipeline is, the easier it is for the core to scale in terms of clock frequency which is why the Presler has a faster operating frequency.

However, an clock cycle is defined as the time it takes the core to execute an operation. You should be able to understand that a core with a shorter pipeline is able to execute an operation much faster than a core with a longer pipeline because the thread has to go through less stages. Therefore, the time it takes a 14 stage core to execute an operation is less than a 31 stage core, therefore the clock cycle of the 14 stage core is more efficent, therefore it requires less cycles to perform a similar operation when compared to a 31 pipeline core, therefore it can operate at a slower frequency and still perform at the same, or even greater speeds.

The megahertz wars as they have been labelled are irrelevant now, you have to factor in the IPC (instruction per cycle) rate of a core in order to determine its power.
He's right.

Conroes are significantly faster than most intel CPUs you can buy now. And people should stop judging how fast and good a CPU is just by its operating speed and caches. Same thing applies for graphic cards.
__________________

__________________
PC Specs:

Intel Core 2 Duo E6420 Conroe 2.13ghz
Gigabyte P35-S3 Motherboard
A-DATA 2x1GB DDR2 800mhz memory
Geforce 8600GT 256MB GDDR3 PCI-Express
bonehelm is offline  
Old 07-28-2006, 12:28 PM   #12 (permalink)
Wizard Techie
 
Join Date: Dec 2004
Location: Canada
Posts: 3,790
Default

Quote:
massacreinfallx was saying that instead of having a dual core with 2 seperate dies, Core 2 Duo is the first to have the 2 cores on the same die.
Again there is misconceptions that need to be cleared up. You can't have two CPU dies otherwise you have two literal chips and need a second socket, a die is basically everything from the pins to the IHS, it's somewhat like a PCB for a core, so no, Pentium D had two cores on the same die. However, Pentium D was pretty much just two single core processors "glued" together and they aren't considered "native". In other words, the two cores on Pentium D processors don't directly communicate with one another, they each have their independant FSB and use that to send information to the northbridge/memory controller, which then in turn sends anything back down the other cores FSB if it has to. AMD64 has the memory controller ondie therefore the FSB speed is the effective speed of the core, creating a "direct connection" considering the cores can basically relay between each other at their own effective speeds plus the latency without relying on an external bus. Hence why K8L is being labeled a native quadcore.

I haven't followed C2D very closely so I'm not sure if Intel corrected this or whether it is infact just another two cores glued together that don't really work together ondie, and I wasn't able to find anything to indicate one way or another. The fact that Conroe cores have unified cache leads me to believe that they are directly connected, however if someone can confirm that would be nice.

You have to consider that if infact independant Intel cores are still dependant on an external bus/memory controller it will create problems for them once we start seeing larger multicore variations (IE 4 core, 8 core etc.) that actually use multiple threads, considering then unlike AMD64 which is effectively communicating at whatever the CPU speed is, the Intel cores will end up with a FSB bottleneck that will severely hinder performance. Of course, if Intel finally decided to incorporate an ondie memory controller (and they have tried in the past), this issue should become irrelevant

The General has the right idea I think, basically PD looks like (where [] represents everything ondie):

[CORE] > FSB > MEM CONTROLLER < FSB < [CORE]

AMD64 looks like:

[CORE(<HTT>)MEM CONTROLLER(<HTT>)CORE]

wouldn't know what C2D looks like
__________________

__________________
Intel C2D E6320 / AMD Athlon X2 3800+
Gigabyte 965P DS3 / DFI nF4 Ultra-D
2GB OCZ Gold PC2-6400 / 2GB OCZ Gold PC4000
eVGA 8800GTS 320MB / eVGA 6800GS 256MB
150GB Raptor / 74GB Raptor
2x500GB / 320GB
OCZ GameXStreme 850w / OCZ StealthXStream 600w
gaara is offline  
Old 07-28-2006, 06:58 PM   #13 (permalink)
Geek Squad
 
Join Date: Nov 2005
Location: Ohio
Posts: 788
Default

Interesting
__________________
"Truth is violated by falsehood but outraged by silence."

"Mastering others is strength, mastering yourself is true power." ~Sun Tzu

"Ad Astra Per Aspera"

"Our greatest glory is not in never falling, but in rising every time we fall." ~Confucious

"The true measure of a man is how he treats someone who can do him absolutely no good.” ~ Samuel Jackson

“Omnis vir est fortunae faber”
JoshSB is offline  
Old 07-28-2006, 07:01 PM   #14 (permalink)
Ultra Techie
 
Jumping_Bean514's Avatar
 
Join Date: Mar 2006
Location: Seattle
Posts: 752
Send a message via AIM to Jumping_Bean514
Default

pretty good information here guys, thanks allot.
Jumping_Bean514 is offline  
Old 07-28-2006, 07:25 PM   #15 (permalink)
Super Techie
 
Join Date: Aug 2005
Posts: 376
Send a message via AIM to Saint71
Default

Im starting to get overwhelmed trying to understand all this.
Saint71 is offline  
Old 07-28-2006, 07:26 PM   #16 (permalink)
Monster Techie
 
Join Date: Jul 2006
Location: New Stanton, Pennsylvania
Posts: 1,017
Send a message via AIM to psp_crazy1
Default

Quote:
Originally posted by gaara
"Netburst" is a nickname that refers to the core design of Pentium 4 procesors such as the Presler 965 you've listed above, it has nothing to do with C2D processors.

I'm pretty sure that massacreinfallx was talking about the unified cache found on Conroe chips. This just means that the extremely fast ondie memory that the core uses is accessible to both cores. In other words, say you're running a single threaded app and one core is idle, the other core is free to use the entire 4MB cache. Previously, multicore processors have cores with dedicated cache, or they each had 2MB, and they could not "share" each others cache

Now, a processor executes an operation via sending a thread down a "pipeline" that has various stages that analyze and decode the thread so the processor can execute it. The number of stages defines the length of the pipeline. Presler was based on the Prescott revision and probably had 31 pipeline stages, whereas Conroe/C2D has 14 pipeline stages. The longer a pipeline is, the easier it is for the core to scale in terms of clock frequency which is why the Presler has a faster operating frequency.

However, an clock cycle is defined as the time it takes the core to execute an operation. You should be able to understand that a core with a shorter pipeline is able to execute an operation much faster than a core with a longer pipeline because the thread has to go through less stages. Therefore, the time it takes a 14 stage core to execute an operation is less than a 31 stage core, therefore the clock cycle of the 14 stage core is more efficent, therefore it requires less cycles to perform a similar operation when compared to a 31 pipeline core, therefore it can operate at a slower frequency and still perform at the same, or even greater speeds.

The megahertz wars as they have been labelled are irrelevant now, you have to factor in the IPC (instruction per cycle) rate of a core in order to determine its power.
lol but with gfx cards its the oppisite.
the more pipelines the better.
btw awesome info ^_^
psp_crazy1 is offline  
Old 07-28-2006, 08:10 PM   #17 (permalink)
True Techie
 
Join Date: Jul 2006
Posts: 143
Default

Completely different pipelines.
Khann is offline  
Old 07-28-2006, 09:27 PM   #18 (permalink)
Wizard Techie
 
Join Date: Dec 2004
Location: Canada
Posts: 3,790
Default

Quote:
the more pipelines the better.
Yes but a CPU only has one pipeline, and I was referring to the stages in a single pipeline. GPU is the same concept pretty much, look at r580 core, it does 3 shader operations per cycle yet only has 16 pipelines compared to the 24 (or is it 32?) on the 7900GTX
__________________

__________________
Intel C2D E6320 / AMD Athlon X2 3800+
Gigabyte 965P DS3 / DFI nF4 Ultra-D
2GB OCZ Gold PC2-6400 / 2GB OCZ Gold PC4000
eVGA 8800GTS 320MB / eVGA 6800GS 256MB
150GB Raptor / 74GB Raptor
2x500GB / 320GB
OCZ GameXStreme 850w / OCZ StealthXStream 600w
gaara is offline  
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Copyright 2002- Social Knowledge, LLC All Rights Reserved.

All times are GMT -5. The time now is 04:59 PM.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2017, vBulletin Solutions, Inc.