hi all,

i was trying to understand this new functionality of P4 processors... but there are many things not clear...... maybe some good soul can explain some misterious points..

Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increased utilization of processor execution resources.

mm how can 2 instructions of 2 different threads be execute simultaneously (what happen if two threads run simultaneously the same variable ?)

Then, seems XP use this technology, but it seem that the new technology is automatically used also in older threading applications. Mmm i don't understand, to take advantage of this new technology must be done something software or not ?

Posted on 2003-09-25 06:07:47 by Bit7
maybe i am wrong, but as far as i can remember (unfortunately i dont remember what is hyper tech is), those tech have no corelation with multi processor. the statement you quote seems ambiguous too. if its true, then it must be done in hardware, and i dont think it will increase something.
Posted on 2003-09-25 07:39:17 by dion

and i dont think it will increase something.

...but intels profit...
Maybe it's a third/forth pipe, u, v, z, w? (Just guesing/specualting here)
Posted on 2003-09-25 11:53:25 by scientica
hehe, i agree. i've never tought to this side of things, but it's really an interesting point. It's very impressive how many new technologies come developed/improved between one pentium family end the next.. Sometimes i think... how many department, employes, minds, genius have the intel and other electronics companies inside to put out all this new features in so fast fashon.. i don't know, probably is so, but probably sometimes words are easier to write intade of millon of transistors, stages, and so on :) I guess no-one can look inside the chip .. to see if the new improovements are really an hardware-technology or.. just some little changes, big words and target of managers.. :)
Posted on 2003-09-25 14:02:06 by Bit7
i just quote it from someone/many experts. i think, "real" and "true" benchmark is the only way to distinguish them.
Posted on 2003-09-26 08:00:36 by dion
CPU with HyperThreading technology just have two sets of registers. Virtually two cores.
When some blocs of executing unit are free, then CPU executes other thread with second set of registers.
Virtually, two threads are executing simultaneously, so performance must be increased.

But there is one bottleneck: reading speed from memory is dramatically slower.
Newer CPUs tend to improve this situation, but let's not forget Moore Law: core speed of P4 will rise up faster.

I belive, if we're able to write optimal code, then we reserve this feature for C compiler :grin:

Anything about cache bank conflicts?

Posted on 2003-09-26 08:26:38 by S.T.A.S.
HyperThreading TM is designed to make use of idle exection engines.
The idea of several execution pipes was left behind with the Pentium MMX, now the processor has several exection engines which execute micro operations. A single operation will be broken down into these micro ops, and pushed out to the engines. Some instructions (and, or, not, add etc.) are simple and will require only the one micro op, while the more complex (adc, sbb, and similar) will require several of these operations. Further to this sub-division, the micro operations can be executed in a different order to that explicitly defined by the code (the P6 core is described as out-of-order exection for this reason). Even with this dynamic re-ordering the majority of programs will not fully stretch the core of a processor.

By the very nature of applications, they tend to either be integer, or floating point, so half of the exection power could in theory be wasted. Even in a purely integer based application there are depenencies which stop the execution cores running at full speed.

Intel decided to duplicate parts of the core (there are two re-order buffers, a second set of registers etc.), and also added logic to split other parts of the core (the TLBs are effectively halved, as are the caches). When enabled, HyperThreading creates a second virtual processor appears, which then uses the second set of logic.

This will work very well if you try to run an integer based application along side a heavily floating point application. But it isn't always better to run HT, if for example the applications are both integer, or floating point, they will be competing for the same parts of the core. This makes both run slowly, in fact they will run more slowly than singly threaded applications, because they will be competeing for the same memory bandwidth, and only have half of the cache.

As for the question that if two processes use the same space, this is not only a problem for HT, any multi-processor system must deal with this problem. You may well have used MUTEX functions before to stop multiple instances of the same app, but they were created to avoid multi-processor access conflicts. The word "mutex" comes from the conjuction of mutually exclusive, in other words it is a blocking mechanism. The problem is even bigger on systems where processors don't share the same cache, as the cache may be dirty in one processor, and not have updated main memory (look at snooping caches).

The more interesting development IMHO is the embedding of multiple cores on a single die. There is talk of 4 or 8 core Itaniums being built, IBM are also talking about multiple POWER5 cores on a single die, and Sun's Ultra5 is supposed to be dual core. The cores will be able to communicate at high speed, and share cache, but will not compete for the execution engines, registers & buffers. HyperThreading is dual processing by deception and hand waving, of course its cheap, and in the right circumstances it works... So I can't complain too much :)

Posted on 2003-09-26 17:19:18 by Mirno
mirno, stas, many thanks. Is impressive how many things you know. How many lives yuo've lived ? :) So if i understand well, if mutex an CriticalSections was not mandatory for example to write a byte in memory in a single instruction, if HT is used, then that's become mandatory.

Anyway, i've not the ideas very clear about processor structures... what is the core ? Is it the place where pipelines/engines are located ?
Posted on 2003-09-27 08:51:07 by Bit7
Mutex is needed for any multi-threaded application, where threads access the same data.
It's like having a person doing maths problems and shouting you the answer, you can hear this fine, but if there are more than two people doing the problems, you may get confused between which one is the answer to which problem. So you add a protocol to suspend one of the people while you listen to the other guys answer, mutex's provide a protocol to do this at an operating system level.

The idea of a core has become blured, in the good old days, processors were simple and did one thing at a time. But in order to get the speed of processing up (without increasing clock speed), the first thing Intel did was add a second pipe line. This meant that two instructions could be executed at once.

Take the idea of a programmer, one programmer, one computer can write one program at a time. If you add a second programmer with a second computer, the two of you will be able to write one program twice as fast. If there are several programmers, who have specialist areas (AI programmer, GUI developer etc.) some of you may not be busy at all times, and cannot be at work because your waiting for a bit of work to be finished by another guy. If you give the team a second manager, who can control another project at the same time then the idle guy can be put to work on the second project.
HyperThreading TM is the addition of this second manager. The analogy holds well, as you can imagine what happens when two managers want the same type of programmer, and there are none free! They will fight, and much less work gets done than if there was one :grin: .

So things have gone from one super programmer who could do everything (the 486 was single pipeline).
To 2 programmers who could do everything, and would cooperate (write alternate functions), but sometimes have to wait if they needed to use a function the other was writing before they continued (the Pentium & Penitum MMX were dual pipelines (the U & V pipes)).
To several programmers, who were specialised, and a manager who would tell each programmer which bit they could do. This meant that some were idle at certain times.... (The Pentium Pro, Pentium II, Pentium III, and Pentium 4 were all out of order execution, several execution engined cores).
To the Pentium 4 with HT, which was the addition of a second manager!

Posted on 2003-09-27 13:59:50 by Mirno
I'm tired so I might have missed it but: It's feels like it's two managers shouting at each other, one question who shouts at the two mangers?
Posted on 2003-09-27 16:49:15 by scientica
There isn't anybody!
This is where HT fails (its performance is worse than with it disabled), because there is no control over which thread gets the resource the two can fight. They will swap back and forth over who gets the time on the processing element. When there is no resource conflict they work happily, and get more stuff done in the given time.
So instead of one application being (for example) floating point limited, there are now two, with the added cost of switching between the two threads.

Posted on 2003-09-28 05:01:56 by Mirno
thx again mirno foer the time you spend :=) you're a great teacher.
Posted on 2003-09-29 01:51:38 by Bit7
I think this could be nice link about it... on sandpile - where else ;)

Hyperthreading and 'lock' prefix
Posted on 2003-09-29 12:53:40 by MazeGen