Heya.

I'm looking for the fastest way to project an arbitrary 3D triangle onto an arbitrary plane.

It has to be fast for small numbers of triangles, so matrix transformation is not suitable...

I've been thinking that it may be most simple to calculate the signed distance of (any) point from the plane, and then multiply that by the plane's normal (ie take advantage of the plane equation), and add the result to the original point(s) .. but is this the fastest?

What can you suggest?

ps : the actual problem I'm trying to solve involves clipping one polygon with another polygon, where both polys live on the same plane (thus the 'any' above).. the polys do not necessarily intersect, we are performing a 'geometric subtraction'. This sounds like a 2D problem being solved in 3D, doesn't it?

I'm looking for the fastest way to project an arbitrary 3D triangle onto an arbitrary plane.

It has to be fast for small numbers of triangles, so matrix transformation is not suitable...

I've been thinking that it may be most simple to calculate the signed distance of (any) point from the plane, and then multiply that by the plane's normal (ie take advantage of the plane equation), and add the result to the original point(s) .. but is this the fastest?

What can you suggest?

ps : the actual problem I'm trying to solve involves clipping one polygon with another polygon, where both polys live on the same plane (thus the 'any' above).. the polys do not necessarily intersect, we are performing a 'geometric subtraction'. This sounds like a 2D problem being solved in 3D, doesn't it?

Using the the plane's normal and distance must be the best approach, especially when used with SSE. You need to compute the plane-normal only once :).

Though, the 1/sqrt() opcode gives only 12-bit precision...

Though, the 1/sqrt() opcode gives only 12-bit precision...

Attached my implementation of makeShadowSSE and makeShadowFPUSSE. Both use the 12-bit precision 1/sqrt SSE instruction.

makeShadowSSE executes in 42 cycles ( 6 for proc call+ret, 12 for normalization, 18 for distance, 6 for outVec)

makeShadowFPUSSE executes in 58 cycles (6 for proc call+ret, 18 for normalization, 20 for distance, 14 for outVec)

Sempron 3000+ (64-bit) (SSE1/2/3)

my SSE skills are poor, I hope daydreamer or someone else improves my code :)

makeShadowSSE executes in 42 cycles ( 6 for proc call+ret, 12 for normalization, 18 for distance, 6 for outVec)

makeShadowFPUSSE executes in 58 cycles (6 for proc call+ret, 18 for normalization, 20 for distance, 14 for outVec)

Sempron 3000+ (64-bit) (SSE1/2/3)

my SSE skills are poor, I hope daydreamer or someone else improves my code :)

Heh - its Christmas all over again - thanks Ultrano 8)

I haven't even looked at it yet but I'm convinced its already faster than my first attempt from late last night :lol:

It's reassuring that I was thinking along the same lines, just one question - looking at the SSE version, I note that you're normalizing the plane's normal - don't you keep your planes in normalized form? If not, why not?

I haven't even looked at it yet but I'm convinced its already faster than my first attempt from late last night :lol:

It's reassuring that I was thinking along the same lines, just one question - looking at the SSE version, I note that you're normalizing the plane's normal - don't you keep your planes in normalized form? If not, why not?

A better explanation of the problem I am trying to solve:

Polygon A is a 'potential portal', and Polygon B is 'a potential occluder'.. where occlusion is not confined to one side of the plane.

Noting that I store polygons as coplanar sets of triangles with precalculated and prenormalized planes, I am trying to find the subtractive union of (A - B), in other words - what parts of A are NOT overlapped by B, and what is the geometric outcome in terms of triangles? I'm 'trimming A with respect to B'.

I figured that if I projected B onto the plane of A, I would then be able to fragment each polygon with respect to the other (by using all edges in one to cut the other), and finally delete any fragments that exist in both A and B.

It sounds expensive, eh?

Still, it has to be faster than the 'edge tree' type of solutions, until the polygons involved have large numbers of edges - agree?

Polygon A is a 'potential portal', and Polygon B is 'a potential occluder'.. where occlusion is not confined to one side of the plane.

Noting that I store polygons as coplanar sets of triangles with precalculated and prenormalized planes, I am trying to find the subtractive union of (A - B), in other words - what parts of A are NOT overlapped by B, and what is the geometric outcome in terms of triangles? I'm 'trimming A with respect to B'.

I figured that if I projected B onto the plane of A, I would then be able to fragment each polygon with respect to the other (by using all edges in one to cut the other), and finally delete any fragments that exist in both A and B.

It sounds expensive, eh?

Still, it has to be faster than the 'edge tree' type of solutions, until the polygons involved have large numbers of edges - agree?

Using the the plane's normal and distance must be the best approach, especially when used with SSE. You need to compute the plane-normal only once :).

Though, the 1/sqrt() opcode gives only 12-bit precision...

together with newton-raphsonmethod you can get 24bit ,36bit,48bit,64bits etc, works about the same as a human learns MUL/DIV LUT for numbers between 0-100 and still theoretically can divide as much digits time permit until human NN is exhausted and cant concentrate anymore

besides final output is maybe max 2048x1536 screenresolution, on lowend machine max 1024x768, still even if its polys defined by floats the hardware squeeze these in a integer 2d space question is for a resolution 1024 = 10bit might work enough with 12bit,

judging from my experience with floormappingcode and I ran it on 2 monitors and my goal was to run it on all 3 monitors+thru netw 3 more, but old pci card that ran simultanously with agp only had 2mb vram, too small for 1280x1024

A better explanation of the problem I am trying to solve:

Polygon A is a 'potential portal', and Polygon B is 'a potential occluder'.. where occlusion is not confined to one side of the plane.

Noting that I store polygons as coplanar sets of triangles with precalculated and prenormalized planes, I am trying to find the subtractive union of (A - B), in other words - what parts of A are NOT overlapped by B, and what is the geometric outcome in terms of triangles? I'm 'trimming A with respect to B'.

I figured that if I projected B onto the plane of A, I would then be able to fragment each polygon with respect to the other (by using all edges in one to cut the other), and finally delete any fragments that exist in both A and B.

It sounds expensive, eh?

Still, it has to be faster than the 'edge tree' type of solutions, until the polygons involved have large numbers of edges - agree?

shouldnt the fastest UNION A-B be heavy usage of MAXPS on A coordinatesunrolled to check 12 xcoordinates at once and MINPS on Bcoordinates and compare maxcoordinates in a polyA against mincoordinates in a polyB one plane at a time? and continue with plane y and z

follow with a few polys partially hidden, which could be left out if its too distant to matter

Attached my implementation of makeShadowSSE and makeShadowFPUSSE. Both use the 12-bit precision 1/sqrt SSE instruction.

makeShadowSSE executes in 42 cycles ( 6 for proc call+ret, 12 for normalization, 18 for distance, 6 for outVec)

makeShadowFPUSSE executes in 58 cycles (6 for proc call+ret, 18 for normalization, 20 for distance, 14 for outVec)

Sempron 3000+ (64-bit) (SSE1/2/3)

my SSE skills are poor, I hope daydreamer or someone else improves my code :)

I must check if my code is correct, but so far 7.7cycles for each normalization+distance+outvec

looking at the SSE version, I note that you're normalizing the plane's normal - don't you keep your planes in normalized form? If not, why not?

Well... I made this code for you, and as an exercise for me to study SSE.

On 12-bit precision... I had misread the SSE manual: it's 12-bit _mantissa_ precision. Meaning the precision is actually 19-bit - well enough :). But since the plane would come pre-normalized, we needn't care now - we'll be having the normal-vector with full 32-bit precision :).

daydreamer: 7.7 cycles can't be right. Even a CoreDuo would take at least 35 cycles, at least according to specs I read today from Agner Fog's site.

(btw, I think I'll try to make a custom pipeline-profiler - I've had enough of being in the darkness when optimizing code . CodeAnalyst beta crashes on me, and I can't wait for a stable release)

daydreamer: 7.7 cycles can't be right. Even a CoreDuo would take at least 35 cycles, at least according to specs I read today from Agner Fog's site.

storm in sweden, lost code but can remake it about the same from memory

I targetted coreduo when I wrote it

no Coreduo should run it even faster because it can do 3 SSE parallel, compared to my old Asocket Athlon replacement Sempron which is even slower than your 64bit when it comes to clockcycles/instruction

because I do 3 in parallel and therefore only need to loop 333334 instead of 1 000 000

actually it does it 6 in parallel, because I placed your first calculation to be interleaved with your second calculation, because they have no dependency until last few opcodes and there are enough regs todo that

after that its issue mulps with no dependency as often as you can as its the slowest operation and interleave movaps and addps because they go to different ports

wow, then hopefully you can reconstruct the code (interleaving) and share ^^

The projection of 3 points to one plane is what Homer needed, so doing it all in 23 cycles instead of 3*42 is a great optimization :D

Yet, it'll be good to benchmark your code on a more down-to-earth, for-people's-pocket CPU, like mine :P

The projection of 3 points to one plane is what Homer needed, so doing it all in 23 cycles instead of 3*42 is a great optimization :D

Yet, it'll be good to benchmark your code on a more down-to-earth, for-people's-pocket CPU, like mine :P