i am some code for a video sampler. the problem though is video uses a lot of memory. i am wanting to know of what a good algorithm would be to store it that has pretty good compression, but is fast to compress and decrompress.
my video data is 32 bit, but i am willing to transfer it to 16 bit for the sake of size?
other than that i'd want the compression to be probably loseless (though you could also point me to some lossy sources)

what would be a good technique and does anybody know of such a technique in assembler?

Karl
Posted on 2004-06-09 01:30:33 by klumsy
i am some code for a video sampler.


Nice to meet you, "Some code for a video sampler". :)

Problem with truecolour images is that common LZ/Huffman-based algorithms don't work very well. Most video recorders use Motion-JPEG. It's more or less a simplified version of MPG, which outputs only i-frames, I believe (no motion information). You can later recompress it to MPG.
Posted on 2004-06-09 03:10:32 by Scali
is it mostly because of the complicated nature of a truecolor image? meaning even a really graining 256 color photo wouldn't compress wellwith such aglorithms? or is it just because it can't find easy patterns in 24 bit
what if you rearranged it so all the red bytes were next to each other, all the green etc.. it probably wouldn't preform any better would it?
do you know what sort of cpu usage motion-jpeg will hit me with?
Posted on 2004-06-09 17:15:17 by klumsy
Well, 'complicated'... The thing is that with photographs, the chance of having many pixels of EXACTLY the same colour is very small. Let alone that you will have repeating sequences of pixels of EXACTLY the same colour. So RLE/entropy-based encoding is not very effective.
A 256 colour photo would work. When you only have 256 possible colours, chances of repeating sequences are much higher than with 2^24 ofcourse. But it would work better if you converted it to an image with a 256 colour palette.

Motion-JPEG can be done in realtime on... maybe P3 500 MHz or so?
Depends on the resolution, framerate and quality of the implementation ofcourse.
Basically you just convert each frame to a JPG image with fixed quantization/huffman tables. That can be done quite quickly with a decent DCT implementation in MMX/SSE.
Posted on 2004-06-10 04:07:44 by Scali
hmmm, huffman would already work on each r,v,b plane if you consider byte elements.
huffman is efficient if some values come more often than other, so it should work unless all colors are equally represented.

i ve thought of taking the three r,v,b planes and to store each line by taking the gradient between 2 consecutive pixels instead of the absolutes values. this way you would have far less values to consider, as a line of the image is pretty continuous data (only small differences between consecutive pix) so there would be much more small values thann big ones.
then huffman would be good (on the whole plane while we are at it) . what do you think of it?

also you could maybe use the info of the line above to store the current line, as it, too, should be pretty close... but its not obvious , you can, for each pix, store the diff with the pix above, but then you dont use line continuity... mmhh..

jpeg, like mp3, stores fourier coefficients (coeff of the different sine funcs "contained" in the wave) of the "wave" (a line) on a very short period (8 pix i think (!!!!) (would ve said te best compromize would ve been longer) ), (FOR WHAT I VE UNDERSTOOD, recent implemetations/codecs are likely to have far more complex things).
(btw i read jpeg stores sine coeffs on a 2D func (2variables).
reconstructing the signal is fast, but building it requires fft for ex.
so it depends if only rebuilding the image fast is important, or if the buffer must be encoded AND decoded
and its lossy, unless you store coeffs until you reach the exact original wave...

it also depends on if you must be able to random read/write the buffer at an arbitrary location... if yes cou cant comress anything much i guess.. who knows..
compressed framebuffers soon? :)

btw i heard newer nvidia boards would (lossless i think) compress tex data in real time in HW before sending them on the board's bus wires, because bandwidth isnt high enough... and it would LIVE decompress it on the other end!
that really blew my head off.
Posted on 2004-06-10 12:27:11 by HeLLoWorld
and hey, once you ve huffman-encodedd the difference(gradient) buffer of one plane, why not encode the 2 others as differences relative to this one (and then ofcourse line by line, the gradient again)? then you would have even more zeroes and small values to compress.

the ting is, if you ve got comtinuous values to compress, and that they contain sines, the gradient also will be sines, so why can we compress it better? because byte values are finite, so the gradient will not be 1.54 , it will be 0, 1 or 2, but the colors are between 0 and 255. ofcourse, if you ve got black pix and white next, the gradient is 255, but not often.

prolly this has already be thougt of and implemented long ago... but i m not sure.
i would be very interested to know what kind of ratio you would get with this (lossless) method.

interesting.
Posted on 2004-06-10 12:36:37 by HeLLoWorld
and then for a movie you store the difference buffer with the next frame...
Posted on 2004-06-10 12:37:57 by HeLLoWorld
JPG and MPG work with 2d Discrete Cosine Transforms. They take an 8x8 block, and find the cosine coefficients in horizontal and vertical direction. They work in YUV-space, in 4:1:1 format. So it is quite similar to separate r, g, b, but it works better, because only the Y channel is stored for every pixel, the U and V are not that important and can be stored for every 2x2 block, and linearly interpolated on decode (this is why red and blue areas sometimes look fuzzy in mpg movies, while green looks good. Green corresponds to the Y-component, and red and blue are the difference between Y and U/V).
It is very similar to your difference encoding, except that cosine functions can encode more complex shapes than linear functions can.
The coefficients are difference-encoded and quantized (scaled), and you end up with only a few significant coefficients at the start, then very small ones, and eventually a list of zeros.
The non-zero coefficients are stored using Huffman, and the number of zeros is stored (zero-length encoding) at the end. The quantization table is actually the most important part. It allows you to control compression ratio/quality in a very flexible way.
The huffman tables for MPG are static, while the tables for JPG are actually stored inside the JPG, so JPG allows you to generate an optimal huffman table for the specific picture. For movies this does not make much sense anyway, and a static table means it can be optimized nicely, especially in hardware.
MPG also supports motion vectors, allowing blocks to move over the screen from one frame to the next, in order to encode the difference between several frames.

And texture compression for 3d cards has been around for quite a while. At least since the first GeForce cards. A certain S3 card was the first to support it. In OpenGL it still carries the S3-name, it's called S3TC, for S3 Texture Compression, iirc.
Posted on 2004-06-10 13:22:17 by Scali
thanks for the informative information. it has taught me alot.. as for speed though.. 500mhz fouser the algorithm really is too slow (sure my have 2ghz to 3ghz machines) , however this videomixer effect would normally be chained realtime with lots of different effects. whcih arecpu intensive, and i wouldn't want this algorithm taking more than 10% of the total cpu that the effects chain uses..

about the compressed texture, that gave me another idea (maybe i can do it in video memory, what sort of levels of compresion can it handle?

my other options i've thought (than just coding it directly), would be to use the video car,d or directshow and some directshow samples and codecs that store it in memory rather than disk (dunno how difficult or slow that would be)

also some 3d cards do realtime mpeg2 encoding/decoding with hardware right? maybe in those cases it would be much better to stream it through the video card to some buffer, (and back again)
problem with mpeg though is it is encoded based on previous frames, so maybe i wouldn't be able to play the buffer 'in reverse' quickly etc.
Posted on 2004-06-10 19:25:00 by klumsy
http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/directx9_c/directx/graphics/programmingguide/FixedFunction/Textures/compressed/alphatextures.asp

hmm 4 bits per pixel.. i could live with that i suppose.. though i wonder if i could get the card to compress it, then bring it back into main memory Raw to keep for later(so not to use up way too much card memory)
Posted on 2004-06-10 20:21:03 by klumsy
The videocard can only decompress textures, it cannot compress.
And it's not such a good compression algo anyway. It's mainly simple enough to decompress it inside the pipeline, so that it can be decompressed on the fly, and you save memory bandwidth.
Videocards don't actually compress/decompress mpg itself, but they can accelerate parts of the algo, such as the (i)DCT and the YUV<->RGB.
So you could most probably also write an MJPEG compressor with hardware, which means you can still play reverse easily.
Posted on 2004-06-11 02:02:10 by Scali
i was not speaking of compressed textures in vram, for what i understood the card would live-compress an rgb buffer in one chip before sending it on the wires, and live-decompress it on the other end to store it into another chip.

but maybe i missed something.
Posted on 2004-06-11 10:45:49 by HeLLoWorld
thanks very much for info bout comression!
Posted on 2004-06-11 10:46:26 by HeLLoWorld
i did some tests, and i think my user probably would haev a video buffer of more than 30 seconds ussually(though they might have more than one), anyway with alot of memory i mightn't need such huge compression.. speed is the most important thing, and any jpeg, mjpeg i've seen at least uses 100mhz to process a good framerate at realtime (and otften way more)
to test , i tried some example stuff (complicated) fro mteh program being fed through microsoft RLE, and actually it reduced to about 1/3 of the size (i wonder if that codec uses the standard RLE techniques?) which made me quite happy (it preformed much better with less complicated stuff)..
however next i put a RGB to YUV filter before i compressed and it compressed much much better..
so i think i could just with just a YUV RLE
Posted on 2004-06-11 17:20:02 by klumsy
Note that you can have YUV->RGB (or variations, such as UYVY) conversion in hardware with most videocards, using an overlay. Can save you some more time perhaps.
Posted on 2004-06-11 17:23:02 by Scali