I have an algorithm that generates large amounts of data - more than could ever fit in memory (32k * 2^24). I'm trying to cache the data at intervals so it's easier to scroll through it. I save the initial condition and the greatest generated condition, but I'm trying to fine a good algorithm select which cached item to discard in the set. Each interval takes the same constant amount of time to generate from the previous condition.

For example:

If the conditions up to 1024 have been generated, but we are only caching 9 conditions:

the optimal cached intervals would be:

0
128
256
384
512
640
768
996
1024

Currently, I minimize the sum of the positive differences (ignore the negitive ones) for each possible outcome and select to remove the one with the greatest sum. Is this the optimal algorithm?
Posted on 2003-01-11 23:15:15 by bitRAKE
I don't know much about caching algorithms but I think your idea is fine... :)

You might want to look at LRU(Least Recently Used) Caching or "Read Ahead". <- I don't know if this idea might help you.

:)
Posted on 2003-01-11 23:45:05 by arkane
Ricky:
At first I thought about HD caching, but 512 GB are probably more than you can keep on HDs.

I understood the fact that you generate a new block of data (32 KB?) from the previous one, but I've to admit I haven't understood what "conditions" mean in this context, so I have it hard to figure it in my mind to attempt to help.
Could you make more examples, explaining it a bit more "low level"?
Posted on 2003-01-12 04:53:35 by Maverick
Maverick, cellular automata is what I'm working on. The algorithm is very fast, so generating a screen full of data is quick. Moving 10,000 itterations forward is not realtime, and moving 10,000 itterations backwards would require starting from the begining if states are not cached. I would like to minimize the time by storing states at intervals. Once the user has explored a region of the automata, they could more easily scroll through that region. Or the program could fill the cache automatically in the background.
Posted on 2003-01-12 10:47:53 by bitRAKE
Hi

Would inline compression be of any help ?! (if not already done) You would fit more data into cache. And if your data has certain conditions you could make an optimized compressor just for it. Either for one full screen, either for a succession of full screens, optimizing stable values.



Bye
Posted on 2003-01-13 08:46:51 by valy

Sorry, I was late, didn't see (or get) the notification for this thread.

bitRAKE: thanks for the explanation.. I see, it's some kind of i-frame (to use a video metaphor). :)

If I got it right (still in doubt), I may hazard that the best caching strategy may be to, given n cachable blocks (n depends by the amount of memory you can dedicate to the cache), distribute them at equal distance from a start and end point (the region being explored), as you're already doing.
When you've to free some cache blocks, compute the new checkpoints in the region (which will be less than before), and mark as "keep it" only the closest already cached block to the new checkpoint. Then free the other cached blocks.

Graphically put:



1) The cache is full (10 elements), and looks like this:
0
10
20
30
40
50
60
70
80
90


We need to free 6 elements, to get back some cache memory,
and thus only 4 will then remain:
0
30
60
90


Thus we scan the list of already cached blocks, and "confirm"
only the closest one to the 0,30,60,90 list we just computed:
0 *
10
20
30 *
40
50
60 *
70
80
90 *
In this case they were equal, but we should select the "closest
one" though.


All the non-marked cache blocks can now be freed.


This strategy may ensure optimal distribution of cached blocks.. again.. if I understood correctly the requirements.
Sorry if I otherwise said something banal or was off-point. :]
Posted on 2003-01-13 15:32:03 by Maverick
Maverick, you understand the problem, but not the way the elements are obtained. I can't select the elements I want from the total possible - I can only select from the elements which are already present in the cache. The elements in the cache are not at aligned intervals.
Posted on 2003-01-13 15:59:11 by bitRAKE
Sorry, my English has some limitations, to understand fully I need some extensive example. :stupid:
Posted on 2003-01-13 16:17:35 by Maverick
Cache: 0,10,20,30,40

Need to remove one to add 55: best choice is to remove 40.

Cache: 0,10,20,30,55

Need to remove one to add 67: best choice is to remove 10,20 (can't remove zero).

Cache: 0,20,30,55,67

There is no pattern to how values are added, only that they are greaterthan the largest value in the cache plus the smallest difference between sequencial values.
Posted on 2003-01-13 16:26:10 by bitRAKE
I thought of something like this:

Frames in cache:

0,10,20,30,40,50

The cache can hold six frames.

We wan't to add the next frame (60) and have to delete one frame. Start with the first odd frame and delete it. After we have added 60 we have:

0,20,30,40,50,60

Then, remove the next odd frame:

0,20,40,50,60,70

And then:

0,20,40,60,70,80
0,20,40,60,80,90
0,20,40,60,80,100

Now restart and remove 20 (now add every 20th frame):

0,40,60,80,100,120
0,40,80,100,120,140
0,40,80,120,140,160
0,40,80,120,160,180
0,40,80,120,160,200

The spacing is now 40 between all cacheframes.


I don't know if this is optimal in any way but it's simple and efficiently keeps the distances between each frame close to Tf/Cf where Tf = total frames and Cf = cache frames.

Do you expect the user to scroll back to the beginning as often as scroll back just a few frames? If not, it may be advantageus to store more recent frames than old frames.
Posted on 2003-01-14 03:41:48 by gliptic
valy, thanks for the input. Compressing the stored lines is a good idea, and I'll put it on my list for later features to add.

gliptic, very good idea - think I'll give it a try. The search space is too great, so I don't know where the user would scroll. I could limit the interface and force them to scroll at a certain rate - guess I'll see once it's all together.
Posted on 2003-01-14 12:08:13 by bitRAKE
Cache: 0,10,20,30,40

Need to remove one to add 55: best choice is to remove 40.

Cache: 0,10,20,30,55

Need to remove one to add 67: best choice is to remove 10,20 (can't remove zero).

Cache: 0,20,30,55,67
Ah, sure.. the most logical method to me seems to iterate every possibility ( = all the cached blocks ) and give a weight to each possible choice, and then decide upon.

This weight should be the "optimal distribution".. i.e. the smallest "maximum delta" between the blocks of a possible new output sequence.

Another strategy may be to find the block with the smallest delta, and remove it.

Can't think of any better methods.. after all, what you want is to have the most "spacey" possible distribution, thus your motivation will drive the strategy.

In your example:

Cache: 0,10,20,30,40

Let's add 55, what do we remove? (let's use the second method, it's simpler for now):

We iterate from the 2nd element (because we can't remove the 1st, anyway) to the last, and we try to find the smallest delta. They happen to be all the same (10), we can remove either the former (10) or the latter (40). Our output would thus be:

either
Cache: 0,10,20,30,55
or
Cache: 0,20,30,40,55

What for the second method are equally valid solutions wouldn't have been for the first method, which would have selected the former solution because it has a better distribution (i.e. the maximum delta in the first solution is 25, while in the second solution is 20.. thus preferable). So let's switch to the first method, finally:

Cache: 0,20,30,40,55

Now we want to add 66. We iterate from the 2nd element to the last one this time, and we try to find the smallest MAXIMUM DELTA BETWEEN THE BLOCKS. Let's compute all the possible solutions:

Cache: 0,30,40,55,66 -- MAXIMUM DELTA = 30
Cache: 0,20,40,55,66 -- MAXIMUM DELTA = 20
Cache: 0,20,30,55,66 -- MAXIMUM DELTA = 25
Cache: 0,20,30,40,66 -- MAXIMUM DELTA = 26

So we choose the second.. that is:
Cache: 0,20,40,55,66

Now we have optimal distribution :alright:
Posted on 2003-01-14 15:59:38 by Maverick

More tests:

it was:
0,20,40,55,66

let's add 67 (i.e. a very small increment):
0,40,55,66,67 -- MAXIMUM DELTA = 40
0,20,55,66,67 -- MAXIMUM DELTA = 35
0,20,40,66,67 -- MAXIMUM DELTA = 26
0,20,40,55,67 -- MAXIMUM DELTA = 20

---

it was:
0,20,40,55,67

let's add 140 (i.e. a very big increment):
0,40,55,67,140 -- MAXIMUM DELTA = 73
0,20,55,67,140 -- MAXIMUM DELTA = 73
0,20,40,67,140 -- MAXIMUM DELTA = 73
0,20,40,55,140 -- MAXIMUM DELTA = 85

In case of equal lowest scores, you should then decide depending on the smallest "SECOND MAXIMUM DELTA", and so on.

0,40,55,67,140 -- 2ND MAXIMUM DELTA = 40
0,20,55,67,140 -- 2ND MAXIMUM DELTA = 35
0,20,40,67,140 -- 2ND MAXIMUM DELTA = 27

And go deeper, if required (in this case it's not required, we stop at the solution 0,20,40,67,140).

Recursion here should look neat (although iteration will be more efficient) :alright:


PS:
I haven't tested it.. but perhaps if instead of the MAXIMUM DELTA you just computed the SUM OF DELTAs, you may avoid the recursion/iteration in case of several equal MAXIMUM DELTAs (sorry for Gibsoning, btw :grin: ).
Dunno if it works.. but once you've set up the algorithm in a HLL, it will be easy to verify.
Posted on 2003-01-14 16:17:10 by Maverick
This should (to some extent) help to keep the distribution of cached samples even. However I wonder if this will be optimal for the program, as the distribution of requests might not be even (it seems likely that the user will request a number of samples within the range he is currently exploring).

My point is, that if we have, say, 10 cache entries and the user goes to position 1,000,000 and starts exploring in a range of 100,000 forward and back from that point, then the evenly distributed 10 cached positions in the range 0 .. 1,000,000 will not help much.

It may be better to not distribute the cached samples evenly, but instead have more samples around the current point of exploration to give faster immediate response at the expense of slower moves to far away points.

.. just a thought :-)
Posted on 2003-01-15 04:09:55 by Jibz
Thanks for all the replies! You have all convinced me a locally weighted solution will be better, but 'local' is defined by where the user is and the speed of the processor. This is getting more complex than I wanted. :) I think I'll limit the interface to a local position around the current view, and have another option to go to a specific view.
Posted on 2003-01-15 10:52:21 by bitRAKE