heya all,
i am trying to find the best way to filter text our from an array using a filter words.
i.e:
a user enters >=1<=1000 words into an array to filter with
and a >=1<=30,000 words as a text
i wann find the best way to scan the second array, and using the filter words in the first array i'll 'delete' those words in the second array.
any words that contain the substring will be filtered out as well.
filters/words can be devided by any programmer aggrement letter (i.e: ' ' or carridge return)..

I.E:
Filter words: 'asm'
Text: 'Win32asm.cjb.net rocks'
screen output: 'rocks'

note, my asm field is 16bit
any known/good algotithms?

i know i can loop around the text several times if no filter word found/not found. ans rep scasb all the time..
mabye there are better ways?
Posted on 2003-03-31 06:06:30 by wizzra
just a quick idea, not thought much about this. But what about building a hash table of all the filter words?

The process would be simple then, pseudocode:


while(more input text)
{
word = gettoken();
if(not inhashtable(word)) output(word);
}


Might be better solutions than using a hashtable, but it should definitely beat brute-force string comparisons.
Posted on 2003-03-31 06:40:52 by f0dder
The problem with the hashtable approach would be that it has to identify substring matches too.

I think a viable solution would be to build a search tree containing the filter words. That way you can check against all filter words at once at each position in the input.
Posted on 2003-04-01 09:27:15 by Jibz
Would it be wise if i will use linked list ?
each word i will put into an struc, and it will make things
better to search and remove no?
Posted on 2003-04-07 07:51:52 by wizzra