I want to make a database format and functions, what would be the best way to set it up so searching through it is fast?
If your really concerned with fast searching, we need more information: what type for information are you building a database of, and what type of connectivity do you need? Type is the specific as well as the quantity. For example if you were creating a database that was an encylopedia you couldn't fit it in memory and so your options would be limited. Let the data choose the path you take - what is it and how do you want to get at it? Hope I make sense to someone besides myself :P bitRAKE
It will carry Name, Age, ID#, and stuff like that. Im gonna have quite a few databases and each one will be farly small but some of them, like the one above may have a couple hundred items in it. And for searching i want to be able to search for anything, ID#, Name or/and Age and get the rest of the info. I'm also gonna have to be able to add and delete items. The way i started to make it was that each thing would have an index based on its first char, so if I wanted to get someone name "Joe" it would go to the index find J and there would be an array of pointer to names beginning with "J" so that way the ammount the program would have to search through would be reduced. But the problem I had was deleting/adding an item wouldnt be easy cause all the pointers would have to be updated, and i didnt think that would go to fast.
Databases are the hardest and sure not the funny part of programming...i mean large databases...the ones that will not get into the momory.... My advice: study the Xbase (Dbase/Delphi and Vfox database structures) : make it an fixed field size array, add a header on top describeing the field name/type and size..calculate the record size. so you can easy: "go to record no 1567" just by 1567*recordsize then make an index file (the hard part)...make it a binary tree (at least) so the search will be fast (for extra speed try to compress the index file...to read smaaler sizes from drive) about delete: nobody can afford a "delete/insert" command (nowdays...it was one in dbase 2/3/4 i belive) so just mark the record for deletion (a flag) and ignore it in search etc...but leave it there in the file ...(index also) at some latter maintenance time (like the "pack" command dump the "marked for deletion" records...but warn the user it...can take a while :) and rede the whole index and data file from zero...in a temporary copy of course :D whenever you add a record just place it at the end of the file...but make the changes into the index file also (incremental/production operation) PS. the index file : try not to rearange records into the index file...instead just reset the corect pointers (links) like the FAT system This message was edited by BogdanOntanu, on 2/28/2001 11:08:46 PM
I've been dealing with "databases" for more than 3 decades on both IBM mainframes and PCs. If you are sure that the files will ALWAYS be small enough to fit into memory, then just read the entire file and go for the "brute force" method. The problems start when a file gets too big to fit into memory. In this case, I agree with BogdanOntanu. Indexed files are the easiest, and fastest, way to set up a search procedure. Imagine the index as a seperate file. If you want to index a file by name, for example, then the index "file" would consist of the name field, followed by a pointer (see SetFilePointer) to the actual record. Again, the idea here is to make the index small enough to fit into memory, so you can search the entire index very quickly. If you organize the index in sequence, then you can use a "binary search" for even faster searches, but that's another story... This approach works well for "medium" sized files. But what if, even after you have indexed the file, the index is still too big to fit into memory? Then we get into second, and maybe even third level indices. Think of these as an index to the index. Again, that's a whole 'nother story... From what you have posted, you want to be able to search a file on EVERY field. You have much to consider here. What are you going to do with duplicates, such as age? If you have 100 records, chances are pretty good that you will have at least 2 records with the same age. Do you want only the first one that you find, or do you want to continue and find all of them? You have the same sort of problems with any index (like 2 guys named JOHN SMITH). As far as adding/deleting records, it depends on the application. These are one-time events, but most systems will access/update a record many times during the life of the record. You must deal with the extra aggrivation involved in updating the indices, especially when it comes to things like keeping the index in sequence, so you can take advantage of binary searches, etc. etc... I agree with BogdanOntanu re. "marking" deleted records, if you can afford the "down time" needed to reorganize a file. But if you can't, it's not "that much" (all considered!) extra work to do it real-time. I don't want to scare you away, but these are things to consider. Entire systems, like SQL, or VSAM on the mainframe, have been designed to deal with just this sort of problem. :) --- On the keyboard of life, always keep one finger on the escape key.
You could consider generating a hash code of all the fields you want to search by that are greater than 4 bytes (use the exact value for small fields). Then make your index all those four byte fields. These can be search very quickly. If your hash index doesn't fit in memory, then generate hash codes for your groups of hash codes :) bitRAKE