Ikonboard » Printable Version of Topic

Forum: The Classroom
Topic: A Topic
started by: StreetRaver

Posted by StreetRaver on Oct. 12 2001,15:13

So, What does everyone want to talk about?

------------------
Take XTC, Then you wake up in the morning wondering "Wheres My Underwear?"

This message has been edited by StreetRaver on October 13, 2001 at 03:19 PM

Posted by StreetRaver on Oct. 12 2001,15:14

To*, Damn I can't spell

------------------
Take XTC, Then you wake up in the morning wondering "Wheres My Underwear?"

Posted by DeadAnztac on Oct. 12 2001,15:28

Templated Linked Lists. I'm having trouble making an efficent array system here. My goal was to make a define on use, dynamic sized array class here. In other words on the outside it has all the ease of use of an array (like the [ ] operator) but it really has a dynamic inside. See my plan was to make the index a linked list with pointers to dynamic typed data. Another thing this could help is that the index would be independent of the numbers, so, like a[5] could be the first index, a[1] could be the second. I was also going to add in string indexing as well, like a["foo"], but I want to get this other stuff down first. It's rather complicated, does anyone have a tutorial on templated classes that might help?

Edit: In other words PHP like arrays.

/Thus endeth Anztac's idea for a discussion./
------------------
~Anztac [ I'm just this guy, you know? ]

This message has been edited by DeadAnztac on October 13, 2001 at 10:30 AM

Posted by Beldurin on Oct. 12 2001,15:58

Disclaimer: it's been a couple of years since I've had a real programming class.

But if you want to use the [] operator, you would either have to decide on some kind of regular expression to use as your index value( i.e. a[#0] or some other kind of reserved character so that you could do a string or integer search within), or else go w/out a true index.

Also, if you made it string-indexable such as a["foo"], your search times (for large, arrays) are going to approach the mythic O unless you force the arrays to be ordered, arent't you?

------------------

quote:
Originally posted by Dark-Angel99:
How come {name removed} doesn't like you? I find you really funny :D

Never argue with an idiot...he may be doing the same thing

Posted by veilside on Oct. 12 2001,16:36

quote:
Originally posted by StreetRaver:
To*, Damn I can't spell

that's what the edit button's for.

Posted by damien_s_lucifer on Oct. 12 2001,19:29

quote:
Originally posted by Beldurin:
Also, if you made it string-indexable such as a["foo"], your search times (for large, arrays) are going to approach the mythic O unless you force the arrays to be ordered, arent't you?

Not if you use a hash, which is probably what you want in the first place. They use more memory than arrays but they're a *lot* faster - with a decent implementation, retrieval from an arbitrarily large hash happens in constant time, i.e. retrieving a value from a 1 bazillion element hash happens just as quickly as retrieving a value from a 10 element hash. w00t!

Posted by miNus on Oct. 12 2001,19:50

quote:
Originally posted by damien_s_lucifer:
hash
*snip*
hash
*snip*
hash
*snip
hash. w00t!

Dear God, is that ALL you think about!!??!?

...

I took a C++ class in high school, just to get started off with it, and we got just as far as arrays. I never really learned about pointers. Half the class could barely open up Microsoft Visual C++, so it was a VERY slowly progressing class.

Get your mind off of the drugs damien!

------------------
"Take a cable and simulate a sex act on each end. Press some combonation of play and record buttons." - askheaves

Posted by Beldurin on Oct. 13 2001,02:26

Stupid me forgot about hashes. Had a 200-level college data structures class that we dealt briefly with them, but since the prof couldn't teach worth a fuck, noone really got them. All I remember is that we had to put the American Heritage Dictionary into a hash and then spell check the first two books of the Bible with it.

And get this...there were errors...sheeh...you'd think God could spell.

P.S. It's a joke...don't flame me for it

------------------

quote:
Originally posted by Dark-Angel99:
How come {name removed} doesn't like you? I find you really funny :D

Never argue with an idiot...he may be doing the same thing

Posted by damien_s_lucifer on Oct. 13 2001,04:59

quote:
Originally posted by miNus:
Dear God, is that ALL you think about!!??!?

Yes. And thanks for calling me God

DeadAnztac, implementing a hash is fairly straightforward. Essentially, you create an array of known size whose elements are a simple struct (forgive my C syntax, I'm pretty rusty) :

struct hashrec
{
char key[32];
void* data;
}

Then you define a hashing function :

unsigned int hash_func(char* key)
{
int keylen,genlen,count,return_value;
return_value = 0;
keylen = strlen(key);

// sum up all the characters in our key
for (count=0; count<keylen; count++)
{
return_value += key[count];
}
// return the modulus of our sum and our array size. For efficient hashing,
// HASH_ARRAY_SIZE should be a prime number
return_value = return_value \% HASH_ARRAY_SIZE;
return return_value;
}

Your array should be as large as the *maximum* value your hash function will ever return. For the function above, it will be HASH_ARRAY_SIZE - 1 :

struct hashrec hash_array[HASH_ARRAY_SIZE -1];

I'm just going to pseudo-code retrieval :

int hash_retrieve(char *key, char *data)
{
// figure out what our hash value is
int hash_array_offset = hash_func(key);

// Is our hashed struct null?
if (hash_array[hash_array_offset].key == NULL)
{
// yes, return failure
return 0;
}

// no. see if keys match
if (strcmp(hash_array[hash_array_offset].key,key) == 0)
{
// we found it. Copy out our data and return success.
copy_data(hash_array[hash_array_offset],data); // you'll have to define this
return 1;
}

// No match. Fall back on one of many collision resolution schemes.
}

Insertion is almost the same as retrieval. I'll leave it to you to figure out how to do it.

The only tricky part is collision resolution. With the hash function listed above, the keys 'fuck you' and 'you fuck' will both produce the same hash value, so if you try and insert those two keys into the hash you'll end up with a collision.

There are a lot of ways to resolve this - my favorite is to expand your hash_struct so it can become a linked list. After hashing, you search thru the list until you find a match or run off the end of the list.

Obviously there's more to hash implementation than just the above, but that should give you an idea of how they work at a fundamental level. Have fun!

This message has been edited by damien_s_lucifer on October 14, 2001 at 12:01 AM

Posted by damien_s_lucifer on Oct. 13 2001,07:55

I'm not sure what you mean by just hashing the indeces... do you mean storing an index into an array of pointers? If so, you're adding an unnecessary layer of indirection, because you'd have to :

- perform the hash lookup and get the index
- use the index to retrieve a pointer from your array
- then dereference the pointer to get the data

If you're storing the data itself in the array, that's a little different... but then you have the problem of having a fixed limit on how much data you can store per key.

Either way, though, inserts are going to slow down drastically as things are added and removed from the array, since you'll pretty much have to start at the beginning of the array and hunt until you find an empty slot, thereby negating a lot of the benefit you'd get from a hash.

Better to just alloc the memory you need during an insert, and store the data pointer in the hash itself. Free the memory on a delete.

As far as autogenerating enumerated data types, you'll have to use strings for that. The compiler's going to balk any time it sees something like

my_hash[foo] = "bar";

if you haven't declared foo somewhere else *and* made your class use the enum. But it won't care about

my_hash["foo"] = "bar";

as long as you've overloaded "foo" properly, i.e. as taking a char*.

A really good hash class should be able to handle

my_hash[char*] = char*
my_hash[int] = char*
my_hash[float] = char*

So that you can use arbitrary keys to store arbitrary data (you can store just about anything in a char* if you're careful )

<advocacy>
All of this crap is why I went to Perl in the first place. Weakly typed data + native lists, hashes, and regular expressions make it a programmer's dream.
</advocacy>

Posted by askheaves on Oct. 13 2001,16:00

I'm always going to be a fan of strong typing. I got away with too much when I was passing void*s around in my youth, and it bit me a lot of times. That's why I hate RTTI, because those decisions should be made before hitting the compile button.

One interesting idea would be to have a variable length hash of indices. Keep it roughly the square root of the number of entries in the hash. Have a rehash function that would reslot the parts in their appropriate places.

For a better explanation, let's take a 100 element hash. You make a keying function that groups all data into one of 10 bins. And each bin holds, on average, 10 elements in a linked list, or array, or whatever. Then, to find an element, you do a key lookup first, and find your bin where your data would be. Then, you do a regular iterate to find the exact data you want.

If you were to add 44 elements to the hash (making it 144), it may be a good time to rehash. You call the function, and it divides everything into 12 bins and divies the data out accordingly.

I've forgotten all there is to know about the big-O, but I think this approach is a relatively fast one... possibly Root(O)? and it doesn't create a whole lot of excess memory use. When you got up to 1 million entries, you'd have 1000 bins with an average of 1000 entries to search through to find your data, instead of 1 million in a regular linked list.

Data Structures suck.

Posted by askheaves on Oct. 13 2001,17:30

Actually, from what I've heard, the idea of taking your hash and building linked lists under the nodes is the preferred method. It allows you to greatly reduce the size of your hash, and basically divides your search time by the number of nodes you allow.

Posted by DeadAnztac on Oct. 13 2001,17:42

Oh, I like, I like! perhaps though I could just hash the indecies? My plan is slowly evolving me thinks, I'll just have:
hash->linked list of indecies->templated data

in other words I'll have to have a Node class that has templated data (any data), it's index number, and a pointer. A lList class to manage all those fun nodes (so it will have head, tail, cur, etc.). Then a class to handel it all, which will hash the indecies on base 10. Now that's what I call an efficent data system Well, this should do some things, it will take the dynamicity of a linked list, make O=O/(number of hashes), and also add the ease of use of a normal array (as in outwardly using the index operator instead of having to do a raw search everytime.) Plus on top of that it will be defined on use, and every sub-section of this array could be a different type ([0] could be an int, [1] could be a char [], etc.)

Here's a thinker for you though, do you think there's a way to make dynamic enumerated data types? I think that would make this whole thing just that much cooler (as in [foo] upon use is added into a enum where it's given a value of the current index).

If I've failed to grasp a concept please help, I'm writting code in my head right now, I just want to make sure it'd work Thanks for the help Damien, as always you've gone beyond the call of duty!

------------------
~Anztac [ I'm just this guy, you know? ]

This message has been edited by DeadAnztac on October 14, 2001 at 12:45 AM

Posted by damien_s_lucifer on Oct. 13 2001,18:46

Actually, if you're going to use AH's technique, you want to make sure your hash size is the next prime # that's larger than root(n). That ensures that keys are spread evenly throughout the hash, instead of everything trying to store itself in the same few locations.

As for saving memory - the bulk of your memory consumption is going to be in your key nodes. The hash itself is only going to take 4 bytes per slot, which is small compared to how much your key nodes will take. So you might want to consider making the hash the next prime after 2*root(n), or even larger, just to cut O down even further. The larger the hash array, the faster lookups work - until they hit the magical "constant time"

Posted by damien_s_lucifer on Oct. 13 2001,19:30

DAMN the escape key. I just erased my entire revised explanation!!!!

Anyway - using a hash+LL with a hash size of root(n) is going to have worse behavior than a B-tree.

code:


Algorithm     O      100 elements   10000 elements
B-tree     log2(n)   7 lookups      14 lookups
Hash+LL    1+root(n) 11 lookups     101 lookups

I'd say the best bet is simply to use a hash size very close to n. It seems like you're wasting memory, but in reality your data is *always* going to take up a lot more memory than your hash array - on a million elements, unless your keys & data are tiny (less than 32 bytes each), your data itself is going to take 64 or more MB - so the 4MB extra space for a million-element hash array is insignificant.

But if you *really* need to conserve space, use a B-tree instead of a LL to perform collision resolution :

code:


Algorithm     O            100 elements   1000000 elements   4 billion elements
B-tree     log2(n)         7 lookups      20 lookups            32 lookups
Hash+LL    1+root(n)       11 lookups     1001 lookups       63246 lookups
Hash+Btree 1+log2(root(n)) 5 lookups      11 lookups            17 lookups
Hash with  1               1 lookup       1 lookup             1 lookup
large n

Take your pick

This message has been edited by damien_s_lucifer on October 14, 2001 at 02:35 PM

Posted by DeadAnztac on Oct. 14 2001,05:45

Yes, ask hit it, It's going to be a linked list, I don't want to search the entire linked list every time, so I'm going to hash it. See, i can't make it a straight hash because I can't make an algoritim based on the data because the data is going to be of different types. So, I think that's what I'll do. I'll show you guys the code when i'm done if your interesting, it might make for an interesting thing to interpret, hehehe

BTW, I'm doing this because I miss my PHP arrays when I'm coding in C++.

------------------
~Anztac [ I'm just this guy, you know? ]