Discussion:
[perl #59810] [PATCH] store hash seed in parrot_string_t struct
(too old to reply)
Christoph Otto
2008-10-11 21:53:53 UTC
Permalink
# New Ticket Created by Christoph Otto
# Please include the string: [perl #59810]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=59810 >


Calling string_hash with a seed value other than the one used in src/hash.c
(3793) can cause strange and wonderful failures if the STRING is reused by imcc.

What happens is that after the STRING's hash is computed, it's cached in
s->hashval. This is works fine unless the first caller of string_hash on a
given STRING uses a seed other than 3793 *and* the STRING is reused by imcc as
a hash key. When this happens, the second call to string_hash sees a cached
hash which was computed with an unexpected seed. When this STRING is used as
a hash key, parrot_hash_get_bucket looks in the wrong bucket and fails to find
the associated value.

This leads to various levels of badness. In Pipp's case, it means that with
the following PIR code, the lookup of the hypothetical do_stuff METHOD would
fail because the STRING 'do_stuff' would be hashed by
Parrot_PhpArray_get_string_keyed_str with an unexpected seed.

p = new 'PhpArray'
p['do_stuff'] = 'x' #caches an unexpected hash in s->hashval
p.'do_stuff'() #method lookup fails with reused STRING 'do_stuff'

Because string_hash is marked PARROT_API and presumably intended for external
use, a caller should be able to call it with an arbitrary seed without messing
up the rest of Parrot. The attached patch allows this by adding a seedval
field to parrot_string_t and checking both the seedval and the hashval before
returning the cached hashval of a STRING.

I'd appreciate comments on whether this is the right approach and if there is
anything the patch misses.
Chromatic
2008-11-17 02:52:08 UTC
Permalink
Post by Christoph Otto
Calling string_hash with a seed value other than the one used in src/hash.c
(3793) can cause strange and wonderful failures if the STRING is reused by imcc.
What happens is that after the STRING's hash is computed, it's cached in
s->hashval.  This is works fine unless the first caller of string_hash on a
given STRING uses a seed other than 3793 *and* the STRING is reused by imcc
as a hash key.  When this happens, the second call to string_hash sees a
cached hash which was computed with an unexpected seed.  When this STRING
is used as a hash key, parrot_hash_get_bucket looks in the wrong bucket and
fails to find the associated value.
This leads to various levels of badness.  In Pipp's case, it means that
with the following PIR code, the lookup of the hypothetical do_stuff METHOD
would fail because the STRING 'do_stuff' would be hashed by
Parrot_PhpArray_get_string_keyed_str with an unexpected seed.
I'd rather remove the hash seed from the key calculation. Instead, let's use
a global seed (#defined somewhere) as the initial seed, cache the calculated
key value, then hash against any hash seed and the string's length if the
hash has a non-zero seed. That doesn't spread the hash seed's entropy
througout the key as much, but it lets us cache calculated keys and should
give us more entropy to reduce collisions.

Any mathematician is welcome to prove that this makes things worse, however.

-- c
Nicholas Clark
2008-11-17 11:52:39 UTC
Permalink
Post by Chromatic
I'd rather remove the hash seed from the key calculation. Instead, let's use
a global seed (#defined somewhere) as the initial seed, cache the calculated
You don't want a constant global seed, else you fall foul of Algorithmic
Complexity attacks: http://www.cs.rice.edu/~scrosby/hash/
Post by Chromatic
Any mathematician is welcome to prove that this makes things worse, however.
Scott A Crosby and Dan S Wallach appear to be computer scientists. Will they
do? :-)

Nicholas Clark
Christoph Otto
2008-11-17 12:42:03 UTC
Permalink
Post by Nicholas Clark
Post by Chromatic
I'd rather remove the hash seed from the key calculation. Instead, let's use
a global seed (#defined somewhere) as the initial seed, cache the calculated
You don't want a constant global seed, else you fall foul of Algorithmic
Complexity attacks: http://www.cs.rice.edu/~scrosby/hash/
Post by Chromatic
Any mathematician is welcome to prove that this makes things worse, however.
Scott A Crosby and Dan S Wallach appear to be computer scientists. Will they
do? :-)
Nicholas Clark
What about a per-interp seed, set to some random value during Parrot's
initialization? This would keep any sharp edges away from users and avoid
those charming algorithmic complexity attacks (assuming Parrot_int_rand can
find sufficient randomness).
Christoph Otto
2008-12-08 06:33:14 UTC
Permalink
I'd appreciate comments or a quick code review as to whether I should
apply the patch as-is (sans randomization) once the failing OrderedHash
test passes. It's admittedly not a complete solution, but it does hide
Parrot's hash seed from any PARROT_EXPORT functions and makes testing
and fixing the other bugs easier.
After some testing, it turns out that the OrderedHash bug is orthogonal to
this patch and can easily be reproduced by changing the value in
src/hash.c:816 in trunk (e.g. to 3794).

Since I now know that the patch doesn't break anything new, I'd appreciate a
+1 to apply it (with a fixed seed of 3793, matching the current behavior).
I'll assume the patch [1] is ok to commit if I don't hear a negative responses
by the 16th.

I'll also be working on the OrderedHash bug and anything else that's keeping
randomized seeds from working.

[1] http://rt.perl.org/rt3/Ticket/Attachment/498952/234164/noseed_1.patch
Loading...