A friend of mine wrote me up on IM the other day, asking how to use a hash algorithm to generate short, unique identifiers out of a longer piece of information. I suggested that she ditch the hash idea and just generate random numbers. It's pretty much the same thing, after all, since she was going to be storing the lookup information in a database anyway.
Her first (and fairly natural) reaction was "What about collisions?" After thinking about it for a second, she remembered that a hash isn't guaranteed to be unique - how can you come up with a one-to-one mapping from all possible four-page documents to a single 32-bit number? - so she had to have the code in place to check for duplicates anyway. But it did raise the question, "How likely is a collision to occur?"
In this article, I explain how we answered that question.
No comments:
Post a Comment