## Wednesday, February 17, 2010

### "Microcomputers: the unredeemed lollipop..."

So let us formalize the problem of finding a pair of strings (not necessarily distinct) from a dictionary of size `N`, such that the Java string hash code of their concatenation is `Integer.MIN_VALUE` or 0.

How quickly can this be done?

A quote from Joshua Bloch himself on how he found `"polygenelubricants"` (at 1:07:00 in the video):
I have a dictionary of 200,000 words, which means there are 40 billion word pairs, [...] and it took it 10 minutes to calculate all the hash values of every word pair
The quote suggests at least an `O(N^2)` solution, assuming that the hash code calculation is `O(1)` for each word pair (which can in fact be done).

I think you can do MUCH better than that, however. I'm still working on this as we speak, but here's a Maple snippet to shed some light:
```> # "lubricants".hashCode() = -573982625
> # "lubricants".length() = 10
> msolve(x * 31^(10) + (-573982625) = 0, 2^31);
{x = 561786081}
> # 561786081 = "polygene".hashCode()
```
Thus, the problem is solvable in `O(N)` times the time it takes to do `msolve`.

My goal is to demonstrate the superiority of this technique by coming up with word triplets. If `O(N^2)` takes 10 minutes, then `O(N^3)` would take almost 4 years.

Specifically, we need to be able to solve this equation:
```x = -H/31^L (mod 2^31)
```
The division translates to finding the modular multiplicative inverse. Note, however, that we only need to do this for very few choice values of `L`, so we can simply precompute a lookup table in Maple. Thus, `x` can be found in `O(1)`.
It is done...
```Microcomputers: the unredeemed lollipop...
Incentively, my dear, I don't tessellate a derangement.
A person who never made a mistake, never bombed resurgent spiders.
And so my fellow mismanagements: ask not what your newsdealer can sugarcoat for you -- ask what you can sugarcoat for your newsdealer.
```
By the way, bonus points if you can identify the original quotes.
Update 2/20:

I challenged myself to solve more complex patterns, e.g. ABCABC, where it's not just suffix -> prefix matchings.
```Foliates recycle; putrid foliates recycle putridly.
Allocator redistricts; strict allocator redistricts strictly.
Familiarness proffers; freakish familiarness proffers freakishly.
Entrails transmigrate; swinging entrails transmigrate swingingly.
```
A couple of things that make this problem a bit harder:
• The multiplicative inverse is no longer uniquely defined. Per the linear congruence theorem, an equation may have no solution at all, or it may have many (bounded by the modulus).
• The equation to solve is dependent on the length of the infix to inject. Fortunately, in any practical dictionary, length of longest word is `O(1)`.