## Tuesday, September 28, 2010

### Puzzle: Java identity hashcode

Just for fun, I came up with this original (hopefully) puzzle: can you find a Java string whose `hashCode()` is equal to the numeric value that the string "represents"?

Here are some random guesses that fail to solve the puzzle but perhaps convey the idea:
```System.out.println("42".hashCode()); // 1662

System.out.println(
Integer.toBinaryString(
"10101010101010101010101010101010"
.hashCode()
)
);  // 11001111110011111101111111110000

System.out.println(
"Long.MAX_VALUE + Integer.valueOf(666)".hashCode()
==
Long.MAX_VALUE + Integer.valueOf(666)
); // false
```
This is just for fun, so the rules are pretty relaxed. You can use base conversion, expressions, constants, widening conversion, autoboxing, etc. Anything creative is acceptable.

## Sunday, September 19, 2010

### Matching palindromes in PCRE regex

```\$palindrome = '/(?x)
^
(?:
(.) (?=
.*
(
\1
(?(2) \2 | )
)
\$
)
)*
.?
\2?
\$

/';
```
(see with testcases on ideone.com)

This is "better" than the one given in the man page, which is based on recursive subpattern matching which doesn't quite "work" because it's atomic.

Too lazy to explain how it works, so I'm just going to toss the bone to stackoverflow (see: How does this PCRE pattern detect palindromes?).

## Thursday, September 16, 2010

### Finding factorials using regex: .NET balancing groups

Believe it or not, this is also a first iteration solution. In fact, as a personal first, this pattern was completely developed on paper (see also on ideone.com).

```var r = new Regex(@"(?xn)

^(
(
( ^.
| (?=  (?<temp-n> .)+ )
(?<= (?<fact>  .+)  )
(?<n-temp> \k<fact> )+?
(?(temp) (?!))
)
(?<n>)
)+
)\$

");

for (int x = 0; x < 6000; x++) {
if (m.Success) {
Console.WriteLine("{0,4} = {1}! ", x, m.Groups["n"].Captures.Count);
}
}
```
The only thing that tripped me up was the need for reluctant quantifier, but everything else was exactly as it was first written on paper.
I'm halfway convinced that the problem with the greedy repetition is a bug, or perhaps was and has since been fixed (see: stackoverflow - Backtracking a balancing group in a greedy repetition may cause imbalance?)

## Wednesday, September 15, 2010

### Finding factorials using regex: a catastrophic first iteration solution

I finally got this to work, and it was such a car wreck of a pattern that I just had to immediately exhibit it for posterity (see also on ideone.com):
```import java.util.regex.*;

public class Factorial {
static String assertPrefix(String pattern) {
return "(?<=(?=^pattern).*)".replace("pattern", pattern);
}
public static void main(String[] args) {
final Pattern FACTORIAL = Pattern.compile(
"(?x) (?: inc stepUp)+"
.replace("inc", "(?=(^|\\1 .))")
//                     1

.replace("stepUp", "(?: ^. | (?<=(^.*)) (?=(.*)) (?: notThereYet \\2)+ exactlyThere )")
//                                 2         3

.replace("notThereYet", "(?:  (?=((?=\\3) .  |  \\4 .)) (?=\\1(.*)) (?=\\4\\5)  )")
//                                        4                   5

.replace("exactlyThere", "measure4 measure1")
.replace("measure4", assertPrefix("\\4(.*)"))
.replace("measure1", assertPrefix("\\1\\6"))
);

for (int n = 0; n < 1000; n++) {
Matcher m = FACTORIAL.matcher(new String(new char[n]));
if (m.matches()) {
System.out.printf("%3s = %s!%n", n, m.group(1).length() + 1);
}
}
}
}
```
Yes, I will definitely work on this and make it more elegant... somehow...

## Tuesday, September 14, 2010

### Computing the root of squares using regex

```final Pattern SQUARE = Pattern.compile(
"(?x) (?: (\\2|^) (\\1 .) )+"
);
for (int n = 0; n < 1000; n++) {
Matcher m = SQUARE.matcher(new String(new char[n]));
if (m.matches()) {
System.out.printf("%3s = %2s^2%n", n, m.group(2).length());
}
}
//   1 =  1^2
//   4 =  2^2
//   9 =  3^2
//  16 =  4^2
//     :
// 961 = 31^2
```

Group 1 contains a forward reference to group 2, and group 2 contains a back reference to group 1.

Alternatively you can have just a single self-referencing group, but this is way more fun.

### Bug 6984178: Greedy repetition throws StringIndexOutOfBoundsException

During the development of the Fibonacci regex pattern, I discovered a bug in the Java regex engine (see: Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition?). This has been filed under BugID 6984178, though I filed another bug months ago and it still hasn't shown up in the external database.

Since I reported the bug, I thought I should at least also investigate it a bit further. I've been able to further simplify the pattern to reproduce the bug (see also on ideone.com).
```System.out.println(
"abaab".matches("(?x) (?: (?=(a+)) \\1 b )* x")
); // StringIndexOutOfBounds: -1
```
The out of bounds index is the difference in length between the first and the second `a+` (e.g. `"aabaaaaab"` gets -3).

Note that using reluctant `*?` or possessive `*+` simply returns `false`. Only the greedy `*` raises the exception.

It looks like the problem is triggered by the attempt to backtrack a greedy repetition when there's a reference to a capturing group inside a lookahead.

## Monday, September 13, 2010

### Finding Fibonacci numbers using regex

This is one of the more fun patterns I've written; it exploits the Second Identity of Fibonacci Numbers.

```final String FIBONACCI =
"(?x) .? | ( \\2?+ (\\1|^.) )* ..";

for (int n = 0; n < 10000; n++) {
String s = new String(new char[n]);
if (s.matches(FIBONACCI)) {
System.out.printf("%s ", n);
}
}
// 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
```