Tuesday, September 28, 2010

Puzzle: Java identity hashcode

Just for fun, I came up with this original (hopefully) puzzle: can you find a Java string whose hashCode() is equal to the numeric value that the string "represents"?

Here are some random guesses that fail to solve the puzzle but perhaps convey the idea:
System.out.println("42".hashCode()); // 1662

System.out.println(
   Integer.toBinaryString(
      "10101010101010101010101010101010"
         .hashCode()
   )
);  // 11001111110011111101111111110000

System.out.println(
   "Long.MAX_VALUE + Integer.valueOf(666)".hashCode()
   ==
    Long.MAX_VALUE + Integer.valueOf(666)
); // false
This is just for fun, so the rules are pretty relaxed. You can use base conversion, expressions, constants, widening conversion, autoboxing, etc. Anything creative is acceptable.

Sunday, September 19, 2010

Matching palindromes in PCRE regex

$palindrome = '/(?x)
^
  (?:
      (.) (?=
              .*
              (
                \1
                (?(2) \2 | )
              )
              $
          )
  )*
  .?
  \2?
$

/';
(see with testcases on ideone.com)

This is "better" than the one given in the man page, which is based on recursive subpattern matching which doesn't quite "work" because it's atomic.

Too lazy to explain how it works, so I'm just going to toss the bone to stackoverflow (see: How does this PCRE pattern detect palindromes?).

Thursday, September 16, 2010

Finding factorials using regex: .NET balancing groups

Believe it or not, this is also a first iteration solution. In fact, as a personal first, this pattern was completely developed on paper (see also on ideone.com).

var r = new Regex(@"(?xn) 

^(
   (
     ( ^.
     | (?=  (?<temp-n> .)+ )
       (?<= (?<fact>  .+)  )
       (?<n-temp> \k<fact> )+?
       (?(temp) (?!))
     )
     (?<n>)
   )+
 )$

");

for (int x = 0; x < 6000; x++) {
   Match m = r.Match("".PadLeft(x));
   if (m.Success) {
      Console.WriteLine("{0,4} = {1}! ", x, m.Groups["n"].Captures.Count);
   }
}
The only thing that tripped me up was the need for reluctant quantifier, but everything else was exactly as it was first written on paper.
I'm halfway convinced that the problem with the greedy repetition is a bug, or perhaps was and has since been fixed (see: stackoverflow - Backtracking a balancing group in a greedy repetition may cause imbalance?)

Wednesday, September 15, 2010

Finding factorials using regex: a catastrophic first iteration solution

I finally got this to work, and it was such a car wreck of a pattern that I just had to immediately exhibit it for posterity (see also on ideone.com):
import java.util.regex.*;

public class Factorial {
   static String assertPrefix(String pattern) {
      return "(?<=(?=^pattern).*)".replace("pattern", pattern);
   }
   public static void main(String[] args) {
      final Pattern FACTORIAL = Pattern.compile(
         "(?x) (?: inc stepUp)+"
            .replace("inc", "(?=(^|\\1 .))")
            //                     1
            
            .replace("stepUp", "(?: ^. | (?<=(^.*)) (?=(.*)) (?: notThereYet \\2)+ exactlyThere )")
            //                                 2         3
            
            .replace("notThereYet", "(?:  (?=((?=\\3) .  |  \\4 .)) (?=\\1(.*)) (?=\\4\\5)  )")
            //                                        4                   5
            
            .replace("exactlyThere", "measure4 measure1")
               .replace("measure4", assertPrefix("\\4(.*)"))
               .replace("measure1", assertPrefix("\\1\\6"))
      );
      
      for (int n = 0; n < 1000; n++) {
         Matcher m = FACTORIAL.matcher(new String(new char[n]));
         if (m.matches()) {
            System.out.printf("%3s = %s!%n", n, m.group(1).length() + 1);
         }
      }
   }
}
Yes, I will definitely work on this and make it more elegant... somehow...

Tuesday, September 14, 2010

Computing the root of squares using regex

final Pattern SQUARE = Pattern.compile(
   "(?x) (?: (\\2|^) (\\1 .) )+"
);
for (int n = 0; n < 1000; n++) {
   Matcher m = SQUARE.matcher(new String(new char[n]));
   if (m.matches()) {
      System.out.printf("%3s = %2s^2%n", n, m.group(2).length());
   }
}
//   1 =  1^2
//   4 =  2^2
//   9 =  3^2
//  16 =  4^2
//     :
// 961 = 31^2
(see also on ideone.com)

Group 1 contains a forward reference to group 2, and group 2 contains a back reference to group 1.

Alternatively you can have just a single self-referencing group, but this is way more fun.

Bug 6984178: Greedy repetition throws StringIndexOutOfBoundsException

During the development of the Fibonacci regex pattern, I discovered a bug in the Java regex engine (see: Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition?). This has been filed under BugID 6984178, though I filed another bug months ago and it still hasn't shown up in the external database.

Since I reported the bug, I thought I should at least also investigate it a bit further. I've been able to further simplify the pattern to reproduce the bug (see also on ideone.com).
System.out.println(
   "abaab".matches("(?x) (?: (?=(a+)) \\1 b )* x")
); // StringIndexOutOfBounds: -1
The out of bounds index is the difference in length between the first and the second a+ (e.g. "aabaaaaab" gets -3).

Note that using reluctant *? or possessive *+ simply returns false. Only the greedy * raises the exception.

It looks like the problem is triggered by the attempt to backtrack a greedy repetition when there's a reference to a capturing group inside a lookahead.

Monday, September 13, 2010

Finding Fibonacci numbers using regex

This is one of the more fun patterns I've written; it exploits the Second Identity of Fibonacci Numbers.


final String FIBONACCI = 
   "(?x) .? | ( \\2?+ (\\1|^.) )* ..";
                        
for (int n = 0; n < 10000; n++) {
   String s = new String(new char[n]);
   if (s.matches(FIBONACCI)) {
      System.out.printf("%s ", n);
   }
}
// 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
(see also on ideone.com | PHP | C#)

Wednesday, September 8, 2010

The power of nested references in Java regular expressions

One day it dawned upon me that I haven't seen any interesting uses for nested references just yet, so I decided to sit down and brew some interesting patterns; boy were there many.

Here's what I've published on stackoverflow so far:
I still have a few more good ones to go.
These aren't part of the series, but use nested references regardless:
  • Split string of varying length using regex (an absolutely wonderful problem; I'd like to go back to this sometime in the future to apply new things I'd have learned by then, because the current solution is a bit convoluted).