Tuesday, September 14, 2010

Bug 6984178: Greedy repetition throws StringIndexOutOfBoundsException

During the development of the Fibonacci regex pattern, I discovered a bug in the Java regex engine (see: Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition?). This has been filed under BugID 6984178, though I filed another bug months ago and it still hasn't shown up in the external database.

Since I reported the bug, I thought I should at least also investigate it a bit further. I've been able to further simplify the pattern to reproduce the bug (see also on ideone.com).
System.out.println(
   "abaab".matches("(?x) (?: (?=(a+)) \\1 b )* x")
); // StringIndexOutOfBounds: -1
The out of bounds index is the difference in length between the first and the second a+ (e.g. "aabaaaaab" gets -3).

Note that using reluctant *? or possessive *+ simply returns false. Only the greedy * raises the exception.

It looks like the problem is triggered by the attempt to backtrack a greedy repetition when there's a reference to a capturing group inside a lookahead.

No comments:

Post a Comment