Friday, April 9, 2010

CodingBat string manipulation problems - Part 3

This part of the series covers relatively more complicated solutions, including one that exploits a known bug, and another using nested assertions.

Negative assertions

gHappy: We'll say that a lowercase 'g' in a string is "happy" if there is another 'g' immediately to its left or right. Return true if all the 'g's in the given string are happy.
public boolean gHappy(String str) {
  return !str.matches(".*(?<!g)g(?!g).*");
}
In the absence of s.contains("pattern"), one can always resort to s.matches(".*pattern.*").

Also note the use of first-order logic duality ∀x P(x) ⇔ ¬∃x ¬P(x) (the quantified version of De Morgan's law): instead of checking that all 'g's are "happy", we check that there isn't a 'g' that isn't.

Alternatively, one can solve the problem using this 2-steps algorithm:
public boolean gHappy(String str) {
  return !str.replaceAll("gg+", "").contains("g");
}
The reasoning is straightforward: remove all "happy" 'g's; any remaining 'g's aren't.

sameStarChar: Returns true if for every '*' (star) in the string, if there are chars both immediately before and after the star, they are the same.
public boolean sameStarChar(String str) {
  return !str.matches(".*(?<=(.))\\*(?!\\1).+");
}

When greed matters

sameEnds: Given a string, return the longest substring that appears at both the beginning and end of the string without overlapping.
public String sameEnds(String str) {
  return str.replaceAll("(.*).*\\1$", "$1");
}
Here we take advantage of the fact that \1 is greedy: when the regex matches (and it will have to eventually, i.e. when \1 captures nothing), \1 would've been as long as it could've possibly been.
getSandwich: Return the string that is between the first and last appearance of "bread" in the given string, or return the empty string if there are not two pieces of bread.
public String getSandwich(String str) {
  return str.replaceAll(".*?bread(.*)bread.*|.*", "$1");
}
To capture the string between the first and last "bread", we simply need to ensure that the captured string is as long as possible. For \1 to be able to capture the longest possible match, it is essential that not only must it be greedy, but also everything preceding it be reluctant. Also note the use of the "bail-out" pattern (|.*). If the sandwich attempt fails, then just match the whole thing; \1 would capture nothing.

Brilliant insanity

stringSplosion: Given a non-empty string like "Code" return a string like "CCoCodCode".
public String stringSplosion(String str) {
  return str.replaceAll(".(?<=(^.*))", "$1");
}
This solution exploits Bug# 6695369. See also:
repeatEnd: Given a string and an int N (whose value is no more than the length of the string), return a string made of N repetitions of the last N characters of the string.
public String repeatEnd(String str, int N) {
  return str.replaceAll(
    "(?=.{0,N}$(?<=(.{N}))).|."
      .replace("N", Integer.toString(N)),
    "$1"
  );
}
Nested assertions are used to skip at most N characters to reach the end of the string ($) and immediately "turn around" and grab the preceding N characters. I consider this solution a masterpiece.
wordEnds: Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
public String wordEnds(String str, String word) {
  return str.replaceAll(
     ".+?(?<=(^|.)word)(?=(.?))|.+"
       .replace("word", java.util.regex.Pattern.quote(word)),
     "$1$2"
  );
}
This is actually a very interesting problem, due to these two factors:
  • The same character can be captured twice:
    wordEnds("XY1XY", "XY") → "11".
  • Characters can be captured out of order:
    wordEnds("XYXY", "XY") → "XY"
Capturing lookarounds can take care of both quite easily, though. I too consider this solution a masterpiece.

Note: Pattern.quote isn't necessary to pass the available tests, but technically it's required for a proper regex-based solution.

No comments:

Post a Comment