Saturday, April 10, 2010

CodingBat string manipulation problems - Part 4

This second to last part shares a few things you should know about Java regex, how to measure lengths, the "2-for-1" technique, and a few other tricks.

Know your flags

withoutString: Given two strings, base and remove, return a version of the base string where all non-overlapping instances of the remove string have been removed (not case sensitive).
public String withoutString(String base, String remove) {
  return base.replaceAll(
    "(?i)" + java.util.regex.Pattern.quote(remove), "");
}
The 'i' flag enables case-insensitive matching; Pattern.quote is used (unnecessarily for codingbat tests) to match remove literally, giving no special meaning to any regex special metacharacters.

Know your character classes

countYZ: Given a string, count the number of words ending in 'y' or 'z' (not case sensitive). We'll say that a 'y' or 'z' is at the end of a word if it's not followed by a letter (Note: Character.isLetter tests if a char is a letter.)
public int countYZ(String str) {
  return str.split("[yzYZ](?!\\p{javaLetter})", -1).length - 1;
}
Any non-deprecated Character.isXXX method maps to \p{javaXXX} character class.

Know your bounds

notReplace: Given a string, return a string where every appearance of the lowercase word "is" has been replaced with "is not". The word should not be immediately preceded or followed by a letter.
public String notReplace(String str) {
  return str.replaceAll("\\b(is)\\b", "is not");
}
Do note that while the above solution passes all tests, it is technically incorrect, because it uses \b word boundary anchor. Regex words consist of [a-zA-Z_0-9]. When it is applicable, though, you should use the anchor, because it results in a clean regex like above.

Capture only when necessary

startOz: Given a string, return a string made of the first 2 chars (if present), however include first char only if it is 'o' and include the second only if it is 'z'.
public String startOz(String str) {
  return str.replaceAll("(?:(o)|.)(?:(z)|.).*", "$1$2");
}
Note that deFront is essentially the same problem.

Measuring lengths

everyNth: Given a non-empty string and an int N, return the string made starting with char 0, and then every Nth char of the string.
public String everyNth(String str, int N) {
  return str.replaceAll(String.format("(.).{0,%d}", N-1), "$1");
}

conCat: Given two strings, return their concatenation. However, if the concatenation creates a double-char, then omit one of the chars.
public String conCat(String a, String b) {
  return (a + b).replaceAll(
    String.format("(?<=^.{%d})(?<=(.))\\1", a.length()), "");
}
Admittedly both solutions are awkward in regex, but it shows how one can use finite repetition to measure lengths (see also the solution to repeatEnd in Part 3).

2-for-1

startWord: Given a string and a second non-empty "word" string, we'll say that the word matches the string if it appears at the front of the string, except its first char does not need to match exactly. On a match, return the front of the string; otherwise return the empty string.
public String startWord(String str, String word) {
  return (word + "#" + str)
    .replaceAll(".(.*)#(.\\1)?.*", "$2");
}
Regex tasks involving two strings can sometimes be done on their concatenation instead, using a special delimiter (which must not occur in either string) to mark where one ends and the other begins. Note that if no special delimiter is safely available, you can always measure length instead (see above, and compare with below).
An alternate solution for conCat using this technique would look something like this:
public String conCat(String a, String b) {
  return (a + "#" + b)
    .replaceAll("(.)?#(?=\\1|(.?)).?", "$1$2");
}

2-steps

stringE: Return true if the given string contains between 1 and 3 'e' chars.
public boolean stringE(String str) {
  return str.replaceAll("[^e]", "").matches("e{1,3}");
}
Sometimes a task becomes a lot easier to solve if you naturally break it down into smaller steps, such is the case here. For reference, this is how you'd solve this one in one step:
public boolean stringE(String str) {
  return str.matches("([^e]*e){1,3}[^e]*");
}
See also gHappy in Part 3.

Normalize input to eliminate exceptional cases

lastChars: Given 2 strings, a and b, return a new string made of the first char of a and the last char of b. If either string is empty, use '@' for its missing char.
public String lastChars(String a, String b) {
  return (a + "@@" + b).replaceAll("(?<!^).*(?!$)", "");
}
Note that the way we concatenate the strings is such that the regex doesn't have to consider any special case. This is a technique that is often applicable to many other things as well.

It should be mentioned that while a lot of the regex solutions given here aren't the most efficient way of solving these problems, this one in particular makes it obvious: it's linear when a constant-time solution exists.

No comments:

Post a Comment