Monday, December 27, 2010

Reference point

Learning Python. No doubt, this code will become a source of embarrassment in the future.

(Actually, I'm embarrassed already...)

CoderLoop: reversesequence
#!/usr/bin/python

import sys
from collections import defaultdict

f = open(sys.argv[1], 'r')

succ = defaultdict(set)

for [p, q] in [line.split() for line in f]:
   succ[p].add(q)
   succ[None].add(p)
   succ[None].add(q)

stack = [None]
visited = set([None])

while stack:
   p = stack[-1]
   nexts = [q for q in succ[p] if q not in visited]

   if nexts:
      stack.extend(nexts)
   else:
      if p not in visited: print p
      visited.add(p)
      stack.pop()

Dear god...

CoderLoop: reversewords
#!/usr/bin/python

import sys
import re

print re.sub(r"\w+", lambda m: m.group(0)[::-1], open(sys.argv[1], 'r').read()),

Monday, October 11, 2010

Writing more elegant comparison logic with Guava's Ordering

One of the essential interfaces to define object ordering in Java is Comparator<T>. The contract is rather straightforward:
int compare(T o1, T o2): Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.
This simplicity can be deceiving: implementing a correct and readable comparator logic can be quite tricky. There are many common pitfalls that one should be aware of, and failure to do so may lead to subtle bugs. Often we also see complicated (and therefore error-prone) comparators that implement various additional rules (e.g. null handling and/or tie breaking) that obfuscates the big picture logic. Moreover, even the simplest and most straightforward comparator often duplicates code on the two objects being compared.

One of the better solutions to solve all of these problems is Ordering<T> from Guava (a strictly compatible superset of the old Google Collections library). This post shows how it enables elegant implementation of various comparison patterns. An appendix also explains why the commonly seen comparison by subtraction and reverse order comparison by negation "tricks" are broken.

Tuesday, September 28, 2010

Puzzle: Java identity hashcode

Just for fun, I came up with this original (hopefully) puzzle: can you find a Java string whose hashCode() is equal to the numeric value that the string "represents"?

Here are some random guesses that fail to solve the puzzle but perhaps convey the idea:
System.out.println("42".hashCode()); // 1662

System.out.println(
   Integer.toBinaryString(
      "10101010101010101010101010101010"
         .hashCode()
   )
);  // 11001111110011111101111111110000

System.out.println(
   "Long.MAX_VALUE + Integer.valueOf(666)".hashCode()
   ==
    Long.MAX_VALUE + Integer.valueOf(666)
); // false
This is just for fun, so the rules are pretty relaxed. You can use base conversion, expressions, constants, widening conversion, autoboxing, etc. Anything creative is acceptable.

Sunday, September 19, 2010

Matching palindromes in PCRE regex

$palindrome = '/(?x)
^
  (?:
      (.) (?=
              .*
              (
                \1
                (?(2) \2 | )
              )
              $
          )
  )*
  .?
  \2?
$

/';
(see with testcases on ideone.com)

This is "better" than the one given in the man page, which is based on recursive subpattern matching which doesn't quite "work" because it's atomic.

Too lazy to explain how it works, so I'm just going to toss the bone to stackoverflow (see: How does this PCRE pattern detect palindromes?).

Thursday, September 16, 2010

Finding factorials using regex: .NET balancing groups

Believe it or not, this is also a first iteration solution. In fact, as a personal first, this pattern was completely developed on paper (see also on ideone.com).

var r = new Regex(@"(?xn) 

^(
   (
     ( ^.
     | (?=  (?<temp-n> .)+ )
       (?<= (?<fact>  .+)  )
       (?<n-temp> \k<fact> )+?
       (?(temp) (?!))
     )
     (?<n>)
   )+
 )$

");

for (int x = 0; x < 6000; x++) {
   Match m = r.Match("".PadLeft(x));
   if (m.Success) {
      Console.WriteLine("{0,4} = {1}! ", x, m.Groups["n"].Captures.Count);
   }
}
The only thing that tripped me up was the need for reluctant quantifier, but everything else was exactly as it was first written on paper.
I'm halfway convinced that the problem with the greedy repetition is a bug, or perhaps was and has since been fixed (see: stackoverflow - Backtracking a balancing group in a greedy repetition may cause imbalance?)

Wednesday, September 15, 2010

Finding factorials using regex: a catastrophic first iteration solution

I finally got this to work, and it was such a car wreck of a pattern that I just had to immediately exhibit it for posterity (see also on ideone.com):
import java.util.regex.*;

public class Factorial {
   static String assertPrefix(String pattern) {
      return "(?<=(?=^pattern).*)".replace("pattern", pattern);
   }
   public static void main(String[] args) {
      final Pattern FACTORIAL = Pattern.compile(
         "(?x) (?: inc stepUp)+"
            .replace("inc", "(?=(^|\\1 .))")
            //                     1
            
            .replace("stepUp", "(?: ^. | (?<=(^.*)) (?=(.*)) (?: notThereYet \\2)+ exactlyThere )")
            //                                 2         3
            
            .replace("notThereYet", "(?:  (?=((?=\\3) .  |  \\4 .)) (?=\\1(.*)) (?=\\4\\5)  )")
            //                                        4                   5
            
            .replace("exactlyThere", "measure4 measure1")
               .replace("measure4", assertPrefix("\\4(.*)"))
               .replace("measure1", assertPrefix("\\1\\6"))
      );
      
      for (int n = 0; n < 1000; n++) {
         Matcher m = FACTORIAL.matcher(new String(new char[n]));
         if (m.matches()) {
            System.out.printf("%3s = %s!%n", n, m.group(1).length() + 1);
         }
      }
   }
}
Yes, I will definitely work on this and make it more elegant... somehow...

Tuesday, September 14, 2010

Computing the root of squares using regex

final Pattern SQUARE = Pattern.compile(
   "(?x) (?: (\\2|^) (\\1 .) )+"
);
for (int n = 0; n < 1000; n++) {
   Matcher m = SQUARE.matcher(new String(new char[n]));
   if (m.matches()) {
      System.out.printf("%3s = %2s^2%n", n, m.group(2).length());
   }
}
//   1 =  1^2
//   4 =  2^2
//   9 =  3^2
//  16 =  4^2
//     :
// 961 = 31^2
(see also on ideone.com)

Group 1 contains a forward reference to group 2, and group 2 contains a back reference to group 1.

Alternatively you can have just a single self-referencing group, but this is way more fun.

Bug 6984178: Greedy repetition throws StringIndexOutOfBoundsException

During the development of the Fibonacci regex pattern, I discovered a bug in the Java regex engine (see: Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition?). This has been filed under BugID 6984178, though I filed another bug months ago and it still hasn't shown up in the external database.

Since I reported the bug, I thought I should at least also investigate it a bit further. I've been able to further simplify the pattern to reproduce the bug (see also on ideone.com).
System.out.println(
   "abaab".matches("(?x) (?: (?=(a+)) \\1 b )* x")
); // StringIndexOutOfBounds: -1
The out of bounds index is the difference in length between the first and the second a+ (e.g. "aabaaaaab" gets -3).

Note that using reluctant *? or possessive *+ simply returns false. Only the greedy * raises the exception.

It looks like the problem is triggered by the attempt to backtrack a greedy repetition when there's a reference to a capturing group inside a lookahead.

Monday, September 13, 2010

Finding Fibonacci numbers using regex

This is one of the more fun patterns I've written; it exploits the Second Identity of Fibonacci Numbers.


final String FIBONACCI = 
   "(?x) .? | ( \\2?+ (\\1|^.) )* ..";
                        
for (int n = 0; n < 10000; n++) {
   String s = new String(new char[n]);
   if (s.matches(FIBONACCI)) {
      System.out.printf("%s ", n);
   }
}
// 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
(see also on ideone.com | PHP | C#)

Wednesday, September 8, 2010

The power of nested references in Java regular expressions

One day it dawned upon me that I haven't seen any interesting uses for nested references just yet, so I decided to sit down and brew some interesting patterns; boy were there many.

Here's what I've published on stackoverflow so far:
I still have a few more good ones to go.
These aren't part of the series, but use nested references regardless:
  • Split string of varying length using regex (an absolutely wonderful problem; I'd like to go back to this sometime in the future to apply new things I'd have learned by then, because the current solution is a bit convoluted).

Saturday, August 28, 2010

Double.valueOf does NOT cache instances!

I had one of those lightbulb moments where I asked myself, "You know, I wonder how Double.valueOf implements the caching... Or does it???".

The documentation is an obvious place to look for information, of course:
public static Double valueOf(double d)

Returns a Double instance representing the specified double value. If a new Double instance is not required, this method should generally be used in preference to the constructor Double(double), as this method is likely to yield significantly better space and time performance by caching frequently requested values.
No obvious "frequently requested values" come to mind, aside from perhaps the NaN, the signed zeroes, the signed ones, and the signed infinities. Perhaps MIN/MAX_VALUE and MIN_NORMAL are also worth caching, since they at least do have the privilege of being defined as class constants.

Either way, it's not as simple as having an array the way e.g. Integer instances are cached by Integer.valueOf for values around zero.

In my pursuit to uncover what is actually cached and how, I went straight to the source code.
public static Double valueOf(double d) {
    return new Double(d);
}
Ladies and gentlemen, we have been duped.

Saturday, August 21, 2010

Fluid dynamics: a code formatting experiment

Currently experimenting with fluent interface formatting. Specifically, how should you format it when you are switching context in the middle of a chain? (Should you even do that in the first place?)

System.out.println(
   new StringBuilder()
      .append("pine-apple")
      .append("apple-pie")
      .append("pie-crust")
      .toString()
         .replaceAll("([a-z]+)\\1", "$1")
         .toUpperCase()
); // PINE-APPLE-PIE-CRUST

Wednesday, August 11, 2010

(Unofficial) Errata for the Java Language Specification 3rd Edition

I don't think there's an official errata for the 3rd Edition, so here's my (half-hearted) attempt to compile one.

5.1.4 "…combines both widening and narrowing primitive convesions…"
4.8 "…a raw type is define to be…"
15.8.2 "…a class literal, C.Class, where…"
8.4 "…but this is discouraged as a matter of syle."

There are also some more serious non-typographical errors/omissions/inconsistencies in the Java programming language grammar given throughout the document when cross referenced with 18.1.

See also: stackoverflow.com - Errata for Java Language Specification 3rd Edition

By the way, those section numbers aren't linked, but docref.polygenelubricants.com is still up (see blog entry: OpenSearch for JLS).

Tuesday, August 3, 2010

Egex-ray astardization-bay

A (loosely specified) Pig Latin regex translator:

String[] words = "ask another stupid question please".split(" ");

for (String word : words) {
   System.out.printf("%s -> %s%n", word,
      ("w" + word).replaceAll(
         "w(qu|[^aeiou]+|(?<=(w)))([a-z]*)",
         "$3-$1$2ay"
      )
   );
}
This prints (as seen on ideone.com)
ask -> ask-way
another -> another-way
stupid -> upid-stay
question -> estion-quay
please -> ease-play

Sunday, August 1, 2010

Rickrolling regex

System.out.println(
   ("Never gonna give you up, let you down, run around and desert you." + "\n" +
    "Bob Dole is a friend of the tobacco industry, likes your style,,,."
   ).replaceAll("(?m)(?<=^(\\w+ \\w+).*,)", "\n$1")
);
This prints (as seen on ideone.com)
Never gonna give you up,
Never gonna let you down,
Never gonna run around and desert you.
Bob Dole is a friend of the tobacco industry,
Bob Dole likes your style,
Bob Dole,
Bob Dole,
Bob Dole.

Monday, July 5, 2010

Split into powers of 2 using regex

There are so many levels of abuse here it's not even funny.
for (String s :
   new String(new char[(1 << 8) - 1])
   .split("(?=(.*))(?<=(?<=(^.*))\\G.*)(?<=(?=\\2\\2.\\1)^.*)")
) {
   System.out.printf("%s ", s.length());
} // "1 2 4 8 16 32 64 128 "
I think there's still room for "improvement", though...
"Improvements", as Java string literals:
"(?=(.*))(?<=(?<=(^.*))\\G.*)(?<=(?=\\2\\2.\\1)^.*)"
"(?=(.*))(?<=(?<=(^.*))\\G.*)(?<=(?=\\2.\\1)\\G.*)"

Wednesday, June 23, 2010

Regex insanity

Now the most insane regex I've written:
String[] tests = {
   "APPLE juice; APPLE cream; APPLE pie; ORANGE sherbet; ORANGE punch; ",
   "APPLE juice; ORANGE punch; ",
   "APPLE juice; ORANGE punch; ORANGE sherbet; ",
   "APPLE juice; ORANGE punch; LEMON pie; ",
   "APPLE juice; APPLE pie; APPLE cream; ",
   "Tommy eats; Tommy sleeps; Tommy drinks; Jamie tosses; Jamie turns; Amy blinks; "
};
for (String test : tests) {         
   System.out.println(test.replaceAll(
      "(?<=^(?=.*\\b(\\w+) \\w+; \\G).*)?(?:\\1 |(\\w+ ))(\\w+; )",
      "$2$3"
   ));
}
This prints, believe it or not:
APPLE juice; cream; pie; ORANGE sherbet; punch; 
APPLE juice; ORANGE punch; 
APPLE juice; ORANGE punch; sherbet; 
APPLE juice; ORANGE punch; LEMON pie; 
APPLE juice; pie; cream; 
Tommy eats; sleeps; drinks; Jamie tosses; turns; Amy blinks; 
It'll be fun to go back to this, say oh, five years from now, and try to figure out what's going on here...

Sunday, June 20, 2010

Using regex to replace a digit with that many number of zeroes

This was an illustrative example I concocted to answer a stackoverflow question.

The problem: using one String.replaceAll, replace each digit with that many number of zeroes.
See also: [stackoverflow] -- Regex: how can I replace $n with n instances of a string?

It's not pretty, but this can in fact be done (see on ideone.com):
// meta-regexing to generate the regex and replacement string
String seq = "123456789";
String regex = "(?x)\n" + seq.replaceAll(".", "(?=[$0-9].*(0)\\$)?\n") + "[0-9]";
String repl = seq.replaceAll(".", "\\$$0");

// let's see what they look like!  
System.out.println(repl);
// $1$2$3$4$5$6$7$8$9
System.out.println(regex);
// (?x)
// (?=[1-9].*(0)$)?
// (?=[2-9].*(0)$)?
// (?=[3-9].*(0)$)?
// (?=[4-9].*(0)$)?
// (?=[5-9].*(0)$)?
// (?=[6-9].*(0)$)?
// (?=[7-9].*(0)$)?
// (?=[8-9].*(0)$)?
// (?=[9-9].*(0)$)?
// [0-9]

String input = "3 2 0 4 x 11 9";
System.out.println(
 (input + "0").replaceAll(regex, repl)
);
// 000 00  0000 x 00 000000000
// it works!  

Here's a bonus snippet that shows the same technique, applied differently (see also on ideone.com):
String seq = "123456789";
String regex = seq.replaceAll(".", "(?=[$0-9]([a-z]))?") + "[0-9][a-z]";
String repl = seq.replaceAll(".", "\\$$0");

String input = "3a 2b 0c 4d 5x";
System.out.println(input.replaceAll(regex, repl));
// aaa bb  dddd xxxxx

Sunday, June 13, 2010

[TF005] Widening primitive conversion doesn't lose any information (true/false?)

(true/false?) The reason why Java doesn't let you assign a double value to a long variable without an explicit narrowing conversion is because some information may be lost during the conversion. Conversely, the reason why Java lets you assign a long value to a double variable without an explicit widening conversion is because no information is ever lost during the conversion.

Friday, June 4, 2010

My first Eclipse bug

I filed my first Eclipse bug recently:

Bug 314830 - [JDT/core/compiler]
Switching on a null expression doesn't always throw NullPointerException
.

The person to which this bug is assigned made no mentions about it being specific to any version, so presumably this affects all current versions of Eclipse.

The essence of the bug is that due to overly aggressive optimizations on the part of Eclipse compiler, the following code does not throw NullPointerException.
java.math.RoundingMode x = null;
switch(x) {};

switch((Integer) null) {};

switch((Character) null) {
   default: System.out.println("I've got sunshine!");
}
This is a violation of JLS 14.11 The Switch Statement.
SwitchStatement:
        switch ( Expression ) SwitchBlock

When the switch statement is executed, first the Expression is evaluated. If it evaluates to null, a NullPointerException is thrown and the entire switch statement completes abruptly for that reason.

See also: stackoverflow.com - Eclipse bug? Switching on a null with only default case

Tuesday, May 25, 2010

A Java printf quine

The challenge: can you write a printf statement in Java that prints the statement itself?

The answer: a resounding yes (see output on ideone.com)
System.out.printf("System.out.printf(%1$c%2$s%1$c,34,%1$c%2$s%1$c);",34,"System.out.printf(%1$c%2$s%1$c,34,%1$c%2$s%1$c);");
My, my, my. Just look at it!!! Have you ever seen such symmetrical self-generating concentrated beauty???

This is without a doubt one of my masterpieces in Java.

Saturday, May 22, 2010

I don't like regex! I love it!

System.out.println("du hast mich".replaceAll("(?<=^(.*)) ", ", $1 "));
// prints "du, du hast, du hast mich"

System.out.println("hold thrill kiss kill me".replaceAll("(?= .* (.*))", " $1,"));
// prints "hold me, thrill me, kiss me, kill me"

System.out.println("Mmm ".replaceAll(".(?=.*?( )?(?<=(^.*) ))", "$2$1"));
// prints "Mmm Mmm Mmm Mmm"

Thursday, May 20, 2010

On "meta-formatting" experiments

I've been way too obsessed with stackoverflow to write in the blog, but here's a fun little snippet:
String[] arr = { "a", "b", "c" };
final int N = arr.length;
String format = new String(new char[N])
    .replace("\0", ":")
    .replaceAll(".(?=(.)?)", "%s$1");
System.out.println(format);      // prints "%s:%s:%s"
System.out.format(format, arr);  // prints "a:b:c"
Not knowing any better, I call this "meta-formatting", i.e. the programmatic generation of formatting strings. It's really not that much different than "meta-regexing", i.e. the programmatic generation of... do I really have to say this?

And here's a slightly more complicated example:
String[] arr = { "alpha", "omega", "xxx", "yyy" };
final int N = arr.length;
String format = new String(new char[N/2])
    .replaceAll(".", "(%7s : %-7s)%n");
System.out.format(format, arr);
// prints: (  alpha : omega  )
//         (    xxx : yyy    )
And another one for giggles:
String[] arr = { "a", "b", "c", "d" };
final int N = arr.length;
String format = new String(new char[N])
    .replaceAll("(?<=(^.*)).", "%s$1")
    .replaceAll("(?!^)(?=%)", " = ")
    .replaceAll("\0", "+%<s");
System.out.format(format, arr);
// prints "a = b+b = c+c+c = d+d+d+d"

Monday, May 10, 2010

[TF004] NaN is not a Number (true/false?)

(true/false?) NaN is acronym for "Not a Number", therefore it's not a Number.

(Please don't kill me for this...)

Saturday, May 1, 2010

Java simple assignment operator semantics

I'll elaborate on this more once I have the time, but I was surprised by the semantics of a simple assignment in Java. I knew it's not "right-then-left" (then assign), but rather "left-then-right" (then assign). It turns out, though, that this isn't as straightforward as I initially thought.

Here are some snippets:
int[] arr = { };
arr[-1] = arr[-2];   
// Throws ArrayIndexOutOfBoundsException: -2

int[] arr = { };
arr[arr[-1]] = arr[arr[-2]];
// Throws ArrayIndexOutOfBoundsException: -1
int[] arr = null;
arr[-1] = (arr = new int[0])[0];
// Throws ArrayIndexOutOfBoundsException: 0

int[] arr = null;
arr[-1] = (arr = new int[1])[0];
// Throws NullPointerException

Tuesday, April 27, 2010

[TF003] You can only throw instanceof Throwable in Java (true/false?)

(true/false?) In Java, you can only throw something that is an instanceof Throwable.

Saturday, April 24, 2010

On the all other products, no division problem

This is among my favorite interview questions:
Given an array of numbers, return an array of the products of all other elements in the array.

More formally, write a method int[] products(int...), where products(nums)[i] is the product of all nums[j], j != i.
System.out.println(
   java.util.Arrays.toString(
      products(1, 2, 3, 4, 5)
   )
); // prints "[120, 60, 40, 30, 24]"
   // i.e. [(2*3*4*5), (1*3*4*5), (1*2*4*5), (1*2*3*5), (1*2*3*4)]
You must do this in O(N) without using division.
I offer two solutions, both beautiful and hideous in their own ways.

Thursday, April 22, 2010

[TF002] dot-star matches all strings! (true/false?)

(true/false?) In regex, the dot metacharacter matches everything, and the star allows zero or more repetition, so for all s instanceof String, s.matches(".*").
static void tf002(String s) {
   if (s instanceof String) {
      assert s.matches(".*");
   }
}

Wednesday, April 21, 2010

[TF001] An empty string doesn't contain anything! (true/false?)

Note: This is the start of a (daily, hopefully!) series of articles exploring various "True or False?" statements regarding Java. Hint: they're all false. The idea is to correct misconceptions through counterproofs just have fun.
(true/false?) By definition, something is empty if it doesn't contains anything, so given String s1, s2, if s1.isEmpty(), then !s1.contains(s2).
static void tf001(String s1, String s2) {
   if (s1.isEmpty()) {
      assert !(s1.contains(s2));
   }
}

Tuesday, April 13, 2010

CodingBat string manipulation problems - Extra

This part includes solutions that I didn't come up with, my own variants of the original problems, exercising variations on previous regex solutions, etc.

A little help

plusOut: Given a string and a non-empty word string, return a version of the original string where all chars have been replaced by pluses ("+"), except for appearances of the word string which are preserved unchanged.
public String plusOut(String str, String word) {
  return str.replaceAll(
    "(?<!(?=word).{0,M})."
      .replace("word", java.util.regex.Pattern.quote(word))
      .replace("M", String.valueOf(word.length()-1)),
    "+"
  );  
}
Credit to Alan Moore on stackoverflow for this last one.

Monday, April 12, 2010

CodingBat string manipulation problems - Appendix

This appendix just has more examples, including one where my first attempt ended in an unreadable disaster.

Octoslash disaster

last2: Given a string, return the count of the number of times that the string consisting of the last 2 characters also appear elsewhere.
public int last2(String str) {
  return str.isEmpty() ? 0
    : str.split(
        "(.)(?=(.).)(?=.*\\1\\2$)",
        -1
      ).length - 1;
}
The above was not my first solution. That would be this one:

Sunday, April 11, 2010

CodingBat string manipulation problems - Part 5

This last part covers some solutions that aren't pure regex; unlike the other parts where I used stubbornly used regex whenever possible, here I use whatever comes naturally.

xyBalance: Return true if the given string is xy-balanced, that is for all the 'x' chars in the string, there exists a 'y' char somewhere later in the string. One 'y' can balance multiple 'x's.
public boolean xyBalance(String str) {
  return str.lastIndexOf('x') <= str.lastIndexOf('y');
}

Saturday, April 10, 2010

CodingBat string manipulation problems - Part 4

This second to last part shares a few things you should know about Java regex, how to measure lengths, the "2-for-1" technique, and a few other tricks.

Know your flags

withoutString: Given two strings, base and remove, return a version of the base string where all non-overlapping instances of the remove string have been removed (not case sensitive).
public String withoutString(String base, String remove) {
  return base.replaceAll(
    "(?i)" + java.util.regex.Pattern.quote(remove), "");
}
The 'i' flag enables case-insensitive matching; Pattern.quote is used (unnecessarily for codingbat tests) to match remove literally, giving no special meaning to any regex special metacharacters.

Friday, April 9, 2010

CodingBat string manipulation problems - Part 3

This part of the series covers relatively more complicated solutions, including one that exploits a known bug, and another using nested assertions.

Negative assertions

gHappy: We'll say that a lowercase 'g' in a string is "happy" if there is another 'g' immediately to its left or right. Return true if all the 'g's in the given string are happy.
public boolean gHappy(String str) {
  return !str.matches(".*(?<!g)g(?!g).*");
}
In the absence of s.contains("pattern"), one can always resort to s.matches(".*pattern.*").

Also note the use of first-order logic duality ∀x P(x) ⇔ ¬∃x ¬P(x) (the quantified version of De Morgan's law): instead of checking that all 'g's are "happy", we check that there isn't a 'g' that isn't.

Thursday, April 8, 2010

CodingBat string manipulation problems - Part 2

Continuing on with the series, this part shows basic uses of lookarounds and the split counting idiom.

Lookarounds

frontAgain: Given a string of any length, return true if the first 2 chars in the string also appear at the end of the string (allowing overlap).
public boolean frontAgain(String str) {
  return str.matches("(?=(..)).*\\1");
}

Wednesday, April 7, 2010

CodingBat string manipulation problems - Part 1

I had fun doing all the Java exercises on codingbat.com the other week (all 264 at last count). None of them were truly hard, but I challenged myself to stubbornly use regex whenever possible, but also especially when it actually leads to a much better solution.

This part shows some of the easier ones.

zipZap: Look for patterns like "zip" and "zap" in the string -- length-3, starting with 'z' and ending with 'p'. Return a string where for all such words, the middle letter is gone.
public String zipZap(String str) {
  return str.replaceAll("z.p", "zp");
}
Why regex? This is why!

Tuesday, April 6, 2010

Ternary operator idioms

I'm sure others have done something similar, but these are the idioms I use for ternary operators:
return what ? yes : no;

return what ? yes
  : noooooooooooooooooooooooo
;

return whaaaaatttttttttt
  ? yeeessss : noooooo
;

return whaaaaatttttttttt
  ? yeeeeeeeeesssssssssssssss
  : noooooooooooooooooooooooo
;

return
  thisWay ? thisThing :
  thatWay ? thatThing :
  otherThing
;
Variation include NOT having the semicolon on its own line, and of course return can be replaced by other grammatical possibilities (e.g. variable declaration and initialization), but these are the idioms in my standard repertoire that I've developed over the years.

I'm not sure how common these idioms are (the people at stackoverflow often complain of unreadability), but I'd like to think that these are very useful idioms, and very readable once you get used to it.

Why Floyd-Warshall is one of my favorite algorithms

Among classes of algorithms, dynamic programming is my favorite. The way it happened was almost a bait-and-switch: I was quickly drawn to it by its name, followed by a brief period of rejection ("Such a cool name and this is it?") and misunderstanding ("It's just caching!"), before eventually appreciation and finally admiration comes in.

If I have to pick a favorite among many dynamic programming algorithms, it will be Floyd-Warshall's all-pairs shortest path algorithm.
procedure FloydWarshall ()
   for k := 1 to n
     for i := 1 to n
        for j := 1 to n
          path[i][j] = min ( path[i][j], path[i][k]+path[k][j] );
Here's why it's cool: when you first learn about the shortest-path problem in your graph theory course, you probably start with Dijkstra's algorithm that solves single-source shortest path. It's quite complicated at first, but then you get over it, and you fully understood it.

Then the teacher says "Now we want to solve the same problem but for ALL sources". You think to yourself, "Oh god, this is going to be a much harder problem! It's going to be at least N times more complicated than Dijkstra's algorithm!!!".

Then the teacher gives you Floyd-Warshall. And your mind explodes. Then you start to tear up at how beautifully simple the algorithm is. It's just a triply-nested loop. It only uses a simple array for its data structure.

The most eye-opening part for me is the following realization: say you have a solution for problem A. Then you have a bigger "superproblem" B which contains problem A. The solution to problem B may in fact be simpler than the solution to problem A.

Sunday, April 4, 2010

sort3

A snippet from my latest question on stackoverflow "Reordering arguments using recursion (pro, cons, alternatives)", what I honestly believe is the most elegant solution one can write for sorting 3 values:
// sorts 3 values and return as array
static int[] sort3(int a, int b, int c) {
    return
      (a > b) ? sort3(b, a, c) :
      (b > c) ? sort3(a, c, b) :
      new int[] { a, b, c };
}
So far, most people say that the code is unreadable. Which is rather sad, actually.

I hope I won't ever have to dumb down my work for stupid people.

Saturday, April 3, 2010

ABC of regex

System.out.println(
   "abc".replaceAll("(?=(.*)).", "$1!")
); // prints "abc!bc!c!"

System.out.println(
   "abc".replaceAll(".(?<=(^.*))", "$1!")
); // prints "a!ab!abc!"

System.out.println(
   "abc".replaceAll(".(?=.*(?<=(^.*)))", "$1!")
); // prints "abc!abc!abc!"
Fun fact: non-obvious length lookbehind shouldn't even compile (Bug ID 6695369).

From regular-expressions.info:
Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.
Well, so much for that limitation.

Friday, March 19, 2010

On Java's (supposedly) immutable strings

The power of setAccessible is beyond belief.
import java.lang.reflect.*;

public class MutableStrings {
   public static void main(String args[]) throws Exception {
      Field value = String.class.getDeclaredField("value");
      value.setAccessible(true);
      value.set("foo", "bar".toCharArray());      

      System.out.println("foo"); // prints "bar"
   }
}
Note: Although I label this post as [fun], I don't think it actually is...

Thursday, March 18, 2010

Modifying static final fields through reflection

In Java, nothing is impossible.
import java.lang.reflect.*;

public class EverythingIsTrue {
   static void setFinalStatic(Field field, Object newValue) throws Exception {
      field.setAccessible(true);

      Field modifiersField = Field.class.getDeclaredField("modifiers");
      modifiersField.setAccessible(true);
      modifiersField.setInt(field, field.getModifiers() & ~Modifier.FINAL);

      field.set(null, newValue);
   }
   public static void main(String args[]) throws Exception {      
      setFinalStatic(Boolean.class.getField("FALSE"), true);

      System.out.format("Everything is %s", false); // "Everything is true"
   }
}
Having fun yet?

Sunday, March 14, 2010

Open facebull

I was asked to share my solution for my facebull, and I finally did (github).

I'm not particularly proud of the code, since it's rather messy. I spent about 5 days thinking about the algorithm, and I coded it in about 30 min and submitted it, and to my surprise it passed.

I am aware that this is probably the most challenging Facebook puzzle, which is why I was initially reluctant to share the solution. In the end, though, I decided that none of this really matters anyway. It's just 250 lines of poorly written Java code that when compiled and ran against some input produces the expected output, and some bot decides that it's good enough.

So there.
Update 03/18: I plan to rewrite this using immutable sets instead of java.util.BitSet; I was worried that the algorithm is way too slow and thought I'd need to squeeze every bit of performance to pass the bot. I think this optimization is premature.

Fact: the original passing solution had 100 more lines of various preprocessing heuristics; they turned out to be unnecessary, since the current version still passes.

Saturday, March 13, 2010

St-st-stuttering rr-rr-regex

Credit for this one goes to Alan Moore on stackoverflow.com (again!) from his answer to my question:
Matcher m = Pattern.compile("(?=(..))").matcher("12345");
while (m.find()) {
   System.out.println(m.group(1));
} // prints "12", "23", "34", "45"
This is the most instructive pattern I've seen BY FAR. The most important revelations for me are:
  • You can capture during lookaround! Which means that...
  • Group zero is not necessarily a superstring of all the other groups!

Tuesday, March 9, 2010

Looking behind as far as the eyes could see...

System.out.println(
   java.util.Arrays.deepToString(
      "This works! Yes, really!! Try it! JUST DO IT!!!".split("(?<=!++)")
   )
); // [This works!,  Yes, really!!,  Try it!,  JUST DO IT!!!]
 
The above snippet works. I had to experiment with the different quantifiers, and neither greedy nor reluctant worked, but somehow possessive does what I wanted. And I'm not sure how and why.

It turns out that it isn't supposed to work. That pattern isn't supposed to be compilable; a PatternSyntaxException should've been thrown. That it compiles is a bug (ID 6695369); that it works is mind-boggling. So, unlike the case in the bug report, this pattern doesn't "fail silently" -- it "works unexpectedly"!!!

For what it's worth, the "right" pattern to use is "(?<=!)(?!!)" (credit to Alan Moore on stackoverflow.com).

Tuesday, March 2, 2010

Kidnapping an inner class instance

This blew my mind. Although I suppose this naturally follows from the fact that Java's final fields aren't:
import java.lang.reflect.*;

public class Me {
    final String name;

    Me(String name) {
        this.name = name;
    }

    class InnerMe {
        void whoAreYou() {
            System.out.println(name);
        }
    }
    
    InnerMe innerSelf() {
        return new InnerMe();
    }

    public static void main(String args[]) throws Exception {
        final Me me = new Me("Just the old me!");
        final InnerMe innerMe = me.innerSelf();

        innerMe.whoAreYou(); // "Just the old me!"

        Field outerThis = innerMe.getClass().getDeclaredFields()[0];
        outerThis.setAccessible(true);
        outerThis.set(innerMe, new Me("New and improved me!"));

        innerMe.whoAreYou(); // "New and improved me!"
    }
}

Monday, March 1, 2010

Go invoke yourself!

import java.lang.reflect.Method;

public class InfiniteMirror {
   public static void main(String args[]) throws Exception {
      Method m = Method.class.getMethod(
         "invoke", Object.class, Object[].class
      );
      Object[] a = { m, null };
      a[1] = a;
      m.invoke(m, a);
   }   
}
Oh my...
This is not a bug by any mean, but out of curiosity, I looked into the database to see if there's anything like it. There is (ID 4185411), since it was a bug back when this would crash the JVM (!!!).
WORK AROUND: Never, ever, write code that is as stupid as this in any real project.

Saturday, February 27, 2010

Final field in Java: is it really?

import java.lang.reflect.Field;

public class ItsFinallyTrue {
   
   public final boolean dream1 = false;
   public final Boolean dream2 = false;
   
   @Override public String toString() {
      return dream1 + "-" + dream2;
   }

   public static void main(String[] args)
         throws IllegalAccessException, SecurityException {

      ItsFinallyTrue dreamer = new ItsFinallyTrue();
      System.out.println(dreamer);
      
      for (Field dream : dreamer.getClass().getDeclaredFields()) {
         dream.setAccessible(true);
         dream.set(dreamer, true);
      }      
      System.out.println(dreamer);
   }
}
What will happen if you run this code?

(A) It throws IllegalAccessException
(B) It throws SecurityException
(C) It prints false-false and false-false
(D) It prints false-false and false-true
(E) It prints false-false and true-true

Thursday, February 25, 2010

Concerning deep*$#!%

How do Arrays.deepToString, deepHashCode, deepEquals handle self-containing arrays? Inconsistently, apparently...
import java.util.Arrays;

public class LetsGetToTheBottomOfThis {
   
   public static Object[] selfContainingArray() {
      Object[] arr = new Object[1];
      arr[0] = arr;
      return arr;
   }
   
   public static void main(String[] args) {
      try {
         System.out.println(
            Arrays.deepToString(
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // prints "[[...]]"

      try {
         System.out.println(
            Arrays.deepHashCode(
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // stack overflows

      try {
         System.out.println(
            Arrays.deepEquals(
               selfContainingArray(),
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // stack overflows
   }
   
}
This behavior is documented clearly, so I suppose it's not exactly a bug (which is why it's not in the database, or else I would've submitted it). It is curious, though, why deepToString is made cycle-aware and the rest aren't. I admit that deepEquals will be made quite complicated, but deepHashCode is essentially the same as deepToString.
Also, this suggests to me the need for some sort of visitor pattern as well.
Update 03/01: OK, it's a bit worse than I thought: this inconsistency is also there with List (where presumably you favor safety to efficiency).
Note: While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.

Tuesday, February 23, 2010

Hoare: "Null References: The Billion Dollar Mistake""

C. A. R. Hoare (most famous for his quicksort) is now convinced that the concept of a null reference was a mistake.
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
I haven't done much type system analysis to be entirely convinced. I do use sentinels and/or Null Object whenever I can, but that is rather unrelated to the fundamental question of whether null itself is good or bad.
Watching the video right now. I can say already that I admire him for bravely admitting his mistakes (though I've yet to see this as one).
Eh, I'm still not convinced. Using the same argument shown in the final segment, while it's in many ways unsafe, I still think it's useful enough to have.

Wednesday, February 17, 2010

"Microcomputers: the unredeemed lollipop..."

So let us formalize the problem of finding a pair of strings (not necessarily distinct) from a dictionary of size N, such that the Java string hash code of their concatenation is Integer.MIN_VALUE or 0.

How quickly can this be done?

A quote from Joshua Bloch himself on how he found "polygenelubricants" (at 1:07:00 in the video):
I have a dictionary of 200,000 words, which means there are 40 billion word pairs, [...] and it took it 10 minutes to calculate all the hash values of every word pair
The quote suggests at least an O(N^2) solution, assuming that the hash code calculation is O(1) for each word pair (which can in fact be done).

I think you can do MUCH better than that, however. I'm still working on this as we speak, but here's a Maple snippet to shed some light:
> # "lubricants".hashCode() = -573982625
> # "lubricants".length() = 10
> msolve(x * 31^(10) + (-573982625) = 0, 2^31);
                                {x = 561786081}
> # 561786081 = "polygene".hashCode()
Thus, the problem is solvable in O(N) times the time it takes to do msolve.

My goal is to demonstrate the superiority of this technique by coming up with word triplets. If O(N^2) takes 10 minutes, then O(N^3) would take almost 4 years.

Tuesday, February 16, 2010

Why I could've bought stubbiespornographer.org and didn't

I bought polygenelubricants.com last night. Imagine that, my very first domain name.

Just in case you're wondering, the string "polygenelubricants" traces its origin to Java Puzzlers by Joshua Bloch and Neal Gafter; it is a string whose hash code in Java is Integer.MIN_VALUE. It was used to demonstrate a corner case involving the most negative number in a two-complement representation.

Going for the classic may not have been terribly original, but I think that's also precisely why it's the right choice. Lest I be accused of merely stealing other people's work, however, I will list more strings that fall into the same bucket. The first 4 of these are already known; the rest are my own discoveries.
polygenelubricants
DESIGNING WORKHOUSES
GydZG_
aluminiumsbåtentabu
stubbiespornographer.org
squirreledconcoctive.net
insecticidaltightness.com
www.livableunsoundness.net
www.cosmetologiesbattleship.org
www.nonsectarianhomology.org
www.sadistpacemakers.com
WWW.MOLLYCODDLERPOTABLES.NET
Honestly, I think some of those domain names are VERY tempting.
While we're at it, I also found some examples of the equally (if not more) important strings that hash to 0.

Monday, February 15, 2010

java.lang.String.hashCode() doesn't cache 0

public class Main {
   static void test(String s) {
      long start = System.currentTimeMillis();
      for (int N = 1 << 20; N --> 0;) {
         s.hashCode();
      }
      System.out.format("Took %d ms.%n", System.currentTimeMillis() - start);
   }
   public static void main(String[] args) {
      String z = new String(new char[1000]);
      test(z);
      test(z + "!");
   }
}
Run it!
Quiz: what percentage of all Java strings hash to 0?

Saturday, February 13, 2010

Factory method pattern for Java exceptions

I'm not sure if this is a new idea. I did a brief survey and I didn't find anything like it out there. I'm going to get right to the point, and then incrementally explain the rationale of my approach.
void before(int index, int lo, int hiX) {
   try {
      throw new IndexOutOfBoundsException(
         String.format("%d out of [%d,%d)", index, lo, hiX)
      );
   } catch (IndexOutOfBoundsException e) {
      throw (NoSuchElementException)
         new NoSuchElementException().initCause(e);
   }
}

void after(int index, int lo, int hiX) {
   try {
      throw Exceptions.indexOutOfBounds(index, lo, hiX);
   } catch (IndexOutOfBoundsException e) {
      Exceptions.translate(e, Exceptions.noSuchElement());
   }
}

Thursday, February 11, 2010

Beware when reflecting on Java arrays

From JLS 10.7 (Array Members):
The members of an array type are all of the following:
  • The public final field length, which contains the number of components of the array (length may be positive or zero).
  • The public method clone, which overrides the method of the same name in class Object and throws no checked exceptions. The return type of the clone method of an array type T[] is T[].
  • All the members inherited from class Object; the only method of Object that is not inherited is its clone method.
Unfortunately, life isn't all peaches and cream...

RFE: verbatim string literals in Java

This is very timely following my previous entry:
Bug ID: 4472509 -- Add support for vertabim[sic] string literals

I feel like there are some very good arguments in that discussion. The need is definitely real, and I've had my share of disgruntlement with the multiple strings concatenation idiom.

A syntactic sugar is not in itself bad. Besides, Java has added less useful sugars recently anyway (e.g. static import, which even the official guide cautions to use "very sparingly!").

Verbatim string literals: other modern languages have it; it's time for Java to follow suit.

Wednesday, February 10, 2010

A Concurrent Modification Exception corner case

CME has always piqued my curiosity, and today I finally went ahead and examined the java.util.* source codes to figure out how modification is detected. As can be predicted, one best-effort implementation uses a counter to facilitate the test (see: AbstractList's modCount field). I've never been a fan of such technique, and the fact that the counter is only 32-bit makes the point all the more demonstratable:
import java.util.*;

public class ConcuMod {
   public static void main(String[] args) {
      List<String> list = new ArrayList<String>();
      list.add("HA!");
      list.add("HA!");
      for (String s : list) {
         System.out.println(s);
         for (long reps = (1L << 32); reps --> 0;) {
            list.clear();
         }
      }
   }
}
The strategy is obvious, of course: modify the list exactly 2^32 times. As expected, the iterator fails to detect the modification.
HA!
Exception in thread "main" java.util.NoSuchElementException
   at java.util.AbstractList$Itr.next(Unknown Source)
   at ConcuMod.main(ConcuMod.java:9)
The decision to throw NSEE instead of CME in this case is worth studying. It would at first seem reasonable to expect CME to be thrown even though the counter test yields a (false) negative, precisely because the test is not 100% foolproof to begin with. However, translating the underlying IndexOutOfBoundsException to NSEE turns out to be better because:
  • It's not presumptuous. IOOB naturally means NSE; CM is merely a possibility. Undetected modification scenarios such as shown here is admittedly so improbable that it's really not worth considering.
  • In the more common scenario, the IOOBE is probably due to an internal bug in the implementation of an AbstractList subclass (it is, after all, documented for inheritance). Throwing NSEE would quickly distinguish this case. CME would in effect make the wrong accusation against the wrong person.

That's a sweet, sweet language!

I've found a new (probably short-lived) hobby: compiling a list of computer science eggcorns. I only have one entry so far: "synthetic syntactic sugar".

From an interview with Anders Hejlsberg:
Q: "Some people in the C academic community don’t like the idea of operator overloading. Some people call it 'synthetic poison', not only 'synthetic sugar'"
From an M.S. thesis supervised by Bertrand Meyer:
"Enumerations are nothing else than synthetic sugar for classes extending java.lang.Enum"
And various, less notable sightings:
  • "With C# 3.0 you can use synthetic sugar to write less code"
  • "Trigraph characters are 'synthetic sugar' which is handled by the preprocessor"
  • "Simply put, synthetic sugar, something JavaScript has plenty of"
  • "In the end, aren't all programming languages basically just synthetic sugar, on top of machine code (from an engineering perspective), or on top of a Turing machine/lambda calculus equivalent (from a theoretical perspective)?"

Like all eggcorns, the logic behind this error is self-evident. Dietary synthetic sugars are manufactured as nonnutritive additives to sweeten food while adding little else (most hopefully not calories!). Syntactic sugars are analogous in that they're added to the language by designers to make it "more palatable" to programmers, while adding absolutely nothing to its overall expressive power.

A parallel may also be drawn between the artificial nature of synthetic sugars and the superficial nature of syntactic sugars, and all the negative connotations that follow from such labels. This angle, though more speculative in nature, is perhaps demonstrated by the programming epigram "syntactic sugar causes cancer of the semicolon".
Submitted for your consideration!

Monday, February 8, 2010

OpenSearch for JLS

My previous blog entry has a lot of references to sections of the Java Language Specification. Instead of being hyperlinks to the official online version of the document, however, they're merely plain non-clickable text (that was copied-and-pasted from elsewhere).

I could've modified the post and added the links manually, but then:
  • Finding the target URL for each section is tedious.
  • The blog entry source will be cluttered.
  • I'd have to repeat this for all my future posts.
  • More importantly, I'd still see unlinked references in forum posts, e-mail discussions, and even some official JLS documents.
After a quick research, I settled on a simple solution for my problem: an OpenSearch engine to perform the URL lookup. Once added to Firefox's list of search engines, one can:
  • Look up any section directly from the search bar
  • Look up a highlighted section number reference in a page from the search action in the context menu.
    • By default, Firefox uses the currently selected search engine.
    • Add-ons such as Context Search allow users to select from all previously-added search engines.

Sunday, February 7, 2010

Glossary of Name Reuse

This is a one-page summary that is somewhat hidden in Java Puzzlers. I feel that knowing the proper names for these concepts is important enough that it's worth quoting for future reference. I've taken the liberties to reorder these as I see fit:
Overloading

Methods in a class overload one another if they have the same name and different signatures. The overloaded method designated by an invocation is selected at compile time [JLS 8.4.9, 15.12.2].

Shadowing

A variable, method, or type shadows all variables, methods, or types, respectively, with the same name in a textually enclosing scope. If an entity is shadowed, you cannot refer to it by its simple name; depending on the entity, you cannot refer to it at all [JLS 6.3.1].

Although shadowing is generally discouraged, one common idiom does involve shadowing. Constructors often reuse a field name from their class as a parameter name to pass the value of the named field. This idiom is not without risk, but most Java programmers have decided that the stylistic benefits outweigh the risks.

Obscuring

A variable obscures a type with the same name if both are in scope: If the name is used where variables and types are permitted, it refers to the variable. Similarly, a variable or a type can obscure a package. Obscuring is the only kind of name reuse where the two names are in different namespaces: variables, packages, methods, or types. If a type or a package is obscured, you cannot refer to it by its simple name except in a context where the syntax allows only a name from its namespace. Adhering to the naming conventions largely eliminates obscuring [JLS 6.3.2, 6.5].
Overriding

An instance method overrides all accessible instance methods with the same signature in superclasses [JLS 8.4.8.1], enabling dynamic dispatch; in other words, the VM chooses which overriding to invoke based on an instance's run-time type [JLS 15.12.4.4]. Overriding is fundamental to object-oriented programming and is the only form of name reuse that is not generally discouraged.

Hiding

A field, static method, or member type hides all accessible fields, static methods, or member types, respectively, with the same name (or, for methods, signature) in supertypes. Hiding a member prevents it from being inherited [JLS 8.3, 8.4.8.2, 8.5].
In particular, shadowing (the forementioned constructor/setter idiom) and obscuring (method name reused for return variable) are two of my own practices that I can now vernaculate.

Saturday, January 30, 2010

Lessons from Java Puzzlers

*head explodes*
*pulls hair*
*jaw drops*
Am I the only one who thinks that nested multi-line comments should be universally supported across all modern languages? It's not that much harder to parse, and it doesn't happen often but I remember specifically wishing for this language feature in the past.

And while we're at it, I think nested HTML anchor elements should be supported too.
It's quite enlightening to learn how Java matures over time. I remember when Cloneable was initially promoted as the new cool thing, but now pretty much everybody agree it's a broken mess.

Wednesday, January 27, 2010

Java!?!? NOT YOU TOO!?!?

ARGH!!! Why would a high-level language like Java behave like this!?!?

I dare any programmer to explain why this should be the preferred, or even expected, behavior.
I'm also slightly disappointed that Arrays.binarySearch() does not guarantee which value will be found in the case of multiple matches. I don't use Python, but the consistency provided by the bisect_left/right() algorithms is needed sometimes.

Oh well, at least sort() is guaranteed stable...
Also, why doesn't Math.min() do varargs? At least provide an overload for 3 arguments for my common DP needs!
I wonder if Java was the first language to allow an optional trailing comma in its array initializer syntax. It's a "feature" that I absolutely love, but I can see how that decision might've been contentious. // Nah, C has had it for a long time, apparently...
I need to get this book: Java Puzzlers: Traps, Pitfalls, and Corner Cases. I bet this shift behavior is in there somewhere. / Yep, Puzzle 27: Shifty i's.
I just found out that Java has non-short-circuiting boolean logical operators.

Unsolvable puzzle?

I will admit that I've tried to solve this puzzle twice now and failed miserably both times.
3. MATH CLASS

A high school math teacher chose three of his best students to conduct a little experiment. He said, "I have chosen a three-digit number, N, with the first digit not more than the second and the second not more than the third. I also have chosen a function, F(N), which is one of these five functions:

(1) SUM(N) = The Sum of the digits of N.
(2) PROD(N) = The Product of the digits of N.
(3) SSQ(N) = The Sum of the Squares of the digits of N.
(4) SSC(N) = The Sum of the Cubes of the digits of N.
(5) LCM(N) = The Least Common Multiple of the digits of N.

"I then calculated the value, V = F(N), and have written the three items, N, F, V, each on a separate piece of paper and will give one to each of you. You must try to determine the other 2 items not on your paper. You may use your computers, but cannot collaborate. Don't turn in your answers until I ask for them. Don't worry about any unfair disadvantage of which item you get, this is not a competition, only an experiment. Just make your lists and we'll see what happens!"

Here is what happened:
1:00 - Students begin working.
1:20 - Teacher asks if anyone had found the answers. No one had.
1:30 - Teacher asks if anyone had found the answers. No one had.
1:31 - Teacher asks if it would help if he told them if N was odd or even. All 3 say no.
1:40 - Teacher asks if anyone had found the answers. No one had.
1:41 - Teacher asks if it would help if he told them if V was odd or even. All 3 say no.
1:50 - Teacher asks if anyone had found the answers. No one had.
1:51 - Teacher asks if it would help if he told them the sum of N and V. All 3 say no.
2:00 - Teacher asks if anyone had found the answers. All three of them had!
Admitting that I've probably reached my limit, I posted it on XKCD forum (which I don't really frequent) to get input from others. This reply sums up my frustration exactly:
The fact that no reader of Mathpuzzles.com solved the final puzzle makes me reluctant to put the time into it. In theory it's the same process, but I'm not completely confident of how to parse "if it would help".

ETA: And now that I've put some time into it, it looks even more poorly defined than I had originally suspected. I'm really pretty surprised, coming from Robert Kraus and Ed Pegg, Jr.

A possible breakthrough: the envelopes may be unlabeled. A person that sees a number may not know if that's N or V. Along that line, F may also be written as a number 1..5.
Discussion on GreyLabyrinth forum | For what it's worth, the answer is supposed to be SSQ(336)=54.

Tuesday, January 26, 2010

Free e-books!

These were linked from AZSPCS's Yahoo! Groups:
- Thomas Weise: Global Optimization Algorithms - Theory and Application (see: SIGOA)
- Fred Mellender: Design Patterns for Searching in C# (PDF/code)
Speaking of which, I've dropped to 59th place in the standings, with a score of 89.57. I suppose I should go back to this contest again before it ends...
By the way, something I just discovered yesterday: sandbox pastebins (e.g. ideone.com, codepad.org). Yes, I'm kind of slow about these things...

Saturday, January 23, 2010

Facebook puzzles

Some notes about how to pass the current Facebook puzzles with the least amount of effort. I don't try to be verbose, but needless to say, SPOILERS AHEAD.

#7 hoppity: It's a FizzBuzz test, basically.
#3 meepmeep: I have absolutely no idea why this is even here. Actually, now that I think about it... Nah, even if the input file (that you're instructed to open but then ignore) contains some usable information, there's no easy way to retrieve it.
#20 liarliar: Straightforward UF.
#17 breathalyzer: The O(nm) L-d algorithm is fine. The dictionary is known in advance, but there's no need to preprocess it. No need for a fancy data structure to represent it either.
#15 gattaca: Straightforward wIS.
#13 simonsays: The hardest part is setting up the Thrift framework.
#12 dancebattle: Despite the warning, nothing insightful is necessary. Go ahead, code that obvious brute force algorithm and send it. It'll pass.
#6 smallworld: Use double precision (remember that Infinity * 0 is not 0, etc). No need to batch the k-NN retrieval.
#2 usrbincrash: Straightforward is fine.
#18 rushhour: I think this puzzle is broken, actually. Funny story: I crashed the server once. / Dear god! This puzzle is even more broken than I initially thought!
#14 battleship: Just create 2 dumb clients and have them play against each other.
#10 fridgemadness: Straightforward G-S implementation is fine.
#8 peaktraffic: Straightforward B-K implementation is fine.
#5 swarm: A layer of obnoxiously concocted math obfuscates a straightforward combinatorial problem.
#19 dinoisland: (not yet) This requires lots of investment, and honestly, I'm not sure if it's worth it. I'm not even sure if they guarantee that sufficient intelligence yields survivability. Who knows what lives on that island right now. This one time I hatched and someone ate me right away.
#11 sophie: Brute force is fine, apparently... Nope, it just got harder...
#1 facebull: (not yet) This is the Minimum Spanning Strong Subdigraph problem. <- No it's not! Anyway, somewhat surprised that I solved it, but I did. It's a computationally intractable problem, but my suggestion is to give it a try anyway.
"Hidden" ones I've found:

[hidden#1]: Noisy, buggy, this puzzle is frustrating as hell, but solvable. Use the 162x348 version. You'll get the congratulatory confirmation e-mail, but I don't think the app lets you publish that you've solved it.

[hidden#2]: I had to have someone point me in the right direction to find it, but this was a VERY satisfying puzzle.
Some prime numbers I've seen:

- The e-mail address: {0xFACEB00C>>2 in decimal} = 1051962371 is prime.
- From swarm: 654195331 = 0x26FE3A83 is prime.
- From [hidden#1]: even though the metric makes no sense for this problem, the confirmation e-mail reports that my longest running time is 73.127 ms. 73127 is prime. // Others have reported receiving different, non-prime numbers.

Tuesday, January 19, 2010

"Aber einzigartig... danke."

Are you prime? has spread a bit despite me having abandoned it for weeks now. The really cool thing is that people actually start using its publishing feature -- pretty much its only mean of spreading.

In case you're wondering, it's "but unique... thank you" in German. It's a reference to my (empty) consolation to the composites: that though it isn't prime, it's still unique (duh?). The number happens to be a squarefree semiprime (A006881). Which I think is a class special enough that it should have its own name.
Oh, and it's now approved for directory listing too. None of this really matters now, but it still makes me warm and fuzzy.
Other interesting things seen in the log:

- Record for highest number of friends seen so far: 3004. Of those, 124 are primes.
- A person had 97 friends, 6 of those prime. Apparently not satisfied, he later came back with 139 friends, 9 prime.
- One person was so amused by the app that he spammed all of his prime friends.
Hmm this is getting embarassing, actually. I should make something much bigger than this.

Saturday, January 16, 2010

Named algorithms

Obviously this is not an exhaustive list, and it omits some important algorithms as well (e.g. Lempel-Ziv family). I'm also not interested in listing algorithms whose common name is quite descriptive (e.g. Sieve of Eratosthenes, Gauss-Jordan elimination, etc).

Shortest path in a graph:
- Dijkstra's algorithm - single-source
- Bellman-Ford algorithm - single-source, negative edges
- Floyd-Warshall algorithm - all-pairs (it's neat how concise and straightforward this solution is to what at a glance looked to be a harder super-problem)

Maximum flow of a network:
- Ford-Fulkerson algorithm
- Edmonds-Karp algorithm (frankly I don't think this deserves its own name)

Minimum spanning trees:
- Kruskal's algorithm
- Prim's algorithm

String search:
- Knuth-Morris-Pratt algorithm (cool automaton!)
- Boyer-Moore algorithm (it works backward!)

Computational biology:
- Smith-Waterman algorithm - local sequence alignment
- Ukkonen's algorithm - suffix tree construction
- "Kärkkäinen's algorithm" - suffix array construction (not sure what other people think of it, but it still blows my mind how elegant it is)

Pseudorandom number generator:
- Blub Blum Shub (I don't think I can ever say it with a straight face)

Matrix multiplication:
- Strassen algorithm (it took me a few years before I gave this algorithm a chance; it's actually pretty cool, not to mention revolutionary)
- Cannon's algorithm (surprised that Wikipedia doesn't have more on this rather neat algorithm; maybe I'll contribute some day!)

Quantum:
- Shor's algorithm - prime factorization

Letter:
- Algorithm P - random shuffle
- Algorithm X - exact cover
- Algorithm W - type-inferrence (this is what inspired this entry, actually)

And I guess I'll mention this one for giggles:

- God's algorithm

Monday, January 4, 2010

Prime bits

OEIS A000788: Total number of 1's in binary expansions of 0, ..., n

One of the given formulas there is very interesting:
a(0) = 0;
a(2n) = a(n)+a(n-1)+n;
a(2n+1) = 2a(n)+n+1
I've still yet to decypher how that formula is derived, but here's mine in Maple:
full := n -> n * 2^n / 2:

flog2 := n -> floor(log[2](n)):

f := n -> `if`(n = 0, 0,
full(flog2(n)) + (1 + n - 2^flog2(n)) + f(n - 2^flog2(n))
):
My formula basically partitions the rectangle of bits into 3 parts: top (first 2^k lines, explicit formula trivial), bottom-left (a single column of 1's) and bottom-right (the rest, counted recursively).
Ah, got it. The first formula reduces the problem by separating out the rightmost column and deinterleaving what's left. In any case, this problem is inspired by a Facebook puzzle, whose exact definition I can no longer find. I can assume it to be the following:
Given a, b, count how many numbers in the interval [a, b] have prime number of 1's in their binary representation.
My solution in Maple: