Monday, March 22, 2010

Finding quines by iteration

Russ Cox has an article on his blog about quines. He examines it in zip files, and mentions his favorite Unix shell quine (you'll have to read it to find out!). I sat down this weekend and implemented a cute strategy of quine-hunting: find a fixed-point of error messages. (And I should mention that I've heard this approach credited to Guy Steele. I'd appreciate any references that confirm this).

What's a quine? To quote Russ quoting Ken Thompson (in his masterful Turing Award acceptance speech): "More precisely stated, the problem is to write a source program that, when compiled and executed, will produce as output an exact copy of its source." This statement only allows valid programs in the language. The innovation of the Steele method is to squint at this definition. Instead of "compiled and executed", let's change it to "to run through a compiler/interpreter".

A traditional quine must be a valid program in the language. But when you invoke most language runtimes with an invalid program, they aren't silent. They do produce output. Often error diagnostics to help the programmer craft their gibberish into a valid program.

So what does an invalid-program quine look like? Well, we know it has to be an error message. It has to be an error message whose error is itself. Let's take python. Fire up an interpreter and enter an invalid program:


>>> a
Traceback (most recent call last):
File "", line 1, in
NameError: name 'a' is not defined

Is that error message an invalid-program quine? Let's check.

>>> Traceback (most recent call last):
File "", line 1
Traceback (most recent call last):
^
SyntaxError: invalid syntax

Nope. A NameError isn't a NameError. It's a SyntaxError.

So how will we find it? Well, we could try traditional quine-construction techniques, but this requires an in-depth knowledge of each language runtime we want to find an invalid-program quine for. And traditional quine-construction techniques are, well, hard: they require that you work backward and forward at the same time. Let's not do that.

Instead, think back to our last revelation: a NameError isn't a NameError, it's a SyntaxError. But maybe a SyntaxError is a SyntaxError (quine!). Or maybe it's a FrobError, and a FrobError is a FrobError (quine!).

And here's the advantage studying invalid-program quines: not only are they perverse, they're also easy. Take input, run it through the language runtime, compare the error message to the input. If they're different, just repeat. But if they are the same, then the language runtime evaluated the input to the error message, and we've found an invalid-program quine.

Returning to the python example: a SyntaxError isn't a SyntaxError, and it isn't a FrobError. It's an IndentationError. And an IndentationError is an IndentationError. Here, then, is our invalid-program quine for python:


>>> File "<stdin>", line 1
File "<stdin>", line 1
File "<stdin>", line 1
^
IndentationError: unexpected indent

But if you read Russ's post, you already know a python quine. So let's find some invalid-program quines in some more languages. And let's do it faster.

#! /bin/bash

languages="python;py
awk;awk
ruby;rb
gcc cc
javac java
bash sh
ksh ksh
zsh zsh
perl;pl"

# Find an error quine for a language.
# Args:
# $1: compiler for the language
# $2: file suffix
function find_quine_for_language {
COMPILER=$1
SUFFIX=$2
echo "Searching for error quine for $COMPILER"

echo "Yields falsehood when preceded by its quotation" > input.$SUFFIX
echo "" > output.$SUFFIX

found=""
for ((n=1; n <= 24; n++)) do
$COMPILER input.$SUFFIX > output.$SUFFIX 2>&1
echo -n "."

if cmp -s input.$SUFFIX output.$SUFFIX; then
found=true
break
fi

mv output.$SUFFIX input.$SUFFIX

done
echo

if [ ! -z $found ]; then
echo "Found error quine for $COMPILER after $n steps."
echo "================================"
cat input.$SUFFIX
echo "================================"
else
echo "Could not find error quine for $COMPILER after $n steps."
echo "> du -ah input.$SUFFX"
du -ah input.$SUFFIX
fi
}

for language in $languages
do
language=$(echo $language | tr ";" " ")
find_quine_for_language $language
done


First, I've listed some language runtimes and the suffix we should use for files in that language. Then I define the function that tries to find an invalid-program quine for the language. The only added sophistication from my description is a limit to the number of iterations. Why? Most runtimes on the list tell you about the first error, or the first n errors. They think that after this many errors, any more wouldn't be useful. But one of these runtimes keeps nitpicking your by-now-obviously-not-valid program. There is no fixed point. Each error message is longer than the input, and without this limit your machine grinds to a halt. (If you want to know which language leads to a 70 megabyte text file of parse errors, run the program yourself)

The moral of the story? The next time you find yourself in a quine-speed-writing competition, type gibberish and repeat.