The new stuff for AIT is just try, which enables us to give a LISP expression binary data using read-bit and read-exp, which abort if they run off the end of the data. How can we use this to get the correct program-size complexity measure H(x) of AIT? How can we use this to measure the algorithmic information content of an arbitrary S-expression x?
Well, a simple way to do this would be to define H(x) to be the size of the smallest program for x in an actual programming language, one that human beings like to use. For example, we could define H(x) to be the size in characters of the smallest LISP expression whose value is x, when this LISP expression is printed in standard format with exactly one blank separating successive elements of a list.
Well, that's not a bad definition; it is possible to work with it up to a point. And it does have the advantage of being a very straightforward and understandable definition of program-size complexity.
But real programming languages are too redundant; that's why human beings like them! The information in real programs is not packed densely enough. They're too similar to English and other natural languages, in which a small change in a text usually can be detected and doesn't mess up its meaning too much. If information is really densely packed, then any random change in a meaningful message gives another meaningful message. Natural languages have helpful redundancy, which gives us a natural error-detection/correction mechanism.
So how can we pack information into a program as densely as possible? Well, the best way is to give raw bits; we should have absolutely no idea if the next bit is going to be a 0 or a 1. Both should be equally likely and it should be a complete surprise. Then we are really taking advantage of each bit in the program!
Also we want programs to be self-delimiting. What do I mean by this? What I mean, is that we really only have a binary alphabet. If we use a blank or another special character as punctuation at the end of the program, then it's really a 3-symbol alphabet, not a binary alphabet! And then all three symbols would have to be used equally often, they would have to be equally likely, for us to take full advantage of this communications channel. In other words, if blank is only used as punctuation at the end of binary programs, then we are not using it often enough, we're wasting one of the symbols in our alphabet!
So let's stick to a truly binary alphabet, 0's and 1's, so that as a computer reads in a program bit by bit it has to decide by itself when to stop reading it. And if we can do this, then our information content measure will be (sub)additive, because programs can be concatenated and there is no ambiguity where one ends and the next begins. In other words, if a computer decides by itself when to stop reading a program, then we can build up programs by combining smaller subroutines without having to add any punctuation.
And this is a universal self-delimiting scheme U, because we can use U to simulate any special-purpose self-delimiting binary computer C by putting in front of C's programs a prefix indicating in LISP how C works. In other words, if p is a program for C, and σ is a LISP program for simulating C, then (append (bits σ) p) is a program for our universal self-delimiting computer U that simulates C running p.
So how do we program U(p), our universal computer U running the binary program p, in LISP? Well, that's very easy to do! Here's a definition of U in LISP:
define (U p) cadr try no-time-limit 'eval read-exp pSo the value of U is
lambda (p) cadr try no-time-limit 'eval read-exp pThat's an M-expression, this is the S-expression:
(lambda (p) (car (cdr (try no-time-limit (' (eval (read-exp))) p))) )This says, to run the program p, use a try with no time limit and with p as the binary data, and give the binary data p to (eval (read-exp)). So this will read a LISP expression from the beginning of p, eight bits per character, until it finds a newline character. Then it'll run this LISP expression, it'll evaluate it, giving it the binary data that's left, which is exactly what we wanted it to do. And we select the second element of the triple that try returns, which is the final value returned by the LISP expression that we read in from the beginning of p. [When generating infinite sets of S-expressions, which are displayed, we'd take the third element of the triple that try returns, not the second element.]
And to make things easier the Java applet version of my LISP interpreter provides this as a macro. You just write run-utm-on, that stands for cadr try no-time-limit 'eval read-exp.
Well, the easiest way to do this is to include β as is in the LISP prefix of p and not to use the binary data at all. For example
(U bits' '(0 1 0 1 1 1))evaluates to the 6-bit string (0 1 0 1 1 1). How long is this program? Well, it's
length bits' '(0 1 0 1 1 1)bits long, that's 8 + 8 × the number of characters in the S-expression
(' (0 1 0 1 1 1))So in general, this gives us an 8 + 8 × (5 + 2n) bit program for the n-bit string β. The five is for the five characters (' )), and then it's two characters per bit, times eight bits per character. In other words, we have
Let's do better. To calculate the n-bit string β, let's concatenate the minimum-size program to calculate the decimal number n followed by the n bits in β. So we start by reading in and executing the program to calculate n, then once we know n, we read in the n bits in β. Let's do this in LISP. Well, to read in and run an arbitrary program for U to calculate n, we just do eval read-exp. Then we define a loop function, with one argument k. If k is 0, then we're finished, there are no bits to read, and β is just the empty list nil. If not, we cons read-bit to loop of k minus one. So let's put it all together:
define pi let (loop k) if = k 0 nil cons read-bit (loop - k 1) (loop eval read-exp)This is the prefix π we have to put in front of an arbitrary program for n followed by the n-bit string β. For example:
(U append bits pi append bits 12 '(0 0 1 1 1 1 1 1 0 0 0 1) )yields the 12-bit string
(0 0 1 1 1 1 1 1 0 0 0 1)What this shows is that in general,
where c is length bits π which turns out to be 912.
So in general, just by giving the length n as a decimal constant in LISP, we're using 8 bits for each decimal digit in n, which is about 8log10 n bits, which shows that H(β) is bounded by n + a constant times log n, that is, something of order log n:
But of course for some values of n we can do much better, for example, if n is 101010. This is a very large number with a very small program, with a very simple algorithmic description. In fact, what is H(101010)? Well, here is a program for this n:
(U bits' ^ 10 ^ 10 10)[Don't run this!] So its complexity H(101010) is bounded by
length bits' ^ 10 ^ 10 10which is very small. [This, you can run!]
It turns out that this n + H(n) + c upper bound on H(β) is in general the best possible, because most n-bit strings β have complexity H(β) very close to this upper bound. We'll show this at the beginning of Part III.
Of course some n-bit strings have much lower complexity. For example, the string 0n consisting of n 0's, has complexity within a bounded number of bits of the complexity of n, which in turn is bounded by a constant times log n, that is, is of order log n:
[Exercise: work through the details to prove this!]
The first thing to notice about the joint complexity, is that it's symmetrical:
I'll leave that as an exercise for you, dear reader!
Just consider the prefix ρ defined as follows:
define rho cons eval read-exp cons eval read-exp nilSo ρ is defined to be
(cons (eval (read-exp)) (cons (eval (read-exp)) nil))This is
size rho= 53 characters long, and it's 8 + 8 × 53 = 8 × 54 = 432 bits long when used as a prefix in a program for U. So that's the constant c in
To see an example, try
(U append bits rho append bits pi append bits 5 append '(1 1 1 1 1) append bits pi append bits 9 '(0 0 0 0 0 0 0 0 0) )It yields, as it should, this pair:
((1 1 1 1 1) (0 0 0 0 0 0 0 0 0))
sharp? No, not at all! It's only sharp when x and y have nothing in common, when they are algorithmically independent. Now we'll take a look at the relative complexity of y given x, which we'll write like this: H(y | x). That'll give us a sharp result,
at the end of Part II. For now, all we'll be able to show is one direction of this inequality, namely that
By the way, since joint complexity is symmetrical, at the end of Part II we'll deduce that
and, transposing, that
In other words, the extent to which knowing y helps us to know x is the same as the extent to which knowing x helps us to know y! But we won't know this until the end of Part II.
To get these sharp results, we need to be subtle. The obvious definition of relative complexity won't do. That's to define H(y | x) to be the size of the smallest program for U to calculate y if it's given x for free. Instead we're given a minimum-size program for x, not x directly; there is a subtle but important difference.
The reason for picking this definition instead of the obvious one is that a minimum-size program tells us its size as well as its output, so that's really different from just being given the output.
In fact, it's not difficult to see that
I'll leave this as an exercise for the reader, but I'll show you the main idea later. And any definition of H(y | x) that satisfies the fundamental decomposition result
would, taking y = H(x), have to give us this:
Thus, if it's properly defined, the relative complexity of H(x) given x has got to be bounded:
Well, this is indeed the case if we're given a minimum-size program for x instead of x directly (we just measure its length instead of running it), as I'll show in a moment. It will take us all of Part II to show that this is all we need to do to get
to work in general! In other words, if we can get this to work for H(x, H(x)), then it'll work for H(x, y) with y ≠ H(x)! But this will be a lot of work to show. For now, later in this chapter, all we'll be able to prove is that
But first I'll have to show you a new definition of U that works for relative complexity.
Well, to get relative complexity instead of absolute complexity, we just make one small change. Now the binary program for U begins with a lambda expression for a function definition, for the function that gives the result if applied to the stuff we are given for free. So here's a program for n + 10 given n for free: the prefix is
bits' lambda (n*) + 10 run-utm-on n*and it's followed by no binary data. And here's an H(m) + c bit program for n + m given n for free: the prefix is
bits' lambda (n*) let m eval read-exp let n run-utm-on n* + n mand it's followed by the H(m) bits of a minimum-size program for m. Here's how we get the complexity H(x) if we're given x: the prefix is
bits' lambda (x*) length x*and there's no binary data. Hence, as promised, H(H(x) | x) ≤ c; c = 8 + 8 × 25 = 208 is the number of bits in this prefix.
Two things need to be explained here. First of all, we need to use run-utm-on to get what we're being given for free, because we're given a minimum-size program for it, we're not getting it directly! And we use the macro run-utm-on, not the function U, because the definition of U is not visible inside a try, which always starts with a fresh environment.
So we can see if a program for our universal Turing machine U is a relative program or an absolute one. If there's a lambda expression
(lambda (x y z...) body)in binary at the beginning of the program, then it's a relative computation, and the number of arguments (x y z...) of the function tells us how many things we're being given for free. If the S-expression in binary at the beginning of the program is not a lambda expression, then it's an ordinary ``absolute-complexity'' program, not a relative-complexity program.
How can we actually run one of these relative programs? Well here's how:
define (U2 p q) [p is the binary program for U,] [q is what we're given for free] cadr try no-time-limit [first argument of try] cons 'read-exp [second argument of try] cons cons "' cons q nil nil p [third argument of try]The expression that try evaluates is ((read-exp) (' q)), which reads in the lambda expression from the beginning of p and then gives it the free data q. If we were dealing with H(x | y, z), in which we're given two things for free, we'd form
((read-exp) (' q) (' r))instead.
Exercise: now try this out on some examples! Of course, we can't really run examples, because we can't be given the free stuff via its minimum-size program, we can't be sure that we've got that. But in most cases any program for the free stuff will do, and such examples can be checked by running them.
as we can now, namely, that
So p consists of a prefix γ followed by a minimum-size program for x followed by a minimum-size program to calculate y from a minimum-size program for x, and we want to output the pair (x y). The idea is to start off by reading and running the program for x, but not directly, indirectly, a bit at a time, so that we get each bit in the minimum-size program for x as well as x. [That's also the idea of the proof that H(x, H(x)) = H(x) + O(1), which I've left to the reader as an exercise.] Then we give the program for x to the program to get y from x, and then we cons up x and y.
Here's the prefix γ that does this and whose size in bits gives us c:
[ Proof that H(x,y) <= H(x) + H(y|x) + 2872. ] [ The 2872-bit prefix gamma proves this. ] define gamma [read program p bit by bit until we get it all] let (loop p) if = success car try no-time-limit 'eval read-exp p [then] p [else] (loop append p cons read-bit nil) let x* (loop nil) [get x* = program for x] let x run-utm-on x* [get x from x*] let y [get y from x* by running] eval cons 'read-exp [((read-exp) (' x*))] cons cons "' cons x* nil nil [form the pair x, y] cons x cons y nilThat's the definition of the prefix γ. To check that γ is actually 2872 bits long, you run this:
length bits gammaAnd here's an example showing how to use γ:
run-utm-on [get pair x, y by combining ] [a program for x and a program to get y from x] append bits gamma append [x* = program to calculate x = 2] [[Supposedly x* is smallest possible,]] [[but this works for ANY x* for x.]] bits' + 1 1 [program to calculate y = x+1 from x*] bits' lambda(x*) + 1 run-utm-on x*This will produce the result (x y) = (2 3).
http://www.cs.umaine.edu/~chaitin/ait/utm2.l http://www.cs.umaine.edu/~chaitin/ait/utm2.r http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/utm2.l http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/utm2.r
and determine c.
and determine c.
and determine c.
and determine c.
and determine c.
and determine c. In other words, a minimum-size program for x tells us its size H(x) (a decimal number) as well as its output x.
Represent a function ψ via its graph, that is, as the set of ordered pairs (x ψ(x)). Show that
and determine c.
and determine c. Show that
and determine c'. Hint: merge the programs for generating X and Y in the order that their bits are needed when running both programs in parallel.