Sunday, August 26, 2007

On an open problem of Wolfram

I have something to admit. Something rather shameful. I never learnt Fuun during the contest. After all, I was a late arrival to a team with a man we call Bruce Almighty, after whom the Bruce-First Search is named. Although I had a vague understanding of the low-level details, my contributions were mostly at a higher level.

So I'm only really learning the details now, as I don my tin-foil hat and prepare for a Fuun-filled future. When I saw Jochen's ingenious 55-character junk generator, I had only an inkling of how it worked until I whipped out my spec booklet and a pencil. Jochen's string shows just how incredibly rich Fuun DNA is. And also that Jochen is a brilliant DNA hacker.

I will now, finally, progress to my point. Last week, I left you with an SHA-1 "essence" of a result. In truth, it is more of a question than a result. It is suspected by some that the following DNA sequence has a truly awe-inspiring property: that the genetic program of any being can be emulated by this sequence of Fuun genes. Is this the case? A cash prize is offered for a proof of the affirmative.



  1. Let me guess... the cash prize is US $25000? :-) Do I get a chocolate fish?

    When I first saw your post I thought - aha - at last someone (smarter than me) has posted an Endo DNA self-interpreter, and so I optimistically saved your DNA as fuunfuun.dna. You see, all I really want to do is try to lay my hands on such a beast so I can attempt to find it's "eigenratio" (assuming such a thing even exists).

    Anyway, it sounds like you might be someone who can do that for me - if you were to feel so inclined!

  2. clive,

    Well done, you get the chocolate fish.

    I am aware of your work on eigenratios, and I find it very interesting. But there are some problems with Fuun self-interpretation. For one, what is it?

    Fuun does not allow input, so we have to allow the program to be interpreted to be embedded in the DNA --- appended to the end, say, to the end. But then the empty DNA sequence is a valid self-interpreter.

    Or perhaps the input would be quoted? But then an unquoter is a self-interpreter.

    If you give me a definition, I'll have something to aim towards, but I can't guarantee I'll get there.

  3. I was thinking about that definition problem a bit and how to avoid simply cancelling a prefix. One possibility would be to require that some extra piece of RNA is produced *after* the RNA produced by the program. Since this would have to work for any program, including one that tried to delete any code you injected, it would be necessary to simulate the program rather than allowing it to execute.

  4. Hi Max

    My notion of a self-interpreter for the Fuun DNA would be simply another "prefix" in effect - so just append the DNA to be interpreted onto the tail of your interpreter and off you go.

    In a sense this is treating anything tacked onto the end of your own string as the input. And obviously this also allows for a stack of such self-interpreters to be easily constructed, but I must admit I haven't really thought about it all that much. That approach is quite similar to some other languages that I ended up playing around with in my earlier experiments with "eigenratios" so it seemed a natural approach for this case also.

    As far as I'm concerned, the empty string is a trivial case that isn't a "real" self-interpreter. Possibly I'm in danger of making up the rules to suit my own needs but an empty string isn't really "doing" anything so I'm not going to allow that!

    I'm flattered that you've already spotted "my work" with eigenratios. :-) Not that I've done much in that area lately. But at least I'd have a reason to post something new on that blog if someone could come up with an Endo DNA self-interpreter!


  5. A cunning plan, Bruce.

    I have been discussing this with a friend, and he pointed out that the trivial self-interpreter corresponds to the identity matrix, with eigenvalue of 1, which "works": each layer of self-interpretation slows the stack to 1 times its previous speed.

    So although it's trivial, it's not quite as uninteresting as it seems.

  6. I agree that a self-interpreter with a corresponding matrix equal to the identity matrix is an interesting case to think about, but I'm not sure that you can actually say this applies to an empty string as "self-interpreter". In the cases that I've played with previously, the columns and rows of the appropriate matrix corresponded to "instructions" within the language and the values to the frequency those instructions are executed in the self-interpreter, etc. Given that there is nothing in an empty DNA string to be "executed", it seems that its matrix would also be "empty" (the zero matrix) but not really equal to the identity matrix.

    Perhaps Bruce's suggestion (which I didn't see until after posting my last comment) would be sufficient to stop any "cheating". In any case, in my opinion a self-interpreter as a "reference implementation" needs to actually show us something about the semantics of the language, and not just be some kind of smoke and mirrors construction that effectively "does nothing".

    In this sense it must actually implement a very large part of what you would expect to implement if the interpreter was written in a completely different language.

    I know this isn't anywhere near a formal set of rules but I think the intent is fairly clear. However there is still room for lots of fuzzy stuff. For example, should a Fuun DNA self-interpreter include its own full implementation of (say) code to determine the length of a string, or can it "delegate" this to the next level down by constructing a pattern to match the appropriate string and template that includes the IIP operation to generate the length? It seems okay to me to allow delegation of this kind, but once again that might just be me!

    Perhaps it can be almost as simple as saying that whatever is running our self-interpreter must only ever directly execute code from our self-interpreter, and must never directly execute code from the DNA that our self-interpreter is "running"? Of course some of the target DNA must be handled as "data", but I'm suggesting it can't be interpreted. Once again I've not really worked through this so it's probably a completely flawed approach, but think of the target code (being run by our self-interpreter) as being "tainted", and any attempt to directly execute tainted code being disallowed. Just trying to think if this works if a copy of tainted code is also considered tainted? Execution in this context means "recognising" some sequence as "meaning" do this, or do that...

    I'll sleep on it!

  7. I thought a bit how one can implement the interpreter most easily. With Bruce's suggestion there is still a possibility of cheating. The interpreter could work like this:

    1. Determine start and end of pattern.
    This requires some rudimentary parsing, counting parentheses and handling meta-sequences like IPIICP.
    2. Determine start and end of template.
    This requires some rudimentary parsing.
    3. In every match code "IPxPyP" in the template increase the number for y by one. This can be done while parsing the template.
    4. Output a new command
    IPPIP length_of_interpreter ICC original_pattern
    IPPP original_template
    at the front so it gets executed.

    This way one doesn't have to implement searching, skipping, matching, computing match lengthes etc. However the rudimentary parsing is already a quite difficult task.

    The question is, what is allowed? For example is it allowed to match on the number following the IP in a pattern and then using it in a command (!ptr!match0_0) -> len_0 to increase ptr by that number? Endo's DNA uses constructs like this over and over.

  8. Clive, I think you are mistaken about the identity and zero interpreters. I'm pretty sure the empty self-interpreter corresponds to the identity matrix, as each operation at level n+1 is executed by the same operation at level n (which is, of course, identical). A self-interpreter with a zero matrix would amount to hypercomputation: it could interpret an arbitrarily long program in constant time.

    Your tainting scheme seems broken to me. One can untaint any string by repeatedly replacing each base with itself. I don't see how you can distinguish between this and matchreplaces.

    Jochen, you are right, Bruce's scheme does allow such things, and that is what I was planning to do. I didn't think it was cheating. What does Clive think?

  9. Okay, it is neither pretty nor efficient nor short, but at least working:

    "Source code" is in the same directory :)

    Known Bugs:
    - If the last command in the DNA contains RNA and is not closed, the RNA is not output
    - There may be cases where the increment of y in IPxPyP does not work, if it is at the very end in the DNA with a high value for y. Same for IIPyP. In that case the wrong command is executed.

  10. Max: my earlier objection to associating the identity matrix with the empty string self-interpreter is really based only on the fact that in my previous constructions of actual matrices corresponding to (other language) self-interpreters, the non-zero values (in particular) had a clear derivation (a count of times a specific opcode or "variation" of it was actually "executed" in the self-interpreter). All I am saying now is that I don't really see how to sensibly maintain that correspondence if you are talking about the empty string as a self-interpreter - even though at the same time I do see how the identity matrix can be considered to give the expected result otherwise.

    My tainted code approach is broken as you say. The product of a broken brain also, or perhaps I just need to properly engage my brain before opening my mouth in future! :-)

    Jochen: I think there are a few typos in your explanation of one way you could still "cheat". In step 4, as far as I can see you meant to start with IIPIP instead of IPPIP, and also to have IIC instead of ICC before "original_pattern"? But perhaps I am the one who is wrong - having not yet really acquired fluency in Fuun DNA!

    Thanks also for the your first cut at a self-interpreter. I am trying to understand the "source" and its behaviour now. This could take me some time...

    Perhaps there is no easy and consistent way to define "cheating" in this situation. In other experiments I have done with self-interpreters I think there was usually a built-in limit (inherent in the language definition itself) on how much "cheating" you could get away with - and I probably tended to "cheat" as much as possible!

    I wonder if there is some merit in an approach that starts with a written specification of the language, and a program that claims to be a self-interpreter for the language. In my ideal world, an intelligent agent examining both will discover that the self-interpreter mirrors the semantics of the specification. Obviously the empty string fails this test!

    In a "non-cheating" self-interpreter, presumably you should be able to "fairly easily" modify it to generate some kind of diagnostics at run time or maybe accept input in a different representation, etc. Perhaps the question to ask is how much of the Fuun DNA language specification can be fully and accurately exposed in a self-interpreter instead of how little?

  11. Clive,
    What I am saying is that that correspondence is there. The matrix contains counts of the form "each operation X in the interpreted program corresponds to N operation Y's in the interpreter", right? And for each search done by the interpreted program, the interpreter does one search, simply because interpreted program == interpreter.

  12. Max - I do believe I understand how you are making that correspondence.

    I guess it all comes down to if you accept that an "empty string" in front of the other code is actually a "self-interpreter", and that probably comes down to definitions! However I personally don't see that as a useful way to view "self-interpreters", except perhaps if an empty string/program explicitly carried some additional semantics, rather than just being "something" that just turns out to (conveniently!) have no effect on anything else following it, surrounding it, preceding it, etc.!

    Referring again to: "each operation X in the interpreted program corresponds to N operation Y's in the interpreter"...

    My point is that nothing is actually being executed *in* that empty string so it doesn't make sense to have non-zero counts in the corresponding matrix. Sure - things are being executed elsewhere, but in fact not "in" the supposed "self-interpreter" (unless you want to ignore that empty string and just call every program a self-interpreter for itself, but that doesn't make a lot of sense either because most programs will be a "self-interpreter" for themselves in that sense, instead of for all valid programs in the language which is what I think we are looking for).

    To reiterate - I do understand the point you are making. Perhaps it even makes some sense to view the empty string as a valid self-interpreter for this (and presumably many other languages), but I personally don't see it as a particularly useful idea. I haven't checked the Fuun DNA specs again carefully, but I'd be happy to discover they explicitly stated a DNA program consists of any *non-empty* string of I, C, F, and P "bases", thus making this particular empty "self-interpreter" invalid and ending this debate! :-)

  13. Clive: You're right with the typos.
    The program I posted does it correctly (see compile instruction).

    If you just rule out empty prefixes, this still allows nop instructions like IICIIC or IPP :)

  14. This comment has been removed by the author.

  15. I've posted some more thoughts about this here