Log in

View Full Version : Babel's Library Reversing machine (Search "Jorge Luis Borges")


naides
July 16th, 2004, 09:29
This post is supposed to be in a pure philosophical plane. I will not touch feasibility of what I am discussing here, but more than once computer algorithms have pulverized the barrier of impossible into simple.

Now, A PROGRAM, no matter how complex it might be, is only a linear array of bytes. Yes, it has internal structure, information is encoded in those bytes that is more than the bytes themselves, etc, but plainly,

It is a linear array of bytes, with a beginning and an end, a finite, ordered collection of bytes.

Protection is encoded in those bytes, and reversing involves changing some of those bytes. Typically, a few of them, five, ten when the reversing involves a jump inversion or something simple, a few hundred if code injection is necessary, but plainly

Reversing is changing a subset of bytes.

THE PROBLEM is to know which bytes to change and change them into what values.

The method we all use is a more or less structured attack, knowing the internal rules that govern the byte patterns of the program (also known as code), at least partially understanding the program or the protection overall structure, so we find some guidance towards which bytes to change and to what. This is also known as Heuristic approach, and it is used not only in reversing, but in a multitude of computer and no-computer knowledge activities.

But there are other approaches, to the problem:

Brute force attack: We systematically replace byte after byte until we obtain the desired result (Program runs without protection).

A structured attack: in which we introduce certain rules of thumb on which bytes to try and change, limiting the number of combinations of the attack. For example only changing bytes in areas that contain code, or knowing that most cracks involve changing an instruction, only change bytes are part of a valid instruction into another valid instruction. Or Knowing that a fair number of protection are based on decisions, concentrate the attack on decision instructions.

A self learning system, which can evaluate the results of a given byte or bytes change not as good-bad, black-white but can grade it towards certain goals and concentrate on changes and areas that produce the best positive results.

Of course this sort of combinatory attack is best performed by a computer program, with enough speed and power to perform the repetitive, brutal operations that I am proposing.

If you think about, such a machine could, give enough time reverse all protections, but it would also have the most fascinating side effect: It will modify in untoward ways, the behavior of the program it is reversing. Most of its creatures will be non-functional or dysfunctional programs, which would do silly stuff, or run into endless loops, but some of them will be indeed interesting, Perhaps write all the variations of the story of your life, or paint your picture in the screen at all possible ages, and with all possible changes or, in summary do every thing a computer program can do, including rewrite itself in a slightly modified way, or become the program that created it. . .

Doesn’t it give you the shudders???

dELTA
July 16th, 2004, 12:26
Since you are only patching the existing bytes ot the target, hence being restricted by the orginal size, the program would actually not be able to "do every thing a computer program can do".

Also, there are also of course massive problems connected to the "testing if it was a successful patch", being amongst others:

1.
If you don't limit the memory space/stack space allowed to be used by the tested app you would be subject to the well known so called "Stopping problem" (look it up), which would prevent the theorectical existence of such a tester program altogether.

2.
It would be impossible to make sure that the tester program itself did not affect the behavior of the program (i.e. trace/simulation detection and such), at least without special hardware.

3.
It would be practically impossible to make the program see the effects of a certain isolated patch as good/bad, since effects may be delayed infinitely.

4.
The time complexity would of course be incredible, at least:

((size of program)!) ^ ((size of allowed working memory for program)!)

(note that the exclamation marks are mathematical operators)


Finally, I know some of these things are more towards addressing the feasibility issue, which you said you wouldn't consider, but I just felt like listing them anyway, and the first one is a theoretical showstopper, no matter if you have infinite computer power, special hardware and whatever.

Then there is of course also the issue that in some cases it might even be easier to generate a program with identical functionality from scratch instead, using the same type of bruteforce method.

naides
July 17th, 2004, 12:54
Thank you for your answer Delta. I realize this is not a "miniproject" or anything of that nature, is simply a thought experiment. It is a subject close to my real life work, because the way information flows and changes in computers is only a variation of the way information flows and reproduces itself in biological systems



Quote:
[Originally Posted by dELTA]Since you are only patching the existing bytes ot the target, hence being restricted by the orginal size, the program would actually not be able to "do every thing a computer program can do".

Well, with the current structure of computer systems, a program may not be self contained, self sufficient, limited by the size of the code. It can recruit dlls, run other programs, it can make replicas of itself and use those as a source of mutation/evolution of new functionalities. Not even the physical limit of memory/mass storage is absolute, because computers can and are interconnected, so a self propagating, interconnected program can recruit further resources.


Also, there are also of course massive problems connected to the "testing if it was a successful patch", being amongst others:


1.
If you don't limit the memory space/stack space allowed to be used by the tested app you would be subject to the well known so called "Stopping problem" (look it up), which would prevent the theorectical existence of such a tester program altogether.

I am familiar with Turing's Stopping problem theorem, but I do not completely grasp what the relevance is in here. In fact The system programmer-computer-code does exactly that. Modify code following certain rules, run a test, check results, compare with goals. If goals reached, terminate, if not, loop. If the programmer is a human or another computer or program I do not see the fundamental difference.
The stopping problem deals with the limit case scenario, but I would put forward the example of biological life, which for eons has been doing with DNA bases, genes adn gene products what we are here discussing a program would do with bytes.


2.
It would be impossible to make sure that the tester program itself did not affect the behavior of the program (i.e. trace/simulation detection and such), at least without special hardware.

So be it, we would need special hardware, or what if the "debugged" program is running on a virtual machine kind of environment, in which the "programmer" code has access to the program code and output, but the debugged program has no access to the programmer's code?

If the programmer and the debugged modify each other, would not they evolve together?

3.
It would be practically impossible to make the program see the effects of a certain isolated patch as good/bad, since effects may be delayed infinitely.

Yes, some effects may be delayed, some may be immediate and measurable.
A human programmer is in no better situation, that is why there is no bug-free program apart from "Hello Word!"


4.
The time complexity would of course be incredible, at least:

((size of program)!) ^ ((size of allowed working memory for program)!)

(note that the exclamation marks are mathematical operators)

You are calculating the numbers necessary to exhaustive run through all the variations. The interesting part happens in THE PROCESS of running those variations

Finally, I know some of these things are more towards addressing the feasibility issue, which you said you wouldn't consider, but I just felt like listing them anyway, and the first one is a theoretical showstopper, no matter if you have infinite computer power, special hardware and whatever.

Then there is of course also the issue that in some cases it might even be easier to generate a program with identical functionality from scratch instead, using the same type of bruteforce method.

Sure, I used the cracking example as an introduction, but modifying an already existing program rather that starting from scratch saves very many iterations/modifications becuse the most fundamental structures/functions are already in place



This is not to be taken too seriously, I think it is just an intersting topic.

dELTA
July 18th, 2004, 06:57
Quote:

Well, with the current structure of computer systems, a program may not be self contained, self sufficient, limited by the size of the code. It can recruit dlls, run other programs, it can make replicas of itself and use those as a source of mutation/evolution of new functionalities. Not even the physical limit of memory/mass storage is absolute, because computers can and are interconnected, so a self propagating, interconnected program can recruit further resources.

Yes, but even the possibilities of all functionality that can be recruited by a limited-size program is still constrained by the size of it, combined with the total sum of functionality of all programs currently existing in the universe, which is not _necessarily_ infinite. At least not if you are not at the same time allowed to manipulate the state of the surrounding universe, which was not mentioned in the premises, and which would also be equal to "expanding the array of original bytes", which would in turn contradict the problem definition.


Quote:

I am familiar with Turing's Stopping problem theorem, but I do not completely grasp what the relevance is in here. In fact The system programmer-computer-code does exactly that. Modify code following certain rules, run a test, check results, compare with goals. If goals reached, terminate, if not, loop. If the programmer is a human or another computer or program I do not see the fundamental difference.
The stopping problem deals with the limit case scenario, but I would put forward the example of biological life, which for eons has been doing with DNA bases, genes adn gene products what we are here discussing a program would do with bytes.

The relevance of the stopping problem is that there is a bunch of simulations that will never stop, and hence your test program will never finish either. If your simulator program is not "infinitely multi-threaded" there are a lot of working/stoppable mutations that will never be reached at all, and the simulator will never stop.

Your argument against this, with the "system programmer-computer-code" is hence an argument against the stopping problem itself, and I refer to the proof of the stopping problem (which can indeed be a quite hard to grasp, I agree). An initial hint though: The tested program will in some cases never reach any of the rule triggers that your tester program has defined, no matter how big the (finite) number of such triggers/situations is.

Also, some important differences from the biological life example you mention:

1.
The "test" of each new mutation for a DNA program is "spawned in a separate thread", in a system with seemingly unlimited resources for the number of threads (i.e. "infinitely multi-threaded", as mentioned above). Hence, the stopping problem would not affect the simulation of all possible (stoppable) DNA programs. (also, no DNA program is seemingly able to go into an infinate loop anyway, due to the external premises set up for DNA programs, but the premises for e.g. x86 programs do not prevent this).

2.
The central test program of DNA programs, which we call "the universe" has indeed neither stopped nor finished it's exhaustive test of all mutations yet, and hence, it may very well be subject to the stopping problem too.



Finally, I do indeed agree about not taking this too seriously, as you say, it's just an interesting thought experiment.