Log in

View Full Version : Algorithmic difficulties


LLXX
October 8th, 2005, 04:11
Even though this board is mostly concerned with Reversing I noticed this forum had the name "Programming" in it, so might as well ask this question here to see if any experienced coder can understand it.

We are creating an x86 assembler. Not just another assembler, but one written in a distinct way - a simple, tiny assembler written with 16-bit x86 Asm, one that can assemble itself. Why code an assembler? This assembler is designed specifically for the creation of small, efficient COM programs for 16-bit. (Not exactly demoscene, but competition-related) It has distinctive syntax features that allow rapid coding, e.g. more than one instruction per line and simplified syntax.

This assembler is also to be written in such a way that certain hardcoded limits found with existing assemblers (e.g. length of one source line, length of symbols) are nonexistent - it should allow lines of infinite length, and symbols that can be as long as the symbol table memory (64k at the moment).

The difficulty is in the assembler's symbol table. It is only to store the ASCII name and a word that contains its value. An implementation that I have considered and partially implemented is a double-headed linked list. Recently it occurred to me that binary trees might provide better performance, due to their logarithmic search-depths. However, it seems that in all the implementations of binary trees designed to store strings that I've found on the Internet, rely at some point on a fixed-length buffer and in essence a series of strcmps to decide where in the tree the new node is to be added.

Adding entries to the binary tree isn't too difficult - the new node itself can be used as the buffer for the strcmps. However, searching is where the real problem is. All the example binary-tree code on the Internet seems to show fixed-length data being stored in the nodes. These nodes are not going to be fixed length - they consist of:

- 1 word pointer to Less
- 1 word pointer to More
- 1 word symbol's value
- ASCIIZ string name

As you can see, the Less and More values can be used either as pointers to previous and next entries in case of the linked list, or as Left and Right in the binary tree method. Either way the dynamic storage requirements are the same. It is only the size and efficiency of the algorithm which we are concerned with.

I do not want to have to read the entire input string to be found into a buffer and then do strcmps (or CMPSBs, actually). In the double-doubly-linked-list (two doubly-linked lists) method, the lookeahead-free algorithm was simple - initially, set two pointers, Low and High, to the start and end of the list, and as successive characters of the string are read in, move them toward each other based on character matches until they converge on one entry. I've thought about the converging pointers scheme in a binary tree, but it doesn't seem to work since it's a tree and not a sorted list.

Does such a successive-match algorithm for searching binary trees containing arbitrarily long strings exist? If it doesn't, I'll just revert back to the doubly-linked-list method, in which the insertion code is less than 200 bytes and the search code less than 100 bytes If it does, but the algorithm is very complex or requires recursion (stack space is limited and I'm not too fond of storing the search string on the stack, lest it overflow and cause a bad crash), I'll still stick with the linked list. Otherwise, it'll be implemented.

I've also considered multiway trees. They are excellent for searching (scasb the list of branches, follow new branch, repeat until leaf) but not easy to add to. It is for this reason that the instruction table of our assembler will be implemented using multiway trees (most critical component, must have the best performance).

+ Yes, I know hash tables are another alternative. But they take up way too much space (the hash table needs to be sparse for optimal performance anyway) and are after all basically indexed linked lists. I don't consider them.

Any comments, suggestions, etc. are welcome.

nikolatesla20
October 8th, 2005, 10:41
Well I just have to say it's always a space vs. speed issue with programming. A hash table would be the fastest in my opinion. A b-tree might be the "smallest". But probably the double link is what you need to have a mix of both.

I just don't know if a b-tree could be applied to this problem at all in any "useful" manner since as you scan the token stream you will get arbitrary tokens in no particular order anyway so how can you "order" them into a b-tree. The basic idea of a b-tree is that the data is "somewhat" ordered while the tree is built, hence it's faster the search because of that. But I don't know what you would be able to use to order just ascii strings in a tree format as far as making it faster (trees like numbers). A tree could of course be used to order instructions before output (parse tree) but I don't think it can really be used for symbol table searches and be any better than a basic linked list. That's why hash tables were invented

-nt20

LLXX
October 10th, 2005, 17:17
We've decided to abandon hash tables completely. This is going to be a tiny assembler, one that will not create files over 65535 bytes. Because of this, hash tables take up too much space and the performance gains of such an implementation do not outweigh the increased complexity and memory usage.

The choice is between a double doubly-linked list and a binary tree. The most critical factor affecting the decision is the searching algorithm. Due to the overall design of the assembler, we cannot read the entire string from the input, store it in a buffer, and CMPSB with the existing entries. The search must start with the first character of the symbol and end with the last. For a linked list, such an algorithm is simple - initialise two pointers, one pointing at the lowest entry and the other at the highest. As each character is read in, (and after that character is checked for special conditions like whitespace, end-of-record, EOF...) the pointers are moved toward each other - the Low pointer moves upward until it finds the required character in the existing entries (or else the search fails), and the High pointer moves downward until it also finds the required character. The two pointers enclose a sublist that match the current prefix. If the pointers have converged to a single entry, then the search degenerates to a simple character-by-character comparison. After both pointers have been set to the correct positions, the search depth, which determines the character index of each string to compare, is incremented. The next character is read in and the process of checking, moving pointers, and incrementing search depth is repeated. Only when the input character is a null (translated from whitespace, EOF, or end-of-record by the interface routine) and the pointers converged is the search finished.

The question is, does a similar algorithm of successively decreasing ranges exist for searching a binary tree? I've thought about using the same two-pointers concept, but a tree is not linear; there is a right branch and a left branch, and a pointer has to be kept for both of them. Moving the pointers also seems difficult (I've also considered a doubly-linked binary tree, but it doesn't help much). I know that one can traverse the tree in a specific order to obtain a linear sorted list, but having to do that for each search is extremely inefficient. Is there anything else I have missed?

Finally, this assembler has no "parser", by definition. It processes variable-length "records" in the source file linearly, one at a time. This processing is largely composed of comparisons and jumps on the input character. It does two passes - the first, to collect symbols and determine instruction lengths; and the second, which actually assembles the file.

Maximus
October 10th, 2005, 17:27
Just use a pre-balanced alpha tree. It is damnly fast, works well with strings and it is elegant.
(just in case:
you store chars in the tree and make test with them. Mind to set up right node brothers. Over certain depth, allow full string comparison -if really needed.
Also, flatten the tree in a vector, so you could indexes eventually saving some byte)

LLXX
October 10th, 2005, 23:05
Quote:
[Originally Posted by Maximus]Just use a pre-balanced alpha tree. It is damnly fast, works well with strings and it is elegant.
(just in case:
you store chars in the tree and make test with them. Mind to set up right node brothers. Over certain depth, allow full string comparison -if really needed.
Also, flatten the tree in a vector, so you could indexes eventually saving some byte)

You need not reiterate. I know one of the choices is a tree, and it definitely doesn't seem to work well with strings. I've found many many examples of algorithms for storing and searching for strings in trees on the Internet, but none of them meet my requirements. All the implementations I've found so far simply replace the single-comparison to the value of the node with a strcmp or similar, i.e. the entire string to be searched for must be input and then kept in a buffer... of fixed size. This is exactly the opposite of what I require - namely, no maximal limit of the length of string to be searched for, up to the limit of the symbol table itself (65535 bytes). None of the source code on the Internet can do this. Most of the example code just allocates a buffer of fixed length and doesn't even address that issue at all - resulting in the possibility of buffer overflow when used in real code.

I'll go experiment with the trees to see if it can really be done with two pointers.

-- Edit --
Based on in-order traversal, I managed to write this:
Code:
/* lookahead-free recursion-free algo for
lasm symbol table: binary tree impl.
in pseudo-C

* 2005 09 10 v1.0 llxx */

int FindSym() {
char c;
int pLo,pHi,sd;
pLo = pHi = &SymRoot;
sd=0;
/* push pointers to both ends */
while(pLo.l!=NULL) pLo=pLo.l;
while(pHi.r!=NULL) pHi=pHi.r;
/* main loop */
for(; {
c=GetcharS(); /* translates end-of-symbol chars to 0 */
/* seek left pointer */
while(pLo.c[sd]!=c) {
if(pLo==pHi) return 65535; /* pointers collided with nothing found */
pLo=pLo.p /* go to parent node */
if(pLo.c[sd]==c) break; /* found it! */
if(pLo.r==NULL) continue; /* no right node */
pLo=pLo.r /* go to right node */
while(pLo.l!=NULL) pLo=pLo.l; /* force pointer down left subtree */
}
/* move right pointer */
while(pHi.c[sd]!=c) {
if(pHi==pLo) return 65535; /* pointers collided with nothing found */
pHi=pHi.p /* go to parent node */
if(pHi.c[sd]==c) break; /* found it! */
if(pHi.l==NULL) continue; /* no left node */
pHi=pHi.l /* go to left node */
while(pHi.r!=NULL) pHi=pHi.r; /* force pointer down right subtree */
}
/* termination of search or continue */
if(c==0) return pLo.v; /* match successful to end terminator
pHi.v is also the same, since pHi==pLo */
sd++; /* increment search depth */
}
}

It's not completely C, but should be understandable enough... (I don't like my variables typed; chars ints and longs are good enough). This is the format of each tree node:

Code:

typedef struct {
int l;
int r;
int p;
int v;
char *c; (actually ASCIIZ string stored right in the node)
} STtreeNode;

The question is, will this algorithm work at all? Will it be more efficient (when implemented in Asm) or be too slow?

Maximus
October 11th, 2005, 07:48
mmh...
If you make a alpha char tree, like a huff % tree, you:
1) fetch a char and exit of eos
2) step the tree from current node (char) to it (next char)
3) max depth? goto 5. Wrong sym? exit.
4) goto 1
5) take the final leaf ptr and make an asciiz comparison with the remaining part of all the strings that start with the same chars, sorted there. (Depth is chosen to reduce this to a minimum)

I miss how you are limited this way by:
1) "strcmp or similar", expect for lengthy, non unique symbols after a certain depth.
2) "no maximal limit of the length of string to be searched", same as above, since you scan the symbols in the final leaf sequentially, if more than one.
3) "buffer of fixed length". An alpha tree is a parser tree: no overflow: its maximal size is pre-determined (not true for final leaves)

Maybe I misunderstood the problem, but you can have the pre-built tree and then insert stuff dynamically, by just fetching your symbol char by char, and then manage the leaf insertion. Also note that each non-max delpth leaf can have at most 1 symbol inserted there, and you need not to store its asciiz, since it comes from descending the tree.
--------------
as an alternative, you could use skiplists, instead of dlists. Maybe fast enough for using a list approach.

nikolatesla20
October 11th, 2005, 09:22
Maximus, that's an interesting technique. Overall it looks like a state machine though, although one that gets build dynamically at run time. As I've stated before that only advantage to trees is that they are "pre-sorted" and that's really what makes them faster.

So do you mean to read in the symbol and build a tree based off its characters? So one ascii char per leaf then?

-nt20

doug
October 11th, 2005, 10:26
No one mentionned it yet, but maybe you want to investigate the 'trie'. (http://en.wikipedia.org/wiki/Trie). This datastructure was specifically designed for strings and offers very good compression.

Quote:
There are three main advantages of tries over binary search trees (BSTs):

* Looking up keys is faster. Looking up a key of length m takes worst case O(m) = O(1) time; c.f. BST where O(ln(n)) time is required, because initial characters are examined repeatedly during multiple comparisons. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
* Tries require less space. Because the keys are not stored explicitly, only an amortized constant amount of space is needed to store each key.
* Tries make longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key, efficient. They also allow one to associate a value with an entire group of keys that have a common prefix.

Regards
~doug

Maximus
October 11th, 2005, 10:52
The advantage of alpha tree is that you perform the comparison by implicitly traversing the tree using sym chars as splitters (1d lattice). If you have 1 string in the final leaf (or you stop before the max depth), you need not to string compare at all.
So, there is no search at all. you just traverse the tree using the sym as a valid traverse key, and you land where you need to (or not land at all, so you insert).
Traversal order is a valid sorting key, and the string is then implicitly a valid traversal key.
mmh... complexity for len m is dependant on tree implementation. However, the avg number of *char* comparison is ln(m)[*C] if you stay within the depth, or you end up in a single final leaf.
---------------
@nikolatesla: mmh... it does not resemble me a dfa like those of ll1 parsers, or the yacc ones. But maybe I'm wrong. The implementation is highly dependant on design constraints. When I used it, I shelved it over a pre-balanced binary tree, but it is not the only choice (I used to speed up like hell zillion string comparisons over a dictionary). If the parser must run on a 640K VM86, then many things should be taken into account.

nikolatesla20
October 11th, 2005, 11:27
Thanks for the explanation. I still think this actually resembles a state machine, which of course would be the most blazing fast way but of course setting up the machine in the first place is the lengthy part. As the assembler makes its first pass it would have to grab the symbol and then put it into the tree dynamically. So I don't know if this is going to be a speed issue or not - especially since before a new symbol is added to the tree, one has to compare against the tree first to see if it already exists..and then I don't understand how you then map the match to the symbol's value - where is the symbol's value stored in the tree? - if you are using this method.

I'm not a compiler guru though so I'm sure there are concepts still foreign to me (I've only build a few simple parsers and lexers over the years - including some using lexx and yacc)

-nt20

dELTA
October 11th, 2005, 12:23
Quote:
No one mentionned it yet, but maybe you want to investigate the 'trie'. (http://en.wikipedia.org/wiki/Trie). This datastructure was specifically designed for strings and offers very good compression.
Hehe, actually, I did indeed mention it as the second reply to this thread, but then JMI nuked the board database and I didn't have time to re-post it.

Maximus
October 11th, 2005, 13:04
take "maximus". It's lengthy, let's say we have a depth 4: (hey, spaces gets eat on post?! '-' is space then, sorry)
--M
--+...+
------A
-+...+
-X
-+...+
------I
------[1+*'mus',2+*'mum']
key travel for 'maximus' = 'maxi1' or 'maxi'+a fast linear search on use. You can take a bit-key of traversal order+last index, or just take the @node, or add a id on each node, or make an implicit id other ways on etc.
# insertion is not a great speed problem, in my opinion. Its balancing, it is (think of it as a 1d BSP tree).
# you need not to store at all sym < maxdepth, as they are implicit on the tree-walking, as u can see. each node could contain a (d)word ref, if not an implicit one is used.
# the final list can be managed in may ways (hash, dlist, btree).
# Non-binary tree could be used effectively too (quadtrees, octrees, dynamic ntree based on a [char/ptr] table etc etc) but, as u know, they can be reduced to a btree. Just think to the file-system: image you have \a \b \c and \a\b etc.

True problem is tree balance over unknown data. Degenerated trees would slow down performance, as for btree. But not in the same, horrid way.
As for BSP, we must choose the efficient tree splitter: one could make a %huff analysis of symbols at first compile time, then use it as a node rule for next time compile. You would then insert 'empty' nodes to maintain a balanced tree, avoiding so degenerated structures when you insert nodes 'on the fly', and require no "second pass". I would use a static one, and cross my fingers (yeah, I am laaaazy).
----------
mmh... the trie approach seems very similar to the alpha approach above: the trie uses groups of 'char' instead of single ones.

JMI
October 11th, 2005, 15:03
Oh sure, blame the Old Guy who's doing all the work. It's always his fault.

I point out, for the historical record, that dELTA did return and post several "other" Post before he noticed that this one was missing, even though he posted in the Thread where I advised that those posts of October 9th had been lost, so I believe we have to blame dELTA, himself, for "forgetting" to re-post the link earlier.

nanner, nanner.

Regards,

SiGiNT
October 11th, 2005, 15:31
Hey! he's putting the blame where it belongs, I posted something that would have gotten me a physics or nobel or some kind of prize also!!! and now I can't remember!! it was one of those fleeting "inspirations"

GEESH!

SiGiNT

(all of us do our best work after its destroyed!)

dELTA
October 11th, 2005, 15:54
Hehe, that's right sigint33, let him know what's what.

And I said that I "didn't have time", which absolutely does not implicate that I forgot about it even though I posted in other threads, just rather that I didn't find it important enough to re-post.

JMI
October 11th, 2005, 17:20
sigint33:

Great News. The October 9th back-up still exists, I just couldn't get it to install. Therefore your "prize winning" contribution is still there in an .sql file that can be read by any program which will read .sql files. I will immediately search all your offerings and send them off to dELTA, who, undoubtedly, by his own estimation, is the appropriate party to award those Nobel prizes anyway.

And don't you just love it the way dELTA's story changes when you call him on it. First he says: "I didn't have time to re-post it" and when I pointed out to him he posted several posts after the incident, he comes back with the lame "I didn't find it important enough to re-post." If it wasn't "important enough to re-post," why complain about it getting nuked in the first place? Huh! Huh!

When dELTA sends you the check, count the Miljoner Kronor and make sure he caculated the prize correctly.

Regards,

SiGiNT
October 11th, 2005, 17:47
I'll start checking my mail asap - maybe now i can buy some real software!!

SiGiNT

LLXX
October 11th, 2005, 18:35
Hmm... this thread has grown quite a bit.
Quote:
[Originally Posted by nikolatesla20]Thanks for the explanation. I still think this actually resembles a state machine, which of course would be the most blazing fast way but of course setting up the machine in the first place is the lengthy part. As the assembler makes its first pass it would have to grab the symbol and then put it into the tree dynamically. So I don't know if this is going to be a speed issue or not - especially since before a new symbol is added to the tree, one has to compare against the tree first to see if it already exists..and then I don't understand how you then map the match to the symbol's value - where is the symbol's value stored in the tree? - if you are using this method.

That's what we call a "multi-way tree". We're not familiar with the "proper terms" for data structures, since I am not a native of English and have not studied formally such structures. I believe "Trie" is also the same thing: Setup arrays of pointers to arrays of pointers, and accompanying lists of characters. Then traverse them with SCASB and simple pointer operations. That is the exact method we've decided on for the instruction table, because it'll be searched many many times during operation. However, it would extremely slow and inefficient to build dynamically, requiring very many passes (once to fill the root array with all possible first chars of symbols, again to fill the first entry's array in the root with its symbols, etc...) and run in exponential time. (We cannot allocate fixed root and higher-level arrays, since that is a big waste of space).

Maximus' solution looks like a hybrid of a Trie and a Linked List to me. I'll use his example of maximal depth of 4 to calculate how much storage would be needed just for such a small depth. First, the root array. It will need to take up around 256 words (more like 240 due to characters that will never occur in a symbol, but I round to the closest T just to simplify calculation), or 512 bytes of storage. There are 256 secondary arrays also of 512 bytes each. The calculation proceeds:

1 x 512 (root array)
256 x 512 (second-level array)
256 x 256 x 512 (third-level array)
256 x 256 x 256 x 512 (fourth-level array)

For a total of approximately
8589934592 + 33554432 + 131072 + 512 = 8623620608 ~ 8Gb!

Don't forget that even more uninitialised memory will be needed to store the linked lists that distend from the tree. That is a bit much for a little x86 assembler, don't you think? And 99%+ of those pointers will be completely empty.

For this reason symbols stored within the symbol table tree will be complete, ASCIIZ strings in each node. Also, I've decided *not* to do a search to see if the new symbol exists when inserting one, and just exit if such a duplication does occur in the process of insertion (our assembler won't need much in the way of error handling, since we don't want it to keep reading the rest of the file after it's already found an error).

I've done some more research, and found that if I could implement a non-recursive, non-stack-based algorithm for doing an inorder traversal on the tree, it would be possible to treat the tree like a doubly-linked list and thus I could use the two-pointers algorithm. The sample code I posted earlier was partially correct; however I neglected to enter the right subnode on the search. Here is a corrected pseudocode for an "incremental" in-order traversal (i.e. being already initialised to a node, seek the next one that would be visited in an inorder traversal).

Code:

if(node.r!=NULL) { /* visit right subnode, force down as far left as possible */
node = node.r; /* enter right subtree */
while(node.l!=NULL) node = node.l; /* force all the way left */
break; /* done, that is the next node */
}
/* next node is parent, or parent-of-parent if arriving from the right */
if(node.p.l==node) { node = node.p; break; }
node = node.p; node = node.p; /* parent of parent since already visit parent
when coming back up from the right */

Strangely enough, I haven't found a single algorithm on the Internet that works like this.

doug
October 11th, 2005, 20:18
Your computation for space is incorrect. You are assuming a very naive implementation.
Also tries do not need pointers. It is possible to make a pointerless trie.

google for "pointerless trie". 3rd link from top: http://www.cs.mcgill.ca/~tim/tries/trieNotes.ps is very informative.

Disclaimer:
- not easy to implement.

You seem to be looking for both best-in-class space & time complexity for both insert & search. I doubt it is possible.

naides
October 11th, 2005, 22:04
LLXX I have a question stemming from my sheer ignorance and naivite in this issue, so I apologize in advance if my question is indeed way too stupid:
You are very concerned with the size of the code, the efficiency of the search algo in the symbol table atc etc, but you are being overtly liberal in THE LENGTH of the symbols. And this no-limit policy is at the center of your problem. Why is it so important that the symbols be of unbounded length, if you want to keep your compiler simple and small?

LLXX
October 12th, 2005, 02:23
Quote:
[Originally Posted by doug]Your computation for space is incorrect. You are assuming a very naive implementation.
Also tries do not need pointers. It is possible to make a pointerless trie.

google for "pointerless trie". 3rd link from top: http://www.cs.mcgill.ca/~tim/tries/trieNotes.ps is very informative.

Disclaimer:
- not easy to implement.

You seem to be looking for both best-in-class space & time complexity for both insert & search. I doubt it is possible.

Very informative indeed, but a bit too thick on the theoretical side. Definitely way too complex to implement. Thanks for the link anyways.

Quote:
[Originally Posted by naides]LLXX I have a question stemming from my sheer ignorance and naivite in this issue, so I apologize in advance if my question is indeed way too stupid:
You are very concerned with the size of the code, the efficiency of the search algo in the symbol table atc etc, but you are being overtly liberal in THE LENGTH of the symbols. And this no-limit policy is at the center of your problem. Why is it so important that the symbols be of unbounded length, if you want to keep your compiler simple and small?

In short, because someone with the initials E.J.I. has written an assembler that does almost that, in a little 32Kb COM file.

He also claims his assembler "assembles at a rate of over one hundred thousand lines per second." on a Pentium II 400, i.e. approximately 4000 cycles per line. It also has a symbol table that accommodates "approximately 3000 10-letter symbols", suggesting a 64k symbol table not unlike what we're planning. Line lengths are essentially unlimited except by the sourcefile's length. However, it "recognizes up to 127 letters in a symbol". The author also claims his assembler is "the best assembler available". Hmm...

His assembler is unfortunately commercial closed-source software, yours for $80. The website has an interesting "online ordering system" which, according to the text, gives you a link to download the full version after you've entered your information. (No, we're not planning to reverse that; not yet, at least...) In addition, his assembler has an interesting quirk: it uses what the author claims to be an "invisible fingerprint" so he can essentially tell if you produced a program using it. Not so invisible after all, actually. It just flips the fields in a MODRM and uses the alternate form of the instruction (rw, uw instead of uw, rw) for certain combinations of two registers. Alternate forms of instructions are also used for other instructions.

What we're trying to do, essentially, is "dethrone" him with our assembler, which is going to be 100% open-source freeware. To make that happen, we'll have to make a better assembler, of course. The goal is to make it run at double his assembler's speed (2000 cycles per line), and also have a smaller executable as well, maybe somewhere in the < 10K range. As well, it'll also have the capability to accommodate lines of length only limited by the source file's size itself, and in addition, symbols of nearly unlimited length too. Finally, it should be able to assemble itself. We will welcome the creation of other assemblers, variants based on our code.

To do this, we're using a completely different programming pardigm - no structured programming, data and code abstraction nor OOP. This is 100% pure Asm, using flowcharting (we call it the "linear" method) to design. It will essentially behave like a giant state-machine that switches on each character input from the source file. This is going to be an assembler that represents the true spirit of the Asm language: make it the most efficient code possible, and don't settle for compromises.

Once it's complete I might release it here as well.

--------
Me and the others in our group have had some discussion regarding prefix-searching a binary tree using limit pointers, and we've found that it is posssible, as long as a non-recursive and non-stack-based inorder-traversal algorithm can be created. Needless to say, all implementations of non-recursive inorder traversal algorithms that we found on the Internet needed auxillary storage in the form of a stack or a list. We proceded to design our own, and came up with this pseudocode fragment that requires no previous information, only a pointer to a node, and will change the pointer to point to the "next" node in the tree. All nodes in the tree are doubly linked, with a p member that points to the parent. (Google shows no results for "doubly-linked binary tree"+"non-recursive", however).

if node has right subtree {
enter right subtree;
follow left pointers with node until no more left subtree;
break; }
while node's parent's right pointer not equal to node {
node = node's parent; /* go up the tree until not from the right */
} /* if come up on right side it means the parent has already been visit */
node = node's parent;

By reversing the left's and right's, it's possible to perform the reverse operation, namely finding the previous node in an inorder traversal.

Time to implement this into actual code and test it, I guess...

Maximus
October 12th, 2005, 07:33
mmh... read my posts with more attention. Your analysis is wrong. You need to pre-alloc only the high % nodes to keep the structure balanced, and you get'em from a %huff tree, or from an initial, static analysis. Apart the fact you use a ~256char symbol table(!), you need at most 4-16 pre-allocated nodes for level to obtain a well balanced btree. Also, you can just use indexes instead of pointers, like the free-lists. You need not to de-alloc elements dynamically, so you could need only an: int first_free=0; Alloc(){return first_free++} which could be fine as long as you pre-alloc a segment for it.
Alpha trees, however, do not perform well when you use many words that share the same initials. It is used for dictionary-like strings.
-----
I am very curious to know the solution you will choose to implement for your problem.
-------------
Don't know how good it might be, however hope it can help:
(i wrote it quickly, so I did not take into accounts many small things, and it might be not perfect)
Code:

tree: (8 byte)
char c;
char flags;
int child, left, right;
A/slist: (4 byte)
char * asciiz;
slist_next;
B/slist: (6 byte)
int quickhash;
char * asciiz;
slist_next;


int free_wflist = 1
alloc_treenode { return free_wflist; free_wflist+=4}
alloc_listnode { return free_wflist; free_wflist+= A/2 or B/3}

process_sym(ch) {
while (ch[depth] - sym[index].c) {
if (depth>max_depth || ch[depth]=0) break;
if /greater) {
index = right
if (index=0) { insert_at(&right, ch[depth]); }
continue}
else if (lower) {
index = left
if (index=0) { insert_at(&left, ch[depth]); }
continue}
else {
index = child; ++depth;
if (index=0) { insert_at(&child, ch[depth]); }
}
}
// index is a slist pointer here on any case, due to insert_at
slist_iter = index;
for any element in slist do {
if (slist_iter=0) {slist_iter = insert_in_slist(); break}
if (length(asciiz)>32) { push &quickhash; }
if (slist.quickhash<>0 && slist.quickhash=eval_once_qcheck(ch)) {found}
else { compare strings and check result }
slist_iter = slist_next;
}
if (length(asciiz)>32) { pop all quickhash pointers }
return always the found item, and insert it on need.
}
insert_at() {
right = alloc_treenode; // node memory could be already zeroed, or fill it here
node[right].c = ch[depth];
}
insert_in_slist {
old->slist_next = alloc_listnode; // node memory could be already zeroed, or fill it here
if (length(asciiz)>32) { // manage quick-hashes for long strings, which decrease slowdown on larger, similar syms
slist[quickhash] = evalued_once_quickhash;
for every pushed quickhash {if (pushed quickhash=slist[quickhash]) {slist[quickhash]=0;(pushed quickhash=0 } }
}
set asciiz ptr
return old->slist_next;
}


it is essential that the most used % alpha nodes are created before parsing, following the percentages of a huffman tree, or a static analysis. Then, you can make your compiler to 'tune up' to each user style by generating %huff during compilations, so to speed up next compilations.

nikolatesla20
October 12th, 2005, 09:25
I know which assembly guy you are talking about - I've used his assembler before in the past. Keep up the good work!

-nt20

laola
October 13th, 2005, 22:06
@LLXX:

Don't fall for that marketing yadda yadda too much. Sure it can assemble at 4000 cycles per line, but it is pretty obvious that this cannot count for all circumstances, like symbols with 500 chars and symbol tables with 30K entries etc. They don't even mention if this number is for the overall performance or for the second pass (after all the time-consuming parsing in pass one has been done already).
These raw peak numbers mean exactly nothing without a context. It's like stating compression ratios of 99.9% for a packer. Which is absolutely possible - and valid for most sophisticated packers - if you choose appropriate input. But of course it does not mean that every file can be compressed this way. But still it makes perfect marketing "FUD"

BTW, I am sure that even if you beat every single statement, they will just make up new ones

P.S. I

@nikolatesla20:

If you have used this assembler in the past, maybe you can shed some light on the performance. Is it really that good?

LLXX
October 15th, 2005, 02:15
An assembler with a minimum of 4000 cycles per line would actually be quite slow as it processed a source file full of nothing but blank lines (0d 0a). I believe that number is an average. Just to get an estimate of the maximum speed of the assembler we're competing against, I created a 4MB source file, initially consisting of just blank lines (thus there were 2 million lines in the file, and I had to create it with a hed). I then dispersed the few lines of the traditional "Hello world!" program within that file - a "mov ah, 9", a "mov dx, offset Hello", an "int 21h", an "int 20h", and a "hello db 'Hello world!$'" all separated by several hundred thousand linefeeds. I know this is a very contrived example, but nonetheless useful to see how fast "null records" are processed. The results were actually quite surprising, as you'll find out if you continue reading

Unfortunately, it took less than a second for the assembler to read and assemble the whole file, although that was to be expected on a 4.17GHz CPU and with the file cached. My solution was to have the assembler run many many times, and time the total interval. I will for the moment ignore the overhead of the OS kernal in loading and executing the assembler. The small program I created runs the assembler 253,802 times, in fact (a random number I just picked for no reason). I redirected its output to the nul device to avoid further delays related to file writing (I could care less about the contents of the output file - the assembler works correctly 100% of the time). Test done on the same 4.17GHz processor, under pure MS-DOS 7.0 in realmode with the files being read and written to a RAMDISK, to further reduce effects of hard disk speed.) I started the program and waited... and waited... went to do something else... and waited... went to do something else... and waited...

Apparently that was a bad thing to do. Some calculation reveals that if indeed it was operating at 4000 clocks per line, it would've taken 5 and a half days to assemble the total. I terminated the test and reduced the iterations from 253802 down to 20,000. The result was extremely surprising - 40,000,000,000 total lines in exactly 10 seconds. That is impossible, approximately 1 clock per line, even less if you count the overhead of the OS's Exec code running 20,000 times. I know with absolute certainty that NO assembler that ever exists can do that. WTF exactly is going on here? Once I acquire an extinct 286 or similar I'll do tests with better accuracy. A 4.17GHz processor is too fast to get any decent figures from.

From our design of our assembler so far, it'll probably end up having an estimated minimum of at least 100 clocks per "record" and no maximum.

dELTA
October 15th, 2005, 05:37
See it from the bright side, it really shouldn't be hard to trace the parse loop to see if it uses some special trick to quickly skip empty lines. Have you done that at all?

The good thing in general with people writing extremely small and fast programs is that it's usually extremely easy to reverse them.