PDA

View Full Version : my small question.


BanMe
January 13th, 2011, 22:44
What is the best way to call the functions listed below to determine what data is what, string or digit(0-9) or possibly hex(digit+a-f).Would you choose to recognize strings firsts or hex.

using only these functions.
Code:

int isalnum(int c) -- True if c is alphanumeric.
int isalpha(int c) -- True if c is a letter.
int isascii(int c) -- True if c is ASCII .
int iscntrl(int c) -- True if c is a control character.
int isdigit(int c) -- True if c is a decimal digit
int isgraph(int c) -- True if c is a graphical character.
int islower(int c) -- True if c is a lowercase letter
int isprint(int c) -- True if c is a printable character
int ispunct (int c) -- True if c is a punctuation character.
int isspace(int c) -- True if c is a space character.
int isupper(int c) -- True if c is an uppercase letter.
int isxdigit(int c) -- True if c is a hexadecimal digit.


for example..
you have a table with x elements assume you know nothing of what the table holds in it.

In what order do you think calling these functions above would cover the most 'data' correctly?

Kayaker
January 14th, 2011, 01:03
OK, I'll bite. There might be some funky mathematical proof to come up with the most efficient scheme, but looking at the table at the link below I think one could derive a somewhat intuitive method:

http://www.cplusplus.com/reference/clibrary/cctype/


Maybe.. for the basic 127-character ASCII set, using iscntrl and isspace would eliminate everything from 0x00-0x20 and 0x7F. Everything left would fall into the isgraph category. That would mean you wouldn't need to use isprint, for a start.

isgraph however is covered by isalnum and ispunct, so that means you could also omit using isgraph.

At the same time, isalnum = isalpha + isdigit, so you could omit isalnum from your algo.

And isalpha = isupper + islower, so you don't really need isalpha.

Etc. Distill it down in that way and you could come up with a minimum list of cctypes which will cover all bases. At that point it probably doesn't matter much how you put them together.

I might have missed something, but I think what's left to cover everything is: iscntrl, isspace, ispunct, isdigit, isupper, islower

BanMe
January 14th, 2011, 19:52
Thanks to kayaker for a very complete and thorough answer.

I need to do this in a decisive a dependable manner so I have choosen to take the path of the 'checking' opcodes 'around' the data to try to discern just what it is, and if I (have) to read from it more then 1 time to get to a value and not a address. This is a very daunting task, But I will start simple as I have already done and try to maintain simple throughout my code, and in this way I can move forward to a functionable paper about "code analysis using data in a relocation table". ahahahahah lol after refining my search to those words I got hits!! woohooo!!!And finally not just viral things o0

http://www.stanford.edu/~stinson/paper_notes/stat_anal/disass_exec_code.txt

http://webcache.googleusercontent.com/search?q=cache:izpOMtKYvHgJ:ftp://ftp.cs.wisc.edu/paradyn/papers/Harris05WBIA.ps+code+analysis+using+data+in+a+relocation+table&cd=6&hl=en&ct=clnk&gl=us

Kindest regards BanMe