------------------------------------
   TEXTO - Text steganography
------------------------------------

   Texto is a rudimentary text steganography program which transforms
uuencoded or pgp ascii-armoured ascii data into English sentences.

   This program was written to facilitate the exchange of binary 
data, especially encrypted data.  Why is this necessary?  People or
programs may be reading your mail.  Recent events in the US congress may 
_require_ Internet Service Providers to monitor incoming mail and determine 
whether or not it is "obscene" or lives up to particular parochial moral 
standards.  Since they can't scan the contents of an encrypted message, 
and probably don't have time to manually look at each uuencoded message, 
such emails will probably go into the bit bucket.  This program's output
is hopefully close enough to normal English text that it will slip by
any kind of automated scanning.
   
   Texto text files look like something between mad libs and bad poetry,
(although they do sometimes contain deep cosmic truths) and should be close
enough to normal english to get past simple-minded mail scanners and to
entertain readers of talk.bizarre.

   Texto works just like a simple substitution cipher, each of the 64 ascii 
symbols used by pgp ascii armour or uuencode is replaced by an english word.  
Not all of the words in the resulting English are significant, only those 
nouns, verbs, adjectives, and adverbs used to fill in the preset sentence
structures.  Punctuation and "connecting" words (or any other words not in 
the dictionary) are ignored.

   The obvious main drawback to using this program: the resulting text
is larger than the original data by a factor of 10.  This is bad to the
point of uselessness if you need to send a 5MB uuencoded file.  What
are some possible solutions to this problem?  Using shorter words would 
yield only minimal improvement as most of the words are pretty short now,
and you would still need the same number of english words.  The best 
solution I can think of is to use more words, one for every 2 symbols 
instead of a one-to-one symbol to word mapping.  This requires 4096 words 
for each part of speech, (finding that many adverbs will be a real challenge), 
but search speed shouldn't become a big factor when transforming text to data,
since texto uses a hash table for the words and their lengths in order to
minimize search times.  The net result would probably be an average expansion 
by ~5x instead of ~10x, which is significant enough to warrant trying it.
Changing the code will be easy, the hard part is typing in the dictionaries.
Look for this feature in texto 2.0 coming Real Soon to a net near you.

   Since words are occasionally pluralized and/or gerundized (-ing), and
they're not all regular verbs/nouns, there are plenty of strange spelling
mistakes.  While normally I despise misspelled words, they add a nice
human touch to the repetitive text, and add to the feeling that who/whatever
wrote the text was quite clearly out of his/her/its mind.  


Usage:
------
   
texto msgfile > engfile         - Transforms the contents of msgfile into 
                                  English text and places results in "engfile"
                                  msgfile must be a uuencoded or pgp ascii-
                                  armoured text file.

texto -p engfile > pgpfile      - Takes English text from engfile and produces
OR                                a pgp ascii-armoured text file, which will 
texto -p engfile | pgp -f         be readable by pgp if the original message 
                                  file was.  Alternatively, the output from
                                  texto can be piped directly into pgp.

texto -u engfile > uufile      - Takes English text from engfile and produces
OR                                a uuencoded file, which will be readable by
texto -u engfile | uudecode       uuencode if the original message file was.  
                                  Alternatively, the output from texto can 
                                  be piped directly into uudecode.  
                                  NOTE that uudecoding the results will always 
                                  produce a file called "texto.out" mode 644, 
                                  unless you redirect texto's output into a 
                                  file and hand edit that file.

Installation:
-------------

   This program has only been tested on IRIX 4.0.5, linux kernel 1.0.x, 
and Solaris 2.3.  To build it, just type "make", on SGIs make it with the 
command "make sgi".  If you're on a Solaris machine or any other machine 
whose uuencode uses spaces instead of ` characters, uncomment the
"DEFINES" line in the makefile.


Rolling your own:
-----------------

   The usually-correct English sentence structures are found in the file 
"structs", which is basically a file of mad lib-type "fill in the blank"
sentences.  Feel free to add your own, just be really really careful about
not using words in the "words" file.  You're safe if you use words that
you see elsewhere in the "structs" file.  Using varying "structs" files
could at least annoy mail scanners.  Using different "words" files as
well should totally defeat them.

   The 64 verbs, 64 adjectives, 64 adverbs, 64 places, and 64 things 
which are used to fill in the blanks are in the "words" file.  Again, feel
free to add your own, but again, be careful.  Don't use words that end in 
"s" or "ing" (they'll get chopped), don't use words that are already in 
there (you can double check with the command "sort words | uniq -d"). The 
order of the words in each section of the file is also significant, so for 
example rearranging the nouns will change the result.

   If you use a modified "words" file, the person on the other end of 
your communication must of course be using the same one, or the transformation 
will fail miserably.  The "structs" file is totally irrelevant however, and
can be modified to suit your taste or literary style, so long as it doesn't
conflict with the "words" file as mentioned above.  The structs file is
not used in "decoding" text, so two people can still communicate whether
or not they have the same "structs" file.

BUGS
----

   uuencoded files lose the mode and filename information, which is a bummer.
   Always writing to stdout may not be the best way to go.
   The text produced by texto'ing a uuencoded file can be _really_ repetitive.
   The 64-word dictionaries thing vs. the 4096-word ones, as mentioned above.
   Texto is a dorky name, but it sortof rhymes with stego.
   Please report any other bugs or fixes to kmaher@ucsd.edu

LICENSE
-------

   Copying, modifications, improvements, etc. are highly encouraged, just
   let me know so I can incorporate them.

   All rites reversed.

AUTHOR
------
   
   Kevin Maher
   kmaher@ucsd.edu
   Underware Software Production Ltd. Inc. etc. 
   "Covering your ass since 1981"