CULT OF THE DEAD COW: "Whisker" by Rain Forest Puppy

cDc is pleased as punch to assist in bringing u, the 'l33t muthafuckas that u r, a clever lil' app to aid u in yer endeavors: .rain.forest.puppy.'s Whisker, v. 1.4, available at Defcon.
Currently available, v. 1.3.0 a.
---------[ Whisker: next-generation CGI scanner			doc v1.2.0

--[ by rain.forest.puppy / ADM / wiretrip  	(rfp@wiretrip.net)
								
----[ Table of Contents

	- Background
	- What whisker does/has
	= - array
	= - scan
	- Command line reference
	- Global variable list
	- Language reference
	- Advanced coding tekniq
	= - Logic evaluation
	= - Scan optimization
	- Wish list/future enhancements
	- What's to become of web scanners
	- Notes for eval/internal coding
	- Signoff

----[ Background

A CGI scanner is just a CGI scanner, right?  And they're pretty lame apps
to boot, right?  Hmmm...well, perhaps.  That's because no one has given
any thought to them.  Yeah, until I did.  Perhaps I have to much time on
my hands. ;)  After reading this, I will be surprised if you don't think I've
put way to much thought into this.

I've waded through the pile of CGI scanners found on Packetstorm (before
JP got his way; j3rk), Rootshell, etc.  Suidshell's cgichk.c (and derivitives)
are the most comprehensive....but that seems to be the 'goal' they shoot
for--try to have 'the most checks in any scanner'.  Great.  Nevermind the
fact that some of the checks are completely wrong (I think it's funny to
notice how the Cold Fusion '/expeval/' has propogated to so many scanners
as '/expelval/'--one kiddie made a mistake, and they all copied.)  

Wait...CGI scanning isn't that complex, is it?  Well, to do it right, yes.
Why?  Hmmm...I can think of a few reasons:

1.  /cgi-bin is pretty damn common, I'll give you that.  But I've also been
on many a hosting provider that used /cgi-local.  And I've seen people use
/cgi, /cgibin, etc.  Fact of the matter is that it could also be 
/~user/cgi-bin, or /~user/cgis, etc.  Then there's some scripts that are
all over the place, like wwwboard, which may or may not have it's own
directory.

Point of the point:  wouldn't it be nice to define multiple directories?

2.  You know what really irks me?  Seeing a CGI scanner thrash around through
/cgi-bin or whatnot, when /cgi-bin doesn't even exist.  Talk about noisy in the 
logs.  Now, if we waste a brain cell, we can see that if we query the /cgi-bin
directory (by itself), we'll get a 200 (ok), 403 (forbidden), or 302 (for 
custom error pages) if it exists, or a 404 if it doesn't.  Wow.  So if we
just do a quick check on /cgi-bin, and get a 404, we can save our however
many /cgi-bin CGI checks we were going to make.  That could save you 65
entries in the httpd logs.

Point of the point:  save noise/time by querying parent dirs

3.  If you have more to spare, let's waste another brain cell for another 
obvious issue.  Why should I query for, say, test-cgi on an IIS server?
Or /scripts/samples/details.idc on Apache?  Why should I even bother checking
various httpds at all (like a firewall proxy, etc)?  When we do a request,
the server gives us it's name and version.  How nice of them.  How about
we take advantage of their generosity?

Point of the point:  tailor your scan to the server you're scanning

4.  Virtual hosts.  Most webservers nowadays (especially Apache with it's 
VirtualHost directive, and IIS with its virtual host setup wizards) allow
you to assign many actually domain name/websites to the same IP.  Well,
hell...how does the server know which site you want when you connect?
Well, browsers give a second piece of information, the 'Host' directive.
So, a request may look like:

	GET /~rfp/index.html HTTP/1.1
	Host: www.el8.org

So say we have SlikWilly Virtual Hosting, they run off RedHat Linux using 
Apache.  They setup their only IP (as that's all they could afford for
their $39.95/month shared DS0) to host the site www.slikwilly.com.  Now,
on the actual box, the location for their files are in /home/httpd/html/
for html files, and /home/httpd/cgi-bin/ for, whatelse, but their CGI apps.
So a request to www.slikwilly.com/index.html is going to be pulled from
/home/httpd/html/index.html.  So far, so cool.

Well the powers that be at Defcon decide that they've had it with catalog.com,
since ADM hacked their webpage there.  They want to move over to
SlikWilly.com in hopes that it will keep those ADM people from changing
the site.  So Slik Willy himself hops into his httpd.conf and adds a
VirtualHost directive for www.defcon.org.  He sets up the html directory
to be /home/defcon/html/, so that those Defcon people can ftp in via his nifty
wu-ftpd-2.4.2(beta 18).  So that means that www.defcon.org/index.html should 
be pulled  from /home/defcon/html/index.html.  Slik Willy also gives them
their own cgi-bin, located in /home/defcon/html/cgi-bin/ (which means it's
no silly aliased directory, since Slik doesn't understand all that stuff).

So, now, in this situation, www.defcon.org is a *virtual* site off of 
www.slikwilly.com (the root site). What exactly does that mean will
happen?  Well, let's see:

If I give the request:
	GET /index.html HTTP/1.0
I will get back the file at (assuming it exists):
	/home/httpd/html/index.html
which is Slik Willy's file (www.slikwilly.com)

If I check for:
	GET /cgi-bin/test-cgi HTTP/1.0
I will be checking for:
	/home/httpd/cgi-bin/test-cgi
which is again Slik Willy's file (www.slikwilly.com)

Now, if I check for:
	GET /index.html HTTP/1.0
	Host: www.defcon.org
I will get back:
	/home/defcon/html/index.html
which is the www.defcon.org homepage

Similarly:
	GET /cgi-bin/test-cgi HTTP/1.0
	Host: www.defcon.org
I will be checking:
	/home/defcon/html/cgi-bin/test-cgi
which is in www.defcon.org's cgi-bin.

Now, why does any of this fscking matter whatsoever?  Well, imagine you wanted
to be like ADM, and try to hack www.defcon.org again.  So you whip out
your trusty cgichk.c CGI scanner (oooh, you hacker you) and rev it up
against www.defcon.org.  Well, guess what--the scanner connects to Slik
Willy's box, does generic requests (no Host), and winds up scanning Slik
Willy's cgi-bin for cgis, not the actual www.defcon.org's cgi-bin.  And
there exists the possibility that www.defcon.org had way cooler stuff than
Slik Willy.

But lemme just make it known, this usually works in your favor.  For instance,
on IIS, the virtual hosts will *NOT* (unless specifically added) have
/scripts mapped to them--but the root site will.  So, trying to GET
/scripts will work off the main (generic) site, but if you try a virtual
host with Host directive, most likely /scripts won't be mapped over.  Same
for Slik Willy.  test-cgi comes by default in /home/httpd/cgi-bin/, not
/home/defcon/html/cgi-bin.  So scanning the root site is better to find
the 'default' install CGIs.

Point of the point:  there's a whole 'nother world out there hiding behind
			virtual hosts--and you may not be scanning who you
			think you really are

5.  Some places user custom error pages.  Unfortunately, the
implementation is such that instead of generating a 404 'not found', you
always get a 200 'success', with HTML to indicate the missing page.

Point of the point:  being able to minimize this anomaly would lessen
			false positives

6.  More wishes:  at a decent rate, it seems more CGI and webserver problems 
are found.  Plus, I might like to customize which scans I want to do
against a particular host.  Having to edit C code and recompile everytime
could quite severely suck, especially if I'm a lousy C coder to boot.

Point of the point:  if this was all scriptable, that'd be nifty

7.  Input sources.  I dunno about you, but I'm quite tired of doing bizarre 
awk/host -l combos, dumping them to a file, and then feeding them back into 
the various scanners.  Sometimes I want to just feed in output from nmap
(after all, it has a list of the found open port 80's, right?), sometimes
just a laundry list of IPs/domains, and sometimes, I'd just like to do a
single host on the command line.

Point of the point:  flexibility of input would be nice as well.

8.  IDS/log avoidance.  Do you know how many IDS alarms you'll set off by
requesting /cgi-bin/phf?  Let alone it's easy to spot in the logs.  So 
instead of just handing over the plaintext, why not URL encoded all/part of
it to break up the literal plaintext string, such as /cgi-%62in/ph%66.  It 
keeps the string-matching/packet-grep IDS systems from getting a positive id,
and the more encoded you make it, the harder it is to figure out what it is
(on the flip side, it also stands out more in the logs, even if it's unknown
what /%63%67%69%2d%62%69%6e/%66%69%6e%67%65%72 is really scanning for).

Point of the point:  being able to spoof IDSs would be a nice feature

Well, that's enough wishes, don't you think?  Now, do they come true....

----[ Whisker has all that, plus a bonus feature or two :)

Yeah, no kidding.  Come on, I wouldn't wish for something that I didn't
actually implement.  I'd look dumb. :)  My future wishes are down below
at the end. 

Anyways, so whisker does all that.  Let's look at the two basic functions of
whisker, array and scan.  This is a reprint of the command reference below,
but a little more verbose.

-[ array {name} = {comma delimited list}

This is one of the two core commands of whisker (the other being scan).
Basically, you make an array named {name} with elements from your comma
delimited list.  This array is then referenced as @{name}, and given to
the scan function to scan the permutations of the names in the @array.
You can include another array in the list of elements...it will be added
inline.

Example:
	# let's make an array of common unix cgi locations
	array roots = cgi-bin, cgi-local, cgibin, cgis, cgi

	array first = a,b,c
	array second = d, @first, e
	# second = d,a,b,c,e

	array bigroots = cgi-bin, cgi-bin/secret, cgi-bin/rfp
	
	# this is a big NO!
	array moreroots = cgi-bin/@first, rfp/@bigroots
	# only the scan() function will parse roots like this


-[ scan ({optional server regex}) {dirs} >> {script}

This is the heart of whisker.  This command is what actually performs
the scanning.  There are a few aspects to the command.  First is the
{optional server regex}.  You can do a server specific scan one of two
ways:

	server (iis)
	scan () scripts/tools >> getdrvrs.exe
	endserver

or shorten it as:

	scan (iis) scripts/tools >> getdrvrs.exe

Scan will only do the check if the server regex is () or matches 
(similar to the server command).  Now, {dirs} and {script} are required.
{dirs} is a command delimited list of directories to check to see if
{script} exists.  {dirs} may also contain arrays made with the array
command.  Let's see some examples:

	scan () cgi-bin, cgi-local >> my.cgi

will check for /cgi-bin/my.cgi and /cgi-local/my.cgi

	scan () a/b, a/c, a/d >> my.cgi

will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi

	array subdirs = b,c,d
	scan () a/@subdirs >> my.cgi

will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi

	scan () @subdirs >> my.cgi

will check for /b/my.cgi, /c/my.cgi, /d/my.cgi

	scan () a, a/@subdirs, f/@subdirs/g >> my.cgi

will scan for all those permutations, expanding out @subdirs into every
combo involving the elements in @subdirs.  So you see how powerful
directory arrays can be.  If we have an array of places we want to look
for CGIs
	
	array roots = cgi-bin, cgi-local, scripts
	array people = ~rfp, ~adm, ~wiretrip

we then can scan for wanted combos

	scan () @roots, @people/@roots >> my.cgi

this is nice because we only have to adjust our arrays to compensate
for different locations, and we can use the arrays for all our scans in
the programs.  How centralized. :)

You can specify the root directory by using a single /, as such:

	scan () / >> index.html

whisker automatically checks each directory as it goes in {dirs}, and 
caches the response.  See 'Advanced coding tekniq: Optimized Scans' for
more information on how to (ab)use this command properly.



So basically, you define arrays of directories (although that's 
optional), and use scan to scan for the scripts.  Easy enough.  Plus,
there's a suite of other simple logic to help out in our scanning
endeavours.  You can take a peek at the included sample scan.db to see
usage, if you're a learn by doing/example type person.  Anyways, onto
using whisker....

----[ Commandline reference

Here is the commandline reference ripped from whisker itself:

Usage:  whisker -s script.file ((-n input.file) | (-h host) | (-H list))
		(-l log.file)

	-s specifies the script database file     **
	-n nmap output (machine format, v2.06+)   *
	-h scan single host (IP or domain)        *
	-H host list to scan (file)               *
	-V use virtual hosts when possible
	-v verbose.  Print more information
	-d debug. Print extra crud++ (to STDERR)
	-p proxy off x.x.x.x port y (HTTP proxy)
	-l log to file instead of stdout
	-u user input; pass XXUser to script
	-I IDS-spoof mode--encode URLs to bypass scanners
	-E IDS-evasive mode--more IDS obfuscation
	-i more info (exploit information and such)
	-N query Netcraft for server OS guess
	-S force server version (e.g. -S "Apache/1.3.6")

 	** required     * optional; one must exist

Now, basically, -s {script.file} is required.  This is your scan database,
your big file of whisker script code that tells whisker what to do.

Now, you have three input options, -n, -h, or -H.  You must have at least
one, but you can use multiple.  They are:

   -n nmap.file 	supply a nmap (v2.06+) *MACHINE FORMAT* output file
			you can get this by using nmap -m nmap.out
			whisker will read it in and check every host with port
			80 found to be 'open'
   -h {ip or domain}	single host.  Just supply host on commandline, such
			as "-h www.microsoft.com"
   -H host.file		this is essentially a laundry list of ips and/or domains,
			one per line, like thus:
				www.microsoft.com
				www.sun.com
				123.123.145.167

-V tells whisker to attempt to use virtual host domains where-ever
possible. If you're scanning an IP address, -V won't do anything.  But
if you're scanning a domain name, whisker will include the domain name
in the Host:  directive. 

-v will print more verbose information (to console or logfile).  -d will
print debugging information to STDERR (usually console). -i will include
information specified by the 'info' command.

-l log.file  will redirect all information to log.file

-p x.x.x.x.y is the proxy command. No, this is not SOCKSified, or anything 
else. Basically, the -p is for firewalls and such that you connect to the
PROXY, and then issue:

	GET http://my.target.webserver:80/page/i/want.htm HTTP/1.0

x.x.x.x is the IP address, and y is the port OF THE PROXY.

-u is just a simple way to give information on the commandline, that is
placed directly into the XXUser variable for use within the script. 
That way, you can have a configuration switch externally, like "-u 1"
will be normal scans, and "-u 2" will be extra-stealthy scans, etc.  Use
"if XXUser == ???" inside the script to query the value. 

-I will "URLify" the request line, as spoken about earlier.  It will
encode all letters, numbers, dashes and dots as their hex escaped
sequence equivalent. -E will enable more IDS 'evasion'.  Use -I and -E to
bypass tons of IDSes (I've tested it, and was able to sneak past a lot
of 'em).

-N causes whisker to query Netcraft (www.netcraft.com) and see what they
think the OS is.  Not 100% reliable, but it's a start, and fairly accurate.

-S lets you override what server whisker parses the script as.  You submit
a server string, such as -S "Apache/1.3.1 PHP/3.0.2a"

That's it, go play!  Use the included scan.db for reference.  The rest
of this is technical information and whatnot. 

----[ Global variable list

These are the variables accessible from within the script.  Why all the
prefixed XX's?  So you're less likely to clobber them. :)  I suggest
don't poke values into these unless you know what you're doing. 

** Note: this list is not complete, due to time constraints.  Check my
website for updated documentation and a full list.

Name		Default value		Description

XXPort 		(80)		port to scan...80 for normal webservers
XXRoot 		()		default prefix for URLs...
XXMeth 		(HEAD)		how to retrieve the file...HEAD preferable
XXVer		(HTTP/1.0)	http version for us to use
XXDebug		(0)		do we want debug output
XXVerbose 	(0)		do we want verbose output
XXProxy		(0)        	are we using a proxy
XXTarget	()		actual target ip 
XXBadMeth 	(1)		whether to switch methods if 400 or 500
XXSStr		()		returned server software string
XXRet		()		http return code of page
XXRetStr  	()		http return string
XXSVer		()		http version return from server
XXUser		()		given on commandline with -u
XXDirQuite	(0&&1)		whether or not to print result (internal)
XXPageSrc	()		the html of the page, if GET
XXHeaders	()		the full set of headers returned
XXCache		(0||1)		whether or not this answer was cached

Proxy info (don't recommend you play with it):
XXProxy_addy, XXProxy2port, XXP_target

Cached inet_aton() result:  XXinet_aton

----[ Language reference

Ok, here's the commands that whisker supports in its scripts.

****NOTE: all {} are visual delimiters for viewing only--they are not to be 
included. If you see something like ({variable}), that means that the ()
are required, but the {} are not.  Also, all mentions of regex's are case
insensitive; however, variable names *ARE* case sensitive; commands are not.


-[ # {comment}

Just your usual, everyday comment.  COMMENTS MUST BE ON THEIR OWN LINE!

Example:
	# this is a comment, and won't be executed.

Bad bad bad:
	server (iis)  # if the server has IIS...



-[ print {something to print}

Print out {something to print}.  Duh.  No, no embeded variables or \n, \t, etc.

Example:
	print This will be printed to screen or logfile, depending on switches



-[ printvarb {variable name}

This will print out the contents of the single variable {variable name}.
(variable name is case sensitive)

Example:
	printvarb XXRet



-[ exit

This will 'exit' the scan for the current host, and move along to the next
host to scan

Example:
	exit



-[ exitall

This will immediately exit the program all together, right then and there.

Example:
	exitall



-[ if {variable} {== or !=} {value}   (w/ endif)

Your standard logic test.  if {variable} is equal (==) or not equal (!=)
to the constant {value}, execute up to the first endif.

NOTE: whisker uses a quasi-equality/test system that's more convenient
in this type of situation.  If {value} is a numeric value (all numbers),
then whisker will use a pure "if variable is equal to value" test.
However, if {value} is a string (does not contain all numbers), than it
uses a regex instead, which is more along the lines of "if value is
contained within variable".  This is nicer to match string partials, etc;
granted, you don't want whisker returning 'True' when "20" is found within
"200", rather than "20" not equal to "200".

Example:
	if XXRet == 200
		print The page was found
	endif
	if XXRet != 200
		print The page was NOT found
	endif



-[ ifexist  (w/ endexist)

This command is equivalent to 'if XXRet == 200', and evaluates as true if
the resulting check came back 200 (meaning the page exists).

*Note: right now it's hardcoded to return value of 200...this will be
changed to be user-definable in the future.

Example:
	scan () cgi-bin >> test-cgi
	ifexist
		print They have the test-cgi CGI
		# other stuff to do
	endexist



-[ server ({server regex})   (w/ endserver)

This is basically a 'if the server string contains the string {server regex},
evalute it as true'.  {server regex} is case insensitive, and required.
Everything up to the first 'endserver' are evaluated.  Regex is case 
insensitive.

Example:
	server (iis)
		# stuff to do if server string has 'iis' in it
	endserver



-[ set {variable} = {value}

This will set the variable {variable} to {value}.  {value} can either be a 
constant you supply, or another variable name that starts with '$'.  You
don't need to worry about pre-allocating a variable...it will automatically
be created in it's first use.  {variable} and {value} are required.  The '$'
on {variable} is assumed, and can NOT be used.   Variable names and the
values you assign are case sensitive.

Example:
	set XXMeth = GET
	set MyReturnValue = $XXRet

Bad bad bad:
	set $MyReturnValue = Some_value_to_assign


-[ startgroup 

Reset the group counters, and start tracking group scans.  Essentially this
lets you see if a full group of files exists.  Note that a 'group' is 'true'
if all scans done since a startgroup have returned successfully.  If any
one scan in the group returns false, the 'group' is evaluated as false (used
with ifgroup, below).

Example:
	startgroup
	scan () cgi-bin >> phf
	scan () cgi-bin >> webdist
	ifgroup
	  print Wow, they have phf AND webdist!
	endifgroup


-[ ifgroup    (w/ endifgroup)

Evaluate the last scans since startgroup, and process if all scans were
successful.  See startgroup for more information and an example.


-[ info {stuff to print}

Print information if the -i switch has been used and the last scan was
successful.  This should be used to provide more information (exploit
info, informational links, notes, etc) about a successful scan.

Example:
	scan () cgi-bin >> phf
	# print this stuff if they have used -i, and phf exists
	info Oh my god! They have phf!  How lame...
	info But then again, it could be one of those phf logger traps


-[ ifinfo   (w/ endinfo)

Evaluate and process if the -i switch was supplied.  Note that ifinfo
allows you to do more than just print information (you can put any 
whisker code in the block), and it does not consider the return status 
of the last scan.

Example:
	server (Apache)
	ifinfo
	# print this stuff only if the it's Apache server and -i switch
	print They're running Apache, in case you didn't notice
	# run any other commands here too
	endinfo
	endserver


-[ usehead

Sets the default method to 'HEAD', while also saving what the current
method was (which can be restored with restoremeth).

Example:
	usehead
	# this will now use HEAD
	scan () cgi-bin >> phf
	restoremeth


-[ useget

Sets the default method to 'GET', while also saving what the current
method was (which can be restored with restoremeth).

Example:
	useget
	# this will now use GET
	scan () cgi-bin >> phf
	restoremeth


-[ usepost

Sets the default method to 'POST', while also saving what the current
method was (which can be restored with restoremeth). 

*Note: whisker automatically adds the required headers for using POST
requests.  You can set what information is actually posted into the
XXPostData variable--whisker will automatically compute Content-Length.

Example:
	usepost
	# this will now use POST
	# use this if you want to submit extra post info
	set XXPostData = somevarb=crap&whatever=morecrap
	scan () cgi-bin >> phf
	restoremeth


-[ restoremeth

Restore to whatever (default) method was chosen before you ran a usehead,
useget, or usepost command.  You should not that this is not implemented
in stack fashion...if you useget, then usepost, then usehead, restoremeth
will then revert to the *PRIOR* method, or in this case, POST.  Therefore
you should always restoremeth before you use a different use* command, or
you will lose the default method.

Example:

	# default scan type is HEAD
	useget
	# now we're GET
	scan () cgi-bin >> phf
	restoremeth
	# we're back to HEAD
	usepost
	# now we're POST	
	scan () cgi-bin >> webdist
	restoremeth
	# we're back to HEAD

Wrong:
	
	# default scan type is HEAD
	useget
	# now we're GET
	scan () cgi-bin >> phf
	usepost
	# now we're POST	
	scan () cgi-bin >> webdist
	restoremeth
	# we're back to GET, we've lost our HEAD default.


-[ savemeth

Essentially does the save operation that useget, usepost, or usehead do
(which can be 'undone' with restoremeth).  This is here in case you want
to do more funky stuff with the XXMeth variable (for instance, use
TRACE, OPTIONS, or set it to * for the various test-cgi vulnerabilities).

Example:
	savemeth
	set XXMeth = TRACE
	restoremeth


-[ insert {file}

Insert the code found in {file} (if it exists) into the script at that
point.  Note that this is a pre-processing command, and done before
whisker even thinks of scanning a host.

Example:
	insert servers.db


-[ fingerprint .{extension} {action}

This is the initial implementation of return code/page fingerprinting
(discussed in detail below).  Basically it causes whisker to verify that
a request with the specified {extension} does not return a 200 (for 
example, Cold Fusion returns a 200 OK for any .cfm request by default
on IIS--which makes it appear as if every .cfm request does indeed 
exist).  Valid actions at this point are skip and exit.  Note that
fingerprint is a pre-processing directive for each command--this means
no matter where the fingerprint command is located in the file, it is
ran *first* before anything else is ran for that host.

If action is skip, and whisker determines that the scanned host returns
200 OK results for that extension, it will just skip any scan with that
extension (and fake a 404 Not Found reply).  If action is exit, it will
print a notice that it exited on fingerprint catch, and move onto the
next host.  A good example of usage would be for scanning 
www.harley-davidson.com--any request for practically anything (.cgi,
.pl, etc) will result in a custom error page, which comes back as 200
OK.  All other scanners will flag this as 'file exists'.  With whisker
you can the option of detecting this anomaly and alerting you to it.

How whisker fingerprints:  right now, implementation is simple.
Whisker generates a random 20-character string, slaps on your 
extension, and requests it--assuming that it won't exist.  If it comes
back 200 OK, then it figures all future requests for that extension
are tainted and implements the fingerprint action handler for that
particular extension.  In the future this will evolve and become more
robust, but for now, it's more than adequate.

Example:
	# skip Cold Fusion files, if they all come back 200 OK
	fingerprint .cfm skip
	# skip this host if every .cgi comes back as 200 OK
	fingerprint .cgi exit


-[ eval   (w/ endeval)

Eval lets you embed raw perl code into your script to do whatever you want.
This gives your scripts unlimited functionality.  See the end of this doc
for eval/raw perl notes on whisker internals.  Note that everything between
eval and endeval is put into a variable, and then just ran through perl's
eval() function.  NOTE: EVAL IS SLOW.  The perl interpreter has to do it's
thing, and it is time consuming.  Just a warning.

Example:
	eval
	print STDOUT "This is a raw perl command\n";
	print "wow, you have a passwd file\n" if(-e "/etc/passwd");
	endeval


-[ array {name} = {comma delimited list}

Basically, you make an array named {name} with elements from your comma
delimited list.  This array is then referenced as @{name}, and given to
the scan function to scan the permutations of the names in the @array.
You can include another array in the list of elements...it will be added
inline.  Array name and values are case sensitive. 

Example:
	# let's make an array of common unix cgi locations
	array roots = cgi-bin, cgi-local, cgibin, cgis, cgi

	array first = a,b,c
	array second = d, @first, e
	# second = d,a,b,c,e



-[ scan ({optional server regex}) {dirs} >> {script}

This is the heart of whisker.  This command is what actually performs
the scanning.  There are a few aspects to the command.  First is the
{optional server regex}.   Scan will only do the check if the server 
regex is () or matches (similar to the server command).  Now, {dirs} 
and {script} are required.  {dirs} is a command delimited list of 
directories to check to see if {script} exists.  {dirs} may also contain
arrays made with the array command. {dirs} and {script} are case 
case sensitive.
	
Examples:
	scan (iis) scripts/tools >> getdrvrs.exe
	array roots = cgi-bin, cgi-local, scripts
	array people = ~rfp, ~adm, ~wiretrip
	scan () @roots, @people/@roots >> my.cgi


----[ Advanced coding tekniq

-[ Logic evaluation

It's best to take a moment and explain how the 'if', 'ifexist', 'server',
etc work when evaluating logic.  Basically, whisker is not block oriented,
but line/linear oriented.  This leads to some nesting problems, but you can 
bend the rules here and there.  Now, let's say we have a simple 'if':

	if XXRet == 200
		print The page existed
	endif

Now, what whisker will do is evaluate the 'if XXRet == 200'.  If this is
true, it will just keep processing line by line.  If this is false, it 
will 'fast forward' to the first 'endif' it comes across.  Same for
'ifexist' (fast forward to first endexist) and 'server' (fast forward to
first endserver).  So you can see how the following nesting breaks:

	# if number one
	if XXRet == 200
		# if number two
		if XXRetStr == OK
			print Page exists
		# endif number one
		endif
		# if number three
		if XXRetStr == Not OK
			print Something is borked
		# endif number two
		endif
	# endif number three
	endif

Now, if 'if number one' is true, it will keep going line by line.  Same for
'if number two & three'.  But if 'if number one' fails, it will just fast 
forward to the first 'endif' it finds, in this case 'endif number one'.  This
means if XXRet does not equal 200, it *will still process* 'if XXRetStr ==
Not OK'...  Whisker will *NOT* fast forward to 'endif number three'.  So you 
can see how this can affect things.  Now, based on this, you can do some
tricks. For instance, a logical AND can be done like so:

	if XX == True
	if YY == True
			print Both XX and YY are true
	endif

Let's take a quick peek.  If XX is true, it continues.  If YY is true, it still
continues, and prints our message.  If either fail, they just fast forward to 
the first endif.  Simple enough.  Logical OR, on the otherhand, kinda sucks:

	if XX == True
		set MyOR = 1
	endif
	if YY == True
		set MyOR = 1
	endif
	if MyOR == 1
		print XX or YY was True
	endif

Yeah, way more code.  I think you get the point, so I won't trace it.  Next is
the simple IF/ELSE type structure.  Whisker has an 'if', but not 'else'.  You
can emulate it like:

	if XX == 1
		print It's 1!
	endif
	if XX != 1
		print It's something else than 1!
	endif

Again, simple stuff.  I'm sure the question of "why the hell don't you just
implement AND/OR and ELSE into whisker?".  My answer is 1. you can still do it
with a bit more code, 2. I want to keep it simple (stupid?), 3. the logic of
doing such would start getting out of control, and I don't want to get a formal
language thing going.  It's just a web scanner, man. :)

server() will run code if a particular server type is found.  But how do you
run code if it's *not* a particular server?  Say I wanted to run something if
the server WASN'T Apache...

	server (apache)
		set apache=1
	endserver
	if apache != 1
		# code to run if it's not apache
	endif

That's all there is to it.  Remember, the check/set variable/check again
procedure tends to work for most logic evaluation situations.

-[ Optimized scans

When I say 'optimized', it mean scans that are coded such that they 
produce the minimal number of requests.  We have the obvious example:

	scan () cgi-bin >> 1.cgi
	scan () cgi-bin >> 2.cgi
	scan () cgi-bin >> 3.cgi
	scan () cgi-bin >> 4.cgi

Here scanning for cgi-bin first is valuable; if it's not found, it will
save us 4 scans (3 if you count the scan for cgi-bin).  If found, it will
cost us one more additional one (5 total).  That gives us a worst case of
5 scans (if cgi-bin exists), best case of 1 scan (if cgi-bin does not exist).
Now, let's say we have:

	scan () cgi-bin/a/b/c >> 1.cgi
	scan () cgi-bin/d/e/f >> 2.cgi
	scan () cgi-local/1/2/3 >> 3.cgi
	scan () cgi-bin/g >> 4.cgi
	scan () cgi-bin >> 5.cgi

Now, the trick is, whisker will scan for all dirs *individually*.  This
means, for 1.cgi, it will scan:
	
	cgi-bin/
	cgi-bin/a/
	cgi-bin/a/b/
	cgi-bin/a/b/c/
	cgi-bin/a/b/c/1.cgi

Wow, that's a lot of scanning.  Same goes for 2, 3, and 4.cgi as well.
All together, with the above set of scans, we will be making 17 checks
(assuming everything exists).  That's worst case 17, best case 2 (2 is 
the check for cgi-bin and cgi-local, and they don't exist).

Now, optimization.  The point of checking for existance of parent dirs
is to speed up scanning of *many* scans that use that parent dir. So,
looking in our set, scanning for the existance of cgi-bin is a good thing,
because if it's not there, it will save us the rest of the checks for
1, 2, 4, and 5.cgi.  But, notice how /a/b/c of 1.cgi aren't shared.
There's no point to check them individually, because they're not shared
with any of the other scans.  What would be nice is if we could check
to see if cgi-bin existed (since knowing ahead of time will help with
the others), and if it does, just go straight to scanning /a/b/c/1.cgi.
Well, we can.  Note the optimized scans below:

	scan () cgi-bin >> a/b/c/1.cgi
	scan () cgi-bin >> d/e/f/2.cgi
	scan () / >> cgi-local/1/2/3/3.cgi
	scan () cgi-bin >> g/4.cgi
	scan () cgi-bin >> 5.cgi

Basically, whisker will do the following:

1. scan for /cgi-bin/ (in 1.cgi)
2. if /cgi-bin/ exists, scan for /cgi-bin/a/b/c/1.cgi right away
3. if /cgi-bin/ exists, scan for /cgi-bin/d/e/f/2.cgi right away
4. scan for /cgi-local/1/2/3/3.cgi right away (since no other scans
	use /cgi-local/ or other dirs, no point in checking them
	individually)
5. if /cgi-bin/ exists, scan for /cgi-bin/g/4.cgi right away
6. if /cgi-bin/ exists, scan for /cgi-bin/5.cgi right away

Wow, we just went from 17 checks to 6 (assuming everything exists).
Granted, 5 checks went to checking for 3.cgi originally.  Since we 
don't need those parent dirs for other scans, we reduced it to one.
That's a worst case of 6, best case of 2.  Much better than 17/2.  
And think of it this way, would you rather have 6 log entries, or 17?

So you can think of the scan function as such:

scan (server) {individual dirs to scan} >> {one thing to scan as a whole}

And just remember, every dir in the 'individual dirs to scan' will
cost you a check (unless cached).  So when scans share dirs in common
(ie they will be cached scan results), use them there.  Otherwise, you
want to push them to the 'scan as a whole' column.

Here's a worst case scenerio of over-optimization:

	scan () / >> scripts/tools/getdrvrs.exe
	scan () / >> scripts/samples/details.idc
	scan () / >> scripts/samples/ctguestb.idc

Now, this will force 3 scans.  Even if /scripts/ doesn't exist, it will
still make 3 scans.  Not as intelligent.  Now, one optimization would be:

	scan () scripts >> tools/getdrvrs.exe
	scan () scripts >> samples/details.idc
	scan () scripts >> samples/ctguestb.idc

Now, if /scripts exists, it will cost us 4 scans.  If /scripts does not,
it only costs us one. (that's a worst/best of 4/1)

	scan () scripts >> tools/getdrvrs.exe
	scan () scripts/samples >> details.idc
	scan () scripts/samples >> ctguestb.idc

Now, assuming scripts exists, and samples does too, it will cost us
5 scans for this, which would be:

	/scripts/
	/scripts/tools/getdrvrs.exe
	/scripts/samples (/scripts is cached)
	/scripts/samples/details.idc
	/scripts/samples/ctguestb.idc (/scripts/samples is cached)

If /scripts/samples didn't exist, but /scripts did, we'd have 3 scans.
If /scripts didn't exist at all, we'd have 1 scan. So really, the previous
optimization would be best (worst/best 4/1).  This optimization 
(worst/best 5/3(or 1)) would be good if there are other CGIs to check for in 
/scripts/samples (making the cached dir check of /scripts and 
/scripts/samples more use).

What it really comes down to is a numbers game, and somewhat psychology
as well.  Directory caching works well when the cache is obviously hit
many times...and it's actually a penalty at other times.  Look at the
pros and cons of it:

You can have 10 /cgi-bin/xxx.cgi checks--just like your normal CGI
scanner.  You cause 10 log entries, even if the scripts don't exist,
which stick out like a sore thumb.  With whisker, first you have
a log entry for /cgi-bin/, which is much less obvious then, say,
/cgi-bin/test-cgi.  I mean, /cgi-bin/, while suspicious, isn't as
obvious.  Now, if that check fails, you don't have the other 10 log
entries.  You just saved yourself those 10 red flags.  If the check
passes, well, then it's worth the red flags to see, right?  After all,
that's what the scanner is for. ;)  Granted, this is a very obvious
result.  But the numbers can be tweaked for any of the optimization
cases above.  How obvious are checks for /scripts/ and /script/samples/
compared to checks for /scripts/samples/details.idc, etc?  BTW, the
realtime IDS systems pick off the full URL requests from the wire only.
So a raw check for /cgi-bin/phf will set off the IDS, regardless of it's
existance.  A check for /cgi-bin, and a result of negative, will save 
you the step of even sending the URL, and therefore keep the IDS quiet.


----[ Wish-list and future updates

Well, the most obvious one I can think of is a more rigorous language 
parsing.  Obviously a flex/yacc combo would be kickass, but I don't
want to port it off perl.  whisker, as it is, is useful demonstration
of theory...but hey, if you want to port it to C, go for it.  Why
didn't I just port it to C?  Well, mostly because perl's auto-allocation
is such a blessing, especially to my nasty array permutation code
in the scan function.  Plus, eval was a nice feature, and I just like
perl all around. :)


----[ What's to become of web scanners

My hope is that rather than make *another* cgichk.c, port it to rebol, 
add a few checks, etc,  that people will use whisker as the engine, and 
now just code cool suave kickass scan databases that are intelligent and
take advantage of the features.  I'd like to see a program that can
pre-process a scan database and optimize the scans--this actually wouldn't
be all that hard.  I've also had ideas/pre-code for a parallel process
front-end to scan multiple hosts at once.  But I dunno, maybe people will
just think whisker is stupid and I'll be laughed at.  So be it.


----[  whisker perl internals for eval and coding

Ok, just a few quick notes on some of the inside perl code.  This will help
if you want to effectively use 'eval', or poke into the code.

All user and global variables are in %D.  So, to reference XXRet, for
instance, it would really be $D{'XXRet'}.

All user arrays are prefixed with 'D'.  So,
	array roots = a,b,c
would become @Droots in perl's 'process space'.  Again, this is to avoid
clobber.  Also note that whisker will also define $D{'Droots'}="--array--" 
for the 'roots' array.  Making arrays named XXRet and whatnot will start 
getting you into trouble, as the XXRet value will be clobbered with the
"--array--" string for a bit...

I suggest using wprint() to print stuff.  wprint() will correctly direct
to console or log file, depending on commandline options.  Use verbose()
to print stuff only when the verbose switch is used, and debugprint() to
print stuff only when the debug switch is used.

You can do http requests by using sendhttp() and sendraw(). *ALL* the
networking functionality (sockets, connects, etc) is contained *ONLY* in
sendraw().  Use rdecode() to decode the server's return value to human
readable string.

For code hackers, I have indicated within the code where to add commands
and where to add scan database pre-process code.

----[ Signoff

Well, if you haven't thought I'm crazy by now for putting this much thought
into a CGI scanner, then maybe there's hope for me. :)  Whisker really came
about because of two reasons:  1. I needed something to I could easily
script web audits with (I was tired of rewriting C all the time), and 2.
I wanted to make proof of concept of the 'next-generation' of web scanner.
So here it is.  Now granted, my perl coding isn't the best, and I'd love
for someone to recode the scan function...that directory permutation stuff
is scary code.  But it works, and that's good for me. :)

Drop me a line if you like/use whisker, and definately send me snippets of
interesting scan scripts you make...I would like to compile a nice big
one with lots of intelligence.  Also, if you have ideas/bugs with the code,
let me know.

Till next time!
.rain.forest.puppy.		(rfp@wiretrip.net)

----------------[ whisker is GPL.  Do not steal.  Do not pass go.
----------------[ and definately do not collect $200 for my work.

-[ Yes, this document uses a Phrack-esque layout.

----[ EOF