cDc is pleased as punch to assist in bringing u, the 'l33t muthafuckas that u r, a clever lil' app to aid u in yer endeavors: .rain.forest.puppy.'s Whisker, v. 1.4, available at Defcon.
Currently available, v. 1.3.0 a.
---------[ Whisker: next-generation CGI scanner doc v1.2.0 --[ by rain.forest.puppy / ADM / wiretrip (rfp@wiretrip.net) ----[ Table of Contents - Background - What whisker does/has = - array = - scan - Command line reference - Global variable list - Language reference - Advanced coding tekniq = - Logic evaluation = - Scan optimization - Wish list/future enhancements - What's to become of web scanners - Notes for eval/internal coding - Signoff ----[ Background A CGI scanner is just a CGI scanner, right? And they're pretty lame apps to boot, right? Hmmm...well, perhaps. That's because no one has given any thought to them. Yeah, until I did. Perhaps I have to much time on my hands. ;) After reading this, I will be surprised if you don't think I've put way to much thought into this. I've waded through the pile of CGI scanners found on Packetstorm (before JP got his way; j3rk), Rootshell, etc. Suidshell's cgichk.c (and derivitives) are the most comprehensive....but that seems to be the 'goal' they shoot for--try to have 'the most checks in any scanner'. Great. Nevermind the fact that some of the checks are completely wrong (I think it's funny to notice how the Cold Fusion '/expeval/' has propogated to so many scanners as '/expelval/'--one kiddie made a mistake, and they all copied.) Wait...CGI scanning isn't that complex, is it? Well, to do it right, yes. Why? Hmmm...I can think of a few reasons: 1. /cgi-bin is pretty damn common, I'll give you that. But I've also been on many a hosting provider that used /cgi-local. And I've seen people use /cgi, /cgibin, etc. Fact of the matter is that it could also be /~user/cgi-bin, or /~user/cgis, etc. Then there's some scripts that are all over the place, like wwwboard, which may or may not have it's own directory. Point of the point: wouldn't it be nice to define multiple directories? 2. You know what really irks me? Seeing a CGI scanner thrash around through /cgi-bin or whatnot, when /cgi-bin doesn't even exist. Talk about noisy in the logs. Now, if we waste a brain cell, we can see that if we query the /cgi-bin directory (by itself), we'll get a 200 (ok), 403 (forbidden), or 302 (for custom error pages) if it exists, or a 404 if it doesn't. Wow. So if we just do a quick check on /cgi-bin, and get a 404, we can save our however many /cgi-bin CGI checks we were going to make. That could save you 65 entries in the httpd logs. Point of the point: save noise/time by querying parent dirs 3. If you have more to spare, let's waste another brain cell for another obvious issue. Why should I query for, say, test-cgi on an IIS server? Or /scripts/samples/details.idc on Apache? Why should I even bother checking various httpds at all (like a firewall proxy, etc)? When we do a request, the server gives us it's name and version. How nice of them. How about we take advantage of their generosity? Point of the point: tailor your scan to the server you're scanning 4. Virtual hosts. Most webservers nowadays (especially Apache with it's VirtualHost directive, and IIS with its virtual host setup wizards) allow you to assign many actually domain name/websites to the same IP. Well, hell...how does the server know which site you want when you connect? Well, browsers give a second piece of information, the 'Host' directive. So, a request may look like: GET /~rfp/index.html HTTP/1.1 Host: www.el8.org So say we have SlikWilly Virtual Hosting, they run off RedHat Linux using Apache. They setup their only IP (as that's all they could afford for their $39.95/month shared DS0) to host the site www.slikwilly.com. Now, on the actual box, the location for their files are in /home/httpd/html/ for html files, and /home/httpd/cgi-bin/ for, whatelse, but their CGI apps. So a request to www.slikwilly.com/index.html is going to be pulled from /home/httpd/html/index.html. So far, so cool. Well the powers that be at Defcon decide that they've had it with catalog.com, since ADM hacked their webpage there. They want to move over to SlikWilly.com in hopes that it will keep those ADM people from changing the site. So Slik Willy himself hops into his httpd.conf and adds a VirtualHost directive for www.defcon.org. He sets up the html directory to be /home/defcon/html/, so that those Defcon people can ftp in via his nifty wu-ftpd-2.4.2(beta 18). So that means that www.defcon.org/index.html should be pulled from /home/defcon/html/index.html. Slik Willy also gives them their own cgi-bin, located in /home/defcon/html/cgi-bin/ (which means it's no silly aliased directory, since Slik doesn't understand all that stuff). So, now, in this situation, www.defcon.org is a *virtual* site off of www.slikwilly.com (the root site). What exactly does that mean will happen? Well, let's see: If I give the request: GET /index.html HTTP/1.0 I will get back the file at (assuming it exists): /home/httpd/html/index.html which is Slik Willy's file (www.slikwilly.com) If I check for: GET /cgi-bin/test-cgi HTTP/1.0 I will be checking for: /home/httpd/cgi-bin/test-cgi which is again Slik Willy's file (www.slikwilly.com) Now, if I check for: GET /index.html HTTP/1.0 Host: www.defcon.org I will get back: /home/defcon/html/index.html which is the www.defcon.org homepage Similarly: GET /cgi-bin/test-cgi HTTP/1.0 Host: www.defcon.org I will be checking: /home/defcon/html/cgi-bin/test-cgi which is in www.defcon.org's cgi-bin. Now, why does any of this fscking matter whatsoever? Well, imagine you wanted to be like ADM, and try to hack www.defcon.org again. So you whip out your trusty cgichk.c CGI scanner (oooh, you hacker you) and rev it up against www.defcon.org. Well, guess what--the scanner connects to Slik Willy's box, does generic requests (no Host), and winds up scanning Slik Willy's cgi-bin for cgis, not the actual www.defcon.org's cgi-bin. And there exists the possibility that www.defcon.org had way cooler stuff than Slik Willy. But lemme just make it known, this usually works in your favor. For instance, on IIS, the virtual hosts will *NOT* (unless specifically added) have /scripts mapped to them--but the root site will. So, trying to GET /scripts will work off the main (generic) site, but if you try a virtual host with Host directive, most likely /scripts won't be mapped over. Same for Slik Willy. test-cgi comes by default in /home/httpd/cgi-bin/, not /home/defcon/html/cgi-bin. So scanning the root site is better to find the 'default' install CGIs. Point of the point: there's a whole 'nother world out there hiding behind virtual hosts--and you may not be scanning who you think you really are 5. Some places user custom error pages. Unfortunately, the implementation is such that instead of generating a 404 'not found', you always get a 200 'success', with HTML to indicate the missing page. Point of the point: being able to minimize this anomaly would lessen false positives 6. More wishes: at a decent rate, it seems more CGI and webserver problems are found. Plus, I might like to customize which scans I want to do against a particular host. Having to edit C code and recompile everytime could quite severely suck, especially if I'm a lousy C coder to boot. Point of the point: if this was all scriptable, that'd be nifty 7. Input sources. I dunno about you, but I'm quite tired of doing bizarre awk/host -l combos, dumping them to a file, and then feeding them back into the various scanners. Sometimes I want to just feed in output from nmap (after all, it has a list of the found open port 80's, right?), sometimes just a laundry list of IPs/domains, and sometimes, I'd just like to do a single host on the command line. Point of the point: flexibility of input would be nice as well. 8. IDS/log avoidance. Do you know how many IDS alarms you'll set off by requesting /cgi-bin/phf? Let alone it's easy to spot in the logs. So instead of just handing over the plaintext, why not URL encoded all/part of it to break up the literal plaintext string, such as /cgi-%62in/ph%66. It keeps the string-matching/packet-grep IDS systems from getting a positive id, and the more encoded you make it, the harder it is to figure out what it is (on the flip side, it also stands out more in the logs, even if it's unknown what /%63%67%69%2d%62%69%6e/%66%69%6e%67%65%72 is really scanning for). Point of the point: being able to spoof IDSs would be a nice feature Well, that's enough wishes, don't you think? Now, do they come true.... ----[ Whisker has all that, plus a bonus feature or two :) Yeah, no kidding. Come on, I wouldn't wish for something that I didn't actually implement. I'd look dumb. :) My future wishes are down below at the end. Anyways, so whisker does all that. Let's look at the two basic functions of whisker, array and scan. This is a reprint of the command reference below, but a little more verbose. -[ array {name} = {comma delimited list} This is one of the two core commands of whisker (the other being scan). Basically, you make an array named {name} with elements from your comma delimited list. This array is then referenced as @{name}, and given to the scan function to scan the permutations of the names in the @array. You can include another array in the list of elements...it will be added inline. Example: # let's make an array of common unix cgi locations array roots = cgi-bin, cgi-local, cgibin, cgis, cgi array first = a,b,c array second = d, @first, e # second = d,a,b,c,e array bigroots = cgi-bin, cgi-bin/secret, cgi-bin/rfp # this is a big NO! array moreroots = cgi-bin/@first, rfp/@bigroots # only the scan() function will parse roots like this -[ scan ({optional server regex}) {dirs} >> {script} This is the heart of whisker. This command is what actually performs the scanning. There are a few aspects to the command. First is the {optional server regex}. You can do a server specific scan one of two ways: server (iis) scan () scripts/tools >> getdrvrs.exe endserver or shorten it as: scan (iis) scripts/tools >> getdrvrs.exe Scan will only do the check if the server regex is () or matches (similar to the server command). Now, {dirs} and {script} are required. {dirs} is a command delimited list of directories to check to see if {script} exists. {dirs} may also contain arrays made with the array command. Let's see some examples: scan () cgi-bin, cgi-local >> my.cgi will check for /cgi-bin/my.cgi and /cgi-local/my.cgi scan () a/b, a/c, a/d >> my.cgi will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi array subdirs = b,c,d scan () a/@subdirs >> my.cgi will check for /a/b/my.cgi, /a/c/my.cgi, /a/d/my.cgi scan () @subdirs >> my.cgi will check for /b/my.cgi, /c/my.cgi, /d/my.cgi scan () a, a/@subdirs, f/@subdirs/g >> my.cgi will scan for all those permutations, expanding out @subdirs into every combo involving the elements in @subdirs. So you see how powerful directory arrays can be. If we have an array of places we want to look for CGIs array roots = cgi-bin, cgi-local, scripts array people = ~rfp, ~adm, ~wiretrip we then can scan for wanted combos scan () @roots, @people/@roots >> my.cgi this is nice because we only have to adjust our arrays to compensate for different locations, and we can use the arrays for all our scans in the programs. How centralized. :) You can specify the root directory by using a single /, as such: scan () / >> index.html whisker automatically checks each directory as it goes in {dirs}, and caches the response. See 'Advanced coding tekniq: Optimized Scans' for more information on how to (ab)use this command properly. So basically, you define arrays of directories (although that's optional), and use scan to scan for the scripts. Easy enough. Plus, there's a suite of other simple logic to help out in our scanning endeavours. You can take a peek at the included sample scan.db to see usage, if you're a learn by doing/example type person. Anyways, onto using whisker.... ----[ Commandline reference Here is the commandline reference ripped from whisker itself: Usage: whisker -s script.file ((-n input.file) | (-h host) | (-H list)) (-l log.file) -s specifies the script database file ** -n nmap output (machine format, v2.06+) * -h scan single host (IP or domain) * -H host list to scan (file) * -V use virtual hosts when possible -v verbose. Print more information -d debug. Print extra crud++ (to STDERR) -p proxy off x.x.x.x port y (HTTP proxy) -l log to file instead of stdout -u user input; pass XXUser to script -I IDS-spoof mode--encode URLs to bypass scanners -E IDS-evasive mode--more IDS obfuscation -i more info (exploit information and such) -N query Netcraft for server OS guess -S force server version (e.g. -S "Apache/1.3.6") ** required * optional; one must exist Now, basically, -s {script.file} is required. This is your scan database, your big file of whisker script code that tells whisker what to do. Now, you have three input options, -n, -h, or -H. You must have at least one, but you can use multiple. They are: -n nmap.file supply a nmap (v2.06+) *MACHINE FORMAT* output file you can get this by using nmap -m nmap.out whisker will read it in and check every host with port 80 found to be 'open' -h {ip or domain} single host. Just supply host on commandline, such as "-h www.microsoft.com" -H host.file this is essentially a laundry list of ips and/or domains, one per line, like thus: www.microsoft.com www.sun.com 123.123.145.167 -V tells whisker to attempt to use virtual host domains where-ever possible. If you're scanning an IP address, -V won't do anything. But if you're scanning a domain name, whisker will include the domain name in the Host: directive. -v will print more verbose information (to console or logfile). -d will print debugging information to STDERR (usually console). -i will include information specified by the 'info' command. -l log.file will redirect all information to log.file -p x.x.x.x.y is the proxy command. No, this is not SOCKSified, or anything else. Basically, the -p is for firewalls and such that you connect to the PROXY, and then issue: GET http://my.target.webserver:80/page/i/want.htm HTTP/1.0 x.x.x.x is the IP address, and y is the port OF THE PROXY. -u is just a simple way to give information on the commandline, that is placed directly into the XXUser variable for use within the script. That way, you can have a configuration switch externally, like "-u 1" will be normal scans, and "-u 2" will be extra-stealthy scans, etc. Use "if XXUser == ???" inside the script to query the value. -I will "URLify" the request line, as spoken about earlier. It will encode all letters, numbers, dashes and dots as their hex escaped sequence equivalent. -E will enable more IDS 'evasion'. Use -I and -E to bypass tons of IDSes (I've tested it, and was able to sneak past a lot of 'em). -N causes whisker to query Netcraft (www.netcraft.com) and see what they think the OS is. Not 100% reliable, but it's a start, and fairly accurate. -S lets you override what server whisker parses the script as. You submit a server string, such as -S "Apache/1.3.1 PHP/3.0.2a" That's it, go play! Use the included scan.db for reference. The rest of this is technical information and whatnot. ----[ Global variable list These are the variables accessible from within the script. Why all the prefixed XX's? So you're less likely to clobber them. :) I suggest don't poke values into these unless you know what you're doing. ** Note: this list is not complete, due to time constraints. Check my website for updated documentation and a full list. Name Default value Description XXPort (80) port to scan...80 for normal webservers XXRoot () default prefix for URLs... XXMeth (HEAD) how to retrieve the file...HEAD preferable XXVer (HTTP/1.0) http version for us to use XXDebug (0) do we want debug output XXVerbose (0) do we want verbose output XXProxy (0) are we using a proxy XXTarget () actual target ip XXBadMeth (1) whether to switch methods if 400 or 500 XXSStr () returned server software string XXRet () http return code of page XXRetStr () http return string XXSVer () http version return from server XXUser () given on commandline with -u XXDirQuite (0&&1) whether or not to print result (internal) XXPageSrc () the html of the page, if GET XXHeaders () the full set of headers returned XXCache (0||1) whether or not this answer was cached Proxy info (don't recommend you play with it): XXProxy_addy, XXProxy2port, XXP_target Cached inet_aton() result: XXinet_aton ----[ Language reference Ok, here's the commands that whisker supports in its scripts. ****NOTE: all {} are visual delimiters for viewing only--they are not to be included. If you see something like ({variable}), that means that the () are required, but the {} are not. Also, all mentions of regex's are case insensitive; however, variable names *ARE* case sensitive; commands are not. -[ # {comment} Just your usual, everyday comment. COMMENTS MUST BE ON THEIR OWN LINE! Example: # this is a comment, and won't be executed. Bad bad bad: server (iis) # if the server has IIS... -[ print {something to print} Print out {something to print}. Duh. No, no embeded variables or \n, \t, etc. Example: print This will be printed to screen or logfile, depending on switches -[ printvarb {variable name} This will print out the contents of the single variable {variable name}. (variable name is case sensitive) Example: printvarb XXRet -[ exit This will 'exit' the scan for the current host, and move along to the next host to scan Example: exit -[ exitall This will immediately exit the program all together, right then and there. Example: exitall -[ if {variable} {== or !=} {value} (w/ endif) Your standard logic test. if {variable} is equal (==) or not equal (!=) to the constant {value}, execute up to the first endif. NOTE: whisker uses a quasi-equality/test system that's more convenient in this type of situation. If {value} is a numeric value (all numbers), then whisker will use a pure "if variable is equal to value" test. However, if {value} is a string (does not contain all numbers), than it uses a regex instead, which is more along the lines of "if value is contained within variable". This is nicer to match string partials, etc; granted, you don't want whisker returning 'True' when "20" is found within "200", rather than "20" not equal to "200". Example: if XXRet == 200 print The page was found endif if XXRet != 200 print The page was NOT found endif -[ ifexist (w/ endexist) This command is equivalent to 'if XXRet == 200', and evaluates as true if the resulting check came back 200 (meaning the page exists). *Note: right now it's hardcoded to return value of 200...this will be changed to be user-definable in the future. Example: scan () cgi-bin >> test-cgi ifexist print They have the test-cgi CGI # other stuff to do endexist -[ server ({server regex}) (w/ endserver) This is basically a 'if the server string contains the string {server regex}, evalute it as true'. {server regex} is case insensitive, and required. Everything up to the first 'endserver' are evaluated. Regex is case insensitive. Example: server (iis) # stuff to do if server string has 'iis' in it endserver -[ set {variable} = {value} This will set the variable {variable} to {value}. {value} can either be a constant you supply, or another variable name that starts with '$'. You don't need to worry about pre-allocating a variable...it will automatically be created in it's first use. {variable} and {value} are required. The '$' on {variable} is assumed, and can NOT be used. Variable names and the values you assign are case sensitive. Example: set XXMeth = GET set MyReturnValue = $XXRet Bad bad bad: set $MyReturnValue = Some_value_to_assign -[ startgroup Reset the group counters, and start tracking group scans. Essentially this lets you see if a full group of files exists. Note that a 'group' is 'true' if all scans done since a startgroup have returned successfully. If any one scan in the group returns false, the 'group' is evaluated as false (used with ifgroup, below). Example: startgroup scan () cgi-bin >> phf scan () cgi-bin >> webdist ifgroup print Wow, they have phf AND webdist! endifgroup -[ ifgroup (w/ endifgroup) Evaluate the last scans since startgroup, and process if all scans were successful. See startgroup for more information and an example. -[ info {stuff to print} Print information if the -i switch has been used and the last scan was successful. This should be used to provide more information (exploit info, informational links, notes, etc) about a successful scan. Example: scan () cgi-bin >> phf # print this stuff if they have used -i, and phf exists info Oh my god! They have phf! How lame... info But then again, it could be one of those phf logger traps -[ ifinfo (w/ endinfo) Evaluate and process if the -i switch was supplied. Note that ifinfo allows you to do more than just print information (you can put any whisker code in the block), and it does not consider the return status of the last scan. Example: server (Apache) ifinfo # print this stuff only if the it's Apache server and -i switch print They're running Apache, in case you didn't notice # run any other commands here too endinfo endserver -[ usehead Sets the default method to 'HEAD', while also saving what the current method was (which can be restored with restoremeth). Example: usehead # this will now use HEAD scan () cgi-bin >> phf restoremeth -[ useget Sets the default method to 'GET', while also saving what the current method was (which can be restored with restoremeth). Example: useget # this will now use GET scan () cgi-bin >> phf restoremeth -[ usepost Sets the default method to 'POST', while also saving what the current method was (which can be restored with restoremeth). *Note: whisker automatically adds the required headers for using POST requests. You can set what information is actually posted into the XXPostData variable--whisker will automatically compute Content-Length. Example: usepost # this will now use POST # use this if you want to submit extra post info set XXPostData = somevarb=crap&whatever=morecrap scan () cgi-bin >> phf restoremeth -[ restoremeth Restore to whatever (default) method was chosen before you ran a usehead, useget, or usepost command. You should not that this is not implemented in stack fashion...if you useget, then usepost, then usehead, restoremeth will then revert to the *PRIOR* method, or in this case, POST. Therefore you should always restoremeth before you use a different use* command, or you will lose the default method. Example: # default scan type is HEAD useget # now we're GET scan () cgi-bin >> phf restoremeth # we're back to HEAD usepost # now we're POST scan () cgi-bin >> webdist restoremeth # we're back to HEAD Wrong: # default scan type is HEAD useget # now we're GET scan () cgi-bin >> phf usepost # now we're POST scan () cgi-bin >> webdist restoremeth # we're back to GET, we've lost our HEAD default. -[ savemeth Essentially does the save operation that useget, usepost, or usehead do (which can be 'undone' with restoremeth). This is here in case you want to do more funky stuff with the XXMeth variable (for instance, use TRACE, OPTIONS, or set it to * for the various test-cgi vulnerabilities). Example: savemeth set XXMeth = TRACE restoremeth -[ insert {file} Insert the code found in {file} (if it exists) into the script at that point. Note that this is a pre-processing command, and done before whisker even thinks of scanning a host. Example: insert servers.db -[ fingerprint .{extension} {action} This is the initial implementation of return code/page fingerprinting (discussed in detail below). Basically it causes whisker to verify that a request with the specified {extension} does not return a 200 (for example, Cold Fusion returns a 200 OK for any .cfm request by default on IIS--which makes it appear as if every .cfm request does indeed exist). Valid actions at this point are skip and exit. Note that fingerprint is a pre-processing directive for each command--this means no matter where the fingerprint command is located in the file, it is ran *first* before anything else is ran for that host. If action is skip, and whisker determines that the scanned host returns 200 OK results for that extension, it will just skip any scan with that extension (and fake a 404 Not Found reply). If action is exit, it will print a notice that it exited on fingerprint catch, and move onto the next host. A good example of usage would be for scanning www.harley-davidson.com--any request for practically anything (.cgi, .pl, etc) will result in a custom error page, which comes back as 200 OK. All other scanners will flag this as 'file exists'. With whisker you can the option of detecting this anomaly and alerting you to it. How whisker fingerprints: right now, implementation is simple. Whisker generates a random 20-character string, slaps on your extension, and requests it--assuming that it won't exist. If it comes back 200 OK, then it figures all future requests for that extension are tainted and implements the fingerprint action handler for that particular extension. In the future this will evolve and become more robust, but for now, it's more than adequate. Example: # skip Cold Fusion files, if they all come back 200 OK fingerprint .cfm skip # skip this host if every .cgi comes back as 200 OK fingerprint .cgi exit -[ eval (w/ endeval) Eval lets you embed raw perl code into your script to do whatever you want. This gives your scripts unlimited functionality. See the end of this doc for eval/raw perl notes on whisker internals. Note that everything between eval and endeval is put into a variable, and then just ran through perl's eval() function. NOTE: EVAL IS SLOW. The perl interpreter has to do it's thing, and it is time consuming. Just a warning. Example: eval print STDOUT "This is a raw perl command\n"; print "wow, you have a passwd file\n" if(-e "/etc/passwd"); endeval -[ array {name} = {comma delimited list} Basically, you make an array named {name} with elements from your comma delimited list. This array is then referenced as @{name}, and given to the scan function to scan the permutations of the names in the @array. You can include another array in the list of elements...it will be added inline. Array name and values are case sensitive. Example: # let's make an array of common unix cgi locations array roots = cgi-bin, cgi-local, cgibin, cgis, cgi array first = a,b,c array second = d, @first, e # second = d,a,b,c,e -[ scan ({optional server regex}) {dirs} >> {script} This is the heart of whisker. This command is what actually performs the scanning. There are a few aspects to the command. First is the {optional server regex}. Scan will only do the check if the server regex is () or matches (similar to the server command). Now, {dirs} and {script} are required. {dirs} is a command delimited list of directories to check to see if {script} exists. {dirs} may also contain arrays made with the array command. {dirs} and {script} are case case sensitive. Examples: scan (iis) scripts/tools >> getdrvrs.exe array roots = cgi-bin, cgi-local, scripts array people = ~rfp, ~adm, ~wiretrip scan () @roots, @people/@roots >> my.cgi ----[ Advanced coding tekniq -[ Logic evaluation It's best to take a moment and explain how the 'if', 'ifexist', 'server', etc work when evaluating logic. Basically, whisker is not block oriented, but line/linear oriented. This leads to some nesting problems, but you can bend the rules here and there. Now, let's say we have a simple 'if': if XXRet == 200 print The page existed endif Now, what whisker will do is evaluate the 'if XXRet == 200'. If this is true, it will just keep processing line by line. If this is false, it will 'fast forward' to the first 'endif' it comes across. Same for 'ifexist' (fast forward to first endexist) and 'server' (fast forward to first endserver). So you can see how the following nesting breaks: # if number one if XXRet == 200 # if number two if XXRetStr == OK print Page exists # endif number one endif # if number three if XXRetStr == Not OK print Something is borked # endif number two endif # endif number three endif Now, if 'if number one' is true, it will keep going line by line. Same for 'if number two & three'. But if 'if number one' fails, it will just fast forward to the first 'endif' it finds, in this case 'endif number one'. This means if XXRet does not equal 200, it *will still process* 'if XXRetStr == Not OK'... Whisker will *NOT* fast forward to 'endif number three'. So you can see how this can affect things. Now, based on this, you can do some tricks. For instance, a logical AND can be done like so: if XX == True if YY == True print Both XX and YY are true endif Let's take a quick peek. If XX is true, it continues. If YY is true, it still continues, and prints our message. If either fail, they just fast forward to the first endif. Simple enough. Logical OR, on the otherhand, kinda sucks: if XX == True set MyOR = 1 endif if YY == True set MyOR = 1 endif if MyOR == 1 print XX or YY was True endif Yeah, way more code. I think you get the point, so I won't trace it. Next is the simple IF/ELSE type structure. Whisker has an 'if', but not 'else'. You can emulate it like: if XX == 1 print It's 1! endif if XX != 1 print It's something else than 1! endif Again, simple stuff. I'm sure the question of "why the hell don't you just implement AND/OR and ELSE into whisker?". My answer is 1. you can still do it with a bit more code, 2. I want to keep it simple (stupid?), 3. the logic of doing such would start getting out of control, and I don't want to get a formal language thing going. It's just a web scanner, man. :) server() will run code if a particular server type is found. But how do you run code if it's *not* a particular server? Say I wanted to run something if the server WASN'T Apache... server (apache) set apache=1 endserver if apache != 1 # code to run if it's not apache endif That's all there is to it. Remember, the check/set variable/check again procedure tends to work for most logic evaluation situations. -[ Optimized scans When I say 'optimized', it mean scans that are coded such that they produce the minimal number of requests. We have the obvious example: scan () cgi-bin >> 1.cgi scan () cgi-bin >> 2.cgi scan () cgi-bin >> 3.cgi scan () cgi-bin >> 4.cgi Here scanning for cgi-bin first is valuable; if it's not found, it will save us 4 scans (3 if you count the scan for cgi-bin). If found, it will cost us one more additional one (5 total). That gives us a worst case of 5 scans (if cgi-bin exists), best case of 1 scan (if cgi-bin does not exist). Now, let's say we have: scan () cgi-bin/a/b/c >> 1.cgi scan () cgi-bin/d/e/f >> 2.cgi scan () cgi-local/1/2/3 >> 3.cgi scan () cgi-bin/g >> 4.cgi scan () cgi-bin >> 5.cgi Now, the trick is, whisker will scan for all dirs *individually*. This means, for 1.cgi, it will scan: cgi-bin/ cgi-bin/a/ cgi-bin/a/b/ cgi-bin/a/b/c/ cgi-bin/a/b/c/1.cgi Wow, that's a lot of scanning. Same goes for 2, 3, and 4.cgi as well. All together, with the above set of scans, we will be making 17 checks (assuming everything exists). That's worst case 17, best case 2 (2 is the check for cgi-bin and cgi-local, and they don't exist). Now, optimization. The point of checking for existance of parent dirs is to speed up scanning of *many* scans that use that parent dir. So, looking in our set, scanning for the existance of cgi-bin is a good thing, because if it's not there, it will save us the rest of the checks for 1, 2, 4, and 5.cgi. But, notice how /a/b/c of 1.cgi aren't shared. There's no point to check them individually, because they're not shared with any of the other scans. What would be nice is if we could check to see if cgi-bin existed (since knowing ahead of time will help with the others), and if it does, just go straight to scanning /a/b/c/1.cgi. Well, we can. Note the optimized scans below: scan () cgi-bin >> a/b/c/1.cgi scan () cgi-bin >> d/e/f/2.cgi scan () / >> cgi-local/1/2/3/3.cgi scan () cgi-bin >> g/4.cgi scan () cgi-bin >> 5.cgi Basically, whisker will do the following: 1. scan for /cgi-bin/ (in 1.cgi) 2. if /cgi-bin/ exists, scan for /cgi-bin/a/b/c/1.cgi right away 3. if /cgi-bin/ exists, scan for /cgi-bin/d/e/f/2.cgi right away 4. scan for /cgi-local/1/2/3/3.cgi right away (since no other scans use /cgi-local/ or other dirs, no point in checking them individually) 5. if /cgi-bin/ exists, scan for /cgi-bin/g/4.cgi right away 6. if /cgi-bin/ exists, scan for /cgi-bin/5.cgi right away Wow, we just went from 17 checks to 6 (assuming everything exists). Granted, 5 checks went to checking for 3.cgi originally. Since we don't need those parent dirs for other scans, we reduced it to one. That's a worst case of 6, best case of 2. Much better than 17/2. And think of it this way, would you rather have 6 log entries, or 17? So you can think of the scan function as such: scan (server) {individual dirs to scan} >> {one thing to scan as a whole} And just remember, every dir in the 'individual dirs to scan' will cost you a check (unless cached). So when scans share dirs in common (ie they will be cached scan results), use them there. Otherwise, you want to push them to the 'scan as a whole' column. Here's a worst case scenerio of over-optimization: scan () / >> scripts/tools/getdrvrs.exe scan () / >> scripts/samples/details.idc scan () / >> scripts/samples/ctguestb.idc Now, this will force 3 scans. Even if /scripts/ doesn't exist, it will still make 3 scans. Not as intelligent. Now, one optimization would be: scan () scripts >> tools/getdrvrs.exe scan () scripts >> samples/details.idc scan () scripts >> samples/ctguestb.idc Now, if /scripts exists, it will cost us 4 scans. If /scripts does not, it only costs us one. (that's a worst/best of 4/1) scan () scripts >> tools/getdrvrs.exe scan () scripts/samples >> details.idc scan () scripts/samples >> ctguestb.idc Now, assuming scripts exists, and samples does too, it will cost us 5 scans for this, which would be: /scripts/ /scripts/tools/getdrvrs.exe /scripts/samples (/scripts is cached) /scripts/samples/details.idc /scripts/samples/ctguestb.idc (/scripts/samples is cached) If /scripts/samples didn't exist, but /scripts did, we'd have 3 scans. If /scripts didn't exist at all, we'd have 1 scan. So really, the previous optimization would be best (worst/best 4/1). This optimization (worst/best 5/3(or 1)) would be good if there are other CGIs to check for in /scripts/samples (making the cached dir check of /scripts and /scripts/samples more use). What it really comes down to is a numbers game, and somewhat psychology as well. Directory caching works well when the cache is obviously hit many times...and it's actually a penalty at other times. Look at the pros and cons of it: You can have 10 /cgi-bin/xxx.cgi checks--just like your normal CGI scanner. You cause 10 log entries, even if the scripts don't exist, which stick out like a sore thumb. With whisker, first you have a log entry for /cgi-bin/, which is much less obvious then, say, /cgi-bin/test-cgi. I mean, /cgi-bin/, while suspicious, isn't as obvious. Now, if that check fails, you don't have the other 10 log entries. You just saved yourself those 10 red flags. If the check passes, well, then it's worth the red flags to see, right? After all, that's what the scanner is for. ;) Granted, this is a very obvious result. But the numbers can be tweaked for any of the optimization cases above. How obvious are checks for /scripts/ and /script/samples/ compared to checks for /scripts/samples/details.idc, etc? BTW, the realtime IDS systems pick off the full URL requests from the wire only. So a raw check for /cgi-bin/phf will set off the IDS, regardless of it's existance. A check for /cgi-bin, and a result of negative, will save you the step of even sending the URL, and therefore keep the IDS quiet. ----[ Wish-list and future updates Well, the most obvious one I can think of is a more rigorous language parsing. Obviously a flex/yacc combo would be kickass, but I don't want to port it off perl. whisker, as it is, is useful demonstration of theory...but hey, if you want to port it to C, go for it. Why didn't I just port it to C? Well, mostly because perl's auto-allocation is such a blessing, especially to my nasty array permutation code in the scan function. Plus, eval was a nice feature, and I just like perl all around. :) ----[ What's to become of web scanners My hope is that rather than make *another* cgichk.c, port it to rebol, add a few checks, etc, that people will use whisker as the engine, and now just code cool suave kickass scan databases that are intelligent and take advantage of the features. I'd like to see a program that can pre-process a scan database and optimize the scans--this actually wouldn't be all that hard. I've also had ideas/pre-code for a parallel process front-end to scan multiple hosts at once. But I dunno, maybe people will just think whisker is stupid and I'll be laughed at. So be it. ----[ whisker perl internals for eval and coding Ok, just a few quick notes on some of the inside perl code. This will help if you want to effectively use 'eval', or poke into the code. All user and global variables are in %D. So, to reference XXRet, for instance, it would really be $D{'XXRet'}. All user arrays are prefixed with 'D'. So, array roots = a,b,c would become @Droots in perl's 'process space'. Again, this is to avoid clobber. Also note that whisker will also define $D{'Droots'}="--array--" for the 'roots' array. Making arrays named XXRet and whatnot will start getting you into trouble, as the XXRet value will be clobbered with the "--array--" string for a bit... I suggest using wprint() to print stuff. wprint() will correctly direct to console or log file, depending on commandline options. Use verbose() to print stuff only when the verbose switch is used, and debugprint() to print stuff only when the debug switch is used. You can do http requests by using sendhttp() and sendraw(). *ALL* the networking functionality (sockets, connects, etc) is contained *ONLY* in sendraw(). Use rdecode() to decode the server's return value to human readable string. For code hackers, I have indicated within the code where to add commands and where to add scan database pre-process code. ----[ Signoff Well, if you haven't thought I'm crazy by now for putting this much thought into a CGI scanner, then maybe there's hope for me. :) Whisker really came about because of two reasons: 1. I needed something to I could easily script web audits with (I was tired of rewriting C all the time), and 2. I wanted to make proof of concept of the 'next-generation' of web scanner. So here it is. Now granted, my perl coding isn't the best, and I'd love for someone to recode the scan function...that directory permutation stuff is scary code. But it works, and that's good for me. :) Drop me a line if you like/use whisker, and definately send me snippets of interesting scan scripts you make...I would like to compile a nice big one with lots of intelligence. Also, if you have ideas/bugs with the code, let me know. Till next time! .rain.forest.puppy. (rfp@wiretrip.net) ----------------[ whisker is GPL. Do not steal. Do not pass go. ----------------[ and definately do not collect $200 for my work. -[ Yes, this document uses a Phrack-esque layout. ----[ EOF