Disclaimer
---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH---
---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH---
---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH--- ---JAPH---
What this program does is probably not appropriate for young'uns, so
if you wanna teach your kid perl, this is NOT the thing to teach him with.
I take no responsibility for this code, it's functionality, use, I can
barely vouch for it's existence. In fact, this thing might format your harddrive, and you know what?
I wouldn't evem want to know about it. Take it or leave it.
Intro
This is the first article i've written in several months, and is somewhat different then what I've written before.This article will concern itself with bulding a complete program in Perl, and of course.... Everyone lubs Purl. Perl is a programming language originally designed for analysing text files and generating reports, but has been extended so much, it can do almost everything. (including making you a cup of coffee, if you have the right hardware for it)
To find more about perl you can obviously check www.perl.com and www.perl.org, plus, you can get the (most) excellent port of perl to win32 from www.activestate.com
What Does The Program Do?
There are several good articles on corns' page [plug! plug! -NRoC], and I had to find a way to make mine draw attention. I could have done this by subtlety, finnese, and great effort on fine-tuning the document to the interests of the average reader.
Needless to say, I didn't do that.
The program I made is about pr0n. Yes, that little beast inside all of us we try to keep hidden around family and friends (except corn, who shouts his p3rviness from every rooftop in central europe) [LIES! -NRoC]
What the program does is conenct to the various password list sites around the net, collecting passwords, and then generates an .html file containing the full list. It has full dupe-checking, and fairly basic "haha - though - you - were - gonna - see - naked - girlies - but - I - just - made - 3 - cents - cause - yer - so - dumb" link-zapping.
It generates a listing that sorts all sites alphabetically, and maybe a few other things I'll get to along the way. All this in around 120 lines of code, gotta love perl.
You should know that if you don't know Perl, this won't teach it to you, but it might encourage you to go ahead and learn it yourself, which is why I wrote this article.
Anyhow, perl is fairly readable, so if you have experience in some
other high-level languege, you'll probably be able to keep up with the
logical workings of the program.
Perl is really really good, and since I learned it just from examples,
man files and some online tutes, I'm sure you can too.
The way I'm gonna make this article is by going through the script, chunk by chunk, explaining what each part does as I go along. So let's get started...
Part One
Sets a debug var to 1 making the program spit out various info useful for debugging.
Declares its use of a package LWP::Simple, this package is in charge of getting files off of the web. Very convenient, since all you have to do is say "$page = get($url);" and the package will take care of the rest.
Opens 2 files:
"sites", a file containing a list of pages that list passwords, 1 site for each line.
"list.htm", this is where the final list will be written to, the ">" at the
beginning of the filename means it's opened for writing.
And finally, the basic html tags are pre-written to the out.htm file.
$debug = 1;
use LWP::Simple;
# open sites and out files.
open (SITE_FILE,"sites") || die "Couldn't open Sites File";
open (OUT_FILE,">list.htm");
# write HTML header for outfile
print OUT_FILE "\n
print OUT_FILE "<H2>Sites Scanned:</H2><p><p>\n<H4>\n";
This loop goes through each line of the site list, gets the url in the file, and calls a function get_page_passes(), while the url is inplicitly stored in the variable "$_" (this is just the way perl works).
The function returnes a list containing the passwords colllected from the specific page, this list is then push'ed (integrated) into the collective list @total_site_list, that will contain the collected passwords from all pages, and which will go through all the manipulations before outputting it into out.
# main loop, reads in a page url , gets back the passes
# and then combines them all to one list.
while (
print OUT_FILE "$_<p>\n";
@current_list = get_page_passes();
push @total_site_list,@current_list;
}
Now we come to the Get_page_passes function, This is the heart of the matter so-to-speak.
Let's Break It down:
This part declares two variables that will be embedded in the regular expression to follow, which makes the code look a bit cleaner later on (lord knows it needs it).
It then chomps off the "\n" at the end of the url in $_. (which we got before calling the func, in the main loop). then the program prints the url it's connecting to, to show the user what's going on.
sub get_page_passes {
my $user_chars = "a-z0-9\.\!\#\$\%\^\&\*\(\)";
my $site_chars = "a-z0-9\-\_\&\/\.";
my @page;
my @my_passes = undef;
# chomp off url
chomp;
print "Connecting to $_...";
This next bit downloads the page from the net using "get($_)", and makes a list called @page, one line of the downloaded file for each member of the list, this is done by slitting the full page on every "\n".
This next bit is the workhorse, and how lovely of perl to make it possible with just one line ( and enabling that is a time-honored tradition of perl ).
The variable $page containing the full downloaded page is searched for any pattern which resembles "<a href=http://name:pass@www.host.com", which ofcourse if you know html is a link, and the form "http://x:x@site", means it's a link to a page requiring a password.
The map function ( "m//" ), returns a list of all matches, in this case the submatch inside the parenthesis (), which comfortably gives us just the url: "http://name:pass@host.com", the switches used with this map functions are:
I : Case insensitive.
s : Treat file as single line, meaning \s also matches \n in file.
This is good since an html tags might be spread over multiple lines,
using this makes sures those are matched too.
g : Global. this returned all the matches instead of just one.
The function then prints the number of passes found in the page, by priting out the list member count of @my_passes, using the $# variable.
# use LWP to GET the file, and split it by lines into the @page list
$page = get($_);
@page = split(/\n/,$page);
# get all links in correct form
@my_passes = $page =~ m/<[\s\n\r]*a[\s\n\r]+href"javascript:if(confirm('http://nroc.homeip.net/~nroc/cornsoup/articles/[/s/n/r]*=[/s/n/r]*/"?(http/:////?[$user_chars]+/:[$user_chars]+/@[$site_chars]+)/isg;
print $#my_passes . " Passes Found\n";
The same technique of using maps, is used to collect all the hosts referenced by <IMG> tags, this exploits the fact that most sites include banners for their sponsors, and the hosts used to grab the images are also the ones they use false links to.
If the debug variable is set to 1, it prints the list of collected banner sites.
Again, using the map technique, grab all sites that are linked by the page, that contain CGI in the directory part of the url. Again, exploits the fact that sites link to their sponsors, and usually these links are to scripts in the cgi-bin directory of the sponsor's server.
USE THEIR OWN GREED AGAINST THEM, REALITY-CRACKING!!! (hah).
@banner = $page =~ m/<[\s\n\r]*img[\s\n\r]+src"--(--http-----)([-d-w---_-.]+)-isg;-BR" tppabs="http://nroc.homeip.net/~nroc/cornsoup/articles/[/s/n/r]*=[/s/n/r]*/"?(?:http:////)([/d/w/-/_/.]+)/isg;
if ( $debug ) {
print "img tag banner sites collected:\n";
foreach(@banner) {
print "$_\n";
}
}
# get all hosts that are referenced by links and contain *cgi* in the dir
# add to @banner.
push @banner,$page =~ m/<[\s\n\r]*a\s+href"javascript:if(confirm('http://nroc.homeip.net/~nroc/cornsoup/articles/[/s/n/r]*=[/s/n/r]*(?:http:////)?([/d/w/-/_/.]+?)//[/w/_/-]*cgi[/w/_/-]*//.+? \n\nThis file was not retrieved by Teleport Pro, because the server reports that this file cannot be found. \n\nDo you want to open it from the server?'))window.location='http://nroc.homeip.net/~nroc/cornsoup/articles/[/s/n/r]*=[/s/n/r]*(?:http:////)?([/d/w/-/_/.]+?)//[/w/_/-]*cgi[/w/_/-]*//.+?'" tppabs="http://nroc.homeip.net/~nroc/cornsoup/articles/[/s/n/r]*=[/s/n/r]*(?:http:////)?([/d/w/-/_/.]+?)//[/w/_/-]*cgi[/w/_/-]*//.+?">/isg;
Last bit of the get_page_passes function.
The first part kills off any duplicate sites in the blacklist.
The kill_dupes func is fairly simple, we'll get to it soon,
we call kill_duped with "sort @list", since it requires a sorted list to do
it's work.
Then, if the debug flag is on, prints out the cleaned list.
The next part goes through the list of links to passworded sites collected earlier in the function, isolates the host part of the url, and compares it to the members of the blacklist, if there's a match, I've found a false link, (those bastards), and I 'splice' it off the main list of sites.
The function then returns the cleaned up list of sites, and that's all there is to it.
# good thing I have a blacklist, but I need to make sure there are no dupes
@banner=kill_dupes(sort @banner);
if ( $debug == 1) {
print "final blacklist:\n";
foreach(@banner) {
print "$_\n";
}
}
# I go through all the passes in this page, and kill off dupes by comparing
# to entries of blacklist.
foreach (@banner) {
$banner=$_;
for($I=0;$I<$#my_passes;$I++){
if ($my_passes[$I] =~ /(http\:\/\/)?(.*\:.*@)($banner)/I) {
print "Deleted Probable Fake Link:\n";
print "\t$my_passes[$I]\n";
splice @my_passes,$I,1;
}
}
}
return @my_passes;
}
Back to the main code, let's look at the loop again, so we can find our bearings in the code.
while (
print OUT_FILE "$_<p>\n";
@current_list = get_page_passes();
push @total_site_list,@current_list;
}
At this point, we've gone through all sites in the "sites" file, collected all passes and cleaned any fake links we could find, and we now have a full integrated list of all the sites from all the list pages in @total_site_list.
Lets Move On
Simple enough, I sort the list, needed for calling kill_dupes, then call the kill_dupes function itself. This is because different sites may contain the exact same password and site (happens to often actually, hmmm )
Now the final list is done, clean, organised, in easy format to output to file, what a wonderful thing life is.
# sort url's
@total_site_list= sort @total_site_list;
# kill dupes
@total_site_list=kill_dupes(@total_site_list);
if( $debug ) {
print "complete sorted duped sitelist";
foreach(@total_site_list) {
print "$_\n";
}
}
This next bit is simple, it puts out some html code to the outfile, then goes through the list (now sorted), spitting out the list. The regular expression deserves a little attention, since it really shows how powerful this things are, and one of the prettiest things in this whole proggy.
Most sites usually start with "www.", this means that 80% of sites will wind up in the "W" section of the page, and that's no good, and doesn't really show good programming (which I usually try to obtain). In each iteration of the loop, the $I variable contains the current letter the program is outputting, this variable is embedded in the regexp, which has 2 possible ways of matching a line.
The first being "www.$I", the other being "$I.restof_sitename.com"
The grep function goes through each member of a list and returns only those that match a regexp.
In both possibilities, the preamble to the actual sitename is stripped, so the only part the regexp tries to match is the actual sitename.
Since some url's specify an IP instead of a FQDN, a seperate loops is used to output any url that contains a hostname part that starts with anything but an alphabetical letter.
The closing HTML tags are then outputted to file, and the number of total sites collected is outputted to screen.
print OUT_FILE "</H4>\n<H2>Passwords:</H2>\n<H5>\n<p><p>\n";
# print passwords to file by alphabet of hostname
for( $I = 'a' ; $I cmp 'aa' ; $I++ ) {
print OUT_FILE "\n<A2>$I:</A2><p><p>\n";
foreach( grep(/(http\:\/\/)?(.*\:.*@)(((www[\d\w]*\.?)$I)|((?!www.*\.)$I))/,@total_site_list) ) {
print OUT_FILE "<a href=$_>$_</a><p>\n";
}
}
# print out hosts not beginning with the alphabet (mostly numerical ip's)
print OUT_FILE "\n<A2>others:</A2><p><p>\n";
foreach( grep(/(http\:\/\/)?(.*\:.*@)((?!www\.?)[^a-z])/,@total_site_list) ) {
print OUT_FILE "<a href=$_>$_</a><p>\n";
}
print OUT_FILE "</BODY>\n</HTML>\n";
print $#total_site_list . " Passes Written Out";
I suppose this function deserves some explaining, although it's very simple.
It takes in a sorted list, and checks each member against the one following it, if there's a match (dupe found), it strips off the dupe, and checks the next one. it does this through all the list, until there are no more dupes. It's simple but it works, Just like you would expect of a JAPH, and it's fast enough for my needs (although horribly inefficient algo-wise).
sub kill_dupes {
my $I;
my (@total_site_list) = @_;
# just an easy func. if entry == entry+1, splice off next entry and reloop
for($I=0;$I<$#total_site_list;){
if( $total_site_list[$I] eq $total_site_list[$I+1] ) {
if ( $debug ) {
print "Dupe:\n$total_site_list[$I]\n$total_site_list[$I+1]\n"
}
splice @total_site_list,$I+1,1;
}
else {
$I++;
}
}
return @total_site_list;
}
Closing Arguments
The point of this hack is not getting free pr0n. [yeah, right. -NRoC]
Although some might take this side-effect as if it were the effect. The point is that computers should make your life easier, and even more so if you are a programmer. Programming means you can write programs that can do everything you need done. Everything you usually do by hand on your computer can be done by a semi-intelligent computer program. This means you spend more time having fun with your computer and less time just SETTING THINGS UP so you can have some fun.
That's what I like about perl.
That's what I like about being able to program.
That's what I like about my computer.
So now, instead of sitting around loading sites and finding the stuff you want, which might take a good chunk of time. Have your computer run this thing at midnight every night, and load up the list whenever you feel like it.
Let the Computer do the work. After all, you're a humanoid, you have better things to do.
werd.
Written By Anon.