~ Berlin talk: 22C3 Private Investigations ~
Thursday December 29th, 2005 ~ Time: 13:00 ~ Location: 22C3 Berlin
This file dwells at http://www.searchlores.org/private.htm

Back to mines

"There is no shame in not knowing: the shame lies in not finding out"
  
Petit image
Back to searchlores

"Private investigations for curious web-seekers: techniques, tools and magic tricks"
by Fravia+
December 2005
version 4.3

Introduction and caveats
The importance of guessing
Anonymity for beginners
Whois    IP-Related
Wikipedia    Journals
Books    Images    Music
A search    Languages/Regional
Magic tricks    Slides   
Conclusions    Assignements

Searching for disappeared sites
Maps
Searching for sites with relevant names

All the main s.e.: Bk:flange of myth 
[rose]     webbits' cosmic power




INTRODUCTION
Structure, Opera and Proxomitron
Top

Excuse my English, please, which is my third language, and please note that I'm not sure I'll always be able to be politically correct. Also -as you can see- no powerpoint in my talks: there's no need to turn everything into a sales pitch, and with powerpoint even the few pre-chewed "ideas", hidden inside the bambinesque noise, are simplified to the point that they become redundant and unclear.

The purpose of this talk is to show you how to search effectively the web, and hence give you cosmic power, no more and no less. In only one hour we'll be able to examinate just a broad palette of searching approaches. Let's hope that your attention span is good enough, and that you'll be later able to work by yourself, in order to learn much more on your own: Probieren geht über studieren, duh, yet without your own work and application most of you will just remain the poor "one word" searchers that they are.

It would be a pity! Once you know how to search the web, the entire human knowledge will become available to you.

Enough blabbering: let's begin from the beginning. Using google, we can see how a simple "moronical" query like
"index.of" warez has your target signal submerged under such a heavy commercial noise to be next to useless. In fact people create on their servers many Index of/archiv/warez or Index of/archiv/porn subdirectories just in order to attract some (moronical) traffic :-)
The index.of querystring is one of the oldest tricks used to bypass the commercial vultures, in the hope to fetch directly the targets you are seeking.
It still works, btw: playmates index.of will indeed give you some good results. But the commercial beasts and the search engines' spammers (that call themselves SEOs) incorporated a long time ago this index.of string in their scam pages, so you cannot rely on this querystring -alone- anymore.

So what should we use instead?
Well, a query like the following one should cut some more mustard (In case people should look for software on the web instead of buying it):
("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace")
Of course this is useful when used with a specific target NAME: teleport.pro ("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace"), which confirms the paramount importance of NAMES on the Web.



Here another nice mp3 querystring, just smash it inside google:
imagine "snd *.mp3 *-*-2005 *:* *.*m" OR "snd *.mp3 *-*-2005 *:* *.*k" OR "snd *.mp3 *-*-2005 *:* *.*"
and obtain the following link: http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=imagine++%22snd+*.mp3+*-*-2005+*%3A*+*.*m%22+OR+%22snd+*.mp3+*-*-2005+*%3A*+*.*k%22+OR+%22snd+*.mp3+*-*-2005+*%3A*+*.*%22++&btnG=Search
Change imagine to boogie, dylan, garfunkel, mendelssohn, mozart or whatnots. Change 2005 to 2004 (or earlier) for a different (but possibly more stale) search.
Look at the query: can you understand WHY this query works?




These above are just examples, and using just one of the main search engines: Google, one of the most important, but (and hence) also one of the most abused search engines.

Always remember that the main search engines (at the moment the most important ones are google, yahoo, msn and teoma) cover -at best- just a third of the whole web... red 

So the problem is to wade through the slimy commercial morasses of the web, which were made specifically "ad captandum vulgus", and to quickly "cut" this useless ballast in order to find our targets.

We'll use google a lot as an example today, but we'll examine various different ways to fetch your targets "by hand". I say "manually" because that's what we are going to do for pedagocical reasons.
Real seekers try to automate the process as much as possible and usually employ ad hoc bots to do the gritty digging. But that's for later, first the basics.

Let's first have a short look at what the web looks like from a searcher's point of view.
Outside linkers are fetched through klebing (and stalking and social engineering), the bulk and the outside linked through combing and short and long term seeking, the hidden and commercial databases through password breaking or guessing, social engineering or, more simply, just seeking databases' hardcoded passwords (à la Borland Interbase's "politically correct") on the web.
Here is for instance one of these lists: defpasslist1.htm

In fact the web was made for SHARING information, not for "hoarding" nor for "selling" it. And it was made for solidity: its structure was made in order to resist a possible nuclear attack. It will resist even the commercial beasts that have tried to bury real useful information under tons of commercial crap, aggressive commercial porn sites and an avalanche of silly and useless advertisements.
Learning how to search, you'll be able to "cut" through the commercial pudding and morasses and fetch quickly (or relatively quickly) your target jewels.

But to "cut" the Web you'll need first of all a SWORD with a sharp blade: a capable and quick browser. That's the first and foremost step. MSIE, Microsoft Internet explorer is a no-no-no, too buggy, bloated and prone to all sort of nasty attacks. The two current "philosophical schools" are either Firefox or Opera... which is the one I am using now.

Whichever of the two "real" browsers you use, no sword will suffice without a SHIELD. And your shield, and a mighty one, is proxomitron.
Proxomitron is a very powerful tool. Its power lies in its ability to rewrite webpages on the fly, filter communications between your computer and the web servers of the sites you visit, and to allow easy management of external proxy use.
Here is a link to an old, but very good essay about proxomitron basic installation: anony_8.htm, and a link to another essay, Oncle Faf goes inside proxomitron about further finetuning.... Let's sum it up: "Only morons 'just do it' without Proxomitron."

A word of warning: You'll most probably forget most of the things you'll learn today rather quickly, since nowadays most young people (and many elder ones as well) after having been heavily bombarded by advertisements from their birth onwards, have an attention span of just a few minutes and a memory as weak as an autumn leaf. But hopefully you'll gather today the basics of searching correctly the web. You may even want to test your skills, afterwards, on your own, on some assignements.

So, in case you forget, for instance, how to quickly find any mp3 on the fly, you'll be able to use your combing knowledge to quickly find on the web many searchers who will teach you how to find mp3 rather quickly. Or maybe even their teachers :-)



MUST KNOW & DISCLAIMER
sine qua non
Top

Just a short tour around the house

Main, regional and local search engines
ftp, blogs and all the various targets
usenet irc and then, of course, trolls
Again: anonymity and stalking, maybe some luring as well...

Disclaimer

The information provided during this conference should only be used for academic purposes and must not be used to infringe the patents, copyrights (or any other legal rights) of any company, organisation, government or legal entity.   The information provided during this conference must not be used to engage in any illegal activity.
On the other hand you may use the information provided in order to FIGHT any illegal activity, be it performed by private parties or by one of the aforementioned legal entities :-)



THE IMPORTANCE OF GUESSING
The Art of Guessing
Top


GUESSING is a very important art for seekers. Here two 'images-related' examples of the "guessing" approach:

1)



A database and a search engine for advertisements, completely free until recently, you could enlarge (and copy) any picture and watch any ad-spot.
Advertisements from Austria to Zimbabwe. Very useful for advertisement reversing pourposes.

For instance: http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601T.JPG, an English volkswagen advertisement.

Alas, the clowns are no longer free. Let's see what we can do.
Let's isolate the image, and now let's play the guessing game, because we don't really want to [shudder] pay advertisers in order to see their crap, do we?
Now we notice that BMPD_03884_0048601T.JPG has a "t" inside. It may be "t" for tiny.
Then we may have "w" for wide and maybe also "a" for art :-)
(http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601W.JPG...see? http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601AJPG...q.e.d.


The web was made for SHARING, not for hoarding and not for selling. It's very STRUCTURE will deliver searchers whatever they are looking for.


But the web is really deep. For instance, if you study advertisement debunking, you may also want to take account of the EVOLUTION of advertisement, and here is where sites like http://scriptorium.lib.duke.edu/adaccess/browse.html (1911-1956) may come handy.



2) http://www.acclaimimages.com/_gallery/_pages/ As you can see, when you click on one of these links, for instance on this construction crane, you get a useless small and watermarked image: http://www.acclaimimages.com/_gallery/_SM/0153-0512-1500-1119_SM.jpg.
Now note the STRUCTURE of this address: http://www.acclaimimages.com/_gallery/_SM/0153-0512-1500-1119_SM.jpg
That _SM must mean SMALL, and in this case, also, watermarked.
Some simple guessing (and some experience) will prove once again that the web was made for SHARING, not for hoarding and not for selling: as you can see the 'similar pictures', below, have the following structure: http://www.acclaimimages.com/_gallery/_TN/0153-0512-1500-1119_TN.jpg, hence _TN must be TINY.
Let's first try http://www.acclaimimages.com/_gallery/_BG/0153-0512-1500-1119_BG.jpg: BG for BIG
Ahi, ahi: "was not found on this server". In fact, I'll stop here and let you find out -I mean, guess out- the correct character sequence as an assignement :-)
But we can still eliminate the watermarks now, so that you see that you may have a three letter combination as well as a two letters one :-)
http://www.acclaimimages.com/_gallery/_SM2/0153-0512-1500-1119_SM2.jpg
How did we guess that "SM2" -instead of SM- would have eliminated the watermark?
We didin't guess at all :-) We just searched



A combing webbit
Combing   ~   Webbits
Top

http://www.google.com/search?&rls=en&q=%22password=rya%22&num=100
(you can also try password=riaa, fuckriaa, and so on, riaa is the "Recording Industry Association of America", a bunch of patents enforcers)
http://www.google.com/search?&rls=en&q=%22password=tolkien%22&num=100

Simple & elegant Webbit, for Combing, rather than for fishing pourposes...



e-mail
Anonymity for beginners
Top

You'll find more about "free" email in the ad hoc section of searchlores, the most important thing is to NEVER give out real data on the web unless you are really compelled to do so (and even in that case there are many ways to avoid it).

Always choose the first option, whatever it is, when you (have to) "choose" some options from a menu ("Your income", "Your profession", Your "State" and so on): State=Afganistan, Income=less than 15 euro per year and so on... If you want to play, there are some funny national options like "American Samoa" "Fortune and Wallys Islands" and so on.
The option "other" that you often find on these menus is also great, because you will get the wannabye sniffers thinking hard about updating their long palette of options, adding even more crap to their possible choices.

Do not feel bad while feeding only lies to anyone asking for your data on line: such people are just scum that will use EVERYTHING you tell them for profit the very moment you do, and they don't even have the decency to admit it. Screw them black and blue, such clowns deserve far worse than that: never believe for a minute that their 'privacy - pleads' about how they will "never use your data" could be anything else than cheap sarcasm.
The very reason they did set up such "free" email addresses sites (and such "free" search engines and "free" file repositories) is -of course- to READ everything you write and to have a copy of everything you upload or create.
Of course, klaro, no human being will ever read what you write, but their bots and grepping algos will do it for the owners of the "free" email services (or of the "free" search engines), presenting them nice tables built on your private data as a result.
This brings us to a very interesting contradiction: on one site "echelon" and the total big broterish control, on the other "wardriving" and pretty good anonymity... red 

Examples of "one shot" email addresses...
Mailinator http://www.mailinator.com/mailinator/Welcome.do
Anothe example:
http://www.pookmail.com/





How to discover whois
Top

For instance using pookmail as an example: http://www.whois.sc/pookmail.com (scroll down for contact names and info)

A very powerful similar tool:
http://www.domainsdb.net/





How to discover IP-related sites
Top


Erom pointed this gem out some time ago:
http://www.searchmee.com/web-info/ip-hunt.php
it allows to see which websites are cohosted on the same ip. Really great for hidden web private research, ahem ;)

A very powerful similar tool:
http://www.domainsdb.net/



Wikipedia: the power of good non commercial approaches 
Top

Very useful for our in-depth "private investigations"



Useful autocompletion...
Lumrix
http://wiki.lumrix.net/en/
Of course also in German and so on: http://wiki.lumrix.net/de/

http://en.wikisource.org/wiki/Main_Page Wikibooks

How comes it works?

A legitimate question would be "why wikipedia works?". Anybody and his cat can write whatever he believes to be the truth. The arbitration committee kills only the most obnoxious kooks and wanna-be experts, and leaves many incompetent buffoons write whatever they want. The whole project is open to trolls and with little defence against them. Yet it works perfectly, and this annoys all the clowns that hate any free successfull project :-)
There are even various idiots and subhumans that PLANT false information in wikipedia, on purpose, in order to accuse it immediately afterwards of spreading false information. To no avail. Wikipedia works very well.
So how comes it does work?
It works BECAUSE it is an open, anarchistical, non commercial, collaborative project: the power of the unwashed masses against the academical experts.

Wikipedia is about as accurate on science as the Encyclopaedia Britannica: The British journal Nature ran blind tests asking experts to compare scientific entries from both publications.
The reviewers were asked to check for errors, but were not told about the source of the information.
Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopaedia.
Reviewers found 162 factual errors in the Wikipedia documents, compared to 123 in the Britannica documents.
Nature also said that its reviewers found that Wikipedia entries were often poorly structured and confused.

Wikipedia's reliability is just a byproduct of the sheer SCALE of the project. It is not due to a peer-controlling academic process (with all its strenghts and weaknesses), but to the 'self-improving' nature of information that is shared on the web. That's the reason why its reliability, already better than many academic experts would be ready to admit, is IMPROVING. That's why anybody that has some expertise in some specific field should contribute. That's why I will do it myself as soon as I find the time :-)
Wikipedia is indeed "a brilliant product of open-source intellectual collaboration". Even its enemies now have obtorto collo to admit it :-)

Caveat lector, of course, but this holds true for all "established" encyclopaedias as well. Indeed much knowledge lies outside of academic study and "experts" . However -again- caveat lector: any seeker would soon be confused, inside or outside academia, without solid evaluation skills.



A small digression about scientific articles
Top

"The contradictions of journal searching"

Now, let's imagine that for our in-depth "private investigations" we need a given COMPLETE ARTICLE, not an abstract, a complete text, and we do not want to pay anyone for that. Let's imagine we want something mathematic related, I haven chosen as examples ["polynomial"] and ["prime factorization"]

Most searchers would use the two most "common" search engines for MATHEMATIC-RELATED articles of the visible web: http://www.emis.de/ZMATH/, which you can use to start a search and http://www.ams.org/mathscinet/search which you SHOULD NOT use, due to its commercial crappiness
Let's search for "polynomial" red 
Let's imagine we are interested in the third result: "The minimum period of the Ehrhart quasi-polynomial of a rational polytope", alas! Now we would be supposed "to pay" in order to consult/see/download it.
But we'r seekers, right?
Let's use a part of the abstract in order to fetch our target in extenso: " called the Ehrhart quasi-polynomial of"... see? Let's repeat this with any other article on this database... red 

Of course we could also have used google scholar

So, we have seen how to bypass commercial yokes using the previously explained "long string searching" approach.

The funny thing is that the web is so deep that we do not need at all to go through such bazaars.

In fact the "open source" waves are already purifying the closed world of the scientific journals as well. Good riddance!

Let's search on The Front (arxiv.org), that is slowly beating the two "established" euroamerican commercial repositories black and blue... for instance: "prime factorization", but, to keep our previous example, also: "The minimum period of the Ehrhart"... et voilà.
On one side the Americans, who do not even let you search if you do not pay up-front (US-mathscinet) & on the other one the Europeans, who let you search, but then want you to pay in order to fetch your results (EU-ZMATH). Of course we could still find our targets starting from there, but it is refreshing to know that there is also -amazingly coexisting on the same web- a complete 'journals' search engine, with a better (& rapidly growing) database and everything you need for free: the Front ("It freed anyone from the need to be in Princeton, Heidelberg or Paris in order to do frontier research"). So -once again- the web is BOTH a bottomless cornucopia and an immense commercial garbage damp, and -of course- you need to know how to search both sides of the same mirror.



Books searching
Top


First of all, for your 'private investigations' a good start is the ISBN finder that all major search engines provide:
isbn 0596005458 at google
isbn 0596005458 at yahoo

All on-line repositories are quite useful for finding books:
http://www.uploadscout.com/UploadScout/newindex.aspx: rapidshare & megaupload index.
You could input -for instance- digital photography on that search mask, but you can as well search with
rapidshare digital.photography: google
rapidshare "digital photography": yahoo
{frsh=94} {mtch=69} {popl=33} rapidshare "digitalphotography": msnsearch
or whatever local/main search engine you may like...

Well, I'm using "digital photography", or "photoshop" query examples just to demonstrate that finding "photoshop-related" books is almost as easy as writing them (everyone and his dog is writing a photoshop book nowadays).
Yet maybe many of the friends in this room would prefer, instead of "digital photography", this kind of books?

Note, however, that the rapidshare search-examples above are JUST ONE EXAMPLE:
Rapidshare is one of many "upload repositories" where people can (and do with gusto) upload large files.
It's quick, it allows unlimited downloads, and it has some free-happy-hours in the morning. So you don't need, of course, to pay. But there are many similar repositories:
rapidshare.de/: 30 Mb max, forever but after 30 days unused the file is removed, daily download limit of 3,000 MB for hosted files
YouSendIt: 1 Giga max, after 7 days or 25 downloads (whichever occurs first) the file is automatically removed
mytempdir: 25 Mb max, 14 days * 1200 free downloads, after that only from 23.00 to 7.00.
Sendmefile: 30 Mb max, after 14 days the file is automatically removed
Megaupload: 500 Mb max (!), forever but after 30 days unused the file is removed (like rapidshare)
ultrashare.net/: 30 Mb max, forever but after 30 days unused the file is removed (like rapidshare)
http://www.spread-it.com/: 500Mb - Forever or after 14 days if unused
http://turboupload.com/: 70Mb - download delay in order to show pub
http://www.4shared.com/: 100Mb - 10Mb per file Forever or after 30 days if unused

and so on, a fairly complete list of many Files and images repositories is here.

Anyhow, there's a whole section regarding books searches (and a fairly complete library as well) at searchlores, and, if interested in finding any book whatsoever, you'll be able to find more pointers there.
Suffice to say that most books mankind has written are already on the web somewhere, and that while we are sitting here, hic et nunc, hundreds of fully scanned libraries are going on line... if you are attentive enough, and if your searching scripts are good, you can even hear the clincking "thud" of those huge databases going on line... red 

In order to fetch books you just need some correct strings.

A simple trick is to use the powerful A9 engine, for instance, for conan doyle, http://a9.com/conan%20doyle?a=obooks and then fetch the study in scarlet.
Of course once we have some arrows, it is relatively easy to fetch whole copies of a book all over the web...

A simple trick is to use google's books' search facility. Let's search for 'The Hound of the Baskervilles': http://books.google.com/books?q=doyle&btnG=Search+Books&hl=en: now let's chose a phrase from page three (more pages we cannot see because of the "patents' dictatorship"): "The probability lies in that direction": It's more than enough: "The probability lies in that direction". q.e.d.

Of course this is also true for all kind of patented books... let's see: "I have no fitting gifts to give you at our parting,"...
and we land here, for instance: 'I have no fitting gifts to give you at our parting,' said Faramir; `but take these staves... (J.R.R. Tolkien: Two Towers)

The more "popular" a target, the easier it is to find it: "some students were standing up to get a better look at Harry as he sat, frozen, in his seat" (Harry Potter and the Goblet of Fire)

A msnsearch "index of" webbit: Nov-2005 intitle:"Index of /" {frsh=9999} , for isntance Nov-2005 intitle:"Index of /" {frsh=9999} "digital photography"

A "classical" Bookish Webbit: -inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip) +"o'reilly"



IMAGES SEARCHING APPROACHES
Top


Target name guessing. Uhmmm. Is this a 'mature' and enough politically uncorrect audience? Anyway we are speaking of 'private' investigations, aren't we?

Here is a better search than the (still working) playmates index.of that we have seen at the beginning:
1981-02.jpg playmates
Of course you could try a different approach: Finally, a very nice trick to avoid those "index of" spammers & clowns
intitle:"index of/" "Apr-2004" "jpg" playboy note the "Apr-2004" snippet, that you can change at leisure :-)

It's often useful to (try to) find images with some ad hoc target name guessing. In this case, since we have one playmate every month, chances are that there are date-related images' URLs.
Note that you could just try 1983-03.jpg, without the specifying suffix 'playmates', or youy could change that suffix to 'playboy', or you could repeat the search with 1983_03.jpg (note the underscore instead of the hyphen) and so on, or even try something like playmate6.jpg, in the (correct) assumption, that where there are at least 6 jpgs, you'll have more.

Of course such searches do not need to be so frivolous or 'Pr0n' oriented: monet8.jpg, and you'll land in Monet-heavy and images-rich sites.


Here are some useful images' repositories (à la 'rapidshare'):
http://www.fapomatic.com/, http://www.imghost.com/, http://www.glowfoto.com/, http://www.imageshack.us/, http://www.imgspot.com/, http://www.mytempdir.com/, http://www.bestupload.com/, http://www.netpix.org/, http://www.jotapeges.com/, http://www.rapidshare.com/, http://www.filesupload.com/, http://www.updownloadserver.de/, http://www.dropload.com/, http://www.sendthisfile.com/, http://www.fireupload.com/, http://www.yousendit.com/, http://www.youshareit.com/, http://www.glintfiles.net/, http://www.paintedover.com/, http://www.2and2.com/, http://www.imagehosting.com/, http://www.xs.com/, http://www.imagehigh.com/, http://www.imagevenue.com/, http://www.shareitagain.com/, http://www.ultrashare.net/, http://www.sendmefile.com/, http://www.perushare.com/, http://www.megaupload.com/, http://www.imageranch.com/, http://www.photobucket.com/





Music searching
Top

A completely new wave of music searching has opened up through the relatively recent mp3 blogs phenomenon, but usually it is MUCH simpler to just fetch the music you need from the web any time you need it.

See the Combing webbit above.

Your phantasy is the limit!
Simply adding for instance "4.6M" (or whatever similar you may fancy) to your querystring will ensure that there are enough big and juicy MP3 in your targets: imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica +"4.6M"

Most simple trick:
"index of" imagine m4a|wma

Another one (for music videos):
?intitle:index.of? "crazy frog" wmv "axel f"
or
?intitle:index.of? "madonna" wmv

Another one (for mp3 & co):
intitle:index.of + mp3 + "garfunkel" -html -htm -php -asp -txt -pls

Another one:
intitle:index.of + "mp3" + "band name" -htm -html -php -asp
Or even this, found with the previous query, so big that it may crash our browsers...
http://24.91.184.80/jserver/files/music/

or http://mensa.familia.rebello.nom.br/media/Som/MPG_RA_VQF/Mp3/,
found through
imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica

Some of the webbits we use for images work for music as well: intitle:"index of/" "Apr-2004" "jpg" garfunkel

However, all these tecniques are overkill. In fact the amazing thing is that even the most stupid searches, those that should NOT work, will give results:
madonna index.of mp3... q.e.d: wherever, whenever, whatever.

Some Magic 
Top


Here are, as promised, some "privated investigation related magic"...



Finding photos BY CAMERA model

http://photos.alexa.com/
This is interesting because is part of the now public Alexa indexes: http://websearch.alexa.com/welcome.html

How to subdivide a query in manageable chunks (by Various Authors)


Do any search engines or techniques exist to get more than -say- 1000 results from a search engine?


Usually you just refine your search.

You can narrow your query in various ways:
eliminating crap (the infamous -tits example)

adding broader -but relevant- terms ("digital photography": 20.400.000, +"shutter priority"= 190.000 +tiff"=40.500)

and with the results it is still good to jump first to -say page five or ten, and then go backwards when evaluating the results :-)

If you have time, you can try things like these on Google (the strategy for other search engines is mutatis mutandis the same):

With this kind of strategy you can divide your SERPs in more or less four equal parts. If you use another common keyword or feature, you can double the number of equal sets, for each new keyword...


There is a more direct way to achieve what you are asking,with the antipagination extension in firefox.
https://addons.mozilla.org/extensions/moreinfo.php?id=853
It flattens result pages (even works in forum pages)

There is a userjs that works in opera too and does the same only for google's result pages here:
http://userscripts.org/scripts/show/1392

Another trick:

blabla -inurl:htm 1.680.000

blabla -inurl:html 2.050.000

the differences are noticeable after the first pumped results

of course you can add and play with -/+ php or -/+ pdf or regional parameters (-/+fr -/+nl etcetera)




1) Sourceror2 (by Mordred & rai.jack)
try it right away
Right click and, in opera, select "add link to bookmarks"

javascript: z0x=document.createElement('form'); f0z=document.documentElement; z0x.innerHTML = '<textarea rows=10 cols=80>' + f0z.innerHTML + '</textarea><br>'; f0z.insertBefore(z0x, f0z.firstChild); void(0);
javascript:document.write(document.documentElement.outerHTML.replace(new RegExp("<","g"), "<"));


2) Another google approach
http://www.google.com/complete/search?hl=en&js=tru%20e&qu=photography

3) Another google approach (by Mordred)
Here is a way to gather relevant info about your target
"index+of/" "rain.wav******"
Useful to see date and size that follow your target name...

bookmarklets: Bookmarklets: Weapons for the seeker

4)Googe's advance operators: "aeroplane finder" and other crap
Here is a google's easy implementation:
from berlin to helsinki
Clicking on the first link) you have an automatic price comparison.

On a similar path, there is the useful define: operator we have already seen, and all the other advanced operators (stocks and other crap).

A possibly useful one is the 'change' operator: 234 USD in euro, 234 euro in CHF, 234 french money in GBP, currency of germany in malaysian money and so on.

Another useful possibility are the mathematical operators:
twenty miles in kilometers
45 Fahrenheit in celsius
((894151*66771)+456)/1241: 48 109 070.8
But here you should not use google, for mathematical calculations yahoo is better: ((894151*66771)+456)/1241=48,109,070.8114423826.


5) ElKilla bookmarklet (by ritz)
try it right away (no more clicking, press DEL to delete and ESC to cancel)
Right click and, in opera, select "add link to bookmarks"



More about bookmarklets in the javascript bookmark tricks essay.

http://fireddl.info/apps.htm: one of the many doors to the warez world



SEARCHING FOR DISAPPEARED SITES
Top

http://webdev.archive.org/ ~ The 'Wayback' machine, explore the Net as it was!


Visit The 'Wayback' machine at Alexa, or try your luck with the form below.


Alternatively, learn how to navigate through [Google's cache]!

NETCRAFT SITE SEARCH
Top

(http://www.netcraft.com/ ~ Explore 15,049,382 web sites)

VERY useful: you find a lot of sites based on their own name, which is another possible way to get to your target...


Search: search tips
Example: site contains [searching] (a thousand sites eh!)




Maps
Top


http://local.live.com/: pretty good ms concoction
http://maps.google.com/: google starter
http://maps.yahoo.com/: Yahoo (limited to the states and kanuk): for instance: zip 56554

Where people live
Top

This is useful for ALL Europe:
http://www.nl.map24.com/ just input street, town and country :-)
The same in english: http://www.uk.map24.com/
Check also the ad hoc stalking section peoplesearch











SLIDES
Top

Structure of the Web   ~   WebStructure + Hidden Web   ~   Main search engines' coverage   ~   Short and long term seeking: %   ~   Short and long term seeking: noise   ~  


   Structure of the web
Structure of the web


   Short and long term seeking: percentages
Short and long term seeking
 
Main search engines' coverage
Top


 Bulk, Hidden web and main search engines' coverage

search engines coverage


The structure of the Web, Hidden web visible
Top


da structurz of da webz

Structure of the web. Explain tie model and diameter 19-21: do not dispair, never...
How big is the web? 24 billions? The s.e. cover between 1/3 and 1/4... red 



Short term searching/Long term searching: noise
Top





   Popularity versus time
popularity versus time

Both axes on a logarithmic scale

1) 5 minutes: Harry Potter, the Goblet of Fire ~ the Half-Blood Prince (pdf or doc)
2) John Lennon "Imagine" (mp3 or ma4) (using imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica and jumping direct to page 3)
3) Lord of the rings trilogy (pdf, html or audiobook)
4) Nero Wolfe: The girl who cried wolf, audio (or any other earlier radioshow)
5) an old Luis Vitton advertisement: (they want us to pay, so let's enlarge it manually (http://media3.adforum.com/zrIf58670C/E/EU/EURR_01547/EURR_01547_0005730W.JPG... you want more? http://media3.adforum.com/zrIf58670C/E/EU/EURR_01547/EURR_01547_0005730A.JPG
6) Several years: A Black and white Bulgarian film of the fifties, or even from the late seventies, for instance ADVANTAGE Bulgaria 1978, 142 min. Dir.: Georgi Dyulgerov



 

What a search looks like
(Private investigations: The cranberry path)
Top


"Your nose is as red as that cranberry sauce," answered Fan, coming out of the big chair where she had been curled up for an hour or two"

Hey, what the heck is a cranberry?

NON SPECIFIC LINKS/APPROACHES (can be used for most targets, doesn't need to be a cranberry :-)


google define: cranberry

wikipedia cranberry

yahoo education cranberry

cranberry: The Columbia Encyclopedia, 6th Edition.

Assorted links (out of thin air):
cranberry institute --> cranberryinstitute;
cranberry magazine --> cranberriesmagazine
cranberry bibliography --> Maine Uni bibliography

Wisconsin Cranberry School Proceedings (browse journals)

google images & yahoo images

SPECIFIC LINKS/APPROACHES (cranberry-related: should be used only for plants-targets)

plants database

 

Synecdochical searching
Top


A Synecdoche ("sin-EK-doh-kee") is the rhetorical or metaphorical substitution of a part for the whole, or vice versa. This approach is widely used in searching, because it allows you to get at your signal 'from the bottom', eliminating part of the noise.

For some specific examples see synecdoc.htm.
Here let's just have "a visual look" at a search:

The red cylinder below represents the TOTALITY of accessible web sites that could be of interest to you -in the context of your current search. The small rings shows four different specific clusters of interesting sites.
Please remember that inside the cylinder the 'void' is only APPARENT! That's the part of the internet you cannot reach through the main search engines. There are interesting sites there as well (as a matter of fact MANY more than on the 'accessible' outside), but to grab them you'll have to use more advanced techniques than commercial engines :-)

latilongi
1 You land first time to an interesting cluster of sites trough your 'clean cut'
2 You have 'synecdochically' moved horizontally, modifying your original clean-cut
3 These sites will be relatively easy to find, they are both on an horizontal and on a vertical synecdoche. Note that the signal width of the vertical synecdoches (e.g. the yellow one on the right side of the image) may vary quite a lot, while horizontal synecdoches' width seems more costant.
4 You'll never find this cluster with your current synecdochical approaches, you'll have to devise a COMPLETELY DIFFERENT cut.



Regional searching
The importance of languages and of online translation services and tools
Top


One of the main reasons why the main search engines together cover (at best) just something less than 1/2 of the web is a LINGUISTIC one. The main search engines are, in fact, "Englishcentric" if I may use this term, and in many cases - which is even worse - are subject to a heavy "Americancentric bias".

The web is truly international, to an extent that even those who did both physically travel and virtually browse a lot tend to underestimate.
Some of the pages you'll find may point to problems, ideals and aims so 'alien' from your point of view that -even if you knew their languages or if they happen to be in English- you cannot even hope to understand them.
On the other hand this multicultural and truly international cooperation may bring some fresh air in a world of cloned Euro-American zombies who drink the same coke with the same bottles, wear the same shirts, the same shoes (and the same pants), and sit ritually in the same McDonalds in order to perform their compulsory and quick "reverse shitting".

But seekers need to understand this Babel if they want to add depth to their queries.
There are MANY linguistic aids out there on the web, and many systems that allow you to translate a page, or a snippet of text from say, Spanish, into English or viceversa. But much rarer, and much more useful for us, are sites that allow us to understand -eve roughly- pages written in Japanese, Chinese, Hindi, Russian, Korean, you name the funny alphabet :-)

As an example of how powerful such services can be in order to understand, for example, a Japanese site, have a look at the following trick:

RIKAI
An incredible translator!
http://www.rikai.com/perl/Home.pl
Try it for instance onto http://www.shirofan.com/ See? It "massages" WWW pages and places "popup translations" from the EDICT database behind the Japanese text!

for instance
http://www.rikai.com/perl/LangMediator.En.pl?mediate_uri=http%3A%2F%2Fwww.shirofan.com%2F
See?
You can use this tool to "guess" the meaning of many a Japanese page or -and especially- Japanese search engine options, even if you do not know Japanese :-)
You can easily understand how, in this way, you can -with the proper tools- explore the wealth of results that the Japanese, Chinese, Korean, you name them, search engines may (and probably will) give you.

Let's search for "spanish search engines"... see?
Let's now search for "buscadores hispanos"... see?

A 'portable' translator
Top

Highlight the following text: Nous sommes en 50 avant Jésus-Christ. Toute la Gaule est occupée par les Romains... Toute? Non! Un village peuplé d'irréductibles Gaulois résiste encore et toujours à l'envahisseur. Et la vie n'est pas facile pour les garnisons de légionnaires romains des camps retranchés de Babaorum, Aquarium, Laudanum et Petitbonum...
click here: translate,
javascript:
params = '?langpair=fr|en';
if (document.getSelection) {
  txt = document.getSelection();
}
else
  if (document.selection) {
    txt = document.selection.createRange().text;
  }
if(txt)
  params+="&text="+encodeURIComponent(txt);
void(window.open('http://translate.google.com/translate_t'+params, /keep on 1 line/
'translate','location=no,status=yes,menubar=no,scrollbars=yes,     /keep on 1 line/
resizable=yes,width=547,height=442'))                              /keep on 1 line/
Ok, simple and quick (and rough) javascript. French into English was easy, of course. But -again- note inside the code the params = '?langpair=fr|en'; snippet, that you can change to anything! For instance to Korean in order to translate or to browse starting from the following: http://www.japanpr.com/shimane/shimane_default.htm

click here: Translate the page ko|en,





POSSIBLE PATHS
Top


I would also like to draw your attention to the paramount importance of names on the web... red 
The ethical aspect... red 
An unfair society... red 
websearch importance nowadays recognized and obvious, you'll see tomorrow :-)... red 
libraries and documents: frills and substance... red 
the guardian of the light tower, the young kid in Central Africa and the yuppie in New York... red 



CONCLUSIONS
Top

Ode to the seekers

Like a skilled native, the able seeker has become part of the web. He knows the smell of his forest: the foul-smelling mud of the popups, the slime of a rotting commercial javascript. He knows the sounds of the web: the gentle rustling of the jpgs, the cries of the brightly colored mp3s that chase one another among the trees, singing as they go; the dark snuffling of the m4as, the mechanical, monotone clincking of the huge, blind databases, the pathetic cry of the common user: a plaintive cooing that slides from one useless page down to the next until it dies away in a sad, little moan. In fact, to all those who do not understand it, today's Internet looks more and more like a closed, hostile and terribly boring commercial world.
Yet if you stop and hear attentively, you may be able to hear the seekers, deep into the shadows, singing a lusty chorus of praise to this wonderful world of theirs -- a world that gives them everything they want.
The web is the habitat of the seeker, and in return for his knowledge and skill it satisfies all his needs.

The seeker does not even need any more to hoard on his hard disks whatever he has found: all the various images, musics, films, books and whatsnot that he fetches from the web... he can just taste and leave there what he finds, without even copying it, because he knows that nothing can disappear any more: once anything lands on the web, it will always be there, available for the eternity to all those that possess its secret name...

The web-quicksand moves all the time, yet nothing can sink.

In order to fetch all kinds of delicious fruits, the seeker just needs to raise his sharp searchstrings.

In perfect armony with the sourronding internet forest, he can fetch again and again, at will, any target he fancies, wherever it may have been "hidden". The seeker moves unseen among sites and backbones, using his anonymity skills, his powerful proxomitron shield and his mighty HOST file.
If need be, he can quickly hide among the zombies, mimicking their behaviour and thus disappearing into the mass.

Moving silently along the cornucopial forest of his web, picking his fruits and digging his juwels, the seeker avoids easily the many vicious traps that have been set to catch all the furry, sad little animals that happily use MSIE (and outlook), that use only one-word google "searches", and that browse and chat around all the time without proxies, bouncing against trackers and web-bugs and smearing all their personal data around.

Moreover the seeker is armed: his sharp browser will quickly cut to pieces any slimy javascript or rotting advertisement that the commercial beasts may have put on his way. His bots' jaws will tear apart any database defense, his powerful scripts will send perfectly balanced searchstrings far into the forest.



So, that was it. Any questions?





ASSIGNEMENTS
Top

Your own private investigations

The power of searching at your fingertips, what are you waiting for?
Start your own private investigations! Here two rather naïve examples.

1) Inflation
Don't you have the impression that the real inflation we have all to endure (with more and more expensive everyday prices) is waaay more than that ludicrous 2,1% (circa) that our powers that be claim year after year?
Well, there are a series of newspapers with their COMPLETE ARCHIVES on the web, searchable, for free.
Supermarket chains, Aldi, Carrefour, you name it, have also published on the web their "fabolous" offers and prices.
Or try this: http://www.google.com/catalogs :-)
You'll be able to find the older pages, as we have seen, using webarchive or similar web-snapshots repositories.

Assignement: Find the real inflation using all available data.

Some simple suggestions:
Use an average of price components that is weighted and categorized in approximately the same manner as the official Consumer Price Index.
Housing should represent the largest component at 40% with other categories having lesser impact.
The inflation rate should be calculated as a price multiplier with a base year of 1995, to represent the number of "1st January 2006" euro that are required to purchase what the equivalent of one "1st January 2006" EUR bought in 1995.
The annualized inflation rate is the equivalent average compounded yearly inflation rate over the 10 year period.
Take account of education and medical care costs, also easy to find and check on the web (some combing and social engineering will go a long way in order to find them).
Create two subgroups: 1995-1999, 2000-2005
If you do it, you will soon realize that -while the euro itself has nothing to do with it- the high inflation trend (around 6~8% real, not 2,1%) has been a (quite interesting) constant.

Purchasing power and living standards been steadily reduced, in Europe and elsewhere, through a higher than admitted real inflation, which translates of course into an automatic salary decrease for the great majority, bar speculators.
This has been coupled with a prolonged (by law) "active life" (read shorter pension), and longer working hours (and working days) without any salary compensation whatsoever for the unwashed masses.

2) Punishing greedy and corrupt ones

D'you have in your town a station being built, a new industrial area being planned, any building permits being granted, any committee for the management of public housing?

You can bet that -in 99% of cases- someone is using law-loopholes and/or a net of political protection in order to make money illegally.
But you now have the power of the seeker! Don't underestimate it.
You can explore all newspapers' databases, you can easily find related news, you can seek in many languages...

In a more and more Internet-oriented society a seeker can find out quite a lot about his targets.
You can stalk people, lure and/or troll info out of them or about them, find out where they live, how much they earn, when, where and how they started to work (political appointment? Public competition? Father's connections?)
You can, with simple social engineering tricks, get in touch with their co-workers, enter their databases, have a look at the code of their doc format documents, where word, often enough and per default, keeps all the corrections and changes which have been made to a document...

Your 'private investigations' may be small crumbs, but even small crumbs may grind the well-greased wheels of your own local political/commercial vermine!












Petit image 2124 bytes
Petit image
Petit image
Petit image
Back to allinone
The Door
the Hall
The Library
The Studio
The Garden path