mi_struc.htm: How to search the web, by fravia+ (¯`·.¸(¯`·.¸ Milan, Linux day ¸.·´¯)¸.·´¯)

http://www.searchlores.org
http://www.fravia.org

				Milan Linux day

The structure of the web (how to access)

Form
In order to search effectively you must first understand how the web looks like.
There are different "areas" of the web, characterized by access flows. We'll see how important this is for searching purposes.

"Nucleus" (& usenet, etcetera)
It's difficult to render the structure of the web on two (or three) dimensions only, but imagine a big bulk of sites, mutually interconnetted, that we'll call the Nucleus (or the "bulk"). The nucleus is composed of sites, newsgroups, databases, maillists repositories, collections of homepages, university portals, webrings... you name it, which are mutually linked. That's the "ordinary" web you find and peruse when you browse around following the links you find.
A bulk of almost 3000 million mutually interconnected pages form the nucleus that, like a moebius tape, is mostly self-referring and self-containing.

"Outside linked" (pages the Nucleus points to)
The Nucleus "points" to another area of the web, the "outside linked". The sites in this area are linked from the Nucleus but do not point back to it. A simple example is a database of images, linked from the Nucleus but not necessarily pointing back to it. This part of the web, made mostly out of "non hidden" databases (but not only), can be searched and "combed" with the same searching techniques that we usually apply when searching the Nucleus. These pages are "outside" the nucleus, yet not particularly difficult to find.

"Outside linkers" (pages pointing to the Nucleus)
Like matter and anti-matter, to the "outside linked" pages we spoke of above, correspond an inversed related part of the web: the "outside linkers" pages. Indeed all the pages located in another area of the web "point" to the Nucleus but are not pointed back from it. Imagine as an example the personal links page of a scientist: lotta interesting links to the Nucleus yet no need to publicize its existence. A page with information you may need is there, somewhere, without any link whatsoever that could bring you to it. Indeed there are no links back from the Nucleus to these pages.
The "outside linkers" are a part of the web you cannot reach using "normal" search techniques, since no link whatsoever points to them. Yet they may hoard knowledge you need. There are, fortunally, some techniques that you can apply in order to find them.

"Hidden databases" (goodies you are supposed to pay for)
The fourth main area of the web (in our "searching taxonomy") is made of hidden databases. These are pages that the Nucleus points to, and that may (or may not) point back to the Nucleus. Yet for commercial (or other access-restrictive) reasons visitors of sites located here are supposed to "pay" (or adhere to some "clan") in order to acess them. As you may imagine, these pages are NOT mutually linked.
Fortunately (for us, unfortunately for the commercial bastards) the web was originally built in order to share (and neither to hoard nor to sell) knowledge. And thus the building blocks, the "basic frames" behind the structure of the web are still the same.
If I may dare a comparison: exactly as it is pretty easy to break any software protection written in a higher language if you know (and use) assembly, so it is easy to break any server-user delivered barrier to a given database if you know (and can outflank) the protocols used by browsers and servers.
As a result let's simply say that it is relatively easy to access all pages in this area reversing the (simple) perl or javascript tricks used to keep them "off limit" for zombies and lusers.
A not too small part of this area is made of "politically and strategivally sensible" data, hidden inside military or government servers, which are connected to the web (a very big mistake eo ipso IMHO).
Since this is a workshop "for the establishment", I'll limit myself to tell you that if you learn how to search you'll find soon, and without excessive fatigue, a whole plethora of effective ways to access this kind of info... should you fancy it.

Proceed to searching, combing, klebing, luring, hacking