Form
In order to search effectively you must first understand how the web looks like. There are different "areas" of the
web, characterized by access flows. We'll see how
important this is for searching purposes.
"Nucleus" (& usenet, etcetera)
It's difficult to render the structure of the web on two (or three) dimensions only, but imagine a big bulk of sites, mutually
interconnetted, that we'll call the Nucleus (or the "bulk").
The nucleus is composed of sites, newsgroups, databases, maillists repositories, collections
of homepages, university portals, webrings... you name it, which are mutually linked. That's the "ordinary" web
you find and peruse when you
browse around following the links you find. A bulk of almost 3000 million
mutually interconnected pages form the nucleus that, like a
moebius tape, is mostly self-referring and self-containing.
"Outside linked" (pages the Nucleus points to)
The Nucleus "points" to another area of the web, the "outside linked". The sites in this area
are linked from the Nucleus but do not point back to it. A simple example is a
database of images, linked from the Nucleus but not necessarily pointing back to it.
This part of the web, made mostly out of "non hidden" databases (but not only), can be searched
and "combed" with the same searching techniques that we usually
apply when searching the Nucleus. These pages are "outside" the nucleus, yet not
particularly difficult to find.
"Outside linkers" (pages pointing to the Nucleus)
Like matter and anti-matter, to the "outside linked" pages we spoke of above, correspond
an inversed related part of the web: the "outside linkers" pages. Indeed
all the pages located in another area
of the web "point" to the Nucleus but are not pointed back from it. Imagine as
an example the personal links page of a scientist: lotta interesting links to
the Nucleus yet no need to publicize its existence. A page with information you
may need is there, somewhere, without any
link whatsoever that could bring you to it. Indeed there are no links back from
the Nucleus to these pages.
The "outside linkers" are a part of the web you cannot reach using
"normal" search techniques, since no link whatsoever points to them. Yet they may hoard
knowledge you need. There are, fortunally, some techniques that you can apply in order
to find them.
"Hidden databases" (goodies you are supposed to pay for)
The fourth main area of the web (in our "searching taxonomy") is made of
hidden databases. These are pages that the Nucleus points to, and that may
(or may not) point back to the Nucleus. Yet
for commercial (or other access-restrictive) reasons visitors of sites located here
are supposed to
"pay" (or adhere to some "clan") in order to acess them. As you may imagine, these pages
are NOT mutually linked. Fortunately (for us, unfortunately for the commercial bastards)
the web was originally built
in order to share (and neither to hoard nor to sell) knowledge.
And thus the building blocks, the
"basic frames" behind the
structure of the web are still the same. If I may dare a comparison: exactly as it is
pretty easy
to break any software protection written in a higher language if you know
(and use) assembly,
so it is easy to break any server-user delivered barrier to a given
database if you know (and can outflank) the protocols used by
browsers and servers.
As a result let's simply say that it is
relatively easy to access all pages in this area reversing the (simple) perl
or javascript tricks used to keep them "off limit" for zombies and lusers.
A not too small part of this area is made of "politically and strategivally sensible" data, hidden inside military
or government servers, which are connected to the web (a very big mistake eo
ipso IMHO). Since this is a workshop "for the establishment", I'll limit
myself to tell you that if you learn how to search you'll find soon, and
without excessive
fatigue, a whole plethora of effective ways to access this
kind of info... should you fancy it.