Back to details.htm
| |
~ Some Oddities @ Raging ~
(Reversing a new search engine)
By ~S~ Humphrey P. and others Courtesy of searchlore.org, May
2000 |
in fieri document for advanced searchers
Check [a look at the STRUCTURE of
Altavista] as well
This is an exceptional document (though, as usual with Humphrey P,
not very easy to read :-) which
represents an ongoing exercise in order to understand some quirks
at raging.com
A little history: The advent of Google
(clean
interface,
good algos, quick delivery of excellent results, not to mention the "cached pages"
most excellent bonus)
forced
Altavista to try to re-gain its 'core' audience, which was migrating en masse
towards
google (as funny as it may sound to you most users stick to ONE search engine for
whatever query they are performing :-(
Raging represents Altavista 'striking
back', and
it seems indeed a nice tool. But how does the "new" Raging engine really work and
interfaces with Altavista's databases?
That's what Humphrey and other seekers try to
understand (and explain) here...
Presentation (by Gregor Samsa) |
I tried the new search interface altavista provides at
http://www.raging.com
Man, you'll
like
it ! It seems to have all features from altavista's simple search and there are NO
GRAFIX -
it's
a good deal faster than the 'regular' altavista web interface.
You'll have
to have a
look
on the way the customizing works. It is on a separate page and you customize your
results
before
or after you search. Apparently, the options are kept as long as you stay
connected.
+
I found something which might be a new feature.
At least
it
was unknown to me.
Compare these two searches:
http://ragingsearch.altavista.com/cgi-bin/query?q=%2BGustav+%2B%22II%22+%2BAdolf
==> 1702 hits
http://ragingsearch.altavista.com/cgi-bin/query?q=%2BGustav+%2B%22II%22+%2BAdo
lf
&FFF=off&wfmt=d&nbq=30&KL=en&KL=fr&KL=de&KL=es&Translate=on&prf=Submit
==> 5438
hits
The first search I performed using the default settings. On
the 'old'
altavista page this sets the KL-parameter to "XX", which stands for 'any
language'.
The second
result I got by checking the boxes for English, French, German and Spanish on the
customizing
page. I do not know why, but the engine finds a larger number of hits when certain
languages are
chosen than by the default (which should be 'any language' here as well, shouldn't
it ?).
Does this also work with the 'old' interface ? No. As you can see
by the
following examples, only the LAST language parameter given is used.
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGustav+%2B%22II%22+%2BAdolf
&kl=en&kl=fr&kl=de&kl=es&stype=stext
==> 27 hits in Spanish
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGustav+%2B%22II%2
2+%2BAdolf
&kl=es&kl=fr&kl=de&kl=en&stype=stext
==> 2385 hits in English
Striking here, too: A search
for English pages turns up more hits than a
search for pages in any language. I'd really like to see the code they
use...
Trouble in Bilbao City (by ~S~ Humphrey
P.) |
Trouble, in Bilbao City, (with a CapiTal T and that rhymes with P and that stands
for
POOL!)
Tabulation of presumptions, followed by
results.
-a-
http://ragingsearch.altavista.com/cgi-bin/query?q=%2BGustav+%2B%22II%22+%2
BAdolf
1702 pages found // word count: Adolf: 387514; Gustav: 461825; II:
33305000
-b-
http://ragingsearch.altavista.com/cgi-bin/query?q=%2BGustav
+%2B%22II%22+%2
BAdolf&FFF=off&wfmt=d&nbq=30&KL=en&KL=fr&KL=de&KL=es&Translate=on&prf=Submit
54
38 pages found:
Searching in ONLY: • English • French • German • Spanish // word count: Adolf:
387514; Gustav:
461825; II: 33305000
Cookie: AV_RAGESETTINGS=v1:30:3:1::enfrdees Tue Dec
31 05:58:26
2013
-c-
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGu
stav+%2B%22II%2
2+%2BAdolf&kl=en&kl=fr&kl=de&kl=es&stype=stext
27 hits in Spanish
27 pages
found. // word
count: Adolf: 387514; Gustav: 461825; II:
33305000
-d-
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGu
stav+%2B%22II%2
2+%2BAdolf&kl=es&kl=fr&kl=de&kl=en&stype=stext
2385 hits in English
2,385
pages found. //
word count: Adolf: 387514; Gustav: 461825; II:
33305000
Observations:
-1-
same
number of word counts: Adolf: 387514; Gustav: 461825; II: 33305000
We seem
to be looking
at the same database, and the same search tree.
-2-
You haven't seen all
(a) 1702 (b)
5438 nor (d) 2,385 pages.
How are they alike? How are they
different?
-3-
-e-
Ordinary ad filled search, where
&kl=xx
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGustav+%2B%
22II%22+%2BAdol
f&kl=XX&stype=stext
1,702 pages found. // word count: Adolf: 387514; Gustav:
461825; II:
33305000
Comparing -e- with -a- I conclude: Statistics for results
where &kl=xx have
not changed between what the old altavista.com WAS doing and the new
ragingsearch.altavista.com
IS doing.
-4-
Lets pick a really esoteric international subject.
Perhaps something
brand new, because I want to see all the found items to figure out what
&stype=stext WAS doing.
Maybe that will give a clue as to what &Translate=on IS doing.
You say
there WAS a
difference between .es and .en so, let's add .xx to those two in a reduced field
search and
compare our three
results.
&kl=es
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&
q=%2BGustav+%2B
%22II%22+%2BAdolf+%2BBilbao&kl=es&stype=stext
2 pages found // word count:
Bilbao: 270771;
Adolf: 387514; Gustav: 461825; II:
33305000
&kl=en
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&
q=%2BGustav+%2B
%22II%22+%2BAdolf+%2BBilbao&kl=en&stype=stext
13 pages found // word count:
Bilbao: 270771;
Adolf: 387514; Gustav: 461825; II:
33305000
&kl=xx
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&
q=%2BGustav+%2B
%22II%22+%2BAdolf+%2BBilbao&kl=xx&stype=stext
6 pages found. // word count:
Bilbao: 270771;
Adolf: 387514; Gustav: 461825; II: 33305000
The phenomenon persists. We
are getting more
pages with &kl=en (13) than with &kl=es (2) or &kl=xx (6).
And the pages
are (trombone
fanfare!):
Not the same.
If you are drawing Venn diagrams, none of
the three
circles overlap. There are no intersections.
Whatever [any language]
(&kl=xx) means, it
doesn't mean all the languages in the list OR (see ORs)
-the tardy student
theory-
OR
there is a hidden time limit, and three different hash tables, and the &kl=xx is
the biggest hash
table, so the finder doesn't get through it in time to report all there is to
find...
-the
superfluity theory-
OR More is less.
+Gustav +II +Adolf (2385)
+Gustav
+II +Adolf
+Bilbao (13)
~~
OR
&kl=XX
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2BGusta
v+%2B%22II%22+%
2BAdolf+%2BBilbao&kl=XX
35 pages found //word count: Bilbao: 270771; Adolf:
387514; Gustav:
461825; II: 33305000
There's a difference between &kl=xx and
&kl=XX.
XX does have
intersection.
Delving deeper
(by ~S~ Humphrey P.) |
"I" quit quite too soon in our analysis of AltaVista's new search
interface.
For, by itself, choosing between using kl=XX or kl=xx doesn't
answer your original question about why kl=en [English] should find more entries
than kl=XX [any language].
For, that is what you had proven with your
example.
You had used kl=XX (big XX = inclusive OR?) and you got more
kl=en [English] than kl=XX [any language] with the +Gustav +"II" +Adolf
query.
You had not used kl=xx (little xx = exclusive OR?). I stumbled upon
that. I made that mistake. And I got fewer kl=xx than kl=en, but more kl=XX than
kl=en with the +Gustav +"II" +Adolf +Bilbao query.
Let's see. I have the
tabulations. I tried it again, the next day, when I was awake, and could see I
hadn't proven anything.
A few days later so statistics are a little bit
different, but:
+Gustav +"II" +Adolf
word count: Adolf: 374906; Gustav:
462989; II:
33338616
&kl=es
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&
q=%2BGustav+%2B%22II%22+%2BAdolf&kl=es&stype=stext
27 pages found.
&kl=en
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2
BGustav+%2B%22II%22+%2BAdolf&kl=es&stype=stext
2,384 pages
found
&kl=xx
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on
&q=%2BGustav+%2B%22II%22+%2BAdolf&kl=xx&stype=stext
836 pages found.
&kl=XX
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%2
BGustav+%2B%22II%22+%2BAdolf&kl=XX&stype=stext
1,702 pages found.
+Gustav +"II" +Adolf +Bilbao
word count: Bilbao: 251970; Adolf:
374906; Gustav: 462989; II:
33338616
&kl=es
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&
q=%2BGustav+%2B%22II%22+%2BAdolf+%2BBilbao&kl=es&stype=stext
2 pages
found
&kl=en
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%
2BGustav+%2B%22II%22+%2BAdolf+%2BBilbao&kl=en&stype=stext
13 pages
found.
&kl=xx
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=
%2BGustav+%2B%22II%22+%2BAdolf+%2BBilbao&kl=xx&stype=stext
6 pages
found
&kl=XX
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&q=%
2BGustav+%2B%22II%22+%2BAdolf+%2BBilbao&kl=XX&stype=stext
35 pages found.
~
Ok, now, our observations are these:
-1-
The old
www.altavista.com didn't allow several languages at a time, but took the last one
in the list.
-2-
with the +Gustav +"II" +Adolf query,
&kl=es 27,
[&kl=xx 836,] &kl=XX 1702, &kl=en 2384
It's odd that there are more English
(en) that AnyLanguage (XX).
-3-
+Gustav +"II" +Adolf +Bilbao
query.
&kl=es 2, &kl=xx 6, &kl=en 13, &kl=XX 35
There are less
&kl=xx than &kl=en, but more &kl=XX than &kl=en. Apparently XX means intersection
(inclusive vs exclusive OR)
-4- (Today)
However, in our first set of
facts, we don't see how the XX (1702) included all the EN
(2284)
~~~
Did I get the facts stated correctly, this
time?
~
All we've managed to do, so far, is to watch the shadow of
"altavista.com/cgi-bin/query?" from a distance on different terrain at two
different times of day. We're not close to making a sundial, yet.
What you
really want, is to have a look at the source.
Well, how big is it, and what
is it called?
And we are assuming there is a difference in terrain? That
"ragingsearch.altavista.com/cgi-bin/query?" is really different from
"www.altavista.com/cgi-bin/query?" -?- We just want to know how it's
different.
~
I should learn more about CGI, instead of just
guessing, don't you think? Although CGI is not a language, but rather a way of
doing things, there should be constraints.
Lets see, what do I want to
find.
-1-
There's a page of parameters somewhere for AltaVista...
kl
means language, ...
-2-
Some time ago, we were looking for original
development and doctorial thesis and white papers and user's manuals, but
"everybody" got bored with that... I'd bet the original database design hasn't
changed much from then. Might shutup and go looking again. Berkely, wasn't it.
Then Digital.com.
-3-
jeff had a section on .CGI didn't he? On his
links
page:
http://altern.org/wholelottarosie/sitelinks.htm#anchorcgi
http://w
ww.cgi-resources.com/Programs_and_Scripts/Perl/ The Cgi Resource Index
http://www.servers.nu/index.html Cgi scripts
http://www.cgi101.com/ Learn
CGI today
Two resource sites, and a tut.
-4-
Ok, I can add a
simpler intro tut to
that.
http://www.geocities.com/SiliconValley/Heights/7394/cgi.html Adam
Stanislav's CGI Programming Tutorial
"Each pair is separated by an ampersand
(&). Please note I said separated, not terminated. There is no ampersand after the
last pair."
but my find doesn't go very far. Just takes away a few of my
illusions.
"The trick is in realizing that, despite appearances, the URL
has nothing to do with the command line of the program.
How, then, do you get
to the data from within your program? You read it from the environment variable
QUERY_STRING."
-5-
Hmmm. Has Mammon mastered CGI and explained it
somewhere?
[+mammon +cgi]
(that was foolish! Cardinal Newman. what
did he know ...)
[host:eccentrica.org +mammon +cgi]
Hmmm. Java is
right down there with Voting for
Bush.
http://discussions.earthweb.com/cgi-bin/dnewsweb?cmd=xover&group=eart
hweb.cgi.general&utag= EarthWeb Discussions: earthweb.cgi.general
newsgroup
http://discussions.earthweb.com/cgi-bin/dnewsweb?cmd=article&grou
p=earthweb.cgi.general&item=541&utag= Subject: Want to learn CGI programming
http://hotwired.lycos.com/webmonkey/programming/perl_cgi/
Webmonkey
http://www.wdvl.com Web developer's Virtual Library: Encyclopedia of
Web Design Tutorials, Articles and
Discussions
http://www.wdvl.com/Authoring/CGI/ CGI: The Common Gateway
Interface for Server-side Processing by Alan Richmond
-6-
Yahoo! Home >
Computers and Internet > Internet > World Wide Web > CGI - Common Gateway
Interface
http://hoohoo.ncsa.uiuc.edu/cgi/ The Common Gateway
Interface
http://www.execpc.com/~keithp/bdlogcgi.htm How to use your
CGI-BIN
"With so many free CGI scripts available (including
http://www.execpc.com/~keithp/bdlogftr.htm Bestdam Logger Lite)..."
Hmmm.
Do you suppose, someone has liberated "query?" and posted it somewhere? Can you
use an altavista search engine on your own pages? Who has done that and then went
fishing and is just sitting there, like a wrecked ship?
~~
Lets see.
What do I want to know:
-a-
Is it always a script? Nahh. Could be
Perl. Could be C. Could be any language. Anything to take information from the
"QUERY_STRING" or from STDIN, and send it to STDOUT.
-b-
What am I going
to find in /cgi-bin/ -?- A program? Nothing? (Everybody will want their's to be
named "query.")
-c-
Even if I know it's name, will I be able to download
it from /cgi-bin/ -?- (Maybe the subdirectory is protected. Maybe the actual
program is named something different. Maybe sending the URL activates the
program, rather than selecting it for download. Is GET or POST always
implied?)
-d-
How do you put it there? Or is it even there? Maybe you
just tell the sysadmin about it, and he tells the server that when you see an URL
of www.altavista.com/ccgi-bin/query? you lookup "query" in a table and locate
whatever program is associated whereever you told him it was.
- That when
you see an URL of ragingsearch.altavista.com/ccgi-bin/query? you find a different
entry in a table, and associate a different program with it...
OR, ragingsearch
sends the URL to a separate server...
OR, ragingsearch is read out of the URL
and the first simple change at the top of the program "query?" branches to the
"ragingsearch" variations, whereever.
- The last one is most likely.
They wouldn't penalize you for trying the new thing by confining you to one
server. Besides, backup and mirroring is complicated enough, without inventing
exceptions.
~~
Hmmm. I don't know what it feels like. It will feel
differently, when I know something.
~
Or, can you give me a Zen dope
slap, and fix my attention upon the truth in a minute?
Do it!
From France
(by Rumsteack) |
Fravia+, Some contributions (may not be very useful, but, at least, I've tried)....
http://ra
gingsearch.altavista.com/cgi-bin/query?q=%2Bwarez+%2Bappz&search=Search
(2,127)
htt
p://ragingsearch.altavista.com/cgi-bin/query?q=%2Bwarez+%2Bappz&search=Search&KL=en
(31,548)
What a difference ! But do you see where I want to go ? Why have I choosen
such an uninterressant query ? The huge number of results !
Scooter's manual, page 32 :
"A new option on the avs_search (timeout) and a
new api call (avs_timer) allows multi-threaded
applications to enforce maximum query processing times."
A few lines down in the manual,
it's explained that the administrator can
define the applications which have a timer, and which ones do not have one.
I think you've
come to the same conclusion as I do (which may be wrong, of
course, but which should be logic) : a multi-language querry (xx), which
would take necessarely more time to be executed than a specific language
query, will have a timer setted. On the contrary, a specific one won't
..... (errr.. I'm not sure about this one nevertheless)
Ok, we now know that a
multi-language query has a timer.... BUT, if there
are few results, we can assume that the timer won't be reached... Error !
http://ragingsearch.altavista.com/cgi-bin/query?q=%2B%22The+history+of+Internet%22&search=
Search
(2)
http://ragingsearch.altavista.com/cgi-bin/query?q=%2B%22The+history+of+Internet%22&s
earch=Search&KL=en
(70)
Haha ! Do you imagine that when searching for special material you only see
2 results, althought there are at least 70 !!!!
So, provided that Altavista detects a multi-language querry, the timer is
setted off, and ALWAYS reached.. Too bad for us : we do not have access to
all the datases with a muti-language search.
Another example :
http://ragingsearch.altavista.com/cgi-bin/query?q=%2Bsitez+%2B%22Maple+v%22&search=Search
(0 !!!!!!!)
http://ragingsearch.altavista.com/cgi-bin/query?q=%2Bsitez+%2B%22Maple+v%22&search=Search&K
L=en
(2)
I know the results have nothing to do with the subject of the search, but I
don't care.... You see here that a (so-called) non results querry could
actually have somes. Terrifying !
As a conclusion : ALWAYS make single language queries
!
A thing is confuse in my mind, nevertheless : why should the timer be
reached, if there are few results ?????? Maybe people at Altavista don't
want their servers to be overloaded... But I doubt of this... I'm tired.
Rumsteack (from France, of course)
Some more data
(by Gregor Samsa) |
I checked the number of hits for each language separately:
search A was :
[+Gustav +"II" +Adolf]
(word count: Adolf: 365276; Gustav: 446836; II:
31069265)
search B was:
[+Gustav +"II" +Adolf +Bilbao]
(word count:
Bilbao: 251970; Adolf: 365276; Gustav: 446836; II: 31069265)
language A B
&KL=xx 836 4
&kl=xx 836 4
&KL=XX 0 0
&kl=XX 1702 27
Chinese (zh)
2 0
Czech (cs) 87 0
Danish (da) 34 0
Dutch
(nl) 41 1
English (en) 2384 13
Estonian (et) 43
0
Finnish (fi) 28 0
Frensh (fr) 54 1
German
(de) 2946 8
Greek (el) 1 0
Hebrew (he) 0 0
Hungarian
(hu) 19 0
Icelandic (is) 1 0
Italian (it) 38 0
Japanese
(ja) 11 0
Korean (ko) 4 0
Latvian (lv) 0 0
Lithuanian
(lt) 0 0
Norwegian (no) 30 0
Polish (pl) 13 0
Portugese
(pt) 12 0
Romanian (ro) 3 0
Russian (ru) 4 0
Spanish (es)
27 2
Swedish (sv) 1426 4
all checked 6559 27
(BTW,
I tried all those &kl=/&KL= combinations with yy and YY. No
result)
+++
I assumed there was a limit somewhere (timeout or
similar).
Did not find anything about that, but realized a pattern nevertheless:
Testing with [+Gustav +"II" +Adolf], comparing the expected number of hits with the
actual one:
lang. => hits (diff.)
el + hu => 20 (0)
pl + pt =>
23 (-2)
ro + ru => 7 (0)
ro + pt => 14 (-1)
ro + pl => 15 (-1)
da + fr => 76
(-12)
fi + fr => 70 (-12)
fi + da => 58 (-4)
de + lv => 2551 (-395)
pl + lv =>
12 (-1)
pt + lv => 11 (-1)
en + lv => 2099 (-285)
en + de => 4650 (-680)
There's a system. Do you see it ?
Every language seems to lose a certain number
of hits (compared to when it is used as only language parameter) when combined with another
language. I first suspected this, when there were two hits less for [pt + pl] than I had
expected, and in combination with [ro] each of these languages seemed to have one hit less.
I tried a language without any hits (lv) and combined it with both German and English.
[de + lv] showed 395 results less than I had expected, [en + lv] 285 less. If I was right, [en +
de] should show 285 + 395 = 680 results less than the sum of each language alone. Indeed
!
+++
I try not to make conclusions too early. I really do not know what
this CGI works like or what their database is like (THAT would be an interesting point !)
I
don't know if you can use something like Oracle with 350.000.000 records and still expect a
reasonable response time for your SQL selects. Well, in the end I'm back at the beginning: Ich
weiß, dass ich nichts weiß
I stop here and go to bed. It seems to be a good
idea not to work long nights on such things - one forgets about cookies too easily...
;-)
CU
Oh, I get it (finally)
(by ~S~ Humphrey P.) |
Oh, I get it. (finally.)
If you were to design a search engine, you would let
the advanced search go longer, or try harder, but cut the simple searcher off
sooner.
You are assuming that the simple searcher doesn't want to wait,
or doesn't care about the last few million 'Treffe' the 'pinball' search engine
makes?
The problem with stopping before searching the whole database, is
that the newest item might be the last one in the database, and you'd come away
thinking that AltaVista never updated their database...
or, the one which
best matches your string of keywords, might be in the part of the database the
search engine never got to.
It would be better to presort, to preindex, to
presearch, to run on the fastest available machinery, anything you could do to
optimize searching, in order to make the whole database available to both the simple
searcher and the one who customized his search.
But, the presentation is
very important, and very different... In the simple search, you are making
assumptions for the searcher. Call them defaults, call them design flaws,
call them public relations...
You've been to a wonderful site before,
haven't you? You know they have something, but the site has so many pages,
and no sitemap, so you decide to use their search engine. And you give it a
two or three word search, and the search engine says: 'nothing found.'
You know it is there. But you don't know the one word which would find
it.
I've written some of those bafflers myself - design flaws.
jeff's lookime for fravia was like that... I couldn't quite get
it to give me the results I wanted. For instance, trying to find 'java,'
but not 'java script.'
Lets don't pick on jeff. He grabbed
what he could. There are lots of other conundrum site search engines
at Big companies.
You get way too many hits with one word.
And with two words, you either get no hits, as though it had to be
a phrase (tight AND)
Or else the simple engine assumes the two
words don't need to have anything in common, nor refer to
the same topic, but just appear on the same page (lazy OR.)
Or
else the 'lucky' way of using the engine is there, and 'natural' to the
programmer, but not to anyone else. (That's the kind I write.)
AltaVista,
after scanning it's 'billions and billions,' will put the entries in the
order of 'most found first,' ... 'least found last.'
~
Google,
since you brought it up, is running a popularity contest, and putting up
front those sites which others most often refer to. And besides that,
they consult a WWW directory.
(By the way, you should be able to
improve your site's 'popularity' by writing your section of that web
directory.)
Here's a query on [search engine optimization] which
uses that
directory:
http://search.netscape.com/cgi-bin/search?search=search+engine+optimization
(yes, of course there's more here than I am letting on, but skip it for now. ;)
(Just notice that you can't tell where Netscape ends and Google begins... Nor
later where Netscape Open Directory ends and Open Directory Project
begins.)
-"-
Web Site Categories
1 - 3 of 3
Groups of
reviewed web sites related to your search term.
,,,
Reviewed
Web Sites
1 - 10 of 94
Web sites reviewed and categorized by a
team of editors.
,,,
Get Involved
Help build the largest
human-edited directory on the web.
Become an Editor (http://home.netscape.com/escapes/search/beditor.html?cp=srpstatic)
Suggest
a site (http://home.netscape.com/escapes/search/addsite.html?cp=srpstatic)
Give
Feedback (http://home.netscape.com/escapes/search/survey.html?cp=srpstatic)
-"-
The first on whisks you off to:
"If you would like to help build Netscape
Open Directory, it's easy to apply:"
and sends to back to the front
door:
http://search.netscape.com/ Netcape Search.
Where we see a
very nice presentation of the Netscape Directory.
Computers > Internet >
WWW (or Web) > Searching the
Web
http://search.netscape.com/Computers/Internet/WWW/Searching_the_Web
"Searching
the Web by the Open Directory Project."
(Amazing what Netscape has
started, isn't it?)
"Discussion, help and tutorials, comparisons,
integrated search pages, mailing lists and newsletters,
submitting and positioning."
Some where in the 'Submitting and
Positioning,' or in the 'Search Engine Optimization,' somebody has got
them all figured out already... every search engine's strengths and
weaknesses, each one's strategies, their peculiarities...
Think
of submitting and positioning as Reverse Engineering of Search
Engines.
http://search.netscape.com/Computers/Internet/WWW/Website_Promotion/Search_Engine
_Submitting_and_Positioning/Comparisons_and_Discussions
(And
there is Search Engine Watch, top of the list. Editors choice.)
~
Well,
I'm not saying that Open Directory Project has considered the whole web. Nor that ODP
is better than Yahoo, (it's not.) But just that Google uses this Open Directory to
form an 'opinion' about what is important, and what's popular. And it could be
your opinion which they are considering. After that, they have their own
database.
~
Now, where in this thoughtful process would you
interrupt Google, and say, "I want your 5 second opinion, not your 20 second
one" -?-
How could you fool MetaCrawler into thinking that's what you
were giving them?
How to preprocess a query you had never seen before?
(Oh, there's one. Keep a history of queries you HAD seen before. Select one
which looks similar.)
How to stop your ranking algorithm in the middle?
That's kind of like stopping in the middle of a sort, isn't it?
(Oh.
there's another one. If you thought you were going to be interrupted, you might
be doing a little presentation processing right along with the sorting. "Well, I
didn't get it all sorted, but here's what I've got sorted so far.")
~
Just
remember, the client is always right.
If he wants instant gratification, give
him something.
A rattle, a hug, the first card off the bottom of the deck...
something someone else has looked up before... what he asked for.
You might
keep processing along his frame of mind, in anticipation that he will ask a further
question. You might keep processing, but with diminishing expectations that he will
ask a further question. If you can, keep his answered question in short term memory,
so you can pick up your search from there.
~~
Gahhh! This is all
theoretical...
How do they do it? How do they do it?
The maggots
know.
(they put 'fravia' in every erotic keyword string... no. -
different topic ;-)
~
Tell me what you know about Google.
Still quite in fieri, I'm afraid... what about contributing with your own suggestions
and observations?
(c) 2000: [fravia+], all rights
reserved