~ Classrooms ~
|
|
|
|
Fourth Classroom |
Version June 2000
This is the fourth classroom, I have collated here a thread on my old messageboard
where one of the most serious
~S~ (Humphrey P.) I know of attempts, with the help of Gregor Samsa and Iefaf, some
very sound 'search engine reversing'. Just read the
text below, where he goes to great lengths, with the help of various friends,
in order to decipher the meaning of some Altavista's codes, and I'm sure you'll enjoy...
This thread, originally, on my old messageboard.
Fourth classroom
Spelunking altavista's acronyms
by
~SS~ Humphrey P., Gregor Samsa & Iefaf, June 2000
Thread slightly edited by
fravia+
It began with this question by 216.234.161.71:
Anybody know how to set search parameters to www.raging.com through CGI string instead of
cookies?:
Humphrey P., first attempt
How to run it without cookies, someone with our highest numbered URL would like to know.
(by the way, who owns the highest numbered URL? the lowest? the one in the middle?
the median? yours? should we stone them?)
www.raging.com is the same as
ragingsearch.altavista.com ?
ragingsearch.altavista.com is the same as www.altavista.com
?
ragingsearch.altavista.com/cgi-bin/query? the same as www.altavista.com/cgi-bin/query?
-?-
Have you been collecting CGI strings?
Here's a bunch for:
www.altavista.com/cgi-bin/query?
& : separator
act=2007 : I have an
account?
d0=1%2F1%2F99 : date from
d1=18%2F5%2F2000 : date to
hl=on : -?-
kl=en :
English
kl=cs : Czech
kl=XX : any language (Expect the rest of the languages to follow ISO
639, eg: http://babel.alis.com:8080/langues/iso639.en.htm ISO 639: 1988)
mmdo=16 : (on an
image search: stype=simage)
par=0 : parent equals zero? (I haven't proven this yet.)
pg=aq
: page is advanced query
pg=q : page is simple query (default main page at
www.altavista.com)
q=this+AND+that+AND+these+AND+those+AND+NOT+them : my query is
[...]
q=the : my query is [the]
r=is : raise or so(r)t to the top the keyword(s)
[is]
sc=on : show one result per Web site (see:
http://doc.altavista.com/adv_search/ast_as_compress.shtml site compression)
search=Search :
-?- (you'd think so...)
search.x=32 : (starting pixel of ad filled page?)
search.y=8 :
(starting pixel of ad filled page?)
stype=simage : searchtype is s-image
stype=stext :
searchtype is s-text
text=yes : don't send me so many ads
what=web : search what? The
web.
~
For: ragingsearch.altavista.com/cgi-bin/query?
pg=pref : customize
page
v=m :
Will raging search run with some of those strings? AltaVista was allowing
only the last language of a set: eg. if you asked for &kl=es&kl=en you got &kl=en. So, were the
cookies a kludge around the CGI parameter limitations?
Humphrey P
Gregor Samsa, first answer
I thought you'd take on here, Humphrey ;)
How can raging be run without cookies ? I
guess, the easiest way to do so is to disable "Cookies" in your browser or firewall or whatever.
What happens if you deny the cookie ? Raging doesn't remember your settings and you get the
defaults next time.
Now, Mr. Spaceproxy-Without-Name, although Humphrey said it was no
answer, in fact he gave one. Just the question has to change a bit. It looks as if raging behaves
like the altavista simple search interface, with some differences.
Look at this
URL:
http://ragingsearch.altavista.com/cgi-bin/query?q=test&FFF=off&wfmt=tau&nbq=10&KL=en&
KL=de&Translate=off&prf=Submit
"q=test" The thing you are looking for
"FFF=off"
Family Filter ? (Set it ON if you are under age ;)
"wfmt=tau" Dunno
"nbq=10"
Number of hits to show ?
"KL=en" English
"KL=de"
German
"Translate=off" You get no link to babelfish
"prf=Submit" ?
One of the
differences is, that you can use several language parameters at once with raging whereas
www.altavista.com/cgi-bin/query? only uses the last KL param given. HP showed it.
Humphrey P., second go
Yeah, gs. My little trip to the raging cookie pusher brought me back the same ones you
had:
FFF=on : family filter [on]
wfmt=tau : (i wanted compact instead of complete page
information?)
nbq=20 : number of results to show per page [20]
prf=Submit : (something
about my profile - it was submitted? I accepted a cookie? I came from the profile page?)
KL=zh : language to search in (in this case, Zhongwen. You'd recognize it as
Chinese.)
KL=en : English
KL=fr : French
KL=de : Deutsch
enc=big5 : language encoding
(in this case, for Zhongwen)
Translate=on : show the translate option
for a cookie
which looked like:
AV_RAGESETTINGS=v1:20:4:3:big5:zhenrde
~
And don't forget
these two
pg=pref : customize or preferences page
v=m : (on the way to the preferences
page; )
~
Some notes and questions.
Had you noticed the capital "KL=" ?
(It seems to me you were trying that out a few weeks ago?)
Perhaps KL=en is different from
kl=en -?- If you would use all caps KL=en&KL=fr&KL=de you would get all your languages even in
www.altavista.com
?
www.altavista.com/cgi-bin/query?pg=aq&text=yes&KL=en&KL=fr&KL=de&q=%2BGustav+%2B%22II%22
+%2BAdolf+%2BBilbao
"AltaVista found no document matching your query."
Oop. No
pages found at advanced. But notice what it does to your list of languages! [my
languages(*)]
So, what does simple
think?
www.altavista.com/cgi-bin/query?pg=q&text=yes&KL=en&KL=fr&KL=de&q=%2BGustav+%2B%22II%22
+%2BAdolf+%2BBilbao
About 23 pages found.
word count: Bilbao: 239105; Adolf: 342273;
Gustav: 447835; II: 28663115
Hmmm. grumble grumble... I've got a lapse of thinking
attack. 23 pages... where have I seen that before?
~
Another thing... text=yes is
ignored where? ah, when you are off looking for images? (where else? where there's money to be
made?)
~
This from a while ago. Does it still work?:
AltaVista
[host:altavista.com link:kl=en]
3 pages:
One was a conundrum: (guy got altavista to
index its own error page)
Two was someone's saved query:
[AltaVista]
[English]
http://www.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=en&sc=on&q=AltaVista&stq=
10
Web Pages 518,470 pages found.
Third, is new
territory:
http://world.altavista.com/
What a nifty portal to the world!
http://world.altavista.com/r/x13/http://www.h2g2.com/A172685 Babel fish:
Origin
http://world.altavista.com/r/x8/http://www.h2g2.com/A172685 Origin of the
BabelFish
http://world.altavista.com/r/x3/http://www.systransoft.com/personal.html Personal
Translation Tools (an ad for
Systran)
http://world.altavista.com/r/x3/http://doc.altavista.com/help/search/language.shtml
Displying World
Alphabets
http://world.altavista.com/r/x3/http://www.altavista.com/cgi-bin/query?pg=q&spage=se
arch%2Fresults.htm&user=avworld&q=special+characters+in+html&stype=stext&x=37&y=13 Using World
Alphabets on Web
sites
http://world.altavista.com/r/x3/http://www.theodora.com/country_digraphs.html World
Internet Domains (a pretty long list from ISO 3166 (does have TV=Tuvalu))
Now, where on
this page does kl=en?
The search in the left frame panel is set up to default to search in
English.
Hmmm. That's not it, though. On the backside, that looks like this:
OPTION
VALUE=en SELECTED>English
Insead, I
found:
http://www.altavista.com/cgi-bin/query?sc=on&user=avworld&q=Sending+Faxes+over+the+inte
rnet&kl=en&pg=q Sending faxes over the Internet
Finding 'query?' on the backside, I come
up with these AltaVista parameter
lines:
pg=q&spage=search%2Fresults.htm&user=avworld&q=special+characters+in+html&stype=ste
xt&x=37&y=13
pg=aa2&stype=stext&sc=on&q=How+do+I+convert+metric+to+U%2eS%2e%3f
sc=on&user=avworld&q=Sen
ding+Faxes+over+the+internet&kl=en&pg=q
kl=en : language is English
pg=q : page is
AltaVista Main search
pg=aa : page is
??
q=special+characters+in+html
q=How+do+I+convert+metric+to+U%2eS%2e%3f
q=Sending+Faxes
+over+the+internet
sc=on : site compression is on
spage=search%2Fresults.htm :
stype=stext : search type is s-text
user=avworld : (who else could be user? Are there
privileges?)
x=37 :
y=13 :
Does user: AltaVista World have special privileges?
What is it, that x and y are placing? Or are they trying to document the origin of a
gridmapping scheme to track your mouse position?
~
Can you think of other ways to
get AltaVista to index its own generated listings and error pages? Bite its own tail? (oh, gee,
somebody had the name of that hoop snake a while back... I forget what it's called.)
Gregor Samsa, second go
Hi, Humphrey !
Want some more stuff ? I think, most of it is new. the altavista main
engine has a lot more holes than I thought. I didn't even get to your kl/KL question and the
tailbiting stuff. But see for yourself.
+=============+
One more oddity at
raging:
http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&KL=enesfrde&sc=on
&q=%2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=10
I did not accept the cookie, but
entered the language parameters manually after the first results were shown:
&KL=en&KL=de&KL=fr&KL=es&
After going on to page 2 of the hits I happened to see all these
language identifiers with only _one_ "KL": KL=enesfrde&
No differences in what (or how much)
it found, as far as I can tell.
_1_
http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=ende&sc=on&q=%
2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=20
[word count: Pinkola: 4031; Clarissa:
121615; Estes: 304093]
"Raging Search found no document matching your
query."
_2_
http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=en&kl=d
e&sc=on&q=%2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=20
[word count: Pinkola:
4031; Clarissa: 121615; Estes: 304093]
"36 pages found."
The first one seems not to
work at all. The second one behaves like the old simple search in that it only looks up German
pages.
What the heck is going on here ? Has that changed since lately
?
+++
[raging]
sc=on
sc=off
This seems to toggle the
appearance of the "Results from this site only" - link
+++
I thougt I had
found some differences in numbers at raging a few weeks ago, but our dirty old friend told me he
couldn't reproduce it. I didn't do any work on that since then. But I scheduled it for the coming
weekend. (I first have to set up my machine completely new. Right now, it sucks - the usual win
problem after installing/deinstalling/playing around too much). Don't tell me to use Linux - I
use it anyway. But I like Opera far too much. And the linux version is not very decent yet.
+++
[altavista]
I didn't even know that "text=yes" could be used with
the advanced search...(Thanx for the tip ;)
Uups...surprise! surprise
!
_1_
http://www.altavista.com/cgi-bin/quer<?pg=aq&text=yes&q=test1&kl=XX&stype=
ntext
Watch that typo: "quer<" instead of "query". It is simply _ignored_. Do you see
any differences when using "query" compared to "quer<" ? Even
better:
_2_
http://www.altavista.com/cgi-bin/$$$$?pg=aq&text=yes&q=test1&kl=XX&stype=nt
ext
Obviously it does not matter. Every URL that aims for the cgi-bin gets processed ? Or
what
?
_3_
http://www.altavista.com/Top?pg=aq&text=yes&q=test1&kl=XX&stype=ntext
This
one I found by chance. I'm unsure what it might tell us.
Anyway, it tells you it
"found no document matching your query" AND gives you ten results, but with no links to more
results. It's somehow like the Hitchiker's Guide: Can u reverse this and tell me what the
question was ?
I suppose, "stype=ntext" is Usenet, whereas "stext" means the regular
textsearch. Let's switch back to
"stext":
_4_
http://www.altavista.com/Top?pg=aq&text=yes&q=test1&kl=XX&stype=stext
"About 75,816 pages found". And what is the first ten ? Aha !
It seems, if we tell
altavista that it should do an advanced usenet search, it gets confused. Fortunately I am
completely ignorant what the parameters look like in a usenet search. So I gave this one a
try:
_5_
http://www.altavista.com/Top?pg=q&text=yes&q=test1&kl=XX&stype=ntext
Again: "...found no documents..." and ten hits. But what ? See the titles that are displayed in
your hit list:
www.oregongrounds.com - nothing but an imagemap. No "test1" in the source
of this page. But this might be chance. It could have changed recently. While I look through the
others, I see that all they have "test1" as title.
I get the same results with
_6_
http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=XX&stype=ntext&pg=q&text=yes
&q=title%3Atest1&search=Search
I switch back to
_7_
http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=XX&stype=stext&pg=q&text=yes
&q=title%3Atest1&search=Search
=> 3954 hits.
OK, a "real" Usenet search looks
like
this:
_8_
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&act=2006&par=0&q=test
1&kl=XX&stype=ntext
act=2006
par=0
???
Btw, replacing
"/cgi-bin/query" with "Top" seems to be a general feature. It works here as
well:
_9_
http://www.altavista.com/Top?pg=q&sc=on&hl=on&act=2006&par=0&q=test1&kl=XX&st
ype=ntext
+=============+
I'll look into the differences between kl and KL
tomorrow, as well as to try this tailbiting stuff. I need to read my old notes again. For now it
is time to go to bed. If I remember right, the guy who found the structure of the benzole
molecule is said to have dreamt of a snake that bit its own tail. I'll try to dream of cgi
params, perhaps that helps.
cu
gregor samsa
Iefaf's findings
This is just a confirmation of what you've found.
Whatever you write before the question
mark is not taken by raging or
altavista.
http://ragingsearch.altavista.com/did-gregor-samsa-wipes-the-cgi-bin?q=test1&se
arch=Search
http://www.altavista.com/did-gregor-samsa-wipes-the-cgi-bin?pg=q&text=yes&q=te
st1&kl=XX&stype=ntext
Out of curiosity I scanned from 204.152.190.1 to 204.152.190.255
and checked the reverse DNS.
Next step is to view the raw websites and sniff around the
chimney.
<pre>
204.152.190.1 (ns3.alta-vista.net)
204.152.190.2
(svc3.cns.alta-vista.net)
204.152.190.3 (svc4.cns.alta-vista.net)
204.152.190.4
(doc.altavista.com)
204.152.190.5 (doc.altavista.com)
204.152.190.6
(ns2.alta-vista.net)
204.152.190.7 (jump.altavista.com)
204.152.190.8
(jump.altavista.com)
204.152.190.9 (jump.altavista.com)
204.152.190.10
(jump.altavista.com)
204.152.190.11 (www.altavista.com)
204.152.190.12
(www.altavista.com)
204.152.190.13 (www.altavista.com)
204.152.190.14
(www.altavista.com)
204.152.190.15 (www.altavista.com)
204.152.190.16
(www.altavista.com)
204.152.190.17 (www.altavista.com)
204.152.190.18
(www.altavista.com)
204.152.190.19 (www.altavista.com)
204.152.190.20
(www.altavista.com)
204.152.190.21 (www.altavista.com)
204.152.190.22
(www.altavista.com)
204.152.190.23 (image.altavista.com)
204.152.190.24
(image.altavista.com)
204.152.190.25 (www.altavista.com)
204.152.190.26
(www.altavista.com)
204.152.190.27 (babelfish.altavista.com)
204.152.190.28
(babelfish.altavista.com)
204.152.190.29 (babelfish.altavista.com)
204.152.190.30
(careers.altavista.com)
204.152.190.31 (dir.altavista.com)
204.152.190.32
(dir.altavista.com)
204.152.190.33 (discovery.altavista.com)
204.152.190.34
(discovery.altavista.com)
204.152.190.35 (jump.altavista.com)
204.152.190.36
(jump.altavista.com)
204.152.190.37 (babelfish.altavista.com)
204.152.190.38
(test-gotcha5.altavista.com)
204.152.190.39 (svc1.marimba.alta-vista.net)
204.152.190.40
(svc2.marimba.alta-vista.net)
204.152.190.41 (lesite.altavista.com)
204.152.190.42
(test-gotcha6.altavista.com)
204.152.190.43 (test-gotcha10.altavista.com)
204.152.190.54
(ads.altavista.com)
204.152.190.55 (ads.altavista.com)
204.152.190.56
(ads.altavista.com)
204.152.190.57 (add-url.altavista.com)
204.152.190.58
(av-pvt.altavista.com)
204.152.190.59 (av-pvt.altavista.com)
204.152.190.60
(www.altavista.com)
204.152.190.61 (www.altavista.com)
204.152.190.62
(www.altavista.com)
204.152.190.63 (www.altavista.com)
204.152.190.64
(www.altavista.com)
204.152.190.65 (www.altavista.com)
204.152.190.66
(image.altavista.com)
204.152.190.67 (av-pvt.altavista.com)
204.152.190.68
(av-pvt.altavista.com)
204.152.190.69 (www.altavista.com)
204.152.190.70
(www.altavista.com)
204.152.190.71 (www.altavista.com)
204.152.190.72
(www.altavista.com)
204.152.190.73 (www.altavista.com)
204.152.190.74
(image.altavista.com)
204.152.190.75 (image.altavista.com)
204.152.190.76
(image.altavista.com)
204.152.190.77 (image.altavista.com)
204.152.190.78
(av-pvt.altavista.com)
204.152.190.79 (ns1.alta-vista.net)
204.152.190.80
(svc1.trip.alta-vista.net)
204.152.190.81 (svc2.trip.alta-vista.net)
204.152.190.82
(jump.altavista.com)
204.152.190.83 (image.altavista.com)
204.152.190.84
(image.altavista.com)
204.152.190.85 (image.altavista.com)
204.152.190.86
(apache.altavista.com)
204.152.190.87 (apache.altavista.com)
204.152.190.88
(iatlas.altavista.com)
204.152.190.89 (iatlas.altavista.com)
204.152.190.90
(iatlas.altavista.com)
204.152.190.91 (ragingsearch.altavista.com)
204.152.190.92
(ragingsearch.altavista.com)
204.152.190.93 (ragingsearch.altavista.com)
204.152.190.94
(ragingsearch.altavista.com)
204.152.190.95 (apache.altavista.com)
204.152.190.96
(apache.altavista.com)
204.152.190.97 (test-gotcha2.test.altavista.com)
204.152.190.126
(svc1.cns.alta-vista.net)
204.152.190.127 (svc2.cns.alta-vista.net)
204.152.190.128
(net2.pa.alta-vista.net)
204.152.190.129 (av-dev6.pa.alta-vista.net)
204.152.190.130
(redirect6.pa.alta-vista.net)
204.152.190.131 (redirect7.pa.alta-vista.net)
204.152.190.132 (mail3.pa.alta-vista.net)
204.152.190.133 (gotcha13.pa.alta-vista.net)
204.152.190.134 (gotcha14.pa.alta-vista.net)
204.152.190.135 (gotcha15.pa.alta-vista.net)
204.152.190.136 (gotcha16.pa.alta-vista.net)
204.152.190.137 (aj-dev.pa.alta-vista.net)
204.152.190.138 (gotcha8.pa.alta-vista.net)
204.152.190.139 (seeya3.pa.alta-vista.net)
204.152.190.140 (seeya4.pa.alta-vista.net)
204.152.190.141
(altavision3.pa.alta-vista.net)
204.152.190.142 (bono.pa.alta-vista.net)
204.152.190.143
(scope.pa.alta-vista.net)
204.152.190.144 (survey1.pa.alta-vista.net)
204.152.190.145
(ns1.pa.alta-vista.net)
204.152.190.146 (swamp.pa.alta-vista.net)
204.152.190.147
(surfwatch1.pa.alta-vista.net)
204.152.190.148 (test-scooter.pa.alta-vista.net)
204.152.190.149 (av-ops4-2.pa.alta-vista.net)
204.152.190.150 (ww2.altavista.com)
204.152.190.151 (avie1.pa.alta-vista.net)
204.152.190.152 (avie2.pa.alta-vista.net)
204.152.190.153 (surfwatch2.pa.alta-vista.net)
204.152.190.154 (babel4.pa.alta-vista.net)
204.152.190.155 (test-scooter2.pa.alta-vista.net)
204.152.190.156
(bee1.pa.alta-vista.net)
204.152.190.157 (bee2.pa.alta-vista.net)
204.152.190.158
(bee3.pa.alta-vista.net)
204.152.190.159 (bee4.pa.alta-vista.net)
204.152.190.160
(bee5.pa.alta-vista.net)
204.152.190.161 (bee6.pa.alta-vista.net)
204.152.190.162
(bee7.pa.alta-vista.net)
204.152.190.163 (bee8.pa.alta-vista.net)
204.152.190.164
(iatlas1.pa.alta-vista.net)
204.152.190.165 (iatlas2.pa.alta-vista.net)
204.152.190.166
(iatlas3.pa.alta-vista.net)
204.152.190.167 (firewall167.pa.alta-vista.net)
204.152.190.168 (firewall168.pa.alta-vista.net)
204.152.190.169
(noc-ext.pa.alta-vista.net)
204.152.190.170 (bee9.pa.alta-vista.net)
204.152.190.171
(scope2.pa.alta-vista.net)
204.152.190.172 (scope3.pa.alta-vista.net)
204.152.190.173
(engage1.pa.alta-vista.net)
204.152.190.174 (engage2.pa.alta-vista.net)
204.152.190.175
(engage3.pa.alta-vista.net)
204.152.190.176 (engage4.pa.alta-vista.net)
204.152.190.177
(qa-babel.pa.alta-vista.net)
204.152.190.178 (pa-install.pa.alta-vista.net)
204.152.190.248 (packet1.pa.alta-vista.net)
204.152.190.249 (packet2.pa.alta-vista.net)
204.152.190.250 (c-ns2.pa.alta-vista.net)
204.152.190.251 (baystack4u.pla.mibh.net)
204.152.190.252 (baystack5u.pla.mibh.net)
204.152.190.253 (av2.feed2.pla.mibh.net)
204.152.190.254 (av2.feed1.pla.mibh.net)</pre>
of course this is still in fieri...
(c) 2000: [fravia+], all rights
reserved