~ Classrooms ~
         Petit image    Fourth
Classroom
Version June 2000
This is the fourth classroom, I have collated here a thread on my old messageboard where one of the most serious ~S~ (Humphrey P.) I know of attempts, with the help of Gregor Samsa and Iefaf, some very sound 'search engine reversing'. Just read the text below, where he goes to great lengths, with the help of various friends, in order to decipher the meaning of some Altavista's codes, and I'm sure you'll enjoy...
This thread, originally, on my old messageboard.
Fourth classroom
Spelunking altavista's acronyms
by ~SS~ Humphrey P., Gregor Samsa & Iefaf, June 2000

Thread slightly edited by fravia+


It began with this question by 216.234.161.71:
Anybody know how to set search parameters to www.raging.com through CGI string instead of cookies?:
Humphrey P., first attempt


How to run it without cookies, someone with our highest numbered URL would like to know. (by the way, who owns the highest numbered URL? the lowest? the one in the middle? the median? yours? should we stone them?)

www.raging.com is the same as ragingsearch.altavista.com ?

ragingsearch.altavista.com is the same as www.altavista.com ?

ragingsearch.altavista.com/cgi-bin/query? the same as www.altavista.com/cgi-bin/query? -?-

Have you been collecting CGI strings?

Here's a bunch for: www.altavista.com/cgi-bin/query?

& : separator
act=2007 : I have an account?
d0=1%2F1%2F99 : date from
d1=18%2F5%2F2000 : date to
hl=on : -?-
kl=en : English
kl=cs : Czech
kl=XX : any language (Expect the rest of the languages to follow ISO 639, eg: http://babel.alis.com:8080/langues/iso639.en.htm ISO 639: 1988)
mmdo=16 : (on an image search: stype=simage)
par=0 : parent equals zero? (I haven't proven this yet.)
pg=aq : page is advanced query
pg=q : page is simple query (default main page at www.altavista.com)
q=this+AND+that+AND+these+AND+those+AND+NOT+them : my query is [...]
q=the : my query is [the]
r=is : raise or so(r)t to the top the keyword(s) [is]
sc=on : show one result per Web site (see: http://doc.altavista.com/adv_search/ast_as_compress.shtml site compression)
search=Search : -?- (you'd think so...)
search.x=32 : (starting pixel of ad filled page?)
search.y=8 : (starting pixel of ad filled page?)
stype=simage : searchtype is s-image
stype=stext : searchtype is s-text
text=yes : don't send me so many ads
what=web : search what? The web.

~

For: ragingsearch.altavista.com/cgi-bin/query?

pg=pref : customize page
v=m :

Will raging search run with some of those strings? AltaVista was allowing only the last language of a set: eg. if you asked for &kl=es&kl=en you got &kl=en. So, were the cookies a kludge around the CGI parameter limitations?

Humphrey P
Gregor Samsa, first answer

I thought you'd take on here, Humphrey ;)

How can raging be run without cookies ? I guess, the easiest way to do so is to disable "Cookies" in your browser or firewall or whatever. What happens if you deny the cookie ? Raging doesn't remember your settings and you get the defaults next time.

Now, Mr. Spaceproxy-Without-Name, although Humphrey said it was no answer, in fact he gave one. Just the question has to change a bit. It looks as if raging behaves like the altavista simple search interface, with some differences.
Look at this URL:

http://ragingsearch.altavista.com/cgi-bin/query?q=test&FFF=off&wfmt=tau&nbq=10&KL=en& KL=de&Translate=off&prf=Submit

"q=test" The thing you are looking for
"FFF=off" Family Filter ? (Set it ON if you are under age ;)
"wfmt=tau" Dunno
"nbq=10" Number of hits to show ?
"KL=en" English
"KL=de" German
"Translate=off" You get no link to babelfish
"prf=Submit" ?

One of the differences is, that you can use several language parameters at once with raging whereas www.altavista.com/cgi-bin/query? only uses the last KL param given. HP showed it.

Humphrey P., second go

Yeah, gs. My little trip to the raging cookie pusher brought me back the same ones you had:

FFF=on : family filter [on]
wfmt=tau : (i wanted compact instead of complete page information?)
nbq=20 : number of results to show per page [20]
prf=Submit : (something about my profile - it was submitted? I accepted a cookie? I came from the profile page?)
KL=zh : language to search in (in this case, Zhongwen. You'd recognize it as Chinese.)
KL=en : English
KL=fr : French
KL=de : Deutsch
enc=big5 : language encoding (in this case, for Zhongwen)
Translate=on : show the translate option

for a cookie which looked like:

AV_RAGESETTINGS=v1:20:4:3:big5:zhenrde

~

And don't forget these two

pg=pref : customize or preferences page
v=m : (on the way to the preferences page; )

~

Some notes and questions.

Had you noticed the capital "KL=" ? (It seems to me you were trying that out a few weeks ago?)

Perhaps KL=en is different from kl=en -?- If you would use all caps KL=en&KL=fr&KL=de you would get all your languages even in www.altavista.com ?

www.altavista.com/cgi-bin/query?pg=aq&text=yes&KL=en&KL=fr&KL=de&q=%2BGustav+%2B%22II%22 +%2BAdolf+%2BBilbao

"AltaVista found no document matching your query."

Oop. No pages found at advanced. But notice what it does to your list of languages! [my languages(*)]

So, what does simple think?
www.altavista.com/cgi-bin/query?pg=q&text=yes&KL=en&KL=fr&KL=de&q=%2BGustav+%2B%22II%22 +%2BAdolf+%2BBilbao

About 23 pages found.
word count: Bilbao: 239105; Adolf: 342273; Gustav: 447835; II: 28663115

Hmmm. grumble grumble... I've got a lapse of thinking attack. 23 pages... where have I seen that before?

~

Another thing... text=yes is ignored where? ah, when you are off looking for images? (where else? where there's money to be made?)

~

This from a while ago. Does it still work?:

AltaVista [host:altavista.com link:kl=en]

3 pages:

One was a conundrum: (guy got altavista to index its own error page)

Two was someone's saved query:
[AltaVista] [English]
http://www.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=en&sc=on&q=AltaVista&stq= 10
Web Pages 518,470 pages found.

Third, is new territory:
http://world.altavista.com/

What a nifty portal to the world!
http://world.altavista.com/r/x13/http://www.h2g2.com/A172685 Babel fish: Origin
http://world.altavista.com/r/x8/http://www.h2g2.com/A172685 Origin of the BabelFish
http://world.altavista.com/r/x3/http://www.systransoft.com/personal.html Personal Translation Tools (an ad for Systran)
http://world.altavista.com/r/x3/http://doc.altavista.com/help/search/language.shtml Displying World Alphabets
http://world.altavista.com/r/x3/http://www.altavista.com/cgi-bin/query?pg=q&spage=se arch%2Fresults.htm&user=avworld&q=special+characters+in+html&stype=stext&x=37&y=13 Using World Alphabets on Web sites
http://world.altavista.com/r/x3/http://www.theodora.com/country_digraphs.html World Internet Domains (a pretty long list from ISO 3166 (does have TV=Tuvalu))

Now, where on this page does kl=en?
The search in the left frame panel is set up to default to search in English.

Hmmm. That's not it, though. On the backside, that looks like this:
OPTION VALUE=en SELECTED>English
Insead, I found:
http://www.altavista.com/cgi-bin/query?sc=on&user=avworld&q=Sending+Faxes+over+the+inte rnet&kl=en&pg=q Sending faxes over the Internet

Finding 'query?' on the backside, I come up with these AltaVista parameter lines:

pg=q&spage=search%2Fresults.htm&user=avworld&q=special+characters+in+html&stype=ste xt&x=37&y=13
pg=aa2&stype=stext&sc=on&q=How+do+I+convert+metric+to+U%2eS%2e%3f
sc=on&user=avworld&q=Sen ding+Faxes+over+the+internet&kl=en&pg=q

kl=en : language is English
pg=q : page is AltaVista Main search
pg=aa : page is ??
q=special+characters+in+html
q=How+do+I+convert+metric+to+U%2eS%2e%3f
q=Sending+Faxes +over+the+internet
sc=on : site compression is on
spage=search%2Fresults.htm :
stype=stext : search type is s-text
user=avworld : (who else could be user? Are there privileges?)
x=37 :
y=13 :

Does user: AltaVista World have special privileges?
What is it, that x and y are placing? Or are they trying to document the origin of a gridmapping scheme to track your mouse position?

~

Can you think of other ways to get AltaVista to index its own generated listings and error pages? Bite its own tail? (oh, gee, somebody had the name of that hoop snake a while back... I forget what it's called.)

Gregor Samsa, second go

Hi, Humphrey !

Want some more stuff ? I think, most of it is new. the altavista main engine has a lot more holes than I thought. I didn't even get to your kl/KL question and the tailbiting stuff. But see for yourself.
+=============+


One more oddity at raging:

http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&KL=enesfrde&sc=on &q=%2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=10

I did not accept the cookie, but entered the language parameters manually after the first results were shown: &KL=en&KL=de&KL=fr&KL=es&

After going on to page 2 of the hits I happened to see all these language identifiers with only _one_ "KL": KL=enesfrde&
No differences in what (or how much) it found, as far as I can tell.

_1_
http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=ende&sc=on&q=% 2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=20

[word count: Pinkola: 4031; Clarissa: 121615; Estes: 304093]
"Raging Search found no document matching your query."

_2_
http://ragingsearch.altavista.com/cgi-bin/query?pg=q&stype=stext&kl=en&kl=d e&sc=on&q=%2bClarissa+%2bPinkola+%2bEstes+%2dbuy+%2dbook&stq=20

[word count: Pinkola: 4031; Clarissa: 121615; Estes: 304093]
"36 pages found."

The first one seems not to work at all. The second one behaves like the old simple search in that it only looks up German pages.
What the heck is going on here ? Has that changed since lately ?


+++


[raging]

sc=on
sc=off

This seems to toggle the appearance of the "Results from this site only" - link


+++


I thougt I had found some differences in numbers at raging a few weeks ago, but our dirty old friend told me he couldn't reproduce it. I didn't do any work on that since then. But I scheduled it for the coming weekend. (I first have to set up my machine completely new. Right now, it sucks - the usual win problem after installing/deinstalling/playing around too much). Don't tell me to use Linux - I use it anyway. But I like Opera far too much. And the linux version is not very decent yet.


+++

[altavista]

I didn't even know that "text=yes" could be used with the advanced search...(Thanx for the tip ;)

Uups...surprise! surprise !


_1_
http://www.altavista.com/cgi-bin/quer<?pg=aq&text=yes&q=test1&kl=XX&stype= ntext

Watch that typo: "quer<" instead of "query". It is simply _ignored_. Do you see any differences when using "query" compared to "quer<" ? Even better:

_2_
http://www.altavista.com/cgi-bin/$$$$?pg=aq&text=yes&q=test1&kl=XX&stype=nt ext

Obviously it does not matter. Every URL that aims for the cgi-bin gets processed ? Or what ?

_3_
http://www.altavista.com/Top?pg=aq&text=yes&q=test1&kl=XX&stype=ntext

This one I found by chance. I'm unsure what it might tell us.


Anyway, it tells you it "found no document matching your query" AND gives you ten results, but with no links to more results. It's somehow like the Hitchiker's Guide: Can u reverse this and tell me what the question was ?

I suppose, "stype=ntext" is Usenet, whereas "stext" means the regular textsearch. Let's switch back to "stext":

_4_
http://www.altavista.com/Top?pg=aq&text=yes&q=test1&kl=XX&stype=stext

"About 75,816 pages found". And what is the first ten ? Aha !

It seems, if we tell altavista that it should do an advanced usenet search, it gets confused. Fortunately I am completely ignorant what the parameters look like in a usenet search. So I gave this one a try:


_5_
http://www.altavista.com/Top?pg=q&text=yes&q=test1&kl=XX&stype=ntext

Again: "...found no documents..." and ten hits. But what ? See the titles that are displayed in your hit list:

www.oregongrounds.com - nothing but an imagemap. No "test1" in the source of this page. But this might be chance. It could have changed recently. While I look through the others, I see that all they have "test1" as title.

I get the same results with

_6_
http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=XX&stype=ntext&pg=q&text=yes &q=title%3Atest1&search=Search


I switch back to

_7_
http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=XX&stype=stext&pg=q&text=yes &q=title%3Atest1&search=Search

=> 3954 hits.


OK, a "real" Usenet search looks like this:

_8_
http://www.altavista.com/cgi-bin/query?pg=q&sc=on&hl=on&act=2006&par=0&q=test 1&kl=XX&stype=ntext

act=2006
par=0

???


Btw, replacing "/cgi-bin/query" with "Top" seems to be a general feature. It works here as well:

_9_
http://www.altavista.com/Top?pg=q&sc=on&hl=on&act=2006&par=0&q=test1&kl=XX&st ype=ntext


+=============+
I'll look into the differences between kl and KL tomorrow, as well as to try this tailbiting stuff. I need to read my old notes again. For now it is time to go to bed. If I remember right, the guy who found the structure of the benzole molecule is said to have dreamt of a snake that bit its own tail. I'll try to dream of cgi params, perhaps that helps.

cu

gregor samsa

Iefaf's findings

This is just a confirmation of what you've found.
Whatever you write before the question mark is not taken by raging or altavista.

http://ragingsearch.altavista.com/did-gregor-samsa-wipes-the-cgi-bin?q=test1&se arch=Search

http://www.altavista.com/did-gregor-samsa-wipes-the-cgi-bin?pg=q&text=yes&q=te st1&kl=XX&stype=ntext


Out of curiosity I scanned from 204.152.190.1 to 204.152.190.255 and checked the reverse DNS.
Next step is to view the raw websites and sniff around the chimney.
<pre>
204.152.190.1 (ns3.alta-vista.net)
204.152.190.2 (svc3.cns.alta-vista.net)
204.152.190.3 (svc4.cns.alta-vista.net)
204.152.190.4 (doc.altavista.com)
204.152.190.5 (doc.altavista.com)
204.152.190.6 (ns2.alta-vista.net)
204.152.190.7 (jump.altavista.com)
204.152.190.8 (jump.altavista.com)
204.152.190.9 (jump.altavista.com)
204.152.190.10 (jump.altavista.com)
204.152.190.11 (www.altavista.com)
204.152.190.12 (www.altavista.com)
204.152.190.13 (www.altavista.com)
204.152.190.14 (www.altavista.com)
204.152.190.15 (www.altavista.com)
204.152.190.16 (www.altavista.com)
204.152.190.17 (www.altavista.com)
204.152.190.18 (www.altavista.com)
204.152.190.19 (www.altavista.com)
204.152.190.20 (www.altavista.com)
204.152.190.21 (www.altavista.com)
204.152.190.22 (www.altavista.com)
204.152.190.23 (image.altavista.com)
204.152.190.24 (image.altavista.com)
204.152.190.25 (www.altavista.com)
204.152.190.26 (www.altavista.com)
204.152.190.27 (babelfish.altavista.com)
204.152.190.28 (babelfish.altavista.com)
204.152.190.29 (babelfish.altavista.com)
204.152.190.30 (careers.altavista.com)
204.152.190.31 (dir.altavista.com)
204.152.190.32 (dir.altavista.com)
204.152.190.33 (discovery.altavista.com)
204.152.190.34 (discovery.altavista.com)
204.152.190.35 (jump.altavista.com)
204.152.190.36 (jump.altavista.com)
204.152.190.37 (babelfish.altavista.com)
204.152.190.38 (test-gotcha5.altavista.com)
204.152.190.39 (svc1.marimba.alta-vista.net)
204.152.190.40 (svc2.marimba.alta-vista.net)
204.152.190.41 (lesite.altavista.com)
204.152.190.42 (test-gotcha6.altavista.com)
204.152.190.43 (test-gotcha10.altavista.com)
204.152.190.54 (ads.altavista.com)
204.152.190.55 (ads.altavista.com)
204.152.190.56 (ads.altavista.com)
204.152.190.57 (add-url.altavista.com)
204.152.190.58 (av-pvt.altavista.com)
204.152.190.59 (av-pvt.altavista.com)
204.152.190.60 (www.altavista.com)
204.152.190.61 (www.altavista.com)
204.152.190.62 (www.altavista.com)
204.152.190.63 (www.altavista.com)
204.152.190.64 (www.altavista.com)
204.152.190.65 (www.altavista.com)
204.152.190.66 (image.altavista.com)
204.152.190.67 (av-pvt.altavista.com)
204.152.190.68 (av-pvt.altavista.com)
204.152.190.69 (www.altavista.com)
204.152.190.70 (www.altavista.com)
204.152.190.71 (www.altavista.com)
204.152.190.72 (www.altavista.com)
204.152.190.73 (www.altavista.com)
204.152.190.74 (image.altavista.com)
204.152.190.75 (image.altavista.com)
204.152.190.76 (image.altavista.com)
204.152.190.77 (image.altavista.com)
204.152.190.78 (av-pvt.altavista.com)
204.152.190.79 (ns1.alta-vista.net)
204.152.190.80 (svc1.trip.alta-vista.net)
204.152.190.81 (svc2.trip.alta-vista.net)
204.152.190.82 (jump.altavista.com)
204.152.190.83 (image.altavista.com)
204.152.190.84 (image.altavista.com)
204.152.190.85 (image.altavista.com)
204.152.190.86 (apache.altavista.com)
204.152.190.87 (apache.altavista.com)
204.152.190.88 (iatlas.altavista.com)
204.152.190.89 (iatlas.altavista.com)
204.152.190.90 (iatlas.altavista.com)
204.152.190.91 (ragingsearch.altavista.com)
204.152.190.92 (ragingsearch.altavista.com)
204.152.190.93 (ragingsearch.altavista.com)
204.152.190.94 (ragingsearch.altavista.com)
204.152.190.95 (apache.altavista.com)
204.152.190.96 (apache.altavista.com)
204.152.190.97 (test-gotcha2.test.altavista.com)
204.152.190.126 (svc1.cns.alta-vista.net)
204.152.190.127 (svc2.cns.alta-vista.net)
204.152.190.128 (net2.pa.alta-vista.net)
204.152.190.129 (av-dev6.pa.alta-vista.net)
204.152.190.130 (redirect6.pa.alta-vista.net)
204.152.190.131 (redirect7.pa.alta-vista.net)
204.152.190.132 (mail3.pa.alta-vista.net)
204.152.190.133 (gotcha13.pa.alta-vista.net)
204.152.190.134 (gotcha14.pa.alta-vista.net)
204.152.190.135 (gotcha15.pa.alta-vista.net)
204.152.190.136 (gotcha16.pa.alta-vista.net)
204.152.190.137 (aj-dev.pa.alta-vista.net)
204.152.190.138 (gotcha8.pa.alta-vista.net)
204.152.190.139 (seeya3.pa.alta-vista.net)
204.152.190.140 (seeya4.pa.alta-vista.net)
204.152.190.141 (altavision3.pa.alta-vista.net)
204.152.190.142 (bono.pa.alta-vista.net)
204.152.190.143 (scope.pa.alta-vista.net)
204.152.190.144 (survey1.pa.alta-vista.net)
204.152.190.145 (ns1.pa.alta-vista.net)
204.152.190.146 (swamp.pa.alta-vista.net)
204.152.190.147 (surfwatch1.pa.alta-vista.net)
204.152.190.148 (test-scooter.pa.alta-vista.net)
204.152.190.149 (av-ops4-2.pa.alta-vista.net)
204.152.190.150 (ww2.altavista.com)
204.152.190.151 (avie1.pa.alta-vista.net)
204.152.190.152 (avie2.pa.alta-vista.net)
204.152.190.153 (surfwatch2.pa.alta-vista.net)
204.152.190.154 (babel4.pa.alta-vista.net)
204.152.190.155 (test-scooter2.pa.alta-vista.net)
204.152.190.156 (bee1.pa.alta-vista.net)
204.152.190.157 (bee2.pa.alta-vista.net)
204.152.190.158 (bee3.pa.alta-vista.net)
204.152.190.159 (bee4.pa.alta-vista.net)
204.152.190.160 (bee5.pa.alta-vista.net)
204.152.190.161 (bee6.pa.alta-vista.net)
204.152.190.162 (bee7.pa.alta-vista.net)
204.152.190.163 (bee8.pa.alta-vista.net)
204.152.190.164 (iatlas1.pa.alta-vista.net)
204.152.190.165 (iatlas2.pa.alta-vista.net)
204.152.190.166 (iatlas3.pa.alta-vista.net)
204.152.190.167 (firewall167.pa.alta-vista.net)
204.152.190.168 (firewall168.pa.alta-vista.net)
204.152.190.169 (noc-ext.pa.alta-vista.net)
204.152.190.170 (bee9.pa.alta-vista.net)
204.152.190.171 (scope2.pa.alta-vista.net)
204.152.190.172 (scope3.pa.alta-vista.net)
204.152.190.173 (engage1.pa.alta-vista.net)
204.152.190.174 (engage2.pa.alta-vista.net)
204.152.190.175 (engage3.pa.alta-vista.net)
204.152.190.176 (engage4.pa.alta-vista.net)
204.152.190.177 (qa-babel.pa.alta-vista.net)
204.152.190.178 (pa-install.pa.alta-vista.net)
204.152.190.248 (packet1.pa.alta-vista.net)
204.152.190.249 (packet2.pa.alta-vista.net)
204.152.190.250 (c-ns2.pa.alta-vista.net)
204.152.190.251 (baystack4u.pla.mibh.net)
204.152.190.252 (baystack5u.pla.mibh.net)
204.152.190.253 (av2.feed2.pla.mibh.net)
204.152.190.254 (av2.feed1.pla.mibh.net)</pre>


of course this is still in fieri...

Petit image

(c) 2000: [fravia+], all rights reserved