www.searchlores.org main search engines: Google ~
Version 04,12 ~ March 2009

VERY useful findings
Google Web APIs    Google in depth
An easy way to search Google's cache
How big is google's index

New introduction  |  Before even beginning  |  Old Introduction
Some useful & simple googolian know hows  |  Spamming  |  Query parameters
Advanced operators  |  Various oldies about google  |  Useful findings  |  Googlette
Daterange:Shally Steckerl explains as_qdr  |  Google's daterange, by Ritz  |  How does google work?

New introduction 

The aim of this section, simply stated, is to offer for free, to anyone, some sound and ready to be used knowledge about google.
This should make it completely useless -for anyone- to search and dsownload from the many repositories (or -Godzilla forbid- even "buy"!) books called "google hacks", "google hacking" and similar crap.
My hope, and joy, is to put out of business (and hopefully gently push towards suicide) some of the wankers that stole the free content of free sites like mine in order to scrap some money out of the spitpits (Ok, now that google is getting spammed to death this is getting less important, but still).
And if you wonder if they deserve to get spammed... yes they do

While this specific page of searchlores is already quite a powerful tool per se, I still hope, with the help of my readers, to update, enlarge and ameliorate it further...
it is, like most things in our lives, a work "in fieri".

What? Only google? Googlecentrism? Google-obsession?
This page is finalized to google. Since the usage of google, relatively to all other engines, has actually further increased (march 2009: google 80% | yahoo 11% | msn 3% | aol 2% | msn-live 2% | ask 1% | all other s.e. 1%, and google is gaining one percentage point per trimester, no matter what the other engines offer), it does make sense to examine in depth this specific engine. The educated seeker should however never forget that yahoo, ms-search and ask (to cite just the three most important search engines after google) have own indexes (yahoo index is in fact bigger than google's) and own search algos that may be more suited than google's ones for specific searches.
For instance when searching files inside file sharing repositories you should use yahoo, not google. Another typical case where yahoo beats google black and white :-) is when searching images in black and white: Compare google's &imgc=mono with yahoo's b&w option!

Suffice to say that casual searchers and assorted web-low life use only and/or exclusively google, while the educated seeker knows when to use all the different search engines.
To find our targets, to dig those "scattered gems in the starry web-firmament" we crave and love, we use many search engines like "feathered multicolored arrows in our seekers' quivers", knowing that each one of them is apt to hit better, or more prone to miss, some specific targets.

Before even beginning 

"One of the most annoying things in google is google mask redirection according to your nationality/language data..."

There are many ways to avoid this crap redirection, from the old (but fine) http://ww.google.com (Note: only two "w"!) to the "classical" ncr: http://www.google.com/ncr
Also some 'everenglish' googles allow further antisniffing
(Mnemo: "7 & 2" "7*2" "2 nothing 7" "7+2 twice")
Try the less intrusive mask below, without both document.f.q.focus() & window.encodeURIComponent :-)

Some more "ncr googles"

http://www.l.google.com  "Urgent service google"

"Another most annoying thing in google is that it does not respect at all "one term" searches"

In fact, if you don't use quotes or check the "this exact wording or phrase" option, google algos will arbitrarily search combinations of terms inserting SPACES in your original one-word searchterm. Since those results are less relevant than the ones without spaces they will land below the horizon, and unless you use the Yo-yo technique, you might not even notice this problem at all.
Compare fravia and "fravia". See? One reason more to avoid one-term queries.

Old Introduction 

Since most search engines were just keen on making money no matter how, Google represented a breath of fresh air, and (mostly) held the promise of delivering high relevancy results without all the extraneous and often ridicolous and annoying 'services' of the larger portals. They understand as first that the real money was to be made as aggregators of information, not as a search engine. Of course they do use for profit all the data they gather. Every search engine does it. But at least they deliver speedy and often useful results.

Unfortunately with success came problems. The beastly SEO spammers ("search engine optimizers", people that routinely push irrelevant sites up among the front results, tricking and tweaking the search engines' algorithms) tried and try their worst in order to push up their own (and their customers') foetid commercial crap sites. In order to fool Google's listings they have used everything, and then some, from "link farms" to "Trackback", from auto-citation features, to blog noise and whatsnot.

Confronted with such concerted attacks from SEO spammers and webloggers Google has taken drastic remedial action, almost abandoning its PageRank algos and even installing at one time (october-november 2003) some brutal emergency filters.
These filters were (and in part still are) activated -mostly- ONLY for the typical zombies' one-word queries, hence making any search just a little more complex, adding any filetype or site-based searchparameter, or for instance just excluding a nonsense string ą la -"qwjjqw" (a term that until today did not exist on google's index), would and will deactivate such brutal anti-spammers algo.

Note also that a (small) part of the following has been purposedly "ripped back" from some books about "google hacking", written by young lamers. After having ripped free knowledge sites like mine for years, some of them didn' even have the decency to cite their sources. So I'll serve them back.
Since their authors don't know zilch, most "search" books and sites are full of severe errors... the joy of the web is that you can easily find copycatters, and then punish them, exposing their copycatting (and incompetence) at the same time.

Some useful & simple googolian know hows 

There is a whole section of older tricks that is worth perusing. Here some simple "know hows" for educated seekers.

The tilde ~

The tilde ~ is surely one of the most useful operators in google.
It allows you to search for similar keywords. This operator is often coupled with a - or + For instance: ~mp3 +searching
Another "classical" example is the (useful) search for ~encyclopedia -britannica
The tilde is especially useful when you are more "exploring" and "feeling the searchscape" than properly seeking.



789+13+156-82-87   |   0.91 euro per 1 litre in dollars per gallons   |   square root(3^5*34)... and so on

Weather forecast

Weather 95472 (if you live in the States)   |   weather Köln   |   weather Perugia... and so on

"What's the time?"

Time Cincinnati   |   Time Berlin   |   Time Kuala Lumpur... and so on

"Wanna go to the movies?"

Movies Boston   |   Movies Bruxelles   |   Movies Bangkok... and so on

The "better than"/"worse than" search

Search for "better than" to gather background information on your target, get some interesting results and filter some noise.
For instance: "better than ubuntu"
The contrary of course applies as well: "worse than spam"


Go and fish google's cache directly whenever a target server is too slow.
This is also useful to quikly bypass simple (and silly) school/parents/religious/states censorship attempts.

The importance of your term sequence

Paris Dakar compare with Dakar Paris
euro dollar compare with euro dollar
(Incidentally, note also how "euro dollar" gives the monetary value-relation as first result, you can also use something like "56 USD in french money" to calculate change rates)

Spamming poor google to no-end 

The evil SEOs (search engines spammers) have managed to reduce google, that was until 2006 the most important search engine, into a spammed burial ground of dead and useless farmed links, scraped dmoz content, cloned wikipedia stuff, doorway generators slime, faked blog links and so on.

There are many relatively easy ways to counter this, imho. The two most common are the following ones.

The first method is the usual google dance trick: you modify the algos, some sites decay, some disappear from the first pages... and the only idiots that immediately begin to modify their own webpages are noone else that the very webmaster spammers, since noone with real content in his pages, noone with a little creativity, would continuously care to check his own 'positioning' in google (only search engines' spammers do).

The second method, also pretty efficient, is the 'find the greedy bastards' one, by hutcheson:
"Google could spot related sites that targeted collections of naturally mutually exclusive popular keywords. For instance: how many sites in the world could possibly have natural content on both "Miami Hotels" and "Las Vegas Hotels"? A couple of dozen hotel chains, two or three reservation systems, and ten million doorway spammers, that's who! Now throw in "Auto rentals", WHACK ANY SITE THAT TARGETS ALL THREE KEYWORD SETS!
Presto, twenty million spam doorways gone from the web.
Wow, the air is clearer already. Wanna think about "Fruit baskets", "Toronto", and "San Francisco"? How about "mortgages", "Idaho", and "Delaware"? "real estate", "Phoenix", and "Boston"? Every site that mentions at least three trademarked fad diet plans AND includes links or form input?
Yes, that's the same method we seekers devised long ago to counter SEOs spammers on the fly or through ad hoc bots: stop words!
In fact one wonders why some people at google allow these SEOs clowns to clog their own result pages instead of whipping them hard on their weak and stinky dollar groins.

Google Query parameters 

q $query (your query)the search query, your target
Start0 -- MAX hitsThe point in the search results where Google should start. Result 0 is the first result on the first page
num maxResults1 -- 100Number of results presented per page (MAX 100)
filterO or 1false or true? If true (=1) they will "omit some entries very similar to those already displayed" and tell you that "If you like, you can repeat the search with the omitted results included" (thus setting the filter to zero)
restrict"restrict code"
for instance countryAF (Afghanistan) countryAR (Argentina) countryAU (Australia) countryBE (Belgium) countryBM (Bermuda)...
Restrict results to a specific country (using country specific IP addresses... google is notoriously unreliable in this). Google also has four topic restricts: US. Government unclesam; GNU-Linux linux; Macintosh mac; FreeBSD bsd
hlinterface language codeAt the moment the language codes for interface language are: af, sq, am, ar, az, eu, be, bn, bh, xx-bork, bs, br, bg, ca, zh-CN, zh-TW, hr, cs, da, nl, xx-elmer, en, eo, et, fo, tl, fi, fr, fy, gl, ka, de, el, gn, gu, xx-hacker, iw, hi, hu, is, id, ia, ga, it, ja, jw, kn, xx-klingon, ko, ky, la, lv, lt, mk, ms, ml, mt, mr, ne, no, nn, oc, or, fa, xx-piglatin, pl, pt-BR, pt-PT, pa, ro, ru, gd, sr, sh, st, si, sk, sl, es, su, sw, sv, ta, te, th, ti, tr, tk, tw, uk, ur, uz, vi, cy, xh, yi, zu.
lrlanguage restrictLanguage restrict. Only display pages written in this language. Codes: Arabic lang_ar; Chinese (S) lang_zh-CN; Chinese (T) lang_zh-TW; Czech lang_cs; Danish lang_da; Dutch lang_nl; English lang_en; Estonian lang_et; Finnish lang_fi; French lang_fr; German lang_de; Greek lang_el; Hebrew lang_iw; Hungarian lang_hu; Icelandic lang_is; Italian lang_it; Japanese lang_ja; Korean lang_ko; Latvian lang_lv; Lithuanian lang_lt; Norwegian lang_no; Portuguese lang_pt; Polish lang_pl; Romanian lang_ro; Russian lang_ru; Spanish lang_es; Swedish lang_sv; Turkish lang_tr
ieUTF-8The input encoding of Web searches. Google suggests UTF-8
oeUTF-8The output encoding of Web searches. Google suggests UTF-8
as_epqExact phraseAdvanced search: "with the exact phrase". The value is submitted as an exact phrase. It's no more necessary to surround the phrase with quotes.
as_fti = include file type;
e = exclude file type a file extension
Advanced search: File format: Only | Don't.... Include or exclude the file type indicated by as_filetype (see below) 
as_filetipefile extensionAdvanced search: File Format: ....return results of the file format. Include or exclude this file type as indicated by the value of as_ft (see above)
as_qdrm3 = past 3 months; m6 = past 6 months; y = past year Advanced search: Date Return web pages updated in the.... Locate pages updated within the specified timeframe. Here Daterange:Shally Steckerl explains as_qdr.Daterange:Shally Steckerl explains as_qdr
as_nlolow numberFind numbers between as_nlo and as_nhi 
as_nhihigh numberFind numbers between as_nlo and as_nhi
as_oqa list of wordsFind at least one among the words of the list
as_occtany = anywhere; title = title of page; body = text of page; url = in the page URL; links = in links to the page Advanced search: Occurrences Return results where my terms occur.... Find search term in a specific page location
as_dti = only include site or domain;
e = exclude site or domain
Advanced search: Domain: Only | Don't.... Include or exclude searches from the domain specified by as_sitesearch (see below) 
as_sitesearchdomain or siteAdvanced search: Domain: ...return results from the site or domain. Include or exclude this domain or site as specified by as_dt (see above)
safeactive = enable SafeSearch off = disable SafeSearchEnables or disables "safe search" (Autocensoring)
as_rqURLLocate pages similar to this URL
as_lqURLLocate pages that link to this URL.

Google Advanced operators (Cfr google) 

Note that the syntaxes are often case-sensitive: phonebook, not "Phonebook"
Note that there can be no space between the "operator:" and the following word
Search within the title of a page. Title text is not limited to the TITLE HTML tag. A Web page’s document can be generated in any number of ways, and in some cases, a Web page might not even have a title at all. The thing to remember is that the title is the text that appears at the top of the Web page, and you can use intitle to locate text in that spot. When using intitle, it’s important to pay attention to the syntax of the search string, since the word or phrase following the word intitle is considered the search phrase. Other terms may be found anywhere in the page. Allintitle, on the contrary, tells Google that every single word or phrase that follows is to be found in the title of the page. Therefore putting "intitle:" in front of every word in your query is equivalent to putting "allintitle:" at the front of your query.
Search text within a given URL. This gives you the opportunity to search for specific directories or folders. Extremely useful operator, together with the site and fyletipe operators. Just like the allintitle search, allinurl tells Google that every single word or phrase that follows is to be found only in the URL of the page. inurl: works only on words , not URL components. In particular, it ignores punctuation and uses only the first word following the "inurl:" operator. To find multiple words in a result URL, use the inurl: operator for each word. Note: Putting inurl: in front of every word in your query is equivalent to putting allinurl: at the front of your query.
Searches for pages that end in a particular file extension. The file extension is the part of the URL following the last period of the filename but before the question mark that begins the parameter list. Here some of the thousand possible extensions: Adobe Portable Document Format: Pdf; Adobe PostScript: Ps; Lotus 1-2-3: wk1, wk2, wk3, wk4, wk5, wki, wks, wku; Lotus WordPro: Lwp; MacWrite: Mw; Microsoft Excel: Xls; Microsoft PowerPoint: Ppt; Microsoft Word: Doc; Microsoft Works: wks, wps, wdb; Microsoft Write: Wri; Rich Text Format: Rtf; Shockwave Flash: Swf; Text ansi: txt
Locates a string within the text of a page. The allintext operator is perhaps the simplest operator to use since it performs the function that search engines are most known for: locating a string within the text of the page. Although this advanced operator might seem too generic to be of any real use, it is handy when you know that the text you're looking for should only be found in the text of the page. Use allintext as a type of shorthand for "find this string anywhere except in the title, the URL, and links". Since this operator starts with the word all, every search term provided after the operator is considered part of the operator's search query
Narrows a search to specific sites. A subset of inurl and allinurl. Parameters to Google’s site operator must end in a valid top-level domain name (org, com, etc).
Companion to inanchor. The link operator allows you to search for pages that link to other pages. Instead of providing a search term, the link operator requires a URL or server name as an argument. It can include not only basic URLs but complete URLs that include directory names, filenames, parameters, and the like. The syntax must be a correct URL syntax, however. When an invalid link: syntax is provided, Google treats the search term not as a link, but as a phrase search
Companion to link. The inanchor operator searches the text representation of a link, not the actual URL. For instance inanchor:webbits would search links like this one: webbits (that actually points to rabbits.htm)
Search for pages published within a certain date range. Google designed the as_qdr field, for its advanced searching mask, to help you locate pages that have been updated within a given time frame (3 months, six months or one year). For example, to find pages that have been updated within the past six months and that contain the word fravia, use the query http://www.google.com/search?q=fravia&as_qdr=m6 (note that ~S~ Ritz has developed a full-fledged daterange mask for searchlores).
The numrange operator requires two parameters, a low number and a high number, separated by a dash. As the name suggests, numrange can be used to find numbers within a range. For example, to locate the number 3008, a query such as numrange:3007-3009 will work just fine. When searching with numrange Google ignores symbols such as currency markers and commas, making it much easier to search for numbers on a page.
Instead of using the numrange operator, you can of course provide a query with two numbers separated by two periods. The shortened version of the query just mentioned would be 3007..3009. Notice however the difference between numrange and "double periods" queries: with the last the two limits (here 3007 and 3009) seem to have priority over the included values (here 3008).
Used to get to google's cached link of the results page, cache:http://www.fravia.com or cache:http://www.yahoo.com. Just as with the link operator, passing an invalid hostname or URL as a parameter to cache will submit the query as a phrase search. A
The info operator shows the summary information for a site and provides links to other Google searches that might pertain to that site. The parameter to this operator must be a valid URL or site name: info:www.searchlores.org. You can achieve this same functionality by supplying a site name or URL as a search query. Just as with the link and cache operators, passing an invalid hostname or URL as a parameter to info will submit the query as a phrase search.
The related operator displays sites that Google has determined are related to a site. The parameter to this operator is a valid site name or URL. You can achieve this same functionality by clicking the Similar Pages link from any search results page or by using the "Find pages similar to the page" portion of the advanced search form
searches for business and residential phone listings (only for the United States). For instance you may search a guy named "buster" in Alabama: buster al. Note that google's phonebook stops digging at 600 results (like its search engines stops digging at 1000). Wildcards don't work either. To do a reverse search, just enter the phone number with area code. Lookups without area code will not work: phonebook: (334) 636-2580. Google's "phonebook" is however a very poor way to find data about a specific person, or to stalk someone. See the ad hoc section of searchlores for more effective ways to find a telephonn number or an address.  
White pages: residential phone listings (only for the United States). Wildcards don't work.
Yellow pages: business phone listings (only for the United States). Wildcards don't work. Then again, they're not needed; the Google phonebook does all the wildcarding for you. For example, if you want to find shops in New York with "Coffee" in the title, don't bother trying to envision every permutation of "Coffee Shop," "Coffee House," and so on. Just search for bphonebook:coffee new york ny and you'll get a list of any business in New York whose name contains the word "coffee."
Usenet searching. The author operator will allow you to search for the author of a newsgroup post on usenet. The parameter to this option consists of a name or an e-mail address
Usenet searching. This operator allows you to search the title of Google Groups posts for search terms. This is one of the operators that is very compatible with wildcards. For example, to search for groups that have a suffix "comp", a search such as group:comp* works well.
Usenet searching. Locate a group post by message ID The msgid operator refers to a specificb group message identifier, a unique string that identifies a newsgroup post. The format is something like comp-sys-concurrent-intro-1-1061190083@gweep.ca, and you can see it only checking the complete header of a given message through the "show original" option in groups. Note however that this operator does not work reliably any more in google groups.
Usenet searching. Insubject: search google groups subject lines (like intitle:)
Search for stock information. Allows those that like to play pyramide schemes to search for information about a particular stock market company. The parameter for this operator must be a valid stock abbreviation (stock ticker). If you provide an invalid stock ticker abbreviation, you will be taken to a YAHOO screen (sic) that allows further searching for a correct ticker symbol,
Show the definition of a term Returns definitions for a search term. Arguments to this operator may be a word or phrase. For instance: gross. Very anglophonic-centric feature.

Various oldies about google 

Google Web APIs Reference (Must read)
serend_1.htm: Serendipity (an easy way to search Google's cache) by Shoki
The Anatomy of a Large-Scale Hypertextual Web Search Engine, by Sergey Brin and Lawrence Page

Queries are since 2005 limited to 32 words (not to 10 any more). So you can now break your search into a series (two or three) of independent "main" searches that the boolean OR (to avoid the default AND) will held together.
Here a silly, but useful, example:
("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace")

A splendid searchform by Ritz: daterange.htm will allow you to search specific time-"slices" inside google
(and since it is in javascript, you may use it wherever :-)

With engines like google you can forget wacky unstable hyperlinks: just find your target pages selecting a set of very peculiar words that uniquely identify a given page, and just use a google query for those words in order to find that page in the future NOT the URL. So we could link to my tadimens.htm page using: "This has of course to do with both the vastness of the web and the fact that people do not know how to search".
Clearly in this case, since we are using this here as an example, you'll fetch this very page as well.

Bye bye, fragile links! (Of course you can do the same with all good search engines :-)

Daterange:Shally Steckerl explains as_qdr (november 2007)

I write to update you on something though in your mastery of such subjects I am well aware this may be like bringing firewood to the forest. Your page http://www.searchlores.org/google.htm has a section detailing the use of Google's as_qdr. I have knowledge to contribute to that as per below and wish to offer it to your gentle readers as a simple "tip" or essay, call it what you will.

Have you noticed that we now have more "Date" choices from Google's Advanced Search menu? I hardly use the advanced search because I can use more commands directly from within the main interface, plus I'm a big fan of injecting the commands directly into the URL path. But now you can limit your search to return results from only:

* Part 24 hours
* Past Week
* Past Month
* Past 2 Months
* Past 3 Months
* Past 6 Months

* Past Year

Select a search for the past 24 hours and take a look at a search results URL - (hint: copy and paste into wordpad or notepad) you will find a qdr command close to the end in the SQL sequence like this:


The ampersand (&) means its an "appended command" or something like that, then the "as_" part means this is an "Advanced Search" feature of course, while the "q" means query and the "dr" means "Date Return." Finally, the = precedes the time frame you want to search. But the fun doesn't end there.

In this case "d" equals one day, which is of course 24 hours. You will notice that in the new search results page you now see a dropdown with the original time frame choices.

What you can do from here though is instead of using that drop down to modify your time frame search, you can instead add stuff to the right of the d. For example, edit the URL to say &as_qdr=d3 instead, and you now have a search for the past three days which is not actually a drop down choice.

Play around with this and you will notice you can make it just about any amount of days you want.


If you select Past Week or Past Month from the drop down you will notice the "dr" changes from a "=d" modifier (for days) to a "=w" modifier for weeks or an "=m" modifier (for months).

This means you can search for the past 2 weeks of crawled data by entering =w2 instead, which is again not a choice on the drop down, and of course the same applies to months like =m4.

So you can theoretically still do this for "years" by using =y2 though that starts to return quite a few results and I'm not sure of the practicality.

WARNING: Just because you are limiting your search to a specific period of time doesn't mean those pages will be fresh. Many dynamically generated web pages appear to be as fresh as hot bread coming out of the oven but its only because they have a javascript that changes the date and time on the page, or it could also be an old blog post with recent comments for example.

Useful findings 

4) GOOGLE'S BIAS:     5) "GoogleRanking" bookmarklet     6) GOOGLE's WILDCARDS
10) GOOGLE's MOST LINKED     11) GOOGLE's CACHE mysteries     12) IS GOOGLE DANCING?
13) DIFFERENT IPs, DIFFERENT DATACENTERS     14) GOOGLE direct     15) GOOGLE index is stale?
16) GOOGLE's simple success secret     17) GOOGLE cache     18) GOOGLE oddities: the AND operator
19) GOOGLE site ranking:     20) GOOGLE PRINT starting (soon)     21) GOOGLE Newsletters (full of crap, yet with some info snippets)

  1. "ARCHEOLOGICAL" DIGGING (using daterange)
    For instance: fravia daterange:2452275-2452639 (1 Jan 2002 - 31 Dec 2002)

    The Julian date is calculated by the number of days since January 1, 4713 BC. Julian dates (abbreviated JD) are simply a continuous count of days and fractions since noon Universal Time on January 1, 4713 BCE (on the Julian calendar). Almost 2.5 million days have transpired since this date. Julian dates are widely used as time variables within astronomical software. Typically, a 64-bit floating point (double precision) variable can represent an epoch expressed as a Julian date to about 1 millisecond precision.

       year  month day    hr min sec  
    CommonEra BeforeCommonEra      
       Julian date    weekday
      calculation type
      JD date   

  2. More GOOGLE daterange  (Nemo's useful knowledge):

    Well our webmasterworld 'friends' are having one interesting 'discussion' about this subject:


    they are too miser to share with you their findings... but the bread crumbs are still very revealing: I've just accidentally discovered a way of getting results from Google with no query string..

    Well!, well!... so it's possible! I've tried it some time ago without any success... I looked once more to the available special syntax:

    site:, link:, inurl:, allinurl:, intitle:, allintitle:, intext:, allintext:, filetype:, ext:, inanchor:, allinanchor:, phonebook:, rphonebook:, bphonebook:, daterange:

    and I saw that I forgot to play a little with daterange:, so I tried the following query at Google:


    and bingo! I hited the jackpot! I think this query is better than the following one (see also "Google's most linked", below):


    because, I bet that in the second one, the keyword density of http should play an important role.

  3. GOOGLE'S HIGHLIGHTING TRICK  (Mordred's useful knowledge):

    A trick to make google highlight important parts of the summary with careful usage of asteriscs. For example if we're looking for a big (as possible) recording of rain, we'd use the "index of" trick like that:

    "index+of/" "rain.wav"

    60 results - but... which one to choose?
    Here is a better way to gather relevant info:

    "index+of/" "rain.wav******"

    Index of /rmx/impregnation
    Index of /rmx/impregnation. ... cymb.wav 14-Mar-2003 08:14 975k hh.wav 14-Mar-2003 08:14
    780k kick.wav 14-Mar-2003 08:14 780k rain.wav 14-Mar-2003 08:15 4.2M sample1 ...
    chronofixion.free.fr/rmx/impregnation/ - 4k - 26 Mar 2003 - Cached - Similar pages

    Mmmm... the six asterisks put the info we need in bold. See the size? 4.2M :)

  4. GOOGLE'S BIAS and PAGERANK (from Larry "Page", haha):
    Pagerank (the basic algo of google that haunts & obsess all the beasty SEOs clowns) seems indeed to have a bias against newly-created pages.
    In fact google seems to pay a lot of attention to the text in a link's anchor when deciding the relevance of a target page.

    Note that you can check on the fly the page rank of any page, for instance:

  5. "GoogleRanking" bookmarklet:
  6. GOOGLE's WILDCARDS  (Shally Steckler's tip):
    The * happens to be a wildcard that replaces an entire word. This is not a documented Google command but if you use the * connected by any of the characters Google ignores like - = , ; \ / < and > then it acts as a place setting for "any word" like this: three-*-cats or   nice=*=spring or   fravia<*>site and so on. An interesting thing is that if you use another connector and another asterisk then it returns results with two words between the first and the last term like this: fravia-*-*-site.
  7. TIRED OF CLICKING GOOGLE? (1) (google viewer)
    google viewer: "bettie page" and you won't have to move a finger :-)

    JANUARY 2005: The googleviewer does not seem to work anymore.
    The address was: http://labs.google.com/gviewer.html
    As usual you won't find any explanation whatsoever from google on the reasons for killing this useful feature.
    Use your keyboard to navigate through google's results (unfortunately it does not work with Opera :-(
  9. GOOGLE's WEBQUOTES  (can you find some use for this?):
    advanced searching
  10. GOOGLE's MOST LINKED  (can you find some use for this?):

  11. GOOGLE's CACHE mysteries  (can you find some use for this?):
    If you search for ˙ž<html> you'll find a list of pages that have the character ˙ž before the first html tag. Check google's cache of these pages.
    The two characters have the hexadecimal code FE and FF: "Furthermore, to maximize chances of proper interpretation, it is recommended that documents transmitted as UTF-16 always begin with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF, also called Byte Order Mark (BOM)) which, when byte-reversed, becomes hexadecimal FFFE, a character guaranteed never to be assigned. Thus, a user-agent receiving a hexadecimal FFFE as the first bytes of a text would know that bytes have to be reversed for the remainder of the text.

    More info here
  12. IS GOOGLE DANCING?  (this is completely useless):
    Every month Google updates its indexes. These updates are known as "Google dances". The indexes are quite large and the calculations take account of a shifting palette of algorithms, and take several days to complete. During this period, the search results are not stabilized and may vary from a minute to another. That's google's "dance" (towards the end of each month, usually between the 20th and the first days of the following month, the 25th is a good bet). When the dance starts the 'linkers' (the number of sites pointing to a specific site) are different on the three googles: www, www2, and www3 (these two point to the San Jose datacenter, which is supposed to be the 'frishest).
    Since 2003, Google dances MUCH less wildly. The indexes are now updated every week, mostly around monday.

    Check if google is dancing: www    www2    www3

    They have introduced a redirection from www2 and www3 to the www.google.
  13. DIFFERENT IPs, DIFFERENT DATACENTERS  (even if it does not dance: hiccups):
    You will often receive different results from google depending on the ip you search from (try the same search with a couple of proxies). When you access google from a different name server you may be sent to a different google data centre. This may also happen when you repeat a search over time.

    Today (1 february 2003):  using a proxy (4840)    searching directly (4860) (see next finding) (4850)
  14. GOOGLE direct  (whenever you are 'stuck' on a crappy MSIE-cookied PC):

    http://ww.google.com (Note: only two "w")

    This was for a while very useful whenever you were 'stuck' on a national-flawed google.
    Now you often have to use heavy artillery, either http://www.google.com/intl/en/ or (better) the "classical" ncr (no country redirection): http://www.google.com/ncr in order to bypass national crap-googles.
    Note that you can put a specific line into your host file (" g") or (" g") and *THAT datacenter* of google will show up whenever you type "g" into the location bar.
    Note that you can use the same trick with everything else, check the lore of the HOSTS scrolls.

    Direct googles

    They follow patterns: a lot have 64.233, 66.120 or 216.239, then one ODD number, and a second one that is around 100 (98/99/100/102/103/104/105/106/107)

    Note that you can always choose between, say,

    The 64.233s... ~  Checked 2/10/2004 ~  Checked 2/10/2004 ~  Not working 2/10/2004 ~  Not working 2/10/2004 ~  Checked 2/10/2004 ~  Not working 2/10/2004 ~  Not working 2/10/2004 ~  Checked 2/10/2004 ~  Not working 2/10/2004 ~  Not working 2/10/2004 ~  Checked 2/10/2004 ~  Not working 2/10/2004

    The 66.102s... ~    Checked 20/03/2004 ~    Checked 2/10/2004 ~    Checked 20/03/2004 ~  Checked 20/03/2004

    The 216.239s... ~ www-ex.google.com ~  Not working 20/03/2004 ~ www-sj.google.com ~  Not working 20/03/2004 ~ www-va.google.com ~  Not working 20/03/2004 ~  Checked 20/03/2004 ~ www-dc.google.com ~  Not working 20/03/2004 ~  Checked 20/03/2004 ~ www-ab.google.com ~  Not working 20/03/2004 ~  Checked 27/03/2004 ~  Checked 2/10/2004 ~  Checked 2/10/2004 ~ www-in.google.com ~  Not working 20/03/2004 ~  Checked 20/03/2004 ~ www-cw.google.com ~  Not working 20/03/2004 ~  Checked 20/03/2004 ~  Checked 2/10/2004 ~  Checked 2/10/2004 ~  Checked 2/10/2004 ~  Checked 20/03/2004

  15. GOOGLE index is stale?:
    Sure is that updating once every month does not help a lot re: freshness :-(
  16. GOOGLE's simple success secret:
    Google is THE ONLY main search engine without crap paid results inside SERPs :-)
  17. GOOGLE cache  (useful knowledge :-):
    Hi Fravia,
    Here is a short note concerning the use of Google's cache that may be of interest...
    When searching for files using Google, perhaps using the +"index of /" trick, you often run into the problem that the files cannot be accessed because you do not have permission... need a passwored or so.
      Sometimes yet, Google has cached the "forbidden" page that lists the file you want, before access restrictions were placed on the URL.  By checking the cache, you will then see the page with the file you want listed on it.
    I have found that a surprising number of times, I can simply download the file from Google's cache page.  Why?  Because the permissions were set on the directories only, and not on every single file within the directory!
    This is especially true for images, but it works for musicz too.
    For example, say you are interested in mp3s by Beck, and Google lists the following site:
    Attempt to access this site directly and you will be denied access.
      Fine.  Have a look at Google's cache and you will see a single mp3 listed: "Beck - Loser.mp3".  You will find that this file is downloadable.

  18. GOOGLE oddities: the AND operator  (doesn't kick in):

    search AND tips: 5,540,000
    search tips: 5,930,000

    This difference (not only quantitative but also qualitative) means that the AND operator forces an exact phrase search and, contrarily to google's statements, that it is not provided by default.
  19. GOOGLE site ranking:

    Spammers (that call themselves SEOs) are investing incredible amounts of work in order to 'rank' in the first positions in google. Seekers could not care less (actually a good reason to spring the first 200 places in google, now heavily spammed, following the old proverb for spammed altavista, that once upon a time was the best engine around: Hic alta, hic salta) but there are possibilities for checking quickly where you are on a given search, using google's API. Hey you may even use the 'googlette' at the bottom of this page, or visit http://www.googlerankings.com/, where you will find following form:

    Keyword(s) to list the sites for:  
    Domain or URL of your website:  
    eg.: google.com or geocities.com/mysite

    For faster results, You may limit your search to the

    The process may take up to 15 seconds

    This is based on this php script
  20. GOOGLE PRINT starting  (soon):

    Google print

  21. GOOGLE Newsletters  (full of crap, yet with some info snippets):


How big is google's index? 

Nobody knows, google doesn't either :-)
Anyway it should be around 25 billion documents (against Yahoo's 20 billion)

Yahoo versus Google
"My index's bigger than yours, nah, nah, nah, nah"

Yahoo announced in sommer 2005 that it had indexed 19 Billion (milliards) documents.
Google at that moment claimed to index 11 Billion (milliards) pages (the 'official' number was at the beginning of september still 8,168,684,336 docs). Yet a few days after Yahoo 'bigger index' claim, google doubled its number of indexed images, (at the beginning of september 2005 -allegedly- 2,187,212,422).
Now -end september 2005- google should have around 25 billion pages: "Google opened its doors in September 1998, and we’ve been pursuing one mission ever since: to organize the world’s information and make it universally accessible and useful. For our seventh birthday, we are giving you a newly expanded web search index that is 1,000 times the size of our original index". Since the original index in September 1998 was around 25 million pages, this would mean (27 Septeber 2005) ariund 25 billion (milliards) documents.
However, index size does not mean nothing.
I have prepared these data - that you wont find on the web elsewhere- for my Helsinki conference (September 2005).
They demonstrate that -for the main search engines- index size is only loosely (and peraphs inversely :-) related to the quality of results returned
One month ago (August 2005) Yahoo announced suddenly to have indexed 19 Billion (milliards) documents. Clearly an attempt to dwarf Google's famous "8 Billion" (Milliards) sites.
Alas! No wonder that the results of (almost) any test search you may launch keep to be in Google's favor: as the following data prove, the biggest increase in Yahoo's results seems to have been in "frills" domains.
For instance Yahoo now indexes 9.560.000.000 "com" domain documents, versus the 1.690.000.000 indexed by google. As you can see, the most striking differences, when regarding domains, are to be found on crap & frill domains like "com", "info", "net" & "biz".
We can clearly see that the differences are less important for more content-rich domains like "edu", "org", "gov", "mil" and "int".
Here some graphs:

Note the sad preponderance of ".com" domains among those indexed:

god image

Note the absolute preponderance of those very ".com" domains in Yahoo:

yado image

Would anyone in his right mind prefer a search engine that prefers "biz", "info", "net" and "com" domains?

diyago image

Anyway, as you can see in the following graph, the COVERING of the web (especially taking account of the hidden databases) is still rather meager BOTH for Yahoo and Google:

weco image



How does google work? 

In march 2005 a rare "under the hood" light was thrown on google's inner working in a interview by Urs Hoelzle, Google vice president of operations and vice president of engineering

To deal with the billions of Web pages and huge amounts of terabytes of information on Google's servers, the company combines cheap machines with plenty of redundancy. Its commodity servers are placed into interconnected nodes. All machines run of course on a Linux kernel. The distributionwas, in 2005, Red Hat.

Google replicates the Web pages it caches by splitting them up into pieces it calls "shards".
The shards are small enough that several can fit on one machine. And they're replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. The engineers call these "chunk servers".

As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box.

In parallel, clusters of document servers contain copies of Web pages that Google has cached. The refresh rate is from one to seven days, with an average of two days.

Each set of document servers contains one copy of the Web. These machines are responsible for delivering the content snippets that show searchers relevant text from the page.

The top 10 results get sent to the document servers, which load the 10 result pages into memory, then you parse through them and find the best snippet that contains all the query words.

Back to the main search engines

to basic
Bk:flange of myth 
(c) III Millennium: [fravia+], all rights reserved, coupla wrongs reversed

Page optimised for Opera. Other browsers? Couldn't care less.