yahoo specific lore

~ Fravia's main search engines: Yahoo ~

				Fast

Version March 2009

Why Yahoo
Useful findings
Nemo's essay (September 2005)
For what kind of searches is yahoo better than google?

YAHOO
Advanced
- for exclusion (but NOT if using booleans, " " for exact match
( ) for nesting, AND NOT for boolean exclusion
no case sensitivity whatsoever
Now with the VERY important NEAR operator (november 2006)

√

Da biggest index?

YAHOO [Once Yahoo had only 677 results viewable, now the SERPs stop at 1000]
For info on Yahoo's (Inktomi's) rich syntax, see Nemo's essay (September 2005)

Yahoo recognized the tragical mistake of going commercial and went 'back to basic' in late 2002 (better late than never) it seems to be gaining momentum as part of the inktomi factories :-)
Note that yahoo recently bought the wondrous fast/alltheweb search engine (and promptly killed it :-(
Yahoo is now one of the three "big players" (google, MSN and Yahoo) andand claimed at the beginning of September 2005, to have indexed 19 billion sites (against google's 8 billion). A few weeks later Google claimed 25 billion docs (against Yahoo 20 billion). Since the Web runs around 500 billion docs (and growing) the 'race' is rather pointless :-)
(More on google's ad hoc section)

Advanced Yahoo search

Note that there are some direct addresses for yahoo (see google's UF, point 14), for instance: http://216.109.117.135/search.

There is an interesting "MSN alike" slider tool you should be aware of: Yahoo Mindset, try for instance fravia
Or try "caravaggio": http://mindset.research.yahoo.com/search.php?p=caravaggio.

Why Yahoo

Because it has a big index, this index, however, is mainly due to yahoo's mining of commercial sites, see below.

Because it has a special "creative commons" search option.

Because it has its own quick yahoo javascript.

Because you can nest terms: ((shills AND trolls) AND NOT kooks)

Because it has its own slider à la MSNsearch.

Because it has its own interesting syntax.

For what kind of searches is yahoo better than google?

Short answer: local searches (only if you live in the States, though, else google beats yahoo black and blue), job searches (ditto) and site explorer (which has now also replaced the old "linkdomain:" operator).

Long answer:
The educated seeker should however never forget that yahoo, ms-search and ask (to cite just the three most important search engines after google) have own indexes (yahoo index is in fact bigger than google's) and own search algos that may be more suited than google's ones for specific searches.
For instance when searching files inside file sharing repositories you should use yahoo, not google. Another typical case where yahoo beats google black and white :-) is when searching images in black and white: Compare google's &imgc=mono with yahoo's b&w option!

Suffice to say that casual searchers and assorted web-low life use only and/or exclusively google, while the educated seeker knows when to use all the different search engines.
To find our targets, to dig those "scattered gems in the starry web-firmament" we crave and love, we use many search engines like "feathered multicolored arrows in our seekers' quivers", knowing that each one of them is apt to hit better, or more prone to miss, some specific targets.

Even longer answer: it is, after all, "A Question of Relevancy
Search engine users are typically most interested in the items returned on the first page of the search results. It's not often that users dig down into the ninth or tenth pages, because it takes too long and those results are simply not as relevant.
Given this fact, it seems appropriate for a ranking algorithm to spend most of its modeling efforts getting the topmost items right.
Though the current algorithms used by Yahoo! do a very good job at determining the relevance of a web page for a particular query, there is always room for improvement. That's why Olivier Chapelle, a senior research scientist at Yahoo!, has spent the last several months trying to boost the ranking quality.
His work is based on the machine learning framework of structured output learning, where the input corresponds to a set of documents and the output is a ranking. This approach is different from the regression model commonly used in current search engine technology.
In essence, the framework of structured output learning provides a new opportunity: instead of viewing the outputs of the documents in an independent fashion, they are now coupled together in order to optimize the performance measure.
Why is this potentially a better approach? Well, by considering the documents independently, the regression model is unaware of the global ranking. By contrast, structured output learning can find a rule that produces a better overall ranking by taking into account all the documents associated with a query.
Indeed, in early tests, Chapelle's algorithm has yielded 3 to 4 percent improved accuracy rates on several public and commercial ranking datasets. These results are featured in a paper entitled Large margin optimization of ranking measures, by Chapelle and his co-author.
Though excited by his initial results, Chapelle admits there is still a long way to go. "The next step is to do more systematic experiments to validate the usefulness of this method" he says. And if everything goes according to plan? "The long-term hope is that eventually this model will be put in production and used for all searches on Yahoo!," says Chapelle".

Note that the problem of the (purportedly) "best top items" inside the SERP's first pages (wich in fact often are among the worst because of heavy spamming) can still be solved "artisanally" using the old "yo yo" searching trick.

Yahoo has its own syntax as well

Hooo, Yahoo too

Of course we are not limited to google. Each search engine has its own quirks, and Yahoo has its own syntax as well:

site: Finds all documents within a particular domain (and all its subdomains). e.g: in.fieri site:searchlores.org
hostname: Finds all documents from a particular host only. e.g: hostname:www.searchlores.org
link: Finds documents that link to a particular url. e.g: link:http://www.un.org/
url: Finds a specific document in yahoo index (pretty useless operator): e.g: url:http://www.fravia.com/index.htm
inurl: Finds a specific keyword as part of indexed urls. e.g: inurl:index songs mp3
intitle: Finds a specific keyword as part of the indexed titles. e.g: intitle:index songs mp3

Note the difference between the last two queries.

Useful findings

"My index's bigger than yours, nah, nah, nah, nah"

I have presented these data - that you wont find on the web elsewhere- at my last Helsinki conference. They demonstrate that -for the main search engines- index size is only loosely (and peraphs inversely :-) related to the quality of results returned
In August 2005 Yahoo announced suddenly to have indexed 19 Billion (milliards) documents. Clearly an attempt to dwarf Google's famous "8 Billion" (Milliards) sites.
Alas! No wonder that the results of (almost) any test search you may launch keep to be in Google's favor: as the following data prove, the biggest increase in Yahoo's results seems to have been in "frills" domains.
For instance Yahoo now indexes 9.560.000.000 "com" domain documents, versus the 1.690.000.000 indexed by google. As you can see, the most striking differences, when regarding domains, are to be found on crap & frill domains like "com", "info", "net" & "biz".
We can clearly see that the differences are less important for more content-rich domains like "edu", "org", "gov", "mil" and "int".
Here some graphs:

Note the sad preponderance of ".com" domains among those indexed:

god image

Note the absolute preponderance of those very ".com" domains in Yahoo:

yado image

Would anyone in his right mind prefer a search engine that prefers "biz", "info", "net" and "com" domains?

diyago image

Back to the main search engines

B k:f l a n g e o f m y t h