Introduction Inktomi unveiled Inktomi's syntax References
Inktomi is one of the best search engines out there. Unfortunately its search syntax is not well documented, which is a pity, because Inktomi offers one of the richest search syntaxes, with lots of unique features and a ranking algo which works often quite well. The purpose of this essay consists precisely in documenting Inktmi's search syntax and providing examples showing its usefulness. For that purpose old HotBot's search FAQs and others Inktomi's web partners' search FAQs were read. The core syntax present in them was expanded using search engines and the WayBack Machine. Finally, from the source code of old HotBot's advanced search pages, additional search syntax was guessed: feature:homepage, originurlextension: and stem:.
Inktomi doesn't provide a public search engine in a way that search engines like AltaVista or Google do. Instead, it licenses its search technology and database to other portals and sites on the Internet. While some of them display results which are taken directly from Inktomi's database, others display results from Inktomi after displaying results from another source.
Inktomi is also different from other search in a major way: it is build of two databases: WebMap and Web Search 9.
Inktomi relies on title, keywords (meta tag) and text to sort search results (cf. [4] page 23). The use of meta tags is a good idea: -Webmasters know for sure what their pages are all about! Nevertheless Inktomi's use of titles and keywords meta tag is prone to abuse and examples are easy to find... just do a title search for any commercial keywords in singular and plural:
title:hotel title:hotelswhere lonnnnnnnnnnnnng titles and keywords meta tags abbound... lets take the very first search result: originurl:http://www.nyc-realestate.com/, which has no body text! (disable javascript, if you don't want to being redirected duh:). Even worst, Inktomi still match merrily this document for the keyword manhattan which occupies chars 377 - 386 of title's string... I wonder why Inktomi allows this free ride to spammers, given that this annoyances are so easy to correct:
Inktomi doesn't show their document's copy aka cache. This lack of transparency is a shot in their own feet because it prevents spammed pages from being easily exposed and get the deserved punishment.
Inktomi's database is queried by four main search engines: MSN (advanced search), HotBot, PositionTech and BluWin. Each one of these has its own advantages and inconvenients:
Inktomi willingly prostitutes himself accepting payed search results and will merrily gives them a boost:
'While the pay-for-placement search model-in which a marketer pays to have certain keywords land high in search results-has been gaining steam over the last few years, Inktomi says that represents only 30% of the paid search model. Inktomi says it will tap into the remaining 70% through its paid inclusion model, which places sites into the results of relevant searches.' (cf. [5])Nevertheless these pages are easy to spot because they were crawled in the past two days (cf. [6] and [7]) and when you put your mouse over the link the status bar shows rdrw1.inktomi.com/click?u=http (disable javascript to not being fooled duh:), if you are using MSN, HotBot or PositionTech.
Default search Multiple search terms are processed as an AND operation.
Boolean search Inktomi offers full Boolean searching and its syntax is AND, OR, and NOT, allows the use of - instead of NOT and searching can be nested using parentheses (). Operators must be in upper case. You are well advised to not use the OR operator for keyword variants, because your query will attract irrelevant search results (Inktomi gives an higher rank to documents containing all ORed keywords), in those cases you should use stemming whenever you can. Example, compare:
Case Inktomi has no case sensitive searching. Using either lower or upper case results in the same hits.
Truncation No truncation (*, ?) is currently available, but you can use word stemming (stem:).
Stop words All words are searched. There are no known stop words.
Ranking Inktomi is the only 'search engine' which lets you change its ranking algorithm, this is done by giving to each keyword a weight. Weight factors can vary betwen 0.0 and 9.9 and the syntax is weight*keyword, by default each keyword has weight 1.0 as you can see comparing these two queries: 1.0*fravia and fravia. This is a very useful feature which lets you fine tune how loud each keyword is allowed to talk and in this way reduce the noise level produced by noisier keywords, by multiplying them by a factor inferior to 1.0; and give an opportunity to more bona fide keywords by multiplying them by a factor grater than 1.0. Example:
depth:[number] Designates how far pages will be searched in a site's directory structure. The number (0, 1, 2, 3, 4) indicates the maximum number of subdirectories, relatively to host's root directory, which could appear in the URL. As a general rule (not universal! duh:) webpage's content increase with directory's depth and, besides, spammers think that webpages on home directory get a ranking boost and are more likely to being indexed, therefore they put often their doorway pages there. This useful feature offers a handy way of getting ride of those anoiances... excluding root directories' pages!
Example: title:german hear feature:audio -depth:0As most pages are located in directories with depth inferior or equal to four, this feature gives a good estimation of how many documents are in Inktomi's database: depth:4.
domain: Restricts a search to the selected domain. Domains can be specified up to three levels deep. Examples: domain:org, domain:searchlores.org, domain:www.searchlores.org. Don't take HotBot's numbers at face value... for each query HotBot only shows two/three pages per site, which is a good antispam measure. If you want to see all indexed pages from a given site, you must do your search at PositionTech: domain:searchlores.org.
feature:acrobat Search for pages that links to a PDF file. Compare the queries:
Example: "link structure" feature:acrobat. As PDF files may have not been indexed by some reason (examples: robots.txt, robots meta tags), this feature may provide some interesting results.feature:activex Detects pages containing embedded activex, i.e. the presence of the tag <object ... classid="clsid:... >, compare:
feature:applet Detects <applet ...> tag in HTML, compare:
feature:audio Detects if a page links to an audio file. Audio files could be among others: wav, mp3, m3u, mid, midi, au, snd, ... The link could be in a:
feature:flash Contrary to what we could expect, Inktomi doesn't detect neither the existence of the tag <embed ...> compare:
feature:form The Inktomi's crown jewel. Detects the <form> tag in HTML. Inktomi may not index the hidden web, but offers you a way of knowing where the front doors are! For instance you can use Inktomi to find Laws' Databases, translation services: dutch english translate url feature:form, etc.
feature:frame Detects pages containing frames.
feature:homepage Restrict your search to personal pages (identifier ~). Very useful, because it's still the convention for personal pages on educational sites. Example: web search feature:acrobat feature:homepage.
feature:image Detects <img src=...> tag in HTML or a link to an image.
("bird of paradise" OR "birds of paradise") AND (papua OR "new guinea") AND feature:image -stem:travel -stem:hotelImages are widely used for aesthetic reasons. If an HTML webpage doesn't contain images you may wonder if there's an hidden agenda... probably it's a cloaked/spammed page by a a spammer putting only keywords n' links and not taking the hassle of building a real webpage. You can trash often those annoyances using this useful feature!
feature:index Restricts your search results to the host's top page. Very useful to find sites about a given theme! The host's homepage is the most valuable site's real estate, there the site's owner should put a resume of what his site is all about and provide links to his most important pages. Example searching for FTP search engines: ftp search feature:index feature:form. Inktomi indexes approximately 27,927,941 webhosts cf.: feature:index.
feature:javascript Detects pages containing the <script ...> tag with the attribute language="javascript", comapare:
title:german exercises feature:javascript feature:formSpitze!
feature:meta Detects <meta ...> tags in HTML.
feature:shockwave Detects pages containing links to files with extension dcr, dir, fla, spl or swf, compare:
feature:script Detects <script ...> tag in HTML, in particular detects other script languages than javascript (for instance VB script), compare:
feature:table Search for pages containing the <table ...> tag. Tables are widely used to control page's layout and... ahem to build tables! If an HTML webpage doesn't contain tables you may wonder if there's an hidden agenda... probably its a cloaked/spammed page by a SEO fearing that some search engines may not fully support tables, or a spammer putting only keywords n' links and not taking the hassle of building a real webpage. You can trash often of those annoyances using this useful feature!
feature:title Detects pages containing the <title> tag. As allmost all webpages contain a title, this feature gives a good estimation of how many documents are in Inktomi's database. Cf.: feature:title.
feature:video Search for pages linking to video files (file extensions: avi, mpg, mpeg, mov, etc.). Example: title:chaplin feature:video. Videos embedded with the tags <embed src=...> or <img dynsrc=...> are not detected. Compare:
feature:vrml Search for pages containing a link to a vrml file (wrl, wrz, vrml). Compare:
Inktomi is unable to see embedded vrml files. Compare:link: Finds pages which contain hypertext links to the exact specified URL. Example: link:http://www.searchlores.org/news.htm, could also be a link to a parent directory link:../.
linkdomain: Search for pages linking to any page in a given site, example: linkdomain:www.searchlores.org -domain:searchlores.org.
linkextension: Very usefull. Search for pages linking to a file with a specified extension. As what characterizes a file is its extension, this feature provides a way of getting only the pages which links to the real thing.
originurl: Webmasters use this operator to see if a page is in Inktomi's database. Example: originurl:http://www.searchlores.org/news.htm (don't forget the http:// part! duh:). Can also be used to anchor a page and study what went wrong with Inktomi's ranking algo.
originurlextension: Very useful, restricts your search to documents with a given extension. Example: web search originurlextension:pdf.
originurlpath: The same as path:.
outgoingurltype:[url_type] Search for pages linking to a certain mime type. Example: outgoingurltype:image/jpeg. Does more or less the same as linkextension:.
path: Search for words in URLs path, you can also search for phrases, but the syntax isn't the one we would expect: path:"keyword1 keyword2", instead it is "path:keyword1 path:keyword2". Example: path:fravia.
region:name Restricts your search to a geographical region (africa, centralamerica, downunder Oceania, europe, mediterranean, mideast Middle East, northamerica, southamerica, southeastasia). Example: stem:laws noise stem:levels region:europe You can find which countries are included in each region here. This field can also be used to get an estimation of how many documents are in Inktomi's database:
region:africa region:asia region:centralamerica region:downunder region:europe region:mediterranean region:mideast region:northamerica region:southamerica region:southeastasia Total: |
1,821,859 documents 35,037,863 documents 651,599 documents 11,232,997 documents 190,119,720 documents 1,611,480 documents 2,351,964 documents 524,553,025 documents 13,688,268 documents 2,113,385 documents 783,182,160 documents |
stem: Search for documents containing grammatical word variants including plural, singular, and tense. Example: web search -stem:advertise -stem:business -stem:christ -stem:game -stem:genealogy -stem:host -stem:hotel -stem:job -stem:offer -stem:position -stem:product -stem:service -stem:shop -stem:travel.
title: Search for words in the title, you can also search for phrases, but the syntax isn't the one we would expect: title:"keyword1 keyword2", instead it is "title:keyword1 title:keyword2". Example: "title:index title:of" -originurlextension:htm -originurlextension:html