|
version 0.12a | |
Translation by fravia+ 1st published in March 2009 How to search the Web in languages you don't know |
|
lore |
|
|
|
| Google Translate | Googlex | EurLex | Eurodicautom/Iate | | Babylon | Systran | MS-own | Yahoo's own | Lec Translator | rikai | |
Language barriers & language magic boots |
Přes
šteští vidění tak slepý Přes dar slyšení tak hluchý |
The luck of being able to see and, notwithstanding, so blind. The joy of being able to hear and, notwithstanding, so deaf | |||
Ancient Seekers' lore |
"Rather than argue about whether this algorithm is better than that algorithm,
all you have to do is get ten times more training data. And -look and behold- all of a sudden, the worst algorithm is performing
better than the best algorithm on less training data" (Peter Norvig) |
"Most state-of-the-art commercial machine translation systems in use today have been developed
using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.
Several research systems, including ours, take a different approach: we feed the computer with billions of words of text,
both monolingual text in the target language, and aligned text consisting of examples of human
translations between the languages. We then apply statistical learning techniques to build a translation model"
(Franz Och)
|
A small panoramic view |
Da real beefz Fundamental tool for "unknown language searching". Translates from the following languages: Albanian, Arabic, Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese. Since Autumn 2007, Google seems to have dispensed entirely the rules-based system (which was provided by Systrans) and offers translations using its own statistical method for all the 41 now existing language pairs. In fact google’s translation matrix has expanded to 1,640 Language couples: With the recent additions of turkish, thai, hungarian, estonian, albanian, maltese (sic!) and galician, hence the total number of languages of Googletranslate has reached 41 (counting the two "chineses" as one), totalizing in march 2009 the astonishing level of 1,640 language couples (41*40). A "translated search" option will allow you to get automatically translated pages relevant to your search: http://translate.google.com/translate_s?hl=en Obviously your choice of the target language/geographic area/community should depend from your searchstring and targets, as explained in the lore of local searching. For instance: http://translate.google.com/translate_s?hl=en&clss=&q=rapidshare+uploading+depositfile&tq=&sl=en&tl=uk Some minor glitches are still present on this google's Babelfish (crippled) public version. Try for instance: moi je cherche un tas de choses bizarres sur la toile and see how the tense gets lost in "I sought a lot of strange things on the web". I personally find this tool particularly useful for really awkward languages (for me) like vietnamese, japanese, thai, korean or chinese. Also check the limits of automatically translated web pages choosing english to french (or using "swap") and then inputting, for instance http://www.searchlores.org/longtermsearching.htm Finally for your own intranets playing pleasure you'll find an ad-hoc google "Ajax" translation mask on its ad-hoc page. It's all simple javascript, so you can port the whole mask wherever you want. |
A gazillion of human made translations Use google (instead of the slow Eur-Lex search masks) to quickly find a gazillion EU-translation I am proud to have devised this simple "googlex" (google+eur-lex) mask: despite its obvious simplicity, it turned out to be an incredibly versatile and powerful translation web-tool, useful for anyone dealing with web-searches (and, more generally, translations). In fact, once found your target, if you choose to fetch the cached copy of google instead of the original eur-lex server document you'll cut two mustards with one stone :-) You'll have an automatic highlight of your original searchstring AND you'll probably get the document itself much more quickly. More detailed instructions are available on its ad-hoc page: googlex.htm
|
Have bilingual display, will understand Following a European Parliament resolution of 19 December 2002 the access to the old "legal" database of the European institutions "Celex"(Communitatis Europeae LEX) is free of charge from 1. July 2004. The result of the merging with the EUR-Lex portal is the new system, also named EurLex. Visit the EUR-Lex search mask but check also the googlex tool discussion about searching the same documents through google's servers. Official Journals can for instance quickly be gathered: http://eur-lex.europa.eu/JOIndex.do?year=2009&serieL&textfield2=11&Submit=Search&_submit=Search&ihmlang=en See? OJ L 2009/11: only those 4 parameters do change. If you want a specific section, you must know the page number in the OJ as well: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:011:0083:0083:EN:PDF. Self explanatory URL, duh. Note that changing the page number will still give you the complete pdf file of the specific OJ subsection. If you want a bilingual display, nothing beats the Eurlex bilingual facility, though. For instance OJ L 2009/11 (in English and starting at page 6): http://eur-lex.europa.eu/Notice.do?mode=dbl&lng1=en,es&lang=&lng2=bg,cs,da,de,el,en,es,et,fi,fr,hu,it,lt,lv,mt,nl,pl,pt,ro,sk,sl,sv,&val=486721:cs&page=1&hwords=Commission+Directive+2009%2F2%2FEC+of+15+January+2009+amending%7Efor+the+purpose+of+its+adaptation+to+technical+progress%7Efor+the+31st+time%7ECouncil+Directive+67%2F548%2FEEC+on+the+approximation+of+the+laws%7Eregulations+and+administrative+provisions+relating+to+the+classification%7Epackaging+and+labelling+of+dangerous+substances%7E Note the mode=dbl&lng1=en,es, here for english into spanish. |
Old but sturdy one-term translation tool The old Eurodicautom EU-translators' database is now freely available in its "iate" (Inter-Agency Terminology Exchange) implementation. Eurodicautom covers at the moment both the "old" and "new" EU-languages (see above for a complete list). Not really a comprehensive dictionary: focused on technical language. The server it is hosted on is probably (as per early 2009) a pentium III held together by gummi bands and small pieces of adhesive tape: it might at times work with sufficient speed... if you are lucky. The four original languages of Eurodicautom were Dutch, French, German and Italian, to which Danish and English were added in 1973, Greek in 1981, Portuguese and Spanish in 1986, Finnish and Swedish in 1995 and so on. Latin is also (sporadically) present. It contains million of terms and hundred thousands of abbreviations. All actual official EU-languages are feeded, because this free tool aims at meeting the (patently nonsensical yet politically motivated) European Union wish of conferring the same recognition to each language... as if languages were really equal :-) Let's try as example to fetch all the available translations for the term "search": http://iate.europa.eu/iatediff/SearchByQuery.do?method=search&saveStats=true&screenSize=1680x1050&query=search&valid=Search+&sourceLanguage=en&targetLanguages=s&domain=0&typeOfSearch=s&request= Notice: 1) &sourceLanguage=en for english; 2) &targetLanguages=s for "any" target language (but you can specify only some EU-languages), and, 3) the query itself which is here the term "search" in &query=search, a term that you can of course change on the fly inside Opera's address bar with a different query. Why the developers have so insisted in masking the real http:// request that their script sends to the server, beats me :-) Also note you could use optional useful "domain" criteria... for instance limiting the query to "3236 Information technology and data processing" if needs be. As someone (rightly) wrote many years ago: "The original logic was that Eurodicautom would ensure consistent translation and usage across European institutions, but the database is really too large to meet that goal. A great many terms go into it, but there isn't much effort to check how much those terms are actually being used. As neat as huge term databases are, quality control is the real Achilles heel of this concept. Where possible, it's better to have a small database of known good terms than a big database of terms of unknown quality - although it's better still to have both. I tend to think of this sort of database as a last-ditch resource - cheaper than original term research and better than making something up, but definitely a source of questionable quality" Real life translators dealing with bureaucratic papers could be interested also in the eurovoc thesaurus (that you can download for free). |
A cracked proprietary dream Babylon dictionaries (only covering "important" languages). These can be downloaded for free, and theoretically you would need the babylon program to use them. Yet theory and practice often differ, especially in our web-netherworld, and thank to Bilbo and many other among the world's finest reversers at woodman's messageboards it is now relatively easy to port every Babylon book to GNU/Linux in its COMPLETE FORM (if necessary obtorto collo :-) using some ad hoc small scripts. Of course you could go for the complete book instead, but in that case check the laws of your country (or the country of your proxy) about the possibility that some of the findings of your queries could be in fact not in the public domain, despite their massive and widespread presence on the web (some patent-obsessed clowns really seem to believe they can put toothpaste back inside the tubes). So never do anything illegal on the web: there's mostly no need whatsoever anyway. The real lesson here is for the patent-obsessed: "If you try to fence off your little corner of the Internet, you’re better off herding cats". Note in fact that some countries, like S.Marino or Somalia, and their proxies, didn't adhere to the various bogus conventions of the patent holders' mafia. |
Another proprietary-software dream Systran is a very old translation system, that works well, sometimes, for some language couples. It is a commercial proprietary solution, though, and of course there's no reason whatsoever to pay for results you can have for free -and often better- elsewhere. They should propose it for free (instead of just letting lusers leech it all over the web), with source code and everything, let it be ported to GNU/Linux (it is obscenely geared towards windows and various Microsoft-crapola products like word) and so get it improved and finetuned by better programmers than what they seem to have in-house. This would give them a (weak) chance against the google buldozer. Language pairs: English - Arabic, French - Dutch, English - Chinese, French - German, English - Dutch, French - Greek, English - French, French - Italian, English - German, French - Portuguese, English - Greek, French - Spanish, English - Italian, Italian - German, English - Japanese, Italian - Portuguese, English - Korean, Portuguese - German, English - Polish, Spanish - German, English - Portuguese, Spanish - Italian, English - Russian, Spanish - Portuguese, English - Spanish, English - Swedish. It's not that bad, after all. Try our text=moi+je+cherche+un+tas+de+choses+bizarres+sur+la+toile and you'll get an almost correct "me I seek a lot of odd things on the web"). This said, its free accessible mask for text direct translation OR webpage translation could come handy, at least as a simple "poor man" proxy (see above). |
Nice try, pity it's stil behind Microsoft must have thought they HAD to compete in this sector too with google. A pity that they seem still to be far behind the horizon. There's now a new domain (delivered by akamai's sniffing and censorship-prone clowns): http://www.microsofttranslator.com/, rumored to be slightly quicker. Languages available are English to/from: Arabic, Chinese Simplified, Chinese Traditional, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian (RUS ==> EN only) and Spanish. Also Chinese Simplified <==> Chinese Traditional. If you click on a "Translate this page" link in the Live Search results or enter a web address in the web page translation box, translations will be presented to you in the "Bilingual Viewer," providing easy access to the original web page and its translation. There are also some interesting view-options, like hovering translation or hovering original. Alas! Not all pages will be translated: following a typically "microsoft-moronic" approach, this interesting feature is -purposely- optimised for the dangerous (in fact: masochistical) MS-Explorer browser and works badly with all real browsers (Worse! It does not work at all with the best browser of the planet: opera). Try our "moi+je+cherche+un+tas+de+choses+bizarres+sur+la+toile" and you'll get a slightly "tarzan-like" text with the dubious choice of "look" over "search": "me look a lot of strange things on the Web". This said, its free accessible mask for text direct translation OR webpage translation could come handy, at least as a simple "poor man" proxy (see above). |
Average, again: yahoo style The old Altavista's machine translation service, repackaged by Yahoo, powered by Systran. Again, the "Babelfish" name was chosen. A futile attempt, by yahoo, to keep abreast against the google bulldozer. Try our "moi+je+cherche+un+tas+de+choses+bizarres+sur+la+toile" and you'll get an horrific "me I seek an odd heap of things on the fabric". This said, its freely accessible mask for text webpage translation is not that bad and could come handy, at least as a simple "poor man" proxy (see above). |
Below average commercial crap Lec's (Language Engineering Company) translator works -somehow- for japanese/english and english/japanese. Also it has Pashto (quite useful if you are roaming around between Kabul, Kandahar and Bahawalpur :-) For some other language combinations -frankly- it is not even funny :-( Luckily they provide a demo that will show all its limits. Our "moi je cherche un tas de choses bizarres sur la toile" example will result in the rather outlandish "me I look for a heap of bizarre things on canvas". Again this just proves how on the web free products (mostly) beat commercial products black and blue. This of course also happens elsewhere, in the software world (see GNU/Linux versus Windows) and in the real one (see realicra/aquafina_and_dasani.htm). |
Japanese-english hovering translation Among "hovering translators", Rikai is (for japanese to english) surely one of the best free scripts available. You'll have just to use Opera's excellent "right click" ==> "block content" feature in order to eliminate its obnoxious advertisements. Try it out at http://www.rikai.com/perl/Home.pl (pass the cursor on some japanese symbols and enjoy :-) Of course the whole point is to use it on sites you happen to find during your searches. In fact the rikai perl scripts work quite well when applied to a japanese site you want to understand: here for instance comics.shogakukan.co.jp. Note how clicking on any link will bring your "hovering rikai" feature along with you. |
(When you are on a hurry) |