Talk:Machine translation

	This article is within the scope of the WikiProject Translation Studies, a collaborative effort to expand, improve and standardise the content and structure of articles related to Translation Studies. If you would like to participate, you can edit this article, or visit the project page, where you can join the project and see a list of objectives.Translation studiesWikipedia:WikiProject Translation studiesTemplate:WikiProject Translation studiesTranslation studies
Top	This article has been rated as Top-importance on the project's importance scale.

Computing Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Low	This article has been rated as Low-importance on the project's importance scale.

Linguistics: Applied Linguistics Low‑importance

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by Applied Linguistics Task Force.

Untitled

Bill 21:12, 16 March 2006 (UTC)[reply]

If you guys don't mind, I'm going to make some changes to the history section of this article some time in the next week. I'm going to add some material about the work of David G. Hays. He led the MT effort at RAND back in the 50s and 60s, was one of the authors of the ALPAC report, wrote the first textbook in computational linguistics, was instrumental in founding t@Diderothe Association for Computational Linguistics and so forth. I'll be adding an article about Hays himself, then I'll link it to this article, and, as I said, make a some changes to the history section.@ 2001:44C8:440D:F5BF:1:1:CC00:80AF (talk) 20:57, 20 January 2024 (UTC)[reply]

What is all this new stuff in the last two weeks? Most of it has little or no value and none of it makes a hint of sense in an encyclopedia. A lot of it is copyright violation, and even if it wasn't it still doesn't belong here in that form. Deleted most all of it, couldn't figure out exactly which version to revert to. Diderot 19:49, 6 May 2004 (UTC)[reply]

Fact tagged

The PaTrans system requires both manual pre- and post-editing, but the monthly output is still approximately 400,000 words per operator.^{[citation needed]}

Suggestion

Add "Speech Recognition" and "Computational Linguistics" Wikimedia links under "See Also"

Create pages for each of the following organizations (with external links) under "See Also" and linked to "Machine Translation" page.

International Association for Machine Translation (IAMT) http://www.isi.edu/natural-language/organizations/IAMT-bylaws.html

Several links on this page are not working. Typically the pdf links Machine_translation_in_foreign_language.pdf, live_3540...jair.pdf, ...LEPOR.pdf etc. It is good if they are restored correctly. — Preceding unsigned comment added by 112.133.233.109 (talk) 04:14, 15 December 2020 (UTC) Association for Machine Translation in the Americas (AMTA) http://www.amtaweb.org/[reply]

European Association for Machine Translation (EAMT) http://www.eamt.org/

Asia-Pacific Association for Machine Translation (APAMT) http://www.aamt.info/

Association for Computational Linguistics http://www.aclweb.org/

under:See also

I added:

You may wish to remove it, but it seemed to need adressing, the whole catagory of language translators available freely and easilly on the web, without need of so much as a download Cross-Translation Tool comparing SYSTRAN powered services (Babel Fish, Google Language, etc) with other translation services My thanks for the indulgence is sincere

Examples

If no one objects, I'm going to make the Examples into a separate page, maybe

History

does anyone else think this is a joke?

Although there is no system that provides the holy-grail of "Fully automatic high quality machine translation" (FAHQMT), many systems provide reasonable output.

FAHQ MT? Who came up with this acronym? Isn't there a less dramatic and confusing way to summarize that machine translation isn't as high quality as human translation, but produces practical results? - enjone

I think that the actual acronym is FAHQT (Fully Automated High Quality Translation) and dates back to 1950s. Atleast Yehoshua Bar-Hillel uses this acronym on his 1960 article

The Present Status of Automatic Translation of Languages on which he refers to his prior work at 1952 on the first machine translation conference. I couldn't get my hands on that article anywhere, but I'd wager that it is where the term was first coined as suggested by John Hutchins on his 2001 article Machine translation over fifty years (and number of other articles by him). —Preceding unsigned comment added by 78.27.78.126 (talk) 11:57, 31 October 2009 (UTC)[reply]

http://www.isi.edu/natural- 2806:263:8401:751:981E:5E5C:403E:E860 (talk) 09:08, 6 July 2024 (UTC)[reply]

Language Weaver

Language Weaver, developed in 2005, already translates general text from Spanish to English with a high degree of accuracy, rivaling that of weaker human translators.

I'd love to see a source for this. What metrics they used for the evaluation etc. I don't doubt that LW is a damned fine product (it has Kevin Knight working on it!) :), but for such a large claim we'd need some source. - FrancisTyers 09:11, 1 March 2006 (UTC)[reply]

Hello, please help. For 2-3 months now, machine translation into Russian has not been working on Wikipedia Rossiyanka (talk) 15:18, 27 February 2022 (UTC)[reply]

Commercial software

Edited on 20:03, 20 February 2006 FrancisTyers (→Commercial software - google trans is just repackaged systran)

Not anymore. The one I quoted: http://translate.google.com/translate_t was developed by Franz Och in Google Labs: http://googleblog.blogspot.com/2005/08/machines-do-translating.html

Galilite, 1 March 2006

Cool! :) Feel free to re-add this, it might be worth noting on the page that this is an example of a stat mt package available online? - FrancisTyers 12:50, 1 March 2006 (UTC)[reply]

I've done some testing and read the links and I remain unconvinced, at least for the language pairs that are not labelled BETA:

This is a paste from the BBC "This bill brings forward root-and-branch reform I promised ensuring we have a far more comprehensive and co-ordinated system.":

SYSTRAN:

« Cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné. »

GOOGLE:

"cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné."

BABELFISH:

"cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné."

It doesn't specifically say on that googleblog link you pasted that translate.google.com is using this statistical method for all translations. I suspect highly that the ones labelled "BETA" on translate.google.com are using the statistical methods. Perhaps this could be pointed out. - FrancisTyers 13:03, 1 March 2006 (UTC)[reply]

FrancisTyers, thanks for welcoming, correcting me and testing their system. I suspect their announcement and examples with Al Quaeda stuff were pure publicity. Maybe their own MT didn't leave the lab yet. I suggest instead of giving the link to their system quoting their log in another section - after all, it seems significant development. In general, great job cleaning up all this stuff, thanks! - Galilite 09:28, 2 March 2006 (UTC)[reply]

No problem, this article needs a lot of work, and when I've got some more time and experience I'm going to have a shot at it. If you are interested in machine translation, there are loads of papers at http://www.mt-archive.info :) - FrancisTyers 10:43, 2 March 2006 (UTC)[reply]

Yep, actually I'm the one who added it to the list :-) . Galilite 21:36, 2 March 2006 (UTC)[reply]

External links

Time to tidy up the external links section. More coming soon... Note Wikipedia:External links - FrancisTyers 14:51, 31 March 2006 (UTC)[reply]

I've made a start, if anyone disagrees, feel free to comment below. - FrancisTyers 15:11, 31 March 2006 (UTC)[reply]

Qwika — a multiple language search engine of Wikipedias — in beta as of Feb 17,2006

Delete, Not specifically related to machine translation. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Wikimedia Machine Translation Project

Delete, Not the place for advertising this. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Machine Translation, an introductory guide to MT by D.J.Arnold et al. (1994)

Keep, the MTBOOK is good. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Machine Translation Archive by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology

Keep, Invaluable resource. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Machine translation (computer-based translation) — Publications by John Hutchins

Delete, Unless anyone can give a better reason, mt-archive has most of this stuff I think. - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

Francis, first - great job tidying it up, but - John Hutchins is (the only) de-facto chronicler of MT and deserves a special entry. This site is a separate one and many of publications do not appear in MT archive. Galilite 00:12, 5 April 2006 (UTC)[reply]

I checked out the site again, and I agree with you, I'll restore it. I didn't realise it had different stuff from the mt-archive site :) And I agree, he should have his own article, I keep meaning to write one but am having trouble finding any biographical information. - FrancisTyers 09:46, 5 April 2006 (UTC)[reply]

European Association for Machine Translation: EAMT, non-profit org

Delete, Not really necessarily as we have a page. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Association for Machine Translation in the Americas: AMTA, non-profit org

Delete, As above, if we don't already have a page we should have. - FrancisTyers 14:54, 31 March 2006 (UTC)[reply]

Talking to Strangers, an article about MT from Wired; MT Past and Future (timeline); Universal Translators — A look at the hubs for machine translation R&D worldwide

Delete, Crystal ball gazing mostly :) - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

Machine translation (MT), and the future of the translation industry by Yves Champollion

Delete, Although the article probably has useful stuff that could make this article better. - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

http://www.csmonitor.com/2005/0602/p13s02-stct.html Experiment in machine learning based translation by Google

Delete, Not particularly informative. - FrancisTyers 15:11, 31 March 2006 (UTC)[reply]

http://www.nist.gov/speech/tests/mt/mt05eval_official_results_release_20050801_v3.html NIST 2005 Machine Translation Evaluation Official Results
htp://reverent.org/sounds_like_faulkner.html Sounds like Faulkner

Delete, Humourous maybe, but not encyclopaedic. - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

Machine Translation and Minority Languages

Keep, For now... - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

Compendium of Translation Software — Directory of commercial machine translation systems and computer-aided translation support tools

Delete, Might be good to have an OpenDir link maybe? - FrancisTyers 14:58, 31 March 2006 (UTC)[reply]

Ehm... This is an official compendium of EAMT, the most influential MT body. I think OpenDir would be less comprehensive. - Galilite 00:12, 5 April 2006 (UTC)[reply]

I'd tend to agree, but unfortunately you have to pay for it :( Btw, will you be attending the conference in Norway? - FrancisTyers 09:46, 5 April 2006 (UTC)[reply]

Didn't notice that, sorry. Nope, I'm located down under, a bit too far. I am not affiliated with an academic institution, rather trying to get into commercial MT industry... - Galilite 00:15, 6 April 2006 (UTC)[reply]

Image

I created an image from a drawing in John Hutchins Introduction to Machine Translation, it is nice to have an image in the article, but I'm not sure how much it adds :) Btw, anyone can edit that image because it is created as an SVG, there is a free software program to edit it, see Inkscape. - FrancisTyers 10:23, 5 April 2006 (UTC)[reply]

Picture is worth a thousand words, and IMHO it belongs here. - Galilite 23:44, 5 April 2006 (UTC)[reply]

Footnotes

There are many kinds of footnotes, I prefer Footnotes3, so that is what I shall be using. If you wish to make some substantial contributions to the article please select whichever footnotes system you prefer. Please don't make edits which just change the footnotes system. Thanks >___> - FrancisTyers 08:39, 24 April 2006 (UTC)[reply]

Unfortunately, what you are saying goes agains the established Wikipedia policy WP:OWN, which states that no one owns any articles, not even by virtue of being the major or even sole contributor. Please refer to the text at the bottom of every edit page on Wikipedia: If you don't want your writing to be edited mercilessly or redistributed by others, do not submit it. Although maybe you did write a lot of this article, which referencing style you prefer isn't really relevant because of WP:OWN. Cite.php is demonstrably better and the majority of editors prefer it as a referencing style. --Cyde Weys 06:01, 25 April 2006 (UTC)[reply]

Thanks Cyde, I wasn't trying to be an asshole, I was just stating my preference, feel free to change it, but I will change it back when I am working on it. I don't think the majority of editors prefer it. - FrancisTyers 07:47, 25 April 2006 (UTC)[reply]

I just added a HTML comment requsting people check on the talk page before changing the citation format; IMO, there's no need to alter the format on articles which have a regular editor who prefers another format; there are still thousands of articles where no-one activly prefers the current format, and to me, it's better to work on those first. Nice article, btw. JesseW, the juggling janitor 23:59, 26 April 2006 (UTC)

Thanks, I've rewritten the History section, and I'm hoping to do the rest when I get some more time :) - FrancisTyers 00:09, 27 April 2006 (UTC)[reply]

True, it'll be quite a while yet before we reach the situation where there are just a few "holdouts" of the older referencing style to figure out what to do with. But this article's near the top of the first page of Special:Whatlinkshere/Template:Ref, so it's probably going to see a lot of wikignomes like me stumbling across it until then (I was half a second away from clicking "save page" with the ref formatting updated when I saw the HTML comment myself). May lead to a lot of unfortunte reverting, all done in good faith. Bryan 07:22, 10 May 2006 (UTC)[reply]

Yeah, I have been in discussion with the developer to try and work a way around it, and other users have too, see the talk page on Wikipedia_talk:Footnotes. - FrancisTyers 12:19, 10 May 2006 (UTC)[reply]

Removed history section

The first attempts at machine translation were conducted after World War II. It was assumed at this time that the newly invented computers would have no trouble in translating texts. The reasoning was that computers were able to do complex mathematics quickly, something that humans did with more difficulty. On the other hand, even young children were able to learn to understand human language; therefore, computers could do the same. In actual fact, this belief was soon shown to be incorrect.

On 7 January 1954, the Georgetown-IBM experiment, the first public demonstration of a MT system, was held in New York at the head office of IBM. The demonstration was widely reported in the newspapers and received much public interest. The system itself, however, was no more than what today would be called a "toy" system, having just 250 words and translating just 49 carefully selected Russian sentences into English — mainly in the field of chemistry. Nevertheless it encouraged the view that MT was imminent — and in particular stimulated the financing of MT research, not just in the US but worldwide.

The first serious MT systems were used during the Cold War to parse texts in Russian scientific journals. The rough translations produced were sufficient to understand the "gist" of the articles. If an article discussed a subject deemed to be of security interest, it was sent to a human translator for a complete translation; if not, it was discarded. The governmental support was however cut down in 1966, after the report of ALPAC, a committee established in order to review the investments, which considered that machine translation, despite the expenses, was not likely to reach the quality of a human translator.

Although the ALPAC report had tremendous impact on research in machine translation, there were notable exceptions; SYSTRAN, for example, managed to attract commercial and defence/security customers and survived the decrease of direct governmental funding. Limited field of use systems have also been successful in a number of specialized applications, for instance the METEO System has been used in Canada since 1977 to translate weather forecasts from English to French and now translates close to 80,000 words a day or 30 million words a year.

The advent of low-cost and more powerful computers towards the end of the 20th century brought MT to the masses, as did the availability of sites on the Internet. They are of particular interest to countries in East Asia wishing to export to the North American and European markets.

Much of the effort previously spent on MT research, however, has shifted to the development of computer-assisted translation (CAT) systems, such as translation memories, which are seen to be more successful and profitable. Although the two concepts are similar, machine translation (MT) should not be confused with computer-assisted translation (CAT) (also known as machine-assisted translation (MAT)).

In machine translation, the translator supports the machine, that is to say that the computer or program translates the text, which is then edited by the translator, whereas in computer-assisted translation, the computer program supports the translator, who translates the text himself, making all the essential decisions involved.

Removed for new sub article, feel free to merge in stuff from this. - FrancisTyers 15:00, 25 April 2006 (UTC)[reply]

Removed from Users

It has been reported that in April 2003 Microsoft began using a hybrid MT system for the translation of a database of technical support documents from English to Spanish. The system was developed internally by Microsoft's Natural Language Research group. The group is currently testing an English-Japanese system as well as bringing English-French and English-German systems online. The latter two systems use a learned language generation component, whereas the first two have manually developed generation components. The systems were developed and trained using translation memory databases with over a million sentences each.

Probably true, but I'd like to see a citation. - FrancisTyers · 13:38, 7 June 2006 (UTC)[reply]

Interesting fact

I have copied the following from talk:Sans-culottes. It's not relevant there. It might be relevant here.- Jmabel | Talk 21:39, 11 September 2006 (UTC)[reply]

[Begin copied text]

"Culottes" is also french for "knickers" or "panties":

http://babelfish.altavista.com/tr?doit=done&intl=1&tt=urltext&trtext=panties&lp=en_fr&btnTrTxt=Translate ("panties" translated into French)

"sans" means "Without". —The preceding unsigned comment was added by 86.132.47.53 (talk • contribs) 8 September 2006.

[End copied text]

Standardized English Wikipedia

I have a suggestion for a machine-translatable Wikipedia. The current state of machine translation is that it is not possible to reliably translate natural language. In fact, complete grammars of natural languages have yet to be written.

However, a basic English grammar that fulfills the purposes of most communication would be easy to write in a few dozen augmented context-free rules. The syntax would be unambiguous and therefore more easy to reliably parse by machine. And an unambiguous vocabulary could be developed. This has been done before, for example, by Caterpillar and Xerox, and some others.

Once this grammar and vocabulary were developed, it could be used to start a new standardized English Wikipedia, something like std.wikipedia.org. This Wikipedia would have an automatic standardized grammar checker built-in so that edits would have to be grammatical in order to be saved. That is not as difficult as it sounds, since as an automatically grammar-checked page it can provide grammar hints.

In addition, the vocabulary to the language could be extensible by user suggestion, so that proper names can be added.

This version of Wikipedia would also be somewhat more resistant to idle vandalism.

Once this new Standardized English Wikipedia were in place, it should be possible to develop a program to automatically and (mostly) reliably translate between English and other languages. That is my suggestion. LaggedOnUser 17:05, 1 October 2006 (UTC)[reply]

If only it were that easy… --Sabik 15:29, 7 November 2006 (UTC)[reply]

Something like the Voice of America special English broadcasts that use a carefully chosen restricted vocabulary. For general use, I think machine translation is best for someone who has had a year or two of the foreign language that the machine is translating. That way you can compare the text yourself so you won't be tricked by a "poison cookie" the machine messes up on. Good Wikipedia articles are better than average as targets for machine translation since they are usually fairly well written and don't have overly complex grammar. For example, I've tried using Google Translate's Arabic -> English translation engine. The results are mostly understandable but I wouldn't rely on it since I haven't studied Arabic. However, I did take a year of Russian and feel confident enough that I could puzzle out where the machine messes up so I would be much more confident using machine translation from Russian.DavidCowhig 06:04, 2 January 2007 (UTC)[reply]

Updated references on Google Translate and Machine Translation

Updated references on Google Translate and machine translation.DavidCowhig 05:53, 2 January 2007 (UTC)[reply]

Boa tarde, Nádia!

Vou-te enviar amanhã por correio azul uma carta da Deco Pró teste

no seguimento de uma reclamação feita por um cliente.

O ponto da situação é o seguinte:

As encomendas dos produtos que estão em falta já estão feitas ao

fornecedor pela Rosa Maria e pela Graça Henriques, aguardamos a

sua chegada para procedermos à entrega dos mesmos.

Se necessitares de mais alguma informação estou ao teu dispor.

Atentamente

Marta Martins

Rare language

It would be useful to say if there is a free MT project somewhere, and what could be the cost of building such a system for one pair of languages (for instance in term of the capital of a company doing the job). My underlying question is: "Is it reasonable to expect/promote a MT for rare languages, and if yes under which economic model (community, proprietary)? " (I live in Mongolia.)--Henri de Solages 12:46, 27 June 2007 (UTC)[reply]

Hi, I've replied on your talk page. - Francis Tyers · 15:10, 27 June 2007 (UTC)[reply]

Incorrect theory presented

Article currently states:

> The translation process may be stated as:

  1. Decoding the meaning of the source text; and
  2. Re-encoding this meaning in the target language.

Behind this ostensibly simple procedure lies a complex cognitive operation. <

I think this is false, I can't make much sense of "decode", what should mean exactly is quite blurred. In reality machine translation usally works like this:

Translate from input natural language A to common artificial language X Translate from common artificial language X to output natural language B (where X may be esperanto or other construct)

Not that it matters much, without true AI the net result will always be like "bard in, junk out". 82.131.210.162 18:47, 20 July 2007 (UTC)[reply]

Does anybody know of any other MT applications which can translate English to Hebrew other than Babylon and 1-800-translate ?

If you do, please let me know. Acidburn24m 05:18, 16 November 2007 (UTC)[reply]

Problems with machine translation

This entry on Language Log gives a cautionary example of what can go wrong when translating from Chinese to English! -- Arwel (talk) 22:59, 9 December 2007 (UTC)[reply]

And here is another example of computer-mangled translation [ http://www.worldlingo.com/ma/dewiki/en/Wilfrid_Michael_Voynich] Jackiespeel (talk) 17:26, 8 March 2011 (UTC)[reply]

Acronyms

MT is not an acronym. It is an abbreviation. An acronym is a word that is formed from the initial letters of other words (e.g. LASER, RADAR).

Stephen Shaw (talk) 00:34, 23 March 2008 (UTC)[reply]

References

There is only one book listed in the references and that is from 1992. New doesn't mean better, but I am sure there have been books written after that. Is there a book that goes through all the steps of building a simple translation between two languages (say english and french or english and latin)? The books seem to talk about generalities rather than a specific implementation. If I am a beginner who wants to build a translator between english and some new language, it would be really helpful to have a working program to tweak with. If you any such book, please let me know. Thanks. Kanfoo (talk) 19:35, 24 April 2008 (UTC)[reply]

Methionine [] —Preceding unsigned comment added by 203.160.1.74 (talk) 02:24, 20 March 2009 (UTC)[reply]

The reference to "Wotran" doesn't seem to connect to any statement in the article; if that doesn't change soon, we should remove it. --Jim Henry (talk) 21:50, 19 April 2009 (UTC)[reply]

ABBYY Compreno

This removal does not seem to be well substantiated. The material is based on multiple independent secondary sources. -- Nazar (talk) 23:16, 28 May 2012 (UTC)[reply]

I moved more specific details about Compreno to the ABBYY article, as they likely do not fit so well into general MT article. A short summary of the upcoming technology based on previews in Russian IT-News magazines was added after the Applications section. I still encourage the Russian speaking users whom I addressed earlier to additionally confirm the information based on the sources. Thanks. -- Nazar (talk) 20:37, 29 May 2012 (UTC)[reply]

I am happy to believe your sources (assuming good faith) but this system does not yet exist according to your sources. Adding it to the MT page is therefore premature. In particular, I see no reason for giving it more weight than major players like Systran or Atlas. When the system actually exists, then it may be worth adding here, until then I think it fits better in the ABBYY page. Francis Bond (talk) 01:50, 30 May 2012 (UTC)[reply]

I think the general MT article should ideally speak about approaches, theory, major obstacles and ways to overcome them (both in functional systems and those being developed). From this point of view it is interesting to mention the USH approach (Russian reviewers actually provided even several examples of small semantic tree fragments), as well as the information about the resulting unprecedented quality of translation (English-Russian-English examples of handling typical difficulties encountered by currently functional MT systems are given in reviews as well). The reviewers' claims actually go as far as calling this technology "a revolution", and, well, if it will work as good as those examples they show, it will be a revolution indeed...

But, as to waiting for the existing system to come, ABBYY will likely launch it mostly for the Russian market in the beginning phase of deployment (they usually do that with all their new products). Also, since first versions are expected to translate only between English and Russian (and not so many international reviewers are familiar with Russian), the coverage is likely to be predominantly in Russian sources in the nearest time. The reviews also mention great complexity and engineering challenges of building comprehensive semantic trees, which takes lots of time and human resources.

To summarize, I'd suggest describing here at least the general theory of upcoming approach and mention the high expectations about its quality (we'll see in time if they were justified), as well as suggested capacity to overcome the failures of existing systems. -- Nazar (talk) 08:42, 30 May 2012 (UTC)[reply]

I've rewritten the summary to neutrally address in short the approaches used and potentials claimed by reviewers. Please don't blank the whole chapter in case of further issues. Discuss, change and improve. Thanks. -- Nazar (talk) 11:12, 30 May 2012 (UTC)[reply]

Also, I'm not trying to give it more weight than Systran or Atlas. If you have reliable secondary source reviews of these or other MT providers, I encourage you to expand their description in the article, add details about used approaches, evaluations of quality of translation etc. ABBYY is not offering finalized and publicly open MT services at this moment at all. Compreno development is only in beta phase. But ABBYY is undoubtedly a major and serious player on the Russian and international market of OCR, document conversion, document capture, dictionary software, and Computer-assisted translation. In Russia ABBYY would arguably be the No.1 player in these fields (with over 5 million Russian users of its Lingvo and market leadership of FineReader), especially when considering the cross-field coverage of mentioned areas by a single company. -- Nazar (talk) 12:33, 30 May 2012 (UTC)[reply]

User:Ffbond is on point. The machine translation article has no interest in speculative reporting on upcoming products. Furthermore, WP is not a newspaper or a place for advertising.

I looked at 10 references, and only 3 or 4 were about the technology. Endowing university chairs or obtaining major funding does not go to the article's topic. Some references were interviews of ABBYY principals, so they fail as independent sources (they are essentially press releases). The affiliation of the most detailed article was not clear; if by ABBYY employees, it would fail independence. Given the level of detail, I suspect employees. In any event, at this point ABBYY controls the information sources, so even nominally independent sources would be getting their information through ABBYY. I do not see independent secondary source reviews.

Glrx (talk) 21:28, 30 May 2012 (UTC)[reply]

Oh, well... Let's then just link it to the ABBYY article. Like the other titles (many without any refs at all) in the Applications section. -- Nazar (talk) 21:40, 30 May 2012 (UTC)[reply]

I just want to really quickly agree with Nazar that there are some good sources there; I haven't had enough time to really look in detail, --Interchange88 ☢ 01:59, 2 June 2012 (UTC)[reply]

Universal Semantic Hierarchy

Now the Syntactic and Semantic parser based on ABBYY Compreno Linguistic technologies has been presented on the 18th annual International Conference on Computational Linguistics, held in Moscow from May 30th to June 3rd. Would anyone still object to including some info into the article? I was thinking about something like this:

Universal Semantic Hierarchy

Universal Semantic Hierarchy approach had been used in the upcoming ABBYY Compreno technology and presented in several reviews in Russian IT press, as well as analyzed in a presentation during Dialog 2012, the 18th annual International Conference on Computational Linguistics, held in Moscow from May 30th to June 3rd. It is aimed at building an accurate natural-language-independent semantic model of the processed source text with the help of USH trees.^[1]^[2]^[3]^[4]^[5]^[6]^[7]^[8]^[9]^[10]^[11]^[12]^[13]^[14]^[15]^[16]

References

^ http://www.dialog-21.ru/en/dialog2012/ Dialog 2012, the 18th annual International Conference on Computational Linguistics
^ Syntactic and Semantic parser based on ABBYY Compreno Linguistic technologies http://www.dialog-21.ru/digests/dialog2012/materials/pdf/Anisimovich.pdf
^ http://www.computerra.ru/sgolub/663954/ Голубятня: Чудо Compreno
^ http://biz.cnews.ru/news/top/index.shtml?2011/02/28/429739 Abbyy получила 450 млн рублей от «Сколково»
^ http://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов
^ http://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation
^ http://www.3dnews.ru/software/624398 Лингвистические технологии ABBYY. От сложного — к совершенному
^ http://www.kommersant.ru/doc/1822898?stamp=634588995841938586 Программисты считают, что научили машину понимать смысл текста
^ http://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов
^ http://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation
^ http://www.it-weekly.ru/news/itnews/187189.html Компьютерная лингвистика получит пополнение
^ http://biz.cnews.ru/news/line/index.shtml?2012/05/15/489462 В РГГУ и МФТИ открыты кафедры «Компьютерной лингвистики» при поддержке Abbyy и IBM
^ http://www.pcweek.ru/business/article/detail.php?ID=139282 ABBYY: через мобильность и облака к интеллектуальной лингвистике
^ http://www.abbyy.ru/science/technologies/business/compreno#Section_3; Технология синтаксического и семантического анализа текста ABBYY Compreno
^ http://www.pcweek.ru/gover/article/detail.php?ID=129782 Прорывная технология машинного перевода и вокруг неё
^ http://www.abbyy.ru/Default.aspx?DN=0b4e8dd2-c3be-45c9-bad0-8b750c7bd776 «Мы сэкономили 10% ресурсов и год работы»

Comments

Oppose. The proposal sounds in advertising. The proposal uses a loaded "Universal Semantic Hierarchy". It is about something that is "upcoming" or "aimed at". WP is not a newspaper or a research journal; it should be cautious about reporting on new technology. The long list of references does not meet the notion of independent, reliable, sources. WP also wants secondary sources rather than primary sources. We cannot pass judgment on the merits of new technology based on primary sources; see WP:NOR. We need some authorities to look at that new technology, compare it to other technology, and state why it is significant. WP:UNDUE Glrx (talk) 16:32, 19 June 2012 (UTC)[reply]
Would you consider joining the discussion here? I'm very doubtful about most of your points above, but let's discuss it there, to avoid double threading. Thanks. -- Nazar (talk) 17:44, 19 June 2012 (UTC)[reply]

More publications on Compreno

Почему трудно быть IT-Богом? by Анна Солдатова. E-xecutive. 2 Jul 2012 http://www.e-xecutive.ru/education/adviser/1681597/ -- Nazar (talk) 22:17, 10 July 2012 (UTC)[reply]
А теперь Compreno by Александр Евдокимов. Hard-n-Soft. H814980/2012/6. Page 18-19. Published: 10 june 2012. -- Nazar (talk) 22:35, 10 July 2012 (UTC)[reply]
Anatoly Starostin. Using ABBYY Compreno technology for solving various NLP-tasks. The Eighth Spring Researchers Colloquium on Databases and Information Systems 2012. http://syrcodis.ispras.ru/2012/invited-talks -- Nazar (talk) 22:45, 10 July 2012 (UTC)[reply]
Development of Chinese language lexical-semantic dictionary for the multi-language NLP system http://www.dialog-21.ru/digests/dialog2012/materials/pdf/160.pdf
ABBYY инвестирует в понимание http://it-world.ru/news/itnews/182710.html

-- Nazar (talk) 23:07, 10 July 2012 (UTC)[reply]

Генеральный директор ABBYY Россия о будущем OCR и облачных сервисах. http://www.computerra.ru/interactive/703174/

-- Nazar (talk) 13:41, 27 August 2012 (UTC)[reply]

ABBYY Compreno Technology: Semantic to understand the meaning! http://www.abbyy-developers.eu/en:tech:linguistic:semanitc-intro
От автоматической обработки текста к машинному пониманию http://www.polit.ru/article/2013/03/26/vladimir_selegey/
ABBYY Compreno Technology White paper http://www.abbyy-developers.eu/_media/en:tech:linguistic:en_linguistic_technologies_abbyy_compreno_technology_white_paper.pdf

-- Nazar (talk) 13:53, 28 March 2013 (UTC)[reply]

ABBYY pushes the boundaries of computer linguistics http://rbth.ru/society/2013/02/12/abbyy_pushes_the_boundaries_of_computer_linguistics_22785.html -- Nazar (talk) 18:26, 8 April 2013 (UTC)[reply]

Copyright problem removed

Prior content in this article duplicated one or more previously published sources. The material was copied from: http://www.mt-archive.info/EAMT-2003-Babych.pdf. Copied or closely paraphrased material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Glrx (talk) 16:03, 26 June 2012 (UTC)[reply]

Can you please provide citations of large pieces of material copied from the source? Why haven't you made attempts to work on a shorter resume of the mentioned material in the article? -- Nazar (talk) 16:36, 26 June 2012 (UTC)[reply]

Crowd sourced translation

This series of edits is not really about machine translation or the narrowly defined hybrid translation. Consequently, I believe they are off topic and should be deleted. The appropriate place would be in an article involving human translators. Glrx (talk) 19:35, 27 July 2012 (UTC)[reply]

Separate Page that might need to be incorporated into the main page

https://en.wikipedia.org/wiki/Crowdsourcing_as_Human-Machine_Translation This topic seems an odd choice to have it's own page, especially due to its title. I do not know if it would be better off under the Crowdsourcing page or the Machine translation page. I will add this same post to the Crowdsourcing page. StartlingCanary2 (talk) 16:02, 17 September 2014 (UTC) — Preceding unsigned comment added by StartlingCanary2 (talk • contribs)

Redirection from "Machine Translations" means no link to the musical act of that name

As subject says, when I enter "machine translations" in to Wiki search, it redirects directly to this page. Yet there is a wikipedia page called "Machine Translations" discussing to the musical act of that name. To get to that page I have to search for ("J. Walker" "machine translations") in its entirety. Could some competent wiki editor fix this situation please? — Preceding unsigned comment added by 58.175.66.57 (talk) 12:12, 4 May 2015 (UTC)[reply]

Done — ¾-10 22:52, 4 May 2015 (UTC)[reply]

Problems with neural machine translation

I am working on neural machine translation (NMT), so I thought I'd contribute to Wikipedia by adding what I know. My section on NMT was removed from this page ten days ago because of the following reasons: no secondary sources and WP:COI. NMT is quite a specialized field with few experts, so adding my paper didn't seem like a transgression, but I decided to comply and removed the reference to my work. I also took the trouble to find a popular reference to NMT in the PC World magazine and another site.

Two days ago, my revised contribution was reverted again by the same person. This time, the explanation went: Following link tripped Antivirus software; No secondary sources; take to talk page. So I'm taking it to the talk page. It seems that my computer is not so sensitive to viruses, because those links tripped nothing on it. I guess other users could test them.

Finally, I am amazed at the strictness shown here regarding the sources. If PC World is not a secondary source, then what is? I don't think the New York Times will ever have an article on neural machine translation. Aren't you being too demanding? --Krz.wolk (talk) 00:46, 31 October 2015 (UTC)[reply]

First, triggering an antivirus warning is a bad thing.

Wikipedia is not supposed to cover all knowledge. It is an encyclopedia, and the articles that it covers need to [[WP:N]otable. The article on machine translation is notable. There are lots of systems that do it, Google Translate is a winner, and there are many books. Within each article, the requirement is that any subtopic have due weight. That basically says the subtopic should be covered in some reference texts about machine translation or that some prominent authorities tell Wikipedia that the subtopic is important. If that isn't the case, then "If a viewpoint is held by an extremely small (or vastly limited) minority, it does not belong on Wikipedia, regardless of whether it is true or you can prove it, except perhaps in some ancillary article."

The PC World article[1] is not a secondary source. The author, Loek Essers, does not claim to be an expert on translation but rather just a correspondent: "Loek Essers focuses on online privacy, intellectual property, open source, and online payment issues for the IDG News Service." More significantly, Essers is not judging the merits of neural network translation but rather just quoting others: "The neural network is able to derive grammatical functions of words without having explicit knowledge of the grammar, said Ke Tran, one of the researchers." He's not providing an independent assessment, so the article is a primary source. Wikipedia needs secondary sources to tell us the work is relevant. BTW, the article reads like a reworked press release.

My impression is neural network translation is a new research field that is small and immature.

It's also clear that you have a conflict of interest. Your work may be very good, the field may be promising, but the topic is not prominent enough yet. It may be good for you to raise the profile of the neural network translation, but Wikipedia is not the place for such a campaign. When the profile is high enough, then it is reasonable for WP to cover it.

Glrx (talk) 01:46, 31 October 2015 (UTC)[reply]

While hoping I am not being too stubborn, I feel I need to point out two issues, which are unrelated to each other:

(1) Why are the other "Approaches" subtopics more relevant than mine when at least three of them have no secondary sources: Transfer-based machine translation, Interlingual machine translation and Dictionary-based machine translation? This was pointed out in 2009 already. In fact, Transfer-based machine translation has no sources at all, either primary or secondary.

(2) Let us assume that neural machine translation is not relevant enough to warrant a subtopic in this article. But the facts that it is the object of serious scientific papers and that PC World considered it serious enough to dedicate an article to it should be enough for it to be mentioned, at least. Could I not add a sentence somewhere in this article? An either/or policy, where a thing is either a subtopic or nothing at all, would be extremely rigid and discriminatory.

--Krz.wolk (talk) 01:21, 1 November 2015 (UTC)[reply]

WP:OTHERSTUFFEXISTS is not an argument that carries weight. Maybe the other approaches should be tagged and/or removed. One of the articles has been tagged. The other articles should be tagged for using primary sources. At least two of the articles make colorable claims to notability. If I search for "dictionary-based machine translation", I get lots of hits -- including a hit in an encyclopedia. If I search for "neural network machine translation", I only get two pages of hits, and they say it is a new research topic.

PC World looks like a reworked press release and quotes primary sources rather than evaluates them. It is neither an independent nor a secondary source. The issue is WP:DUE. Glrx (talk) 02:10, 1 November 2015 (UTC)[reply]

Neural machine translation is an exploding field of enormous importance. I was shocked to find no mention of it here. Is its exclusion perhaps an unintentional POV problem? Or is the problem that so many of the papers are so recent because the field is moving so fast? See: Google Scholar, "neural machine translation" 85.47.30.142 (talk) 21:39, 2 April 2016 (UTC)[reply]

The state of this article is embarrassing. There is not a single reference to Transformers [2], the most important method ever devised to tackle machine translation. Just look at the scores on [3]. The article must be written from the ground up in order to focus on recent advances. — Preceding unsigned comment added by EmilianPostolache (talk • contribs) 00:35, 3 November 2020 (UTC)[reply]

Problems with the absence of neural machine translation

Inasmuch as neural machine translation is the new state of the art, the absence of any reference to this body of research on this page is an embarrassment. Google Translate has switched over to the neural technology for its main language pairs. See: “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” (2016) and “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation” (2016). You can Google for them on the Arxiv. 195.77.247.14 (talk) 18:14, 17 December 2016 (UTC)[reply]

Claude Piron

His comments on MT are more or less repeated in the Disambiguation and later the Evaluation section - the first time he's 'Claude Piron, a long-time translator for the United Nations...', and the second time he's 'The late Claude Piron...', as if he hasn't already been mentioned. Looks like the article hasn't been properly edited.213.127.210.95 (talk) 15:14, 26 July 2017 (UTC)[reply]

External links modified

Hello fellow Wikipedians,

I have just modified 2 external links on Machine translation. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 12:55, 11 January 2018 (UTC)[reply]

Missing or empty |title= (help)

Machine_translation#cite_ref-31 error, and resulting "Missing or empty |title= (help)" --36.80.170.118 (talk) 04:07, 5 April 2021 (UTC)[reply]

Léon Dostert

I find it quite strange that this name is not mentioned in the Article as almost a full chapter is dedicated to him in ^[1] as a major acting force in the development of MT. TomyDuby (talk) 15:26, 12 December 2021 (UTC)[reply]

References

^ Michael D Gordin, 'Scientific Babel'. Profile Books 2017. ISBN 978-1-781-25115-7, Chapter 8.

2000 POV

This is a weirdly outdated article, with a POV stuck in the year 2000 or thereabouts. At this point, MT is used all over the world, and systems like GoogleTranslate and DeepL provide very high quality fully automated MT. There is no discussion of the https://en.wikipedia.org/wiki/Transformer_(machine_learning_model) that made this possible. I'd be happy to add material but from the talk page it seems some of the editors are actively hostile to NMT. — Preceding unsigned comment added by SnoTraveller (talk • contribs) 21:34, 16 March 2022 (UTC)[reply]

Introduction

The first sentence: Machine translation ... is a sub-field of computational linguistics ... - is that not secondary to what it actually is: automatic translation of texts from one language into another language by a computer program? Wammes Waggel (talk) 12:32, 12 April 2023 (UTC)[reply]

Addition to Application Section

If you all don't mind, I am going to add a new subsection in the Applications section about MT use in law, since it is being investigated and I feel that it is significant due to the importance and difficulty of accurate translating in the legal sector. I'm also going to add a little bit to the medical applications subsection that goes into depth about research regarding MT use.

Arkenly (talk) 00:27, 6 December 2023 (UTC)[reply]

[1] ttp://www.dialog-21.ru/en/dialog2012/ Dialog 2012, the 18th annual International Conference on Computational Linguistics

[2] Syntactic and Semantic parser based on ABBYY Compreno Linguistic technologies http://www.dialog-21.ru/digests/dialog2012/materials/pdf/Anisimovich.pdf

[3] ttp://www.computerra.ru/sgolub/663954/ Голубятня: Чудо Compreno

[4] ttp://biz.cnews.ru/news/top/index.shtml?2011/02/28/429739 Abbyy получила 450 млн рублей от «Сколково»

[5] ttp://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов

[6] ttp://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation

[7] ttp://www.3dnews.ru/software/624398 Лингвистические технологии ABBYY. От сложного — к совершенному

[8] ttp://www.kommersant.ru/doc/1822898?stamp=634588995841938586 Программисты считают, что научили машину понимать смысл текста

[9] ttp://www.abbyy.ru/science/technologies/business/compreno#Section_2; Синтаксический и семантический анализ текстов

[10] ttp://www.youtube.com/watch?v=HPlV9mzqeFQ Introducing ABBYY Compreno -- new approach to machine translation

[11] ttp://www.it-weekly.ru/news/itnews/187189.html Компьютерная лингвистика получит пополнение

[12] ttp://biz.cnews.ru/news/line/index.shtml?2012/05/15/489462 В РГГУ и МФТИ открыты кафедры «Компьютерной лингвистики» при поддержке Abbyy и IBM

[13] ttp://www.pcweek.ru/business/article/detail.php?ID=139282 ABBYY: через мобильность и облака к интеллектуальной лингвистике

[14] ttp://www.abbyy.ru/science/technologies/business/compreno#Section_3; Технология синтаксического и семантического анализа текста ABBYY Compreno

[15] ttp://www.pcweek.ru/gover/article/detail.php?ID=129782 Прорывная технология машинного перевода и вокруг неё

[16] ttp://www.abbyy.ru/Default.aspx?DN=0b4e8dd2-c3be-45c9-bad0-8b750c7bd776 «Мы сэкономили 10% ресурсов и год работы»

[17] Michael D Gordin, 'Scientific Babel'. Profile Books 2017. ISBN 978-1-781-25115-7, Chapter 8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[1]