原标题:
polyglot machines
why ai needs to learn new languages
efforts are under way to make ai fluent in more than just english
多语种机器。
为什么人工智能需要学习新语言。
努力使AI流利地使用英语和其他语言
paragraph 1]
chatgpt, a chatbot developed by openai, an american firm, can give passable answers to questions on everything from nuclear engineering tostoicphilosophy.
ChatGPT 是由美国公司 OpenAI 开发的聊天机器人,可以回答有关核工程、斯多葛哲学等的各种问题。
or at least, it can in english. the latest version, chatgpt-4, scored 85% on a common question-and-answer test.
至少,它可以用英语回答。 最新版本的 ChatGPT-4 在常见的问答测试中获得了 85% 的分数。
in other languages it is less impressive. when taking the test in telugu, an indian language spoken by nearly 100m people, for instance, it scored just 62%.
但是当你用其他语言回答时,表现就不是那么好了。 例如,在近 1 亿人使用的印度泰卢固语问题中,得分仅为 62%。
paragraph 2]
openai has not revealed much about how chatgpt-4 was built. but a look at itspredecessor, chatgpt-3, issuggestive
OpenAI 没有透露太多关于 ChatGPT-4 的构建。 但看看旧版本的ChatGPT-3,可以找到一些线索。
large language models (llms) are trained on text scraped from the internet, on which english is thelingua franca. around 93% of chatgpt-3’s training data was in english.
大型语言模型 (LLM) 是在从互联网上抓取的文本上训练的,英语是互联网上的通用语言。 大约 93% 的 ChatGPT-3 训练数据是英文的。
in common crawl, just one of the datasets on which the model was trained, english makes up 47% of the corpus, with other (mostly related) european languages accounting for 38% more.
在用于模型训练的 Common Crawl 数据集中,英语占整个语料库的 47%,其他主要欧洲语言占另外 38%。
chinese and japanese combined, by contrast, made up just 9%. telugu was not even arounding error.
相比之下,中国人和日本人加起来只占9%。 泰卢固语甚至不包括在统计数据中。
paragraph 3]
an evaluation by nathaniel robinson, a researcher at johns hopkins university, and his colleagues finds that is not a problem limited to chatgpt.
约翰·霍普金斯大学研究员纳撒尼尔·罗宾逊(Nathaniel Robinson)及其同事的一项评估发现,这不是一个只有ChatGPT的问题。
all llmsfare betterwith “high-resource” languages, for which training data are plentiful, than for “low-resource” ones for which they are scarce.
所有大型语言模型在“高资源”语言上表现更好,因为它们有充足的训练数据,而在“低资源”语言上表现更差,因为训练数据稀缺。
that is a problem for those hoping to export ai to poor countries, in the hope it might improve everything from schools to health care.
对于那些希望将人工智能技术出口到贫穷国家以改善学校、医疗保健和其他方面的条件的人来说,这已经成为一个问题。
researchers around the world are therefore working to make ai more multilingual.
因此,世界各地的研究人员都在努力使人工智能具有多语言性。
paragraph 4]
india’s government is particularly keen. many of its public services are already digitised, and it is keen to fortify them with ai.
印度**对此特别热衷。 印度的许多公共服务已经数字化,现在正在寻求人工智能技术来增强这些服务。
in september, for instance, it launched a chatbot to help farmers get information about state benefits.
例如,去年9月,印度推出了一个聊天机器人,帮助农民获取有关国家福利的信息。
paragraph 5]
the bot works by welding two sorts of language model together, says shankar maruwada of the ekstep foundation, a non-profit that helped build it.
该机器人通过结合两种语言模型来工作,非营利组织 Ekstep** 的 Shankar Maluwada 说,它提供了支持。
users can submit queries in their native tongues. (eight are supported so far; five more are coming soon.)
用户可以用他们的母语提交问题。 (目前支持 8 种语言,即将推出 5 种语言。 )
these are passed to a piece of machine-translation software developed at iit madras, an indian academic institution, which translates them into english.
这些问题被发送到印度马德拉斯理工学院开发的机器翻译软件,该软件将问题翻译成英语。
the english version of the question is then fed to the llm, and its response translated back into the user’s mother tongue.
然后,将问题的英文版本输入到大型语言模型中,并将问题的答案翻译回用户的母语。
paragraph 6]
the system seems to work. but translating queries into an llm’s preferred language is a ratherclumsyworkaround
这个系统看起来工作正常。 但是,将问题翻译成大型语言模型首选的语言实际上是一种不太方便的解决方法。
after all, language is a vehicle for worldviews and culture as well as just meaning, notes the boss of one indian ai firm.
印度一家人工智能公司的老板指出,语言毕竟是世界观和文化的载体,也是意义的载体。
a **by rebecca johnson, a researcher at the university of sydney, published in 2022, found that chatgpt-3 g**e replies on topics such as gun control and refugee policy that aligned most with the values displayed by americans in the world values survey, a global questionnaire of public opinion.
悉尼大学研究员丽贝卡·约翰逊(Rebecca Johnson)在2022年发表了一篇文章,她发现ChatGPT-3对枪支管制和难民政策等话题的回应与美国人在民意调查《全球价值观普查报告》中展示的价值观非常一致。
恭喜您阅读,这个英语词汇量约为 481 942)。
原文发表于2024年1月27日TE的科学与技术部分。
精读笔记** 在:自由英语之路。
本文由Irene翻译和整理
由Irene编辑和校对
它仅用于个人英语学习交流。
[补充信息]。(来自互联网)。
斯多葛派哲学是希腊哲学的一个流派。 直到公元前三世纪,该教派在罗马和希腊盛行。 斯多葛学派在哲学个人道德领域占有重要地位,拥有一套逻辑体系和一套关于物质世界的观点。 斯多葛学派的理论认为,人类作为社会性动物,要想幸福,就必须接受生活的起起落落,不能被欲望或恐惧所影响。 斯多葛学派主张人类应该用自己的智慧去理解世界,与他人合作,以公平公正的方式对待他人。
[关键句子]。(3 个)。
large language models (llms) are trained on text scraped from the internet, on which english is the lingua franca.
大型语言模型 (LLM) 是在从互联网上抓取的文本上训练的,英语是互联网上的通用语言。
all llms fare better with “high-resource” languages, for which training data are plentiful, than for “low-resource” ones for which they are scarce.
所有大型语言模型在“高资源”语言上表现更好,因为它们有充足的训练数据,而在“低资源”语言上表现更差,因为训练数据稀缺。
after all, language is a vehicle for worldviews and culture as well as just meaning, notes the boss of one indian ai firm.
印度一家人工智能公司的老板指出,语言毕竟是世界观和文化的载体,也是意义的载体。