【科学种子科技英语】自主学习的AI目前称霸围棋战略游戏GO

Artificial-intelligenceprogram AlphaGo Zero trained in just days, without any human input.

人工智能程序AlphaGoZero仅在几天内进行了没有人类对战的训练。

AlphaGo Zero came up with Go strategies that human players haven'tinvented in thousands of years.
AlphaGo Zero发明了人类围棋玩家几千年来从未发现的新玩法。

An artificialintelligence (AI) program from Google-owned company DeepMind has reachedsuperhuman level at the strategy game Go — without learning from any humanmoves.

来自Google拥有的DeepMind公司的人造智能（AI）计划已经在没有学习人类下棋策略的情况下，在围棋战略游戏Go中达到了超人的水平。

This ability toself-train without human input is a crucial step towards the dream of creatinga general AI that can tackle any task. In the nearer-term, though, it couldenable programs to take on scientific challenges such as protein folding ormaterials research, said DeepMind chief executive Demis Hassabis at a pressbriefing. “We’re quite excited because we think this is now good enough to makesome real progress on some real problems.”

这种在没有人类对战的的情况下，进行自我训练的能力对实现未来AI能够解决任何任务的梦想来说是关键的一步。 DeepMind首席执行官Demis Hassabis在新闻发布会上表示，接下来短期内AI可以开始挑战科学任务，例如蛋白质折叠或材料研究等。“我们很激动，因为目前来说AI已经足以胜任解决真正的问题并获得真正的进展。”

PreviousGo-playing computers developed by DeepMind, which is based in London, began bytraining on more than 100,000 human games played by experts. The latestprogram, known as AlphaGo Zero, instead starts from scratch using random moves,and learns by playing against itself. After 40 days of training and 30 milliongames, the AI was able to beat the world's previous best 'player' — another DeepMind AIknown as AlphaGo Master.

由位于伦敦的DeepMind最开始是通过10万多专业棋手围棋战局来训练电脑下围棋。他们目前的最新程序称为AlphaGo Zero，却是从头开始使用随机动作，并通过与自己对战的方式进行学习。经过40天的训练以及3000万场对战，这个AI已经可以够击败曾经世界上最好的“玩家” - 另一个被称为AlphaGoMaster的DeepMind AI。

Getting thistechnique, known as reinforcement learning, to work well is difficult andresource-intensive, says Oren Etzioni, chief executive of the Allen Institutefor Artificial Intelligence in Seattle, Washington. That the team could buildsuch an algorithm that surpassed previous versions using less training time andcomputer power “is nothing short of amazing”, he adds.

美国华盛顿州的Allen人工智能研究所的首席执行官Oren Etzioni说，获得这种被熟知为强化学习的技术需要客服很大的困难，还要耗费很多资源。他补充说，该团队可以使用较少的训练时间和计算机能力来构建超过以前版本的算法。

The ancientChinese game of Go involves placing black and white stones on a board tocontrol territory. Like its predecessors, AlphaGo Zero uses a deep neuralnetwork — a type of AI inspired by the structure of the brain — to learnabstract concepts from the boards. Told only the rules of the game, it learnsby trial and error, feeding back information on what worked to improve itselfafter each game.

古代中国围棋游戏是将黑白石头放在棋盘上来控制领土。和前期游戏类似，AlphaGo Zero使用深层神经网络，是一种由大脑结构启发的AI，可以从游戏中学习抽象概念。AI只需要知道游戏规则，通过反复尝试学习，并从每局对战中获得有助于自我改善的反馈信息。

At first, AlphaGoZero’s learning mirrored that of human players. It started off trying greedilyto capture stones, as beginners often do, but after three days it had masteredcomplex tactics used by human experts. “You see it rediscovering the thousandsof years of human knowledge,” said Hassabis. After 40 days, the program hadfound plays unknown to humans.

起初，AlphaGo Zero的学习反映了人类玩家的学习。从初学者常常会做的贪婪地抓住石头开始，经过三天，它遍掌握了人类围棋专家使用的复杂手段。“你看到它重新发现了数千年间的人类知识，”Hassabis说。在40天后，AI发现了人类未知的玩法。

Approachesusing purely reinforcement learning have struggled in AI because ability doesnot always progress consistently, said David Silver, a scientist at DeepMindwho has been leading the development of AlphaGo, at the briefing. Bots oftenbeat their predecessor, but forget how to beat earlier versions of themselves.This is the project's first "really stable, solid version of reinforcementlearning, that’s able to learn completely from scratch," he said.

DeepMind里的指导AlphaGo发展的科学家David Silver在简报会上说，纯强化学习的方法在人工智能领域受阻，AI能力不能持续进步。机器人经常击败他们的前辈，但忘记了如何击败自己的早期版本。他说，这次的AI是第一个“真正稳固的强化学习的版本，可以从头开始学习”。

AlphaGo Zero’spredecessors used two separate neural networks: one to predict the probablebest moves, and one to evaluate, out of those moves, which was most likely towin. To do the latter, they used ‘roll outs’ — playing multiple fast andrandomized games to test possible outcomes. AlphaGo Zero, however, uses asingle neural network. Instead of exploring possible outcomes from eachposition, it simply asks the network to predict a winner. This is like askingan expert to make a prediction, rather than relying on the games of 100 weakplayers, said Silver. “We’d much rather trust the predictions of that onestrong expert.”

AlphaGo Zero的先驱们使用两个单独的神经网络：一个神经网络用于预测可能的最佳招数，另一个用于评估那些最有可能赢得的招数。为了实现后者，他们使用“转出”策略 - 玩多个快速和随机的游戏来测试可能的结果。然而，AlphaGo Zero使用单个神经网络，不是探索每个位置的可能结果，它只是要求网络预测获胜者。这就像要求专家做出预测，而非依靠100个弱势玩家的游戏，Silver说。“我们宁愿相信那位强大的专家的预测。”

Merging thesefunctions into a single neural network made the algorithm both stronger andmuch more efficient, said Silver. It still required a huge amount of computingpower — four of the specialized chips called tensor processing units, whichHassabis estimated to be US$25 million of hardware. But its predecessors usedten times that number. It also trained itself in days, rather than months. Theimplication is that “algorithms matter much more than either computing or dataavailable”, said Silver.

Silver说，将这些功能合并成一个单一的神经网络能够让算法更加强大，效率更高。但是它仍然需要巨大的计算能力 – 需要四个称为张量处理单元的专用芯片，据Hassabis估计硬件约为2500万美元。但它的先驱者使用了这个数字的十倍的资金,且它也仅训练了几天，而非几个月。Silver的意思是“算法比计算或数据重要得多”。

【科学种子科技英语】 自主学习的AI目前称霸围棋战略游戏GO

【科学种子科技英语】自主学习的AI目前称霸围棋战略游戏GO