资源简介
讲述alpha zero的原文,发表在nature。
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in
challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The
tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were
trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce
an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game
rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also
the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality
move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved
superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
代码片段和文件信息
- 上一篇:OSG载入地形和模型文件
- 下一篇:HIP4082 H桥双路电机驱动
相关资源
- Apolipoprotein E4 Impairs in vivo Hippocampal
- 使用ATLAS检测器,在s = 7 TeV的pp碰撞中
- IIB超弦理论中的超杨格-米尔斯,陈恩
- 在B→ρρ中控制ρ宽度效应以
- Association of estrogen receptor alpha polymor
- 不同时间的运动训练对小鼠骨骼肌E
- 不同持续时间低氧后运动对大鼠骨骼
- 周期性牵拉与TNF-α对角膜成纤维细
- 小鼠乳腺中Heregulin-α的表达与作用
- TNIK在TNFα诱导NF-κB活化的过程
- zeromq的windows版本安装包
- AlphaControls14.23_20190502_Full_Source.rar
- Cr和Mn元素强化的α钛合金的制备及
- zeroc ice 中文教程
- ZeroMQ 中文版手册
- 论文研究 - 木薯淀粉/辛烯基琥珀酸淀
- 泻肝补肾法对慢性细菌性前列腺炎前
- 麻杏石甘汤两个配伍对A型流感病毒感
- 非相对论性QCD因式分解中Oαs5处
- α粒子的衍射解离,以测试核内部
- α稳定噪声下SCLD和OFDM的盲识别新
- α吸引子:普朗克,大型强子对撞
- α单相区固溶处理时冷却速度对
- α-酮戊二酸对干旱胁迫下冬小麦幼
- α-氯代长链脂肪酸的气相色谱法分
- α-氯丙酸的合成
- α-P烯热异构化反应速率常数评价
- 龙眼成花逆转花芽α-tublin基因的克
- ZeroMQ在证券行业的应用代码
- Leela Zero 0.16 + AutoGTP v17 CPU版本 无需显
评论
共有 条评论