![]() ![]() This quality function may be learned in an unsupervised manner by having the computer play games against itself, using the data obtained from these simulations to further refine the quality function. During training, the simulator may instead select actions using an epsilon-greedy policy based on a learned quality function. Initially, this Monte Carlo simulator chooses actions at each turn by selecting a board state at random from the set of possible board states based on the player's available moves. The core of this software package is a MATLAB-based simulator which runs Monte Carlo simulations of games of Settlers of Catan. ![]() Training an Autonomous Agent to Play Settlers of Catan using Reinforcement Learning Introduction
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |