{"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.1"},"colab":{"name":"PB016_lab5.ipynb","provenance":[],"collapsed_sections":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"1cKYLTEE1xFO"},"source":["# PB016: Artificial intelligence I, lab 5 - Games and game strategies\n","\n","This week's topic are games, game strategies and basic algorithms for optimal game solutions using AI. We'll focus namely on:\n","\n","1. __Minimax algorithm__\n","2. __Alpha-beta pruning__"]},{"cell_type":"markdown","metadata":{"id":"vL-HOkJH1xFQ"},"source":["---\n","\n","## 1. [Minimax](https://en.wikipedia.org/wiki/Minimax) algorithm\n","\n","__Basic facts__\n","- A concept originally based on [game theory](https://en.wikipedia.org/wiki/Game_theory).\n","- Designed for games of two or more alternating players, each with a set of strategies for each individual move in the game.\n","- The goal states of the game are evaluated by a valuation function, which assigns their corresponding gain values to each player.\n","- The minimax algorithm recursively minimizes a possible loss of a player in the worst possible scenario (i.e. if the opponent tries to reach their maximum loss in each turn by choosing the optimal strategy).\n","- In simple games, a complete evaluation of the game is possible, but for more complex games a combinatorial explosion occurs quickly and the algorithm therefore searches only a few levels of the tree of possible moves at a time.\n","\n","__Example__ - a sample of a general minimax tree:\n","\n","![minimax tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/minimax.png)"]},{"cell_type":"markdown","metadata":{"id":"NqbAtjo-1xFR"},"source":["### Game for these labs\n","- [Tic Tac Toe](https://en.wikipedia.org/wiki/Tic-tac-toe) on a 3x3 playing board.\n","\n","An __example__ of unlabeled game tree for Tic Tac Toe:\n","\n","![tictactoe tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/tictactoe.png)"]},{"cell_type":"markdown","metadata":{"id":"3nQ5z58-1xFS"},"source":["### Basic game code\n","- a class representing the playing board with functions for:\n","  - initialization of the playing board,\n","  - verification that the current state of the playing area is a goal one,\n","  - verification of valid moves,\n","  - drawing the playing board."]},{"cell_type":"code","metadata":{"scrolled":true,"id":"5joYwEmv1xFT"},"source":["# to measure the time required to evaluate the game using minimax and alpha-beta pruning\n","import time\n","\n","# class definition for game representation\n","class Game:\n","    def __init__(self):\n","        self.initialize_game()\n","\n","    # function usable both for initialization and for resetting the game\n","    def initialize_game(self):\n","        self.current_state = [['.','.','.'],\n","                              ['.','.','.'],\n","                              ['.','.','.']]\n","        self.result = None\n","        # player X always plays first\n","        self.player_turn = 'X'\n","\n","    # end of game test - if yes, returns the winner ('X' or 'O'), or '.', if it is a draw;\n","    #                    if not, returns None\n","    def is_end(self):\n","        # vertical win\n","        for i in range(0, 3):\n","            if (self.current_state[0][i] != '.' and\n","                self.current_state[0][i] == self.current_state[1][i] and\n","                self.current_state[1][i] == self.current_state[2][i]):\n","                return self.current_state[0][i]\n","\n","        # horizontal win\n","        for i in range(0, 3):\n","            if (self.current_state[i] == ['X', 'X', 'X']):\n","                return 'X'\n","            elif (self.current_state[i] == ['O', 'O', 'O']):\n","                return 'O'\n","\n","        # main diagonal win\n","        if (self.current_state[0][0] != '.' and\n","            self.current_state[0][0] == self.current_state[1][1] and\n","            self.current_state[0][0] == self.current_state[2][2]):\n","            return self.current_state[0][0]\n","\n","        # secondary diagonal win\n","        if (self.current_state[0][2] != '.' and\n","            self.current_state[0][2] == self.current_state[1][1] and\n","            self.current_state[0][2] == self.current_state[2][0]):\n","            return self.current_state[0][2]\n","\n","        # testing whether the board if full\n","        for i in range(0, 3):\n","            for j in range(0, 3):\n","                # not full, no end yet\n","                if (self.current_state[i][j] == '.'):\n","                    return None\n","\n","        # draw\n","        return '.'\n","\n","    # testing move's validity\n","    def is_valid(self, px, py):\n","        if px < 0 or px > 2 or py < 0 or py > 2:\n","            return False\n","        elif self.current_state[px][py] != '.':\n","            return False\n","        else:\n","            return True\n","    \n","    # drawing the board\n","    def draw_board(self):\n","        for i in range(0, 3):\n","            print(' | '.join(self.current_state[i]))\n","        print()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"qDEXImrd1xFY"},"source":["### Game play functions\n","- A cycle of alternating moves of the `'X'` and`' O'` players until one of them wins, or until they draw."]},{"cell_type":"code","metadata":{"id":"teR0fhJg1xFZ"},"source":["def play(game):\n","    while True:\n","        game.draw_board()\n","        game.result = game.is_end()\n","\n","        # printing the relevant message at the end of the game\n","        if game.result != None:\n","            if game.result == 'X':\n","                print('The winner is X!')\n","            elif game.result == 'O':\n","                print('The winner is O!')\n","            elif game.result == '.':\n","                print('Draw!')\n","\n","            game.initialize_game()\n","            return\n","\n","        # human player's move\n","        if game.player_turn == 'X':\n","\n","            while True:\n","\n","                # calculate the optimal recommended move in a minimization step\n","                print('Calculating the recommended optimal move...')\n","                start = time.time()\n","                (m, qx, qy) = mini(game)\n","                end = time.time()\n","                print('Calculation took %.4f ms' % (1000*(end - start),))\n","                print('Recommended move: X = %d, Y = %d' % (qx, qy))\n","\n","                correct_format, px, py = False, -1, -1\n","                while not correct_format:\n","                    try:\n","                        px = int(input('Enter the X coordinate: '))\n","                        py = int(input('Enter the Y coordinate: '))\n","                        correct_format = True\n","                    except ValueError:\n","                        print('Invalid format, please repeat.')\n","                        correct_format = False\n","\n","                if game.is_valid(px, py):\n","                    game.current_state[px][py] = 'X'\n","                    game.player_turn = 'O'\n","                    break\n","                else:\n","                    print('Invalid move, please repeat.')\n","\n","        # AI move\n","        else:\n","            (m, px, py) = maxi(game)\n","            game.current_state[px][py] = 'O'\n","            game.player_turn = 'X'"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"WhmPih6S1xFc"},"source":["### __Exercise 1.1: AI move selection__\n","- Implement a function to select the optimal move for the AI (i.e. maximize the goal value of the game from the current position)."]},{"cell_type":"code","metadata":{"id":"4skua5Zf1xFd"},"source":["# player 'O' maximizes (in this case it's the AI)\n","def maxi(game):\n","\n","    # possible values for maximum are:\n","    # -1 - defeat\n","    # 0 - draw\n","    # 1 - win\n","\n","    # initial maximum set to -2 (worse than worst case)\n","    maxv = -2\n","\n","    px = None\n","    py = None\n","\n","    # testing whether the game's over\n","    result = game.is_end()\n","\n","    # if the game ends, the function must return the evaluation of the given state (-1 for a loss, 0 for a draw, 1 for a win)\n","    if result == 'X':\n","        return (-1, None, None)\n","    elif result == 'O':\n","        return (1, None, None)\n","    elif result == '.':\n","        return (0, None, None)\n","\n","    # selection of the move coordinates for 'O' by testing the optimality of possible moves (i.e. taking the mini function result into account)\n","\n","    # TODO - COMPLETE YOURSELVES\n","    \n","    # we return the value and coordinates of the optimal move\n","    return (maxv, px, py)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"2UKjevOT1xFk"},"source":["### __Exercise 1.2: Simulating the human move selection__\n","- Implement a function for selecting the optimal move for a human (i.e. minimizing the goal value of the game from the current position)."]},{"cell_type":"code","metadata":{"id":"qaaxf9Zq1xFl"},"source":["# player 'X' minimizes (in this case it's the human)\n","def mini(game):\n","\n","    # possible values for minimum are:\n","    # -1 - win\n","    # 0 - draw\n","    # 1 - defeat\n","\n","    # initial minimum set to 2 (worse than worst case)\n","    minv = 2\n","\n","    qx = None\n","    qy = None\n","\n","    # testing whether the game's over\n","    result = game.is_end()\n","\n","    # if the game ends, the function must return the evaluation of the given state (-1 for a win, 0 for a draw, 1 for a loss)\n","    if result == 'X':\n","        return (-1, None, None)\n","    elif result == 'O':\n","        return (1, None, None)\n","    elif result == '.':\n","        return (0, None, None)\n","\n","    # selection of the move coordinates for 'X' by testing the optimality of possible moves (i.e. taking the maxi function result into account)\n","\n","    # TODO - COMPLETE YOURSELVES\n","    \n","    # we return the value and coordinates of the optimal move\n","    return (minv, qx, qy)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"qDLcXBvQ1xFs"},"source":["### __Let's play!__"]},{"cell_type":"code","metadata":{"scrolled":true,"id":"LtcvuAkJ1xFt"},"source":["g = Game()\n","play(g)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"f6j254Jh1xFx"},"source":["---\n","\n","## 2. [Alpha-beta pruning](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning)\n","\n","__Basic facts__\n","- Optimization of the minimax algorithm thanks to the determination of nodes (or moves) in the game tree that don't need to be searched further.\n","- The alpha-beta pruning algorithm stores two values, $\\alpha$ and $\\beta$, which represent:\n","  - the minimum score guaranteed by the maximizing player,\n","  - The maximum score guaranteed by the minimizing player\n","- At the beginning of the game, $\\alpha = - \\infty, \\beta = \\infty $ applies (i.e. both players start with their worst possible score).\n","- Whenever the minimizing player's maximum guaranteed score (\"beta\") becomes less than the maximizing player's minimum guaranteed score (\"alpha\") (i.e. $\\beta < \\alpha$), the maximizing player does not have to explore moves based on the current node because it's clear they don't represent the optimal strategy and will not be achieved in the game.\n","\n","__Example__ - a pruned minimax tree from the previous example:\n","\n","![alfa-beta tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/alphabeta.png)"]},{"cell_type":"markdown","metadata":{"id":"8m6HPk1R1xFx"},"source":["### Modified game play function\n","- A cycle of alternating moves of `'X'` and`' O'` players until one of them wins, or until a draw occurs.\n","- The same procedure as before, only uses versions of the `mini` and` maxi` functions extended by alpha-beta pruning.\n","- Instead of initial infinite values for $\\alpha, \\beta$, we use $-2, 2$ (for the Tic Tac Toe game defined in this way, this means de facto the same)."]},{"cell_type":"code","metadata":{"id":"vAvdUS_j1xFy"},"source":["def play_alpha_beta(game):\n","    while True:\n","        game.draw_board()\n","        game.result = game.is_end()\n","\n","        if game.result != None:\n","            if game.result == 'X':\n","                print('The winner is X!')\n","            elif game.result == 'O':\n","                print('The winner is O!')\n","            elif game.result == '.':\n","                print('Draw!')\n","\n","\n","            game.initialize_game()\n","            return\n","\n","        if game.player_turn == 'X':\n","\n","            while True:\n","                print('Calculating the recommended optimal move...')\n","                start = time.time()\n","                # updated mini function with pruning\n","                (m, qx, qy) = mini_alpha_beta(game,-2, 2)\n","                end = time.time()\n","                print('Calculation took %.4f ms' % (1000*(end - start),))\n","                print('Recommended move: X = %d, Y = %d' % (qx, qy))\n","\n","                correct_format, px, py = False, -1, -1\n","                while not correct_format:\n","                    try:\n","                        px = int(input('Enter the X coordinate: '))\n","                        py = int(input('Enter the Y coordinate: '))\n","                        correct_format = True\n","                    except ValueError:\n","                        print('Invalid format, please repeat.')\n","                        correct_format = False\n","\n","                if game.is_valid(px, py):\n","                    game.current_state[px][py] = 'X'\n","                    game.player_turn = 'O'\n","                    break\n","                else:\n","                    print('Invalid move, please repeat.')\n","\n","        else:\n","            # updated maxi function with pruning\n","            (m, px, py) = maxi_alpha_beta(game,-2, 2)\n","            game.current_state[px][py] = 'O'\n","            game.player_turn = 'X'"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"EYRwU5KV1xF1"},"source":["### __Exercise 2.1: Pruned move selection for the AI__\n","- Implement a function to select the optimal move for the AI (i.e. maximize the end value of the game from the current position).\n","- This time, however, prune the tree of possible moves to explore."]},{"cell_type":"code","metadata":{"id":"uluGdHyv1xF1"},"source":["def maxi_alpha_beta(game, alpha, beta):\n","    maxv = -2\n","    px = None\n","    py = None\n","\n","    result = game.is_end()\n","\n","    if result == 'X':\n","        return (-1, None, None)\n","    elif result == 'O':\n","        return (1, None, None)\n","    elif result == '.':\n","        return (0, None, None)\n","\n","    # TODO - COMPLETE YOURSELF (select optimal move with alpha update and ignore branches with too large maxv)\n","\n","    return (maxv, px, py)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"N6-ns3ri1xF8"},"source":["### __Example 2.2: Simulation of human stroke selection with pruning__\n","- Implement a function for selecting the optimal move for the human (i.e. minimizing the goal value of the game from the current position).\n","- This time, however, prune the tree of possible moves to explore."]},{"cell_type":"code","metadata":{"id":"nrVT4mEF1xF9"},"source":["def mini_alpha_beta(game, alpha, beta):\n","\n","    minv = 2\n","\n","    qx = None\n","    qy = None\n","\n","    result = game.is_end()\n","\n","    if result == 'X':\n","        return (-1, None, None)\n","    elif result == 'O':\n","        return (1, None, None)\n","    elif result == '.':\n","        return (0, None, None)\n","\n","    # TODO - COMPLETE YOURSELF (select optimal move with beta update and ignore branches with too small minv)\n","\n","    return (minv, qx, qy)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BPSfloFq1xGE"},"source":["### __Let's play!__"]},{"cell_type":"code","metadata":{"scrolled":true,"id":"-Cb-6H8-1xGF"},"source":["g = Game()\n","play_alpha_beta(g)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6MmpG84S1xGI"},"source":["### __Food for thought__\n","- How to modify the exercise code for larger playing areas, or a more difficult winning condition (i.e. a requirement to reach 4 or more fields in a row)?\n","- How will these changes affect the exploration efficiency?"]},{"cell_type":"markdown","metadata":{"collapsed":true,"id":"d1nO-4gZ1xGI"},"source":["---\n","\n","#### _Final note_ - the materials used in this notebook are original works adapted from original works as follows:\n","- Examples of game trees and code:\n","  - Based on materials from the [Stack Abuse](https://stackabuse.com/) site\n","  - Author: [Mina Krivokuća](mina.krivokuca@gmail.com)\n","  - License: N/A (adapted for internal use in PB016 at FI MU with the kind permission of the author and David Landup, the StackAbuse operator)"]}]}