CS50 Introduction to Artificial Intelligence with Python Lecture 1

最近开始学习哈佛的CS50系列课程之一：Introduction to Artificial Intelligence with Python，第一讲的主题是Search，这里总结第一讲以及第一次作业。

课程地址：https://cs50.harvard.edu/ai/

备注：图片均来自课程课件。

问题要素

agent（智能体）：感知其环境并在该环境上作用的实体
state（状态）：智能体及其环境的配置
initial state（初始状态）：智能体开始的状态
actions（动作）：在一个状态下可以做出的选择，$\text{Action}(s)$返回状态$s$可以执行的动作
transition model（转移模型）：描述在任何状态下执行任何动作会导致什么状态，$\text{RESULT}(s, a)$返回在状态$s$执行动作$a$后的状态
state space（状态空间）：通过任何动作序列可以从初始状态到达的所有状态的集合
goal test（目标测试）：确定给定状态是否为目标状态的方法
path cost（路径花费）：与给定路径相关的数值成本

Search Problems

搜索问题有如下几个元素：

initial state
actions
transition model
goal test
path cost function

目标找到从初始状态到目标状态的（最优）路径，为了完成这点，引进一个存储状态的数据结构，node：

node是一个包含如下信息的数据结构

状态
父节点（生成当前节点的节点）
动作（从父节点到当前节点的动作）
路径成本（从初始状态到当前节点）

搜索算法

搜索的通用算法如下：

从包含初始状态的边界(frontier)开始。
重复：
- 如果边界为空，则无解。
- 从边界删除节点。
- 如果节点包含目标状态，请返回解。
- 拓展节点，将结果节点添加到边界。

但上述算法可能会出现死循环，改良的算法为：

从包含初始状态的边界(frontier)开始。
从一个空的探索集开始。
重复：
- 如果边界为空，则无解。
- 从边界删除节点。
- 如果节点包含目标状态，请返回解。
- 将节点添加到探索集中。
- 拓展节点，如果结果节点不在边界或探索集中，则将它们添加到边界。

这里从边界删除节点的顺序是随机的，如果指定顺序，则会产生特定的搜索算法：

如果frontier的数据结构为stack，那么搜索算法变成DFS。
如果frontier的数据结构为queue，那么搜索算法变成BFS。

greedy best-first search

注意通用算法中拓展节点的顺序也是随机的，这一点可以稍作改进，于是产生了greedy best-first search：

搜索算法可扩展最接近目标的节点，该节点由启发式函数$h(n)$所估计

注意上述算法不一定能找到成本最低的路径。

$A^{\star}$搜索

$A^{\star}$是对greedy best-first search的改良，它考虑已经探索部分的成本加上对未探索部分的估计：

搜索算法根据$g(n) + h(n)$的最小值扩展节点
$g(n)=$到达节点的成本
$h(n)=$到达目标的估算成本

$A^{\star}$搜索能取得最优解，如果

$h(n)$是admissible（小于等于真实成本）
$h(n)$是consistent（对于相距步长成本为$c$的每个节点$n$和后继节点$n’$，$h(n)≤h(n’)+ c$）

Game

考虑二人游戏：

$S_0$：初始状态
$\text{PLAYER}(s)$：返回要在状态$s$操纵的玩家
$\text{ACTIONS}(s)$：返回状态$s$的合法动作
$\text{RESULT}(s, a)$：在状态$s$采取行动$a$后进入的状态
$\text{TERMINAL}(s)$：检查状态$s$是否为终止状态
$\text{UTILITY}(s)$：终止状态$s$的数值表示

在二人游戏中，可以假设一个玩家想获得尽可能高的分数，另一个玩家想获得尽可能低的分数。

Minimax

Minimax是处理二人游戏的经典算法，分别涉及了两个函数：

Alpha-Beta Pruning

Minimax会穷举所有状态，这会带来非常大的计算量，一个改进方法为Alpha-Beta Pruning：

（备注：该图片来自CS188第六讲）

这里简要说明一下max-value函数$v\ge \beta$部分为什么直接返回，这是因为max的父节点为min，而min的最优值为$\beta$，所以min不关心子节点大于等于$\beta$的部分，因此如果$v\ge \beta$，则直接返回即可。

Depth-Limited Minimax

该方法就是在搜索的时候增加深度限制，主要是为了减少计算量，不一定能找到最优值。

Project

Degrees

这部分比较简单，只要读懂数据结构即可：

def shortest_path(source, target):
    """
    Returns the shortest list of (movie_id, person_id) pairs
    that connect the source to the target.

    If no possible path, returns None.
    """
    # TODO
    res = []
    Queue = QueueFrontier()
    start = Node(source, None, None)
    Queue.add(start)
    Explored_set = set()
    while True:
        if Queue.empty():
            return None
        
        node = Queue.remove()
        state = node.state
        Explored_set.add(state)
        
        for movie in people[state]['movies']:
            for person in movies[movie]['stars']:
                if person not in Explored_set:
                    if person == target:
                        res.append((movie, person))
                        while node.parent is not None:
                            res.append((node.action, node.state))
                            node = node.parent
                        res.reverse()
                        
                        return res
                    else:
                        child = Node(person, node, movie)
                        Queue.add(child)

Tic-Tac-Toe

实现MiniMax以及Alpha-Beta Pruning：

def initial_state():
    """
    Returns starting state of the board.
    """
    return [[EMPTY, EMPTY, EMPTY],
            [EMPTY, EMPTY, EMPTY],
            [EMPTY, EMPTY, EMPTY]]


def player(board):
    """
    Returns player who has the next turn on a board.
    """
    x = 0
    o = 0
    for i in range(3):
        for j in range(3):
            if board[i][j] == X:
                x += 1
            elif board[i][j] == O:
                o += 1
    
    if x <= o:
        return X
    else:
        return O


def actions(board):
    """
    Returns set of all possible actions (i, j) available on the board.
    """
    action = []
    for i in range(3):
        for j in range(3):
            if board[i][j] == EMPTY:
                action.append((i, j))
                
    return action


def result(board, action):
    """
    Returns the board that results from making move (i, j) on the board.
    """
    i = action[0]
    j = action[1]

    if board[i][j] != EMPTY:
        raise BaseException
    p = player(board)
    newboard = copy.deepcopy(board)
    newboard[i][j] = p
    
    return newboard
    

def winner(board):
    """
    Returns the winner of the game, if there is one.
    """
    w = None
    #检查行
    for i in range(3):
        j = 1
        while j < 3 and board[i][j - 1] == board[i][j]:
            j += 1
        if j == 3:
            w = board[i][0]
            break
    #检查列
    for i in range(3):
        j = 1
        while j < 3 and board[j - 1][i] == board[j][i]:
            j += 1
        if j == 3:
            w = board[0][i]
            break
    #检查对角线
    i = 1
    while i < 3 and board[i][i] == board[i - 1][i - 1]:
        i += 1
    if i == 3:
        w = board[0][0]
        
    i = 1
    while i < 3 and board[i][2 - i] == board[i - 1][3 - i]:
        i += 1
    if i == 3:
        w = board[0][2]
        
    return w


def terminal(board):
    """
    Returns True if game is over, False otherwise.
    """
    action = actions(board)
    #无法再做动作
    if action == []:
        return True
    w = winner(board)
    #有胜者
    if w == X or w == O:
        return True
    
    return False


def utility(board):
    """
    Returns 1 if X has won the game, -1 if O has won, 0 otherwise.
    """
    w = winner(board)
    if w == X:
        return 1
    elif w == O:
        return -1
    else:
        return 0

def Max(board):
    if terminal(board):
        return utility(board), None
    value = -1e9
    act = None
    action = actions(board)
    for a in action:
        V, A = Min(result(board, a))
        if V > value:
            value = V
            act = a
        
    return value, act
        
def Min(board):
    if terminal(board):
        return utility(board), None
    value = 1e9
    act = None
    action = actions(board)
    for a in action:
        V, A = Max(result(board, a))
        if V < value:
            value = V
            act = a
        
    return value, act

def Max_prone(board, alpha, beta):
    if terminal(board):
        return utility(board), None
    act = None
    action = actions(board)
    for a in action:
        V, A = Min_prone(result(board, a), alpha, beta)
        if V > alpha:
            alpha = V
            act = a
        if alpha >= beta:
            return beta, act
        
    return alpha, act

def Min_prone(board, alpha, beta):
    if terminal(board):
        return utility(board), None
    act = None
    action = actions(board)
    for a in action:
        V, A = Max_prone(result(board, a), alpha, beta)
        if V < beta:
            beta = V
            act = a
        if alpha >= beta:
            return alpha, act
        
    return beta, act

'''
def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    if terminal(board):
        return None
    p = player(board)
    if p == X:
        return Max(board)[-1]
    else:
        return Min(board)[-1]
'''
def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    if terminal(board):
        return None
    p = player(board)
    alpha = -1e9
    beta = 1e9
    if p == X:
        return Max_prone(board, alpha, beta)[-1]
    else:
        return Min_prone(board, alpha, beta)[-1]