15 min read

Hidden information and solving Dominoes

Summary

Some notes about the construction of a Block Dominoe playing algorithm for a hidden information variant of the game. I build a game simulator, learn from a heuristic algorithm and then develop some play-out based algorithms which seem fairly good. I conjecture the final algorithm approximates optimal play.

The final SWI Prolog implementation is available here.

I am selling an optimised Javascript library version; embeddable both in the browser or the backend. It can play a variety of \(\ge 2\) player Dominoe game variants with a settable difficulty. Contact me if you are interested!

Introduction

I can’t remember why, but I started musing about how to write algorithms for games with partially hidden state. Poker is the obvious one, but it is far too complicated to casually analyse (and I don’t even know how to play Poker), so I decided on Dominoes instead. I haven’t build game playing algorithms before so what follows is my journey to a decent algorithm.

In Block Dominoes, a “double-6” Dominoe set contains 28 unique tiles, showing all unique pair of numbers from 0 to 6. There are a huge number of games which can be played, and I’ve picked a very simple variant to use herein. Two players are each dealt 7 tiles at random. Players can only see their own tiles and those which are played. The game starts with one player laying down a tile. Subsequent tiles must match an open end and are placed in a chain. For example, after laying 3|5, the opponent may play 1|3 after which the board would be 1|3 3|5 and the next tile would need to match on 1 or 5. Players alternate turns. If unable to play, the turn is skipped. The game ends when a player has no more tiles or no further moves are possible by either player. The player with the fewest pips (the dots on the tiles) when play ends wins the game. The winner gets the total pips remaining in the opponents hand, a new game ensues, and the match continues until one player cumulatively earns more than \(N\) pips.

If the player could see the opponent’s hand I could solve for the best move by implementing some variant of the minimax algorithm. However, that does not work if I do not know the opponent’s hand has because I cannot simulate play. I started by thinking about playing strategy from a probabilistic perspective. After all:

  1. The player knows his hand and what has been played on the board, which tells him something about the remaining tiles.

  2. If the opponent passes a turn, he does not have any tile matching an open end, which tells the player something about the opponent’s hand.

  3. Doubles have the same number on both ends, so they cannot be as easy to match as regular tiles.

It wasn’t obvious to me what to do with this information. I started by writing a Dominoes simulator in Prolog so I could run experiments and trial different strategies against each other.

I modelled individual tiles as a list in the form [X,Y]. A player’s hand then became a list of lists as did the board state. That is, [[…],[…],…]. Game state is managed by the following predicates:

% Remove tile from list.
sub_tile_(_,[],Acc,Acc).
sub_tile_([A,B],[[A,B]|T],Acc,Pn) :-
    !, sub_tile_([A,B],T,Acc,Pn).
sub_tile_([A,B],[[B,A]|T],Acc,Pn) :-
    !, sub_tile_([A,B],T,Acc,Pn).
sub_tile_([A,B],[H|T],Acc,Pn) :-
    sub_tile_([A,B],T,[H|Acc],Pn).
sub_tile([A,B],P,Pn) :-
    sub_tile_([A,B],P,[],Pn).

% Merge tile into board state.
add_board(X,[],[X]) :- !.
add_board([X,Y],[[Y,X0]|Rs],[[X,Y],[Y,X0]|Rs]) :- !.
add_board([X,Y],B,Bn) :-
    last(B,[_,X]),
    append(B,[[X,Y]],Bn),!.
add_board([Y,X],B,Bn) :-
    last(B,[_,X]),
    append(B,[[X,Y]],Bn),!.
add_board([Y,X],[[Y,X0]|Rs],[[X,Y],[Y,X0]|Rs]) :- !.

where sub_tile/3 removes tiles from a player’s hand and add_board/3 merges tiles into the board. Picking lists as a data structure was a mistake I ended up stuck with. I’m constantly getting the last element, or checking a list for membership, and both of these are O(N) operations in SWI Prolog. These predicates are written to try to be efficient within the data structure constraints. Note too that add_board/3 implements an important hack. Say a player had tile [2,4] and the board state was [[2,1],[1,0],[0,4]]. The player could place the tile on either side, so to facilitate this add_board/3 will first try to merge the tile into the board state from left to right, and then it will flip the tile and try to merge it from right to left. So it is possible to add the tile to either side by just flipping it. I.e.:

>> add_board([2,4],[[2,1],[1,0],[0,4]],Board_next).
Board_next = [[2,1],[1,0],[0,4],[4,2]]

>> add_board([4,2],[[2,1],[1,0],[0,4]],Board_next).
Board_next = [[4,2],[2,1],[1,0],[0,4]]

There are a few other important auxiliary predicates shown below which I will make constant use of:

% Evaluate a player's hand.
eval_hand_([],Acc,Acc) :- !.
eval_hand_([[A,B]|Rest],Acc,Value) :-
    Acc_new is Acc+A+B,
    eval_hand_(Rest,Acc_new,Value),!.
eval_hand(P,Value) :-
    eval_hand_(P,0,Value),!.

% Evaluate a game score.
eval_score(P1,P2,Score) :-
    eval_hand(P1,Hand1),
    eval_hand(P2,Hand2),
    (Hand1<Hand2 ->
	 Score is Hand2;
     (Hand2<Hand1 ->
	  Score is -Hand1;
      Score is 0)),!.

% Find all valid moves.
valid_move(P,[],Move) :-
    member(Move,P).
valid_move(P,[[X,_]|_],[Y,X]) :-
    member([Y,X],P);member([X,Y],P).
valid_move(P,Board,[X,Y]) :-
    last(Board,[_,Y]),
    (member([Y,X],P);member([X,Y],P)).
valid_moves(P,Board,Moves) :-
    setof(Move,valid_move(P,Board,Move),Moves).

% Check if player has any valid moves.
has_moves(P,[]) :-
    P\=[],!.
has_moves(P,[[X,_]|_]) :-
    (memberchk([X,_],P);memberchk([_,X],P)),!.
has_moves(P,Board) :-
    last(Board,[_,X]),
    (memberchk([X,_],P);memberchk([_,X],P)),!.

A dominoes game is a relatively simple set of state transitions. I pass around the players hands, the board and who’s turn it is. When player’s pass on a turn, I collect the open ends of the board and keep that in player specific exclusion lists because it is game information affecting future states but is only available at specific states. The game eventually sets the “Score” variable. The scoring is always from player 1s perspective, so a winning score is negative for player 2. I also pass Dec1 and Dec2, which represent the decision predicates for each player: this makes it easy to test decision strategies against each other.

game(_,_,[],P2,_,1,_,_,Score) :-                   % P1 wins.
    eval_hand(P2,Hand2),
    Score is Hand2,!.
game(_,_,P1,[],_,0,_,_,Score) :-                   % P2 wins.
    eval_hand(P1,Hand1),
    Score is -Hand1,!.
game(Dec1,Dec2,P1,P2,Board,0,P1_ex,P2_ex,Score) :- % P1 can move.
    length(P2,N),
    call(Dec1,P1,N,Board,P2_ex,Pick),
    sub_tile(Pick,P1,P1_new),
    add_board(Pick,Board,Board_new),
    game(Dec1,Dec2,P1_new,P2,Board_new,1,P1_ex,P2_ex,Score),!.
game(Dec1,Dec2,P1,P2,Board,0,P1_ex,P2_ex,Score) :- % P1 can't play.
    has_moves(P2,Board),
    ends(Board,Ends),
    append(Ends,P1_ex,P1_ex_new),
    game(Dec1,Dec2,P1,P2,Board,1,P1_ex_new,P2_ex,Score),!.
game(Dec1,Dec2,P1,P2,Board,1,P1_ex,P2_ex,Score) :- % P2 can play.
    length(P1,N),
    call(Dec2,P2,N,Board,P1_ex,Pick),
    sub_tile(Pick,P2,P2_new),
    add_board(Pick,Board,Board_new),
    game(Dec1,Dec2,P1,P2_new,Board_new,0,P1_ex,P2_ex,Score),!.
game(Dec1,Dec2,P1,P2,Board,1,P1_ex,P2_ex,Score) :- % P2 can't play.
    has_moves(P1,Board),
    ends(Board,Ends),
    append(Ends,P2_ex,P2_ex_new),
    game(Dec1,Dec2,P1,P2,Board,0,P1_ex,P2_ex_new,Score),!.
game(_,_,P1,P2,_,_,_,_,Score) :-                   % Draw.
    eval_hand(P1,Hand1),
    eval_hand(P2,Hand2),
    (Hand1<Hand2 ->
	 Score is Hand2;
     (Hand2<Hand1 ->
	  Score is -Hand1;
      Score is 0)),!.
game(Score,Dec1,Dec2) :-                           % Initialisation.
    findall([A,B],(between(0,6,A),between(A,6,B)),All),
    random_permutation(All,All_rand),
    length(P1,7),length(P2,7),
    append(P1,Rest,All_rand),
    append(P2,_,Rest),
    random(0,2,Start),
    game(Dec1,Dec2,P1,P2,[],Start,[],[],Score),!.

To play a game I call the game/3 and pass it the decision predicates to use and then observe the Score variable. The decision predicates are of the form f(Player,N,Board,Exc,Pick) where Player is the players hand, N is the number of tiles in the opponents hand, Board is the current board, Exc is a least of open ends present when the opponent could not play, and Pick is the variable to be set by the predicate. For example, here is a random decision predicate:

decide_random(P,_,Board,_,Pick) :-
    valid_moves(P,Board,[Pick|_]).

it picks the first of the valid moves and relies on the ordering of a player’s hand being random to begin with in order to avoid explicitly randomising the moves. Here is a simple manual decision predicate (in case I want to play against the computer):

decide_manual(P,N,Board,Exc,Pick) :-
    valid_moves(P,Board,Moves),
    write("Board : "), writeln(Board),
    write("Opp # : "), writeln(N),
    write("Exc.  : "), writeln(Exc),
    write("Hand  : "), writeln(P),
    write("Legal : "), writeln(Moves),
    write("Choose: "), read(Pick).

I enter moves from the “Legal” list, in the format “[A,B].”, and it will convey them to the game.

Here is a simple simulation predicate to play out \(N\) games using any pair of decision predicates:

simulate(_,_,0,Score,Score).
simulate(Dec1,Dec2,N,Acc,Score) :-
    game(S,Dec1,Dec2),
    Acc_new is Acc+S,
    N_new is N-1,
    simulate(Dec1,Dec2,N_new,Acc_new,Score).
simulate(Dec1,Dec2,N,Score) :-
    simulate(Dec1,Dec2,N,0,Score).

For example, the following would play out 10000 games with a random starting player (see game/3 definition), using the decision_random/5 predicate for both players.

>> simulate(decide_random,decide_random,1000,Score).

Initial experiments

The simulator makes it easy to test out decision strategies. I tried many decision heuristics – both against random and against each other. The aim wasn’t to find “the best” decision heuristics but rather to try and get some insight into what an effective strategy might look like. For example, here is a compound heuristic decision predicate which does much better than random:

connect([_,B],[B,_]).
connect([_,B],[_,B]).
connect([A,_],[A,_]).
connect([A,_],[_,A]).

connect_count(P,Board,Pick,Count) :-
    add_board(Pick,Board,Board_new),
    ends(Board_new,Ends),
    append(P,Board,List),
    setof(T,(member(T,List),connect(T,Ends)),Result),
    length(Result,Count),!.
connect_count(_,_,0) :- !.

decide_heuristic(P,_,Board,_,[A,A]) :-   % play double.
    valid_moves(P,Board,Moves),
    memberchk([A,A],Moves),!.
decide_heuristic(P,_,Board,Exc,[X,Y]) :- % else cause passed ends.
    valid_moves(P,Board,Moves),
    member([X,Y],Moves),
    member(X,Exc),
    member(Y,Exc),!.
decide_heuristic(P,_,Board,_,Pick) :-    % else cause rare ends.
    valid_moves(P,Board,Moves),
    setof([S,T],(member(T,Moves),connect_count(P,Board,T,S)),Result),
    max_member([_,Pick],Result),!.

The predicate does as follows:

  1. Try to play a double. Doubles are less likely to match open ends.

  2. Else, try to play a tile that leads to open ends the opponent can’t match based on previous passes.

  3. Else, play the tile which creates ends the opponent is least likely to have.

Let’s simulate the outcome against decide_random/5:

>> time(simulate(decide_heuristic,decide_random,10000,Score)).
% 21,839,751 inferences, 1.780 CPU in 1.780 seconds
Score = 50221.

That is roughly a 5 point gain per game on average against random.

Insight

I played manual games against random and against my heuristics. I also played against online bots (there are surprisingly few). I observed that there are many times – mostly towards the end of the game – where it is possible to force moves based on information gleaned from the opponent passing moves, or by creating equal open ends. An example of the latter is playing a tile that places 5s on both ends, which guarantees you will be able to play a tile with a 5 on it in the following move. These initial experiments made it apparent to me that the sequential reasoning is essential for an advanced player and that it cannot be achieved by a strategy which does not take sequences into account.

One route to generally account for sequences would be to randomly set the opponents hand to something consistent with the known portion of the game state (i.e. the player’s hand, the board and the opponents passes) and then to randomly play the game out to see what happens. I could repeat this many times, conditional on each possible move, and pick the move with the best outcomes. This made a lot of sense to me because the play-outs sample over possible sequences and take outcomes into account directly, thereby implicitly subsuming many routes to victory. I’d imagine this to be particularly applicable when the moves are invariant to the opponent’s play such as with forcing moves.

The key auxiliary for play-out based predicates is scenario/8 which picks a random hand for the opponent conditional on the next move of the player:

scenario(P1,Pick,N,Board,Exc,P1_new,P2,Board_new) :-
    sub_tile(Pick,P1,P1_new),        % update P1.
    add_board(Pick,Board,Board_new), % update Board.
    findall(
      [X,Y],
	    (between(0,6,X),
	     between(X,6,Y),
	     \+memberchk(X,Exc),
	     \+memberchk(Y,Exc),
	     \+memberchk([X,Y],P1_new),
	     \+memberchk([X,Y],Board_new),
	     \+memberchk([Y,X],Board_new)),
	    Tiles),
    random_permutation(Tiles,Tiles_rand),
    length(P2,N),
    append(P2,_,Tiles_rand),!.       % update P2.

For each such scenario I can play the rest of the game out randomly as follows:

playout_random(P1,Pick,N,Board,Exc,Score) :-
    scenario(P1,Pick,N,Board,Exc,P1_new,P2,Board_new),
    game(decide_random,decide_random,
         P1_new,P2,Board_new,1,[],[],Score),!.

I can repeat this \(N\) times for statistical power:

n_playouts(_,_,_,_,_,_,0,Score,Score) :- !.
n_playouts(Pred,P,Tile,N,Board,Exc,M,Acc,Score) :-
    call(Pred,P,Tile,N,Board,Exc,Score0),
    Acc_new is Acc+Score0,
    M_new is M-1,
    n_playouts(Pred,P,Tile,N,Board,Exc,M_new,Acc_new,Score),!.
n_playouts(Pred,P,Tile,N,Board,Exc,M,Score) :-
    n_playouts(Pred,P,Tile,N,Board,Exc,M,0,Score),!.

and finally I can wrap n_playouts/9 to produce a decision predicate based on \(N\) playouts:

decide_playout_(P,N,Board,Exc,Moves,Pick) :-
    findall(
 	    [S,T],
 	    (member(T,Moves),
	     n_playouts(playout_random,P,T,N,Board,Exc,1000,S)),
	    Result),
    max_member([_,Pick],Result),!.
decide_playout(P,N,Board,Exc,Pick) :-
    valid_moves(P,Board,Moves),
    (Moves=[Pick]; decide_playout_(P,N,Board,Exc,Moves,Pick)).

I added a small complication to the predicate so that the play-outs would only apply when there is more than one available move. Let’s compare it to the heuristic predicate:

>> time(simulate(decide_playout,decide_heuristic,1000,Score)).
% 10,641,357,968 inferences, 867.153 CPU in 867.219 seconds
Score = 3224.

That is roughly a 3 point gain per game on average against heuristic play, however note the number of inferences when compared to heuristic vs. random: 10.6B vs 21M for 1000 vs. 10000 games! It was apparent that computation would be a limiting factor of play-out oriented strategies.

An obvious problem with the random play-outs is that a good opponent does not make random moves, and I may be aggregating over play-outs which make no sense for an opponent to follow. This is tempered by the ability of the random play-outs to capture tactics invariant to the opponents next move, and the fact that the opponent has limited knowledge with which to plan. In the section below I describe why I think minimax is the ticket to improving play-outs, but I’d note here that other intermediary strategies are also available. For example, I could expand the heuristic predicate more and then use heuristic play-outs instead, or use a mixture of random and heuristic play-outs. This would lead to a more challenging model of the opponent and therefore better comparison between alternative moves.

A minimax conjecture

If the player knew the opponent’s tiles, then the minimax algorithm could be used to find the “optimal” move by calculating through all possible play-outs and making the best possible move for each player from bottom up. The moves are “optimal” in the sense that should either player deviate from the strategy, they could not achieve a better outcome. Note that the reason for making sub-optimal moves does not matter. For example, if the opponent does not know the player’s hand, then they may make sub-optimal moves due to uncertainty. I do not know the opponents hand from the outset, but it seemed to me that I should be able to somehow average over scenarios to both take the uncertainty into account and make use of minimax analysis.

Here is a thought experiment to clarify. Say the state of play is such that there are only 2 possible hands the opponent could have. It is my turn to play, and for each possible next move, I calculate the minimax outcome (the game score given a minimax play-out), for both possible hands. How should I pick the next move? I am equally ignorant about the opponents possible hands, so if I was to preferentially pick one over the other (based on the score for example), I’d be sub-optimal half the time in comparable situations. Similarly, if I were to choose randomly based only on probability of the hand occurring, I’d ignore the magnitude of the outcomes. That is, the cost of being wrong may be significantly different dependent on the true hand. Expectation is the statistical way to quantify this uncertainty into a single measure. In the above example this implies calculating the average minimax score over all possible hands for each move, and picking the move with the highest average.

The above scheme should generalise to any number of scenarios, and I can implement it as a strategy by picking the minimax expectation maximising move at each turn. The strategy should be “optimal” in the sense that not picking the expectation maximising move cannot lead to a higher expectation. Additionally, by the law of large numbers, if I cannot calculate expectations across all possible hands, a large enough random sample should be an unbiased approximation of it.

I conjecture that this strategy is optimal in the sense that any other strategy cannot lead to better outcomes given ignorance about the opponent’s strategy. If I knew the opponent’s strategy then it may be possible to better exploit it by taking the specifics into account.

Minimax is simple but difficult to implement because, for this use case, it has to be efficient enough to run very many times. I did not implement alpha/beta pruning – mostly because I didn’t realise how important it is, and retro-fitting it is not straightforward – further, the minimax predicate is poorly structured for effective tabling (memoising) with SWI-Prolog. The commercial version of my code takes many learnings into account!

The minimax decision predicate is given below:

minimax_(Depth,P1,P2,Board,Moves,0,Pick) :-
    findall(
 	[Score,Tile],
 	(member(Tile,Moves),
	 add_board(Tile,Board,Board_next),
	 sub_tile(Tile,P1,P1_next),
	 minimax(Depth,P1_next,P2,Board_next,1,Score)),
	Result),
    max_member([_,Pick],Result).
minimax_(Depth,P1,P2,Board,Moves,1,Pick) :-
    findall(
 	[Score,Tile],
 	(member(Tile,Moves),
	 add_board(Tile,Board,Board_next),
	 sub_tile(Tile,P2,P2_next),
	 minimax(Depth,P1,P2_next,Board_next,0,Score)),
	Result),
    min_member([_,Pick],Result).
minimax(0,P1,P2,_,_,Score) :-         % Max. depth reached.
    eval_score(P1,P2,Score),!.
minimax(_,[],P2,_,1,Score) :-         % P1 wins.
    eval_hand(P2,Hand2),
    Score is Hand2,!.
minimax(_,P1,[],_,0,Score) :-         % P2 wins.
    eval_hand(P1,Hand1),
    Score is -Hand1,!.
minimax(Depth,P1,P2,Board,0,Score) :- % P1 can play.
    valid_moves(P1,Board,Moves),
    (Moves=[Pick];minimax_(Depth,P1,P2,Board,Moves,0,Pick)),
    sub_tile(Pick,P1,P1_new),
    add_board(Pick,Board,Board_new),
    Depth_new is Depth-1,
    minimax(Depth_new,P1_new,P2,Board_new,1,Score),!.
minimax(Depth,P1,P2,Board,0,Score) :- % P1 can't play.
    has_moves(P2,Board),
    minimax(Depth,P1,P2,Board,1,Score),!.
minimax(Depth,P1,P2,Board,1,Score) :- % P2 can play.
    valid_moves(P2,Board,Moves),
    (Moves=[Pick];minimax_(Depth,P1,P2,Board,Moves,1,Pick)),
    sub_tile(Pick,P2,P2_new),
    add_board(Pick,Board,Board_new),
    Depth_new is Depth-1,
    minimax(Depth_new,P1,P2_new,Board_new,0,Score),!.
minimax(Depth,P1,P2,Board,1,Score) :- % P2 can't play.
    has_moves(P1,Board),
    !,minimax(Depth,P1,P2,Board,0,Score).
minimax(_,P1,P2,_,_,Score) :-         % Draw.
    eval_score(P1,P2,Score),!.

playout_minimax(P1,Pick,N,Board,Exc,Score) :-
    scenario(P1,Pick,N,Board,Exc,P1_new,P2,Board_new),
    minimax(10,P1_new,P2,Board_new,1,Score),!. % Controls depth.

decide_minimax_(P,N,Board,Exc,Moves,Pick) :-
    findall(
 	[Score,Tile],
 	(member(Tile,Moves),
	 n_playouts(
	     playout_minimax,
	     P,Tile,N,Board,Exc,200,Score)),Result),
    max_member([_,Pick],Result),!.

% Optionally make the first 2 moves by random play-out
% to save on processing.
% decide_minimax(P,N,Board,Exc,Pick) :-
%     N > 5,
%     decide_playout(P,N,Board,Exc,Pick).

decide_minimax(P,N,Board,Exc,Pick) :-
    valid_moves(P,Board,Moves),
    (Moves=[Pick]; decide_minimax_(P,N,Board,Exc,Moves,Pick)).

The predicate integrates a depth limit and an optional (commented) clause to make the first couple decisions by other means because that is where the processing load is greatest.

Here is a comparison of minimax vs random play-outs for 200 simulated games at varying numbers of scenarios. The depth is fixed to 10 for minimax, and the first two moves are made by random play-out for both players due to processing time.

# scenarios Score % wins \(P(X\le x)\)
50 63 55 .93
100 200 57 .98
200 104 54 .89
300 -51 51 .58
400 260 59 .99
500 -7 53 .82
600 141 53 .82
1200 7 55 .91

Minimax play-outs improve win rate by about 4% on average and the results scale linearly with the number of scenarios. This isn’t as big an improvement as I had expected. One reason could be that the number of possible hands remains high for most of the game which has the effect of overly hedging moves. Differences between minimax and random playouts may mostly be restricted to less frequent scenarios such as forceable moves.

Epilogue

What started as a musing ended up being a lengthy investigation into how to make decisions under uncertainty for this variant of Dominoes. I would summarise the valuable insights as follows:

  1. Simulation is a great way to learn about a problem. Setting up a simulator enabled me to run dozens of ad hoc experiments and to make observational conclusions.

  2. Random play-outs are surprisingly effective. I suspect it is because they capture response invariant tactics. Better-than-random play-outs can be achieved by making play-outs more realistic.

  3. Minimax play-outs are better – perhaps optimal – but not by as much as expected compared to random play-outs. They are also more complicated to implement and processor heavy.

  4. Prolog is great for giving expression to complicated logic. However, it is not so great for making the resultant procedures computationally efficient.

There were many other less direct practical benefits too, perhaps most importantly, I realised that “averaging over scenarios” is probably close to what I do naturally. That is, when faced with decision making under uncertainty, I tend toward playing out possible ways things could turn out and aggregating what that implies for what I should do now, then making a hedged decision weighted by the likelihood and importance of each scenario. It got me thinking about how I can apply random play-outs to garden variety thinking and when more scenarios considered randomly might be preferable to fewer scenarios considered carefully.