Two long games reached effectively the same position but finished with different scores: 1,811,320 vs 1,810,888, a gap of 432. The boards looked the same, so the scores "should" have matched. They don't, and there is no bug. Here is the full explanation, with the exact numbers.
There is a moment where both games show an identical set of 16 tiles (total mass 131,040), and the scores are exactly the quoted pair:
Game A (03g5vf4j) |
Game B (03gasnewa) |
|
|---|---|---|
| board | identical tile set, mass 131040 | identical tile set, mass 131040 |
| score | 1,811,320 | 1,810,888 |
| 4-tiles spawned all game | 5,891 | 5,999 |
Same board, score differs by 432.
The "one board has a 4, the other has a 2" difference spotted between two screenshots is a red herring. It is just two adjacent frames, and a single small tile is worth at most a few points, not 432. The real cause is not visible on the board at all.
The score is not a function of the board alone. Let
The first two terms depend only on the tiles in front of you. The last term
depends on your spawn luck. So two players who reach the exact same board
differ in score by precisely
Every point of score comes from a merge: combining two tiles of value
- A merge of two
$v$ tiles changes$T$ by
which is exactly the score it adds.
- A spawn of a tile of value
$s$ adds$s\log_2 s$ to$T$ for free (no score). A spawned 2 adds$2$ ; a spawned 4 adds$8$ .
So
because merges conserve mass. Substituting and simplifying gives
A spawned 2 contributes
A spawned 2 is neutral: you still have to merge your way up from it, scoring every step. A spawned 4 is 4 units of mass handed to you that you did not have to build by merging two 2s, and that merge would have scored 4 points. Every 4 the game spawns lowers your ceiling by 4 for the same final board.
Game B was handed 108 more 4-tiles over its ~59,500 moves than game A (5,999 vs 5,891). That is ordinary variance in the spawn RNG; both games expected roughly 5,950. Each extra 4 costs 4 points:
Game A merged up from 2s more often, so it banked 432 more points for the same board.
The identity was checked against every sampled snapshot of both full games (hundreds of thousands of moves). The implied 4-spawn count,
always came out a non-negative integer, never a fraction. If merges were
miscounted or score leaked anywhere,