Skip to content

Instantly share code, notes, and snippets.

@ivanopagano
Last active March 29, 2017 09:14
Show Gist options
  • Save ivanopagano/cf179c56b9674e810b2df240f5985592 to your computer and use it in GitHub Desktop.
Save ivanopagano/cf179c56b9674e810b2df240f5985592 to your computer and use it in GitHub Desktop.
redeploy scenario on 3-instance cluster and split-brain happens. keep-majority strategy

Base scenario

The policy is keep-majority, new instances are started one by one, on redeploy newest instances are downed first (YOUNGEST_FIRST)

  • 3 nodes up: N1, N2, N3

Redeploy

  1. a redeploy is started, so a new node is spun (N+)
  2. N+ is acked by all the nodes, so the Leader (L) is about to set the node to UP
  3. split brain separates in 2 partitions of the same size P1 = {N1, N2}, P2 = {N3, N+`}

scenarios

What happens depends on which net partition holds the oldest node (OLD) and the leader (L)

Case # | OldestIn | LeaderIn
   1   |    P1    |    P1    
   2   |    P1    |    P2    
   3   |    P2    |    P1    
   4   |    P2    |    P2    

Case 1 [Converge] P1 survives P2 is shutdown

  • P1 eventually sees N+ because L is on P1, it sees that there's a tie but decides to live because OLD is on P1 too
  • P2 will not see N+ as UP so it will decide to shutdown, being minority

Case 2 [Converge] P1 survives P2 is shutdown

  • P1 will not see N+ because L is on P2, so it will keep living because he's majority
  • P2 will see N+ as UP and think we have a tie, but L is here so it will decide to shutdown, because it knows OLD is on P1

Case 3 [Diverge] both P1 and P2 are shutdown

  • P1 eventually sees N+ because L is on P1, it sees that there's a tie but decides to shutdown because OLD is on P2
  • P2 will not see N+ as UP so it will decide to shutdown, being minority

Case 3 [Diverge - Worst case] both P1 and P2 survive

  • P1 will not see N+ because L is on P2, so it will keep living because he's majority
  • P2 will eventually see N+ as UP and think we have a tie, but L is here so it will decide to survive, because it knows that OLD too is on P2
@ivanopagano
Copy link
Author

Wow, thanks for the clarification, @rcavalcanti, this is even worse than expected. We have no way to ensure the oldest nodes survive then.

@ivanopagano
Copy link
Author

ivanopagano commented Mar 29, 2017

considering new information:

  • the leader is elected as the lowest address
  • lowest address partition wins on a node count tie

the cases simplifies a lot

  1. leader/lower is on P1
  2. leader/lower is on P2

case 1 [Converge]

  • P1 knows about N+ and will survive because, even with a tie, it holds the lower node
  • P2 doesn't know about N+ and will shutdown, considering himself minority

case 2 [Diverge]

  • P1 doesn't know about N+ and will survive, considering himself majority
  • P2 knows about N+ and will survive because, even with a tie, it holds the lower node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment