To answer the questions, (1) how was the correctness of pomdp_py's implementation of POUCT validated, and (2) does it behave correctly? Prompted by this issue.
In the Tiger domain, with initial belief [0.5, 0.5], compare the value at the root of the POUCT search tree built after planning for the first action with the optimal value produced by pomdp-solve's vi pruning algorithm (an optimal solver) on the Tiger domain. The value in POUCT search tree should be an estimate of the optimal value and should be close.