Kaiyu Zheng zkytony

Why

To answer the questions, (1) how was the correctness of pomdp_py's implementation of POUCT validated, and (2) does it behave correctly? Prompted by this issue.

How

In the Tiger domain, with initial belief [0.5, 0.5], compare the value at the root of the POUCT search tree built after planning for the first action with the optimal value produced by pomdp-solve's vi pruning algorithm (an optimal solver) on the Tiger domain. The value in POUCT search tree should be an estimate of the optimal value and should be close.

Reference: Installing Python scripts and modules

Package structure

├── my_pkg
│   ├── setup.py
│   ├── scripts
│   │   └── executable_script
 └── src

	import random
	import pprint
	import pomdp_py
	import seaborn as sns
	import pandas as pd
	import matplotlib.pyplot as plt

	from pomdp_py.algorithms.value_function import expected_reward, belief_observation_model
	from pomdp_py.problems.tiger.tiger_problem import TigerProblem, TigerState

	;;; mypy-mode.el --- Navigate Mypy Output in Emacs

	;; Copyright (C) 2023 Kaiyu Zheng

	;; Author: Your Name <[email protected]>
	;; Keywords: convenience
	;; Version: 0.0.1
	;; Package-Requires: ((emacs "24.3"))

	;;; Commentary:

	// Using ROS Humble
	// /author: Kaiyu Zheng
	#include <chrono>
	#include <rclcpp/rclcpp.hpp>
	#include <message_filters/subscriber.h>
	#include <message_filters/synchronizer.h>
	#include <message_filters/sync_policies/approximate_time.h>
	#include <std_msgs/msg/header.hpp>
	#include <geometry_msgs/msg/point_stamped.hpp>

	# Using ROS Humble
	# /author: Kaiyu Zheng
	import rclpy
	import message_filters
	import geometry_msgs.msg
	import std_msgs.msg
	from rclpy.node import Node
	from rclpy.qos import QoSProfile, QoSDurabilityPolicy

	"""
	Example of defining a small, tabular POMDP and solving
	it using Cassandra's pomdp-solve value iteration solver.

	Refer to documentation:
	https://h2r.github.io/pomdp-py/html/examples.external_solvers.html
	"""
	import pomdp_py

	def cryingbaby():

	;;; package --- Summary

	;;;; Commentary:

	;;; Code:

	(require 'cl)

	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	;; Note taking ;;

	<?xml version="1.0"?>
	<launch>
	<arg name="limited" default="false"
	doc="If true, limits joint range [-PI, PI] on all joints." />
	<arg name="paused" default="true"
	doc="Starts gazebo in paused mode" />
	<arg name="gui" default="true"
	doc="Starts gazebo gui" />
	<!-- Robot pose -->
	<arg name="x" default="0"/>

	"""
	Computes Legendre Polynomials


	M m (2n-2m)! n-2m
	P (x) = sum (-1) ------------------- x
	n m=0 n
	2 m! (n-m)! (n-2m)!

	where M = n // 2