Skip to content

Instantly share code, notes, and snippets.

@okumin
Last active March 25, 2025 14:48
Show Gist options
  • Save okumin/a1a401e3ade967b4d742967d6023bef4 to your computer and use it in GitHub Desktop.
Save okumin/a1a401e3ade967b4d742967d6023bef4 to your computer and use it in GitHub Desktop.

Original Method

  1. HiveMetaStoreClient#createTable: Set the default catalog, set processor capabilities or else, and hook.preCreateTable
  2. SessionHiveMetaStoreClient#create_table: Bail out and create a temporary table if the give table is temporary
  3. HiveMetaStoreClient#create_table: Issue a Thrift request
  4. HiveMetaStoreClient#createTable: hook.CommitCreateTable

New Method

From the outmost wrapper,

  1. HiveMetaStoreClientWithHook: hook.preCreateTable
  2. HiveMetaStoreClientWithTmpTable: Bail out and create a temporary table if the given table is temporary
  3. HiveMetaStoreClientWithSessionFeature: Do nothing
  4. HiveMetaStoreClientWithLocalCache: Do nothing
  5. ThriftHiveMetaStoreClient: Set the default catalog name, set processor capabilities or else, and then issue a Thrift client
  6. HiveMetaStoreClientWithHook: hook.commitCreateTable

Comparison

Order Original New
1 Set the default catalog hook.preCreateTable
2 Set processor capabilities Tmp table handling
3 hook.preCreateTable Set the default catalog
4 Tmp table handling Set processor capabilities
5 Thrift request Thrift request
6 hook.CommitCreateTable hook.CommitCreateTable
  • The original implementation fills a catalog name or processor capabilities at a very early stage, so HiveMetaHook or another part can see the values

Original Method

  1. SessionHiveMetaStoreClient#getTable: Return a temp table if it exists
  2. HiveMetaStoreClient#getTable: Set the default catalog name, set processor capabilities
  3. SessionHiveMetaStoreClient#getTableInternal: Query cache handling
  4. HiveMetaStoreClientWithLocalCache#getTableInternal: HS2 cache handling
  5. HiveMetaStoreClient#getTableInternal: Issue a Thrift request
  6. HiveMetaStoreClient#getTable: HiveMetaHook#postGetTable, MetaStoreFilterHook#filterTable

New Method

  1. HiveMetaStoreClientWithHook#getTable: Set the default catalog
  2. HiveMetaStoreClientWithTmpTable#getTable: Temp table handling
  3. HiveMetaStoreClientWithSessionFeature#getTable Set the default catalog, query cache handling
  4. HiveMetaStoreClientWithLocalCache#getTable: Set the default catalog, HS2 cache handling
  5. ThriftHiveMetaStoreClient:getTable: Set the default catalog, processor capabilities, Issue a Thrift request
  6. HiveMetaStoreClientWithHook#getTable: HiveMetaHook#postGetTable, MetaStoreFilterHook#filterTable

Comparison

Order Original New
1 Tmp table handling Set the default catalog
2 Set the default catalog Tmp table handling
3 Set processor capabilities Set the default catalog
4 Query cache handling Query cache handling
5 HS2 cache handling HS2 cache handling
6 Thrift request Set the default catalog
7 HiveMetaHook#postGetTable Set processor capabilities
8 MetaStoreFilterHook#filterTable Thrift request
9 N/A HiveMetaHook#postGetTable
10 N/A MetaStoreFilterHook#filterTable
  • The timing setting processor capabilities are different
  • IMetaStoreClient#getTable is highly overloaded. Do we have to implement many in each delegation?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment