Skip to content

Instantly share code, notes, and snippets.

@jiawenyao
jiawenyao / longest_chinese_tokens_gpt4o.py
Created September 21, 2024 12:52 — forked from ctlllll/longest_chinese_tokens_gpt4o.py
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except: