Skip to content

Instantly share code, notes, and snippets.

@inneroot
Created March 24, 2025 11:50
Show Gist options
  • Save inneroot/9d843fe437c7f4291d7e3d8ae72759f1 to your computer and use it in GitHub Desktop.
Save inneroot/9d843fe437c7f4291d7e3d8ae72759f1 to your computer and use it in GitHub Desktop.
Convert wrong encoded cyrilic 'windows-1251' to 'utf-8"
def fix_double_encoded_cyrillic(garbled_text):
# Step 1: Get the raw bytes of the garbled text (as UTF-8)
raw_bytes = garbled_text.encode('latin1') # Preserves exact byte values
# Step 2: Convert bytes to hex list (for debugging)
hex_bytes = [hex(b) for b in raw_bytes] # ['0xc2', '0xe2', '0xe5', ...]
# Step 3: Reinterpret the bytes as Windows-1251 (Cyrillic)
fixed_text = raw_bytes.decode('windows-1251') # Correct decoding
return fixed_text
# Test it
garbled = """Ââåäåíèå"""
fixed = fix_double_encoded_cyrillic(garbled)
print(fixed) # Output: "Введение"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment