Skip to content

Instantly share code, notes, and snippets.

@insin
Last active May 19, 2025 04:02
Show Gist options
  • Save insin/f6d01d6fa626b29edefc53ab885e89e2 to your computer and use it in GitHub Desktop.
Save insin/f6d01d6fa626b29edefc53ab885e89e2 to your computer and use it in GitHub Desktop.

YouTube Audio Track Translations

{
  "runSettings": {
    "temperature": 0.5,
    "model": "models/gemini-2.5-pro-preview-05-06",
    "topP": 0.95,
    "topK": 64,
    "maxOutputTokens": 65536,
    "safetySettings": [{
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "OFF"
    }, {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "OFF"
    }, {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "OFF"
    }, {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "OFF"
    }],
    "responseMimeType": "text/plain",
    "enableCodeExecution": false,
    "enableSearchAsATool": false,
    "enableBrowseAsATool": false,
    "enableAutoFunctionResponse": false
  }
}

user

I have a locales object I use in my YouTube browser extension to find the original audio track for a video, which contains the word "original" (in English). I need to update this to support all the display languages YouTube supports:

const locales = {
  'ar': {
    ORIGINAL: 'أصلي',
  },
  'de': {
    ORIGINAL: 'Original',
  },
  'en': {
    ORIGINAL: 'original',
  },
  'fr': {
    ORIGINAL: 'original',
  },
  'hi': {
    ORIGINAL: '',
  },
  'ja-JP': {
    ORIGINAL: 'オリジナル',
  },
  'zh-Hans-CN': {
    ORIGINAL: '原始',
  },
}

I've extracted the following data from youtube, which pairs the actual lang attribute from YouTube's <html> element with the translation of the same video track name in that display language. I need to create an object in the format of the one above which I can use to perform a substring match on a track name with track.name.includes(...) to determine if it's the original track.

Extract the "orginal" translation from each of these strings required to do that. If you're not sure about a translation or can't extract it, create a placeholder for that language code and a comment it. Here is the data:

["af-ZA","Engels (Verenigde State van Amerika) oorspronklike"]
["am-ET","እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"]
["ar","الإنجليزية (الولايات المتحدة): أصلي"]
["as-IN","মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"]
["az-Latn-AZ","ingilis (Amerika Birləşmiş Ştatları) orijinal"]
["be-BY","англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"]
["bg-BG","английски (Съединени щати), оригинален"]
["bn-BD","মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"]
["bs-Latn-BA","engleski (Sjedinjene Države), original"]
["ca-ES","anglès (Estats Units) (original)"]
["cs-CZ","Angličtina (Spojené státy) (původní)"]
["da-DK","Engelsk (USA) (originalt)"]
["de-DE","Englisch (Vereinigte Staaten) (Original)"]
["el-GR","Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"]
["en-IN","English (United States) original"]
["es-419","Inglés (Estados Unidos) original"]
["es-ES","Inglés (Estados Unidos) original"]
["es-US","Inglés (Estados Unidos) original"]
["et-EE","inglise (Ameerika Ühendriigid), algne"]
["eu-ES","ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"]
["fa-IR","انگلیسی (ایالات متحده) اصلی"]
["fil-PH","Ingles (Estados Unidos) orihinal"]
["fr-CA","Anglais (États-Unis) – originale"]
["fr-FR","Anglais (États-Unis) original"]
["gl-ES","inglés (Estados Unidos) (orixinal)"]
["gu-IN","અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"]
["he-IL","אנגלית (ארצות הברית), מקור"]
["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"]
["hr-HR","Engleski (Sjedinjene Američke Države) – izvorno"]
["hu-HU","angol (Egyesült Államok) eredeti"]
["hy-AM","անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"]
["id-ID","Inggris (Amerika Serikat) asli"]
["is-IS","enska (Bandaríkin) upprunalegt"]
["it-IT","Inglese (Stati Uniti), originale"]
["ja-JP","英語 (アメリカ合衆国)(オリジナル)"]
["ka-GE","ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია"]
["kk-KZ","ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"]
["km-KH","អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"]
["kn-IN","ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"]
["ko-KR","영어(미국) 원본"]
["ky-KG","англисче (Кошмо Штаттар) түпнуска"]
["lo-LA","ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"]
["lt-LT","„anglų (Jungtinės Valstijos)“ – pradinis"]
["lv-LV","angļu (Amerikas Savienotās Valstis) (oriģināls)"]
["mk-MK","англиски (Соединети Американски Држави) оригинален"]
["ml-IN","ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"]
["mn-MN","англи (Америкийн Нэгдсэн Улс) нь эх хувь"]
["mr-IN","इंग्रजी (युनायटेड स्टेट्स) मूळ"]
["ms-MY","Inggeris (Amerika Syarikat) asal"]
["my-MM","အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"]
["nb-NO","engelsk (USA) original"]
["ne-NP","अङ्ग्रेजी (संयुक्त राज्य) मूल"]
["nl-NL","Originele Engels (Verenigde Staten)"]
["or-IN","ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ୍ର) ମୂଳ"]
["pa-Guru-IN","ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"]
["pl-PL","angielski (Stany Zjednoczone) oryginalny"]
["pt-BR","Inglês (Estados Unidos) original"]
["pt-PT","Inglês (Estados Unidos) original"]
["ro-RO","engleză (Statele Unite ale Americii) original"]
["ru-RU","Английский (Соединенные Штаты) (оригинальная)"]
["si-LK","ඉංග්‍රීසි (එක්සත් ජනපදය) මුල්"]
["sk-SK","Angličtina (Spojené štáty) – pôvodná zvuková stopa"]
["sl-SI","Izvirnik: angleščina (Združene države Amerike)"]
["sq-AL","anglisht (SHBA) origjinale"]
["sr-Cyrl-RS","енглески (Сједињене Државе), оригинална"]
["sr-Latn-RS","engleski (Sjedinjene Države), originalna"]
["sw-TZ","Kiingereza (Marekani) halisi"]
["ta-IN","ஆங்கிலம் (அமெரிக்கா) அசல்"]
["te-IN","ఇంగ్లీష్ (యునైటెడ్ స్టేట్స్) అసలైనది"]
["th-TH","เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)"]
["tr-TR","İngilizce (Amerika Birleşik Devletleri) orijinal"]
["uk-UA","Англійська (Сполучені Штати) (оригінал)"]
["ur-PK","انگریزی (ریاست ہائے متحدہ امریکہ) اصل"]
["uz-Latn-UZ","inglizcha (Amerika Qo‘shma Shtatlari) original"]
["vi-VN","Tiếng Anh (Hoa Kỳ) gốc"]
["zh-Hans-CN","英语(美国)原始"]
["zh-Hant-HK","英文(美國) (原聲)"]
["zh-Hant-TW","英文(美國) (原文)"]
["zu-ZA","i-English (i-United States) yokuqala"]

assistant (thinking)

The user wants to update a JavaScript object locales which stores the translation of the word "original" for different YouTube display languages. The goal is to use this translation to find the original audio track of a YouTube video by checking if the track name includes the translated "original". I need to process a list of language code/track name pairs. For each pair, I need to:

  1. Identify the language code.
  2. Extract the word for "original" from the track name string.
  3. Create an entry in the locales object in the format: 'lang-CODE': { ORIGINAL: 'translation' }.
  4. If the translation is ambiguous or cannot be reliably extracted, I should use a placeholder and add a comment.

Let's go through each provided string:

  • ["af-ZA","Engels (Verenigde State van Amerika) oorspronklike"] -> af-ZA: oorspronklike
  • ["am-ET","እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"] -> am-ET: የመጀመሪያ
  • ["ar","الإنجليزية (الولايات المتحدة): أصلي"] -> ar: أصلي (already in the example, but good to confirm)
  • ["as-IN","মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"] -> as-IN: মূল (means 'original' or 'root')
  • ["az-Latn-AZ","ingilis (Amerika Birləşmiş Ştatları) orijinal"] -> az-Latn-AZ: orijinal
  • ["be-BY","англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"] -> be-BY: арыгінальны
  • ["bg-BG","английски (Съединени щати), оригинален"] -> bg-BG: оригинален
  • ["bn-BD","মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"] -> bn-BD: মূল (means 'original' or 'root')
  • ["bs-Latn-BA","engleski (Sjedinjene Države), original"] -> bs-Latn-BA: original
  • ["ca-ES","anglès (Estats Units) (original)"] -> ca-ES: original
  • ["cs-CZ","Angličtina (Spojené státy) (původní)"] -> cs-CZ: původní
  • ["da-DK","Engelsk (USA) (originalt)"] -> da-DK: originalt
  • ["de-DE","Englisch (Vereinigte Staaten) (Original)"] -> de-DE: Original (already in the example, good to confirm)
  • ["el-GR","Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"] -> el-GR: πρωτότυπο (πρωτότυπο αρχείο means original file, so πρωτότυπο is original)
  • ["en-IN","English (United States) original"] -> en-IN: original
  • ["es-419","Inglés (Estados Unidos) original"] -> es-419: original
  • ["es-ES","Inglés (Estados Unidos) original"] -> es-ES: original
  • ["es-US","Inglés (Estados Unidos) original"] -> es-US: original
  • ["et-EE","inglise (Ameerika Ühendriigid), algne"] -> et-EE: algne
  • ["eu-ES","ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"] -> eu-ES: jatorrizkoa
  • ["fa-IR","انگلیسی (ایالات متحده) اصلی"] -> fa-IR: اصلی
  • ["fil-PH","Ingles (Estados Unidos) orihinal"] -> fil-PH: orihinal
  • ["fr-CA","Anglais (États-Unis) – originale"] -> fr-CA: originale
  • ["fr-FR","Anglais (États-Unis) original"] -> fr-FR: original (already in the example, good to confirm)
  • ["gl-ES","inglés (Estados Unidos) (orixinal)"] -> gl-ES: orixinal
  • ["gu-IN","અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"] -> gu-IN: ઑરિજિનલ
  • ["he-IL","אנגלית (ארצות הברית), מקור"] -> he-IL: מקור
  • ["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"] -> hi-IN: मूल (The example had an empty string, this is better. 'मूल भाषा' means 'original language')
  • ["hr-HR","Engleski (Sjedinjene Američke Države) – izvorno"] -> hr-HR: izvorno
  • ["hu-HU","angol (Egyesült Államok) eredeti"] -> hu-HU: eredeti
  • ["hy-AM","անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"] -> hy-AM: բնօրինակ
  • ["id-ID","Inggris (Amerika Serikat) asli"] -> id-ID: asli
  • ["is-IS","enska (Bandaríkin) upprunalegt"] -> is-IS: upprunalegt
  • ["it-IT","Inglese (Stati Uniti), originale"] -> it-IT: originale
  • ["ja-JP","英語 (アメリカ合衆国)(オリジナル)"] -> ja-JP: オリジナル (already in the example, good to confirm)
  • ["ka-GE","ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია"] -> ka-GE: ორიგინალია (or ორიგინალ if 'ია' is a suffix. Let's check. 'ორიგინალი' is 'original'. 'ორიგინალია' is 'is original'. So ორიგინალი is better for substring matching.)
  • ["kk-KZ","ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"] -> kk-KZ: түпнұсқа
  • ["km-KH","អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"] -> km-KH: ដើម
  • ["kn-IN","ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"] -> kn-IN: ಮೂಲ
  • ["ko-KR","영어(미국) 원본"] -> ko-KR: 원본
  • ["ky-KG","англисче (Кошмо Штаттар) түпнуска"] -> ky-KG: түпнуска
  • ["lo-LA","ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"] -> lo-LA: ຕົ້ນສະບັບ
  • ["lt-LT","„anglų (Jungtinės Valstijos)“ – pradinis"] -> lt-LT: pradinis
  • ["lv-LV","angļu (Amerikas Savienotās Valstis) (oriģināls)"] -> lv-LV: oriģināls
  • ["mk-MK","англиски (Соединети Американски Држави) оригинален"] -> mk-MK: оригинален
  • ["ml-IN","ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"] -> ml-IN: ഒറിജിനൽ
  • ["mn-MN","англи (Америкийн Нэгдсэн Улс) нь эх хувь"] -> mn-MN: эх хувь (means 'original copy' or 'master copy'. эх means 'source' or 'original'. Let's use эх хувь as it appears, or just эх? эх хувь seems safer for matching the full phrase.)
  • ["mr-IN","इंग्रजी (युनाईटेड स्टेट्स) मूळ"] -> mr-IN: मूळ
  • ["ms-MY","Inggeris (Amerika Syarikat) asal"] -> ms-MY: asal
  • ["my-MM","အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"] -> my-MM: မူရင်း
  • ["nb-NO","engelsk (USA) original"] -> nb-NO: original
  • ["ne-NP","अङ्ग्रेजी (संयुक्त राज्य) मूल"] -> ne-NP: मूल
  • ["nl-NL","Originele Engels (Verenigde Staten)"] -> nl-NL: Originele (Note: the word order is different here. "Originele" is at the beginning. This should be fine for includes.)
  • ["or-IN","ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ"] -> or-IN: ମୂଳ
  • ["pa-Guru-IN","ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"] -> pa-Guru-IN: ਮੂਲ
  • ["pl-PL","angielski (Stany Zjednoczone) oryginalny"] -> pl-PL: oryginalny
  • ["pt-BR","Inglês (Estados Unidos) original"] -> pt-BR: original
  • ["pt-PT","Inglês (Estados Unidos) original"] -> pt-PT: original
  • ["ro-RO","engleză (Statele Unite ale Americii) original"] -> ro-RO: original
  • ["ru-RU","Английский (Соединенные Штаты) (оригинальная)"] -> ru-RU: оригинальная (or оригинальный if gender matters for matching. оригинальная is feminine, likely referring to "дорожка" - track. For includes, оригинальн might be safer, or just use the full form seen). Let's stick to what's provided: оригинальная.
  • ["si-LK","ඉංග්‍රීසි (එක්සත් ජනපදය) මුල්"] -> si-LK: මුල්
  • ["sk-SK","Angličtina (Spojené štáty) – pôvodná zvuková stopa"] -> sk-SK: pôvodná (pôvodná zvuková stopa means original audio track. So pôvodná is original).
  • ["sl-SI","Izvirnik: angleščina (Združene države Amerike)"] -> sl-SI: Izvirnik (Word order is different. "Izvirnik:" means "Original:". This should be fine.)
  • ["sq-AL","anglisht (SHBA) origjinale"] -> sq-AL: origjinale
  • ["sr-Cyrl-RS","енглески (Сједињене Државе), оригинална"] -> sr-Cyrl-RS: оригинална
  • ["sr-Latn-RS","engleski (Sjedinjene Države), originalna"] -> sr-Latn-RS: originalna
  • ["sw-TZ","Kiingereza (Marekani) halisi"] -> sw-TZ: halisi
  • ["ta-IN","ஆங்கிலம் (அமெரிக்கா) அசல்"] -> ta-IN: அசல்
  • ["te-IN","ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది"] -> te-IN: అసలైనది (or అసలు. అసలైనది means 'the original one'. అసలు is 'original'. Let's use అసలైనది as it appears.)
  • ["th-TH","เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)"] -> th-TH: เสียงต้นฉบับ (means 'original audio'. ต้นฉบับ means 'original'. เสียงต้นฉบับ is more specific and likely what appears. Let's use ต้นฉบับ as it's the core word for "original".)
  • ["tr-TR","İngilizce (Amerika Birleşik Devletleri) orijinal"] -> tr-TR: orijinal
  • ["uk-UA","Англійська (Сполучені Штати) (оригінал)"] -> uk-UA: оригінал
  • ["ur-PK","انگریزی (ریاست ہائے متحدہ امریکہ) اصل"] -> ur-PK: اصل
  • ["uz-Latn-UZ","inglizcha (Amerika Qo‘shma Shtatlari) original"] -> uz-Latn-UZ: original
  • ["vi-VN","Tiếng Anh (Hoa Kỳ) gốc"] -> vi-VN: gốc
  • ["zh-Hans-CN","英语(美国)原始"] -> zh-Hans-CN: 原始 (already in the example, good to confirm)
  • ["zh-Hant-HK","英文(美國) (原聲)"] -> zh-Hant-HK: 原聲 (means 'original sound')
  • ["zh-Hant-TW","英文(美國) (原文)"] -> zh-Hant-TW: 原文 (means 'original text', but in context of audio track, it implies original language/audio)
  • ["zu-ZA","i-English (i-United States) yokuqala"] -> zu-ZA: yokuqala

Double check some of the more complex ones or where I made a choice:

  • ka-GE: "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია". ორიგინალია means "it is original". ორიგინალი means "original". For includes(), ორიგინალი would be more robust if the suffix changes. Let's use ორიგინალი.
  • mn-MN: "англи (Америкийн Нэгдсэн Улс) нь эх хувь". эх хувь means "original copy". эх means "source/original". эх хувь is probably what appears in the track name. Let's stick with эх хувь. If it's too specific, эх could be an alternative. For now, эх хувь.
  • ru-RU: "Английский (Соединенные Штаты) (оригинальная)". оригинальная (feminine adjective). If the noun it refers to changes gender or if it's just "Оригинал" (noun), this might not match. However, if YouTube consistently uses "оригинальная дорожка" (original track - feminine), then оригинальная is correct. Given the input format, this is the string to look for.
  • sk-SK: "Angličtina (Spojené štáty) – pôvodná zvuková stopa". pôvodná is "original" (feminine adjective). pôvodná zvuková stopa is "original audio track". pôvodná seems correct.
  • th-TH: "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)". เสียงต้นฉบับ means "original audio/sound". ต้นฉบับ means "original (manuscript/version)". ต้นฉบับ is the core word. If the track name is just "Original", ต้นฉบับ would match. If it's "Original Audio", เสียงต้นฉบับ would match. Let's use ต้นฉบับ for broader matching, assuming YouTube might sometimes shorten it. The user prompt says "substring match on a track name", so a shorter, more core term is often better.
  • te-IN: "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది". అసలైనది means "the original one". అసలు means "original". అసలు might be more robust. Let's use అసలు.

Let's re-evaluate based on "substring match on a track name with track.name.includes(...)". This means we need the exact substring that YouTube uses to denote "original".

Let's refine:

  • af-ZA: "Engels (Verenigde State van Amerika) oorspronklike" -> oorspronklike
  • am-ET: "እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ" -> የመጀመሪያ
  • ar: "الإنجليزية (الولايات المتحدة): أصلي" -> أصلي
  • as-IN: "মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)" -> মূল
  • az-Latn-AZ: "ingilis (Amerika Birləşmiş Ştatları) orijinal" -> orijinal
  • be-BY: "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)" -> арыгінальны
  • bg-BG: "английски (Съединени щати), оригинален" -> оригинален
  • bn-BD: "মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)" -> মূল
  • bs-Latn-BA: "engleski (Sjedinjene Države), original" -> original
  • ca-ES: "anglès (Estats Units) (original)" -> original
  • cs-CZ: "Angličtina (Spojené státy) (původní)" -> původní
  • da-DK: "Engelsk (USA) (originalt)" -> originalt
  • de-DE: "Englisch (Vereinigte Staaten) (Original)" -> Original
  • el-GR: "Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο" -> πρωτότυπο
  • en-IN: "English (United States) original" -> original (The original example had en: original. This is en-IN.) It's good to have a base en as well. The user's original object had en. Let's assume en and en-IN would be the same.
  • es-419: "Inglés (Estados Unidos) original" -> original
  • es-ES: "Inglés (Estados Unidos) original" -> original
  • es-US: "Inglés (Estados Unidos) original" -> original
  • et-EE: "inglise (Ameerika Ühendriigid), algne" -> algne
  • eu-ES: "ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)" -> jatorrizkoa
  • fa-IR: "انگلیسی (ایالات متحده) اصلی" -> اصلی
  • fil-PH: "Ingles (Estados Unidos) orihinal" -> orihinal
  • fr-CA: "Anglais (États-Unis) – originale" -> originale
  • fr-FR: "Anglais (États-Unis) original" -> original
  • gl-ES: "inglés (Estados Unidos) (orixinal)" -> orixinal
  • gu-IN: "અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ" -> ઑરિજિનલ
  • he-IL: "אנגלית (ארצות הברית), מקור" -> מקור
  • hi-IN: "मूल भाषा अंग्रेज़ी (संयुक्त राज्य)" -> मूल (This updates the empty string in the original example for hi)
  • hr-HR: "Engleski (Sjedinjene Američke Države) – izvorno" -> izvorno
  • hu-HU: "angol (Egyesült Államok) eredeti" -> eredeti
  • hy-AM: "անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)" -> բնօրինակ
  • id-ID: "Inggris (Amerika Serikat) asli" -> asli
  • is-IS: "enska (Bandaríkin) upprunalegt" -> upprunalegt
  • it-IT: "Inglese (Stati Uniti), originale" -> originale
  • ja-JP: "英語 (アメリカ合衆国)(オリジナル)" -> オリジナル
  • ka-GE: "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია" -> ორიგინალია. Reconsidering ორიგინალი vs ორიგინალია. The provided string is "ორიგინალია". If this is exactly what appears in the track name, then ორიგინალია is the correct substring to search for. If YouTube sometimes uses just "ორიგინალი", then ორიგინალი would be better. Without more examples, sticking to the provided string is safer. Let's use ორიგინალია.
  • kk-KZ: "ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)" -> түпнұсқа
  • km-KH: "អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម" -> ដើម
  • kn-IN: "ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆడ్ ಸ್ಟೇಟ್ಸ್) ಮೂಲ" -> ಮೂಲ
  • ko-KR: "영어(미국) 원본" -> 원본
  • ky-KG: "англисче (Кошмо Штаттар) түпнуска" -> түпнуска
  • lo-LA: "ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ" -> ຕົ້ນສະບັບ
  • lt-LT: "„anglų (Jungtinės Valstijos)“ – pradinis" -> pradinis
  • lv-LV: "angļu (Amerikas Savienotās Valstis) (oriģināls)" -> oriģināls
  • mk-MK: "англиски (Соединети Американски Држави) оригинален" -> оригинален
  • ml-IN: "ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ" -> ഒറിജിനൽ
  • mn-MN: "англи (Америкийн Нэгдсэн Улс) нь эх хувь" -> эх хувь
  • mr-IN: "इंग्रजी (युनाईटेड स्टेट्स) मूळ" -> मूळ
  • ms-MY: "Inggeris (Amerika Syarikat) asal" -> asal
  • my-MM: "အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း" -> မူရင်း
  • nb-NO: "engelsk (USA) original" -> original
  • ne-NP: "अङ्ग्रेजी (संयुक्त राज्य) मूल" -> मूल
  • nl-NL: "Originele Engels (Verenigde Staten)" -> Originele (Case sensitive match is important here if includes is case sensitive, which it is in JS)
  • or-IN: "ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ" -> ମୂଳ
  • pa-Guru-IN: "ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ" -> ਮੂલ
  • pl-PL: "angielski (Stany Zjednoczone) oryginalny" -> oryginalny
  • pt-BR: "Inglês (Estados Unidos) original" -> original
  • pt-PT: "Inglês (Estados Unidos) original" -> original
  • ro-RO: "engleză (Statele Unite ale Americii) original" -> original
  • ru-RU: "Английский (Соединенные Штаты) (оригинальная)" -> оригинальная
  • si-LK: "ඉංග්‍රීසි (එක්සත් ජනპදය) මුල්" -> මුල්
  • sk-SK: "Angličtina (Spojené štáty) – pôvodná zvuková stopa" -> pôvodná. The full string is "pôvodná zvuková stopa". If the track name only says "pôvodná", this is fine. If it says "pôvodná zvuková stopa", then pôvodná will match. If it says something like "Slovenčina (pôvodná)", then pôvodná is the key. This seems reasonable.
  • sl-SI: "Izvirnik: angleščina (Združene države Amerike)" -> Izvirnik (The colon might be part of it, or not. "Izvirnik:" is "Original:". If the track name is "Izvirnik: English", then Izvirnik: would be the match. If it's "English (Izvirnik)", then Izvirnik. The data shows "Izvirnik:". Let's assume the colon is part of the identifier in this case. Or perhaps just Izvirnik without the colon is safer if the colon is sometimes omitted or followed by a space that isn't always there. Given the data, Izvirnik: is what's present. Let's use Izvirnik to be a bit more general, as colons can be tricky in matching. Rechecking the prompt: "extract the 'original' translation". Izvirnik is "original". The colon is punctuation. So, Izvirnik.
  • sq-AL: "anglisht (SHBA) origjinale" -> origjinale
  • sr-Cyrl-RS: "енглески (Сједињене Државе), оригинална" -> оригинална
  • sr-Latn-RS: "engleski (Sjedinjene Države), originalna" -> originalna
  • sw-TZ: "Kiingereza (Marekani) halisi" -> halisi
  • ta-IN: "ஆங்கிலம் (அமெரிக்கா) அசல்" -> அசல்
  • te-IN: "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది" -> అసలైనది. Reconsidering అసలు vs అసలైనది. The provided string is "అసలైనది". If this is consistently used, then it's the correct substring. అసలు is the root word "original". అసలైనది is "original one". Let's stick to the provided data: అసలైనది.
  • th-TH: "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)" -> เสียงต้นฉบับ. My previous thought was ต้นฉบับ. But the data shows เสียงต้นฉบับ at the beginning. If this is the consistent pattern, เสียงต้นฉบับ ("original audio") is the string to search for. This is more specific. Let's use เสียงต้นฉบับ.
  • tr-TR: "İngilizce (Amerika Birleşik Devletleri) orijinal" -> orijinal
  • uk-UA: "Англійська (Сполучені Штати) (оригінал)" -> оригінал
  • ur-PK: "انگریزی (ریاست ہائے متحدہ امریکہ) اصل" -> اصل
  • uz-Latn-UZ: "inglizcha (Amerika Qo‘shma Shtatlari) original" -> original
  • vi-VN: "Tiếng Anh (Hoa Kỳ) gốc" -> gốc
  • zh-Hans-CN: "英语(美国)原始" -> 原始
  • zh-Hant-HK: "英文(美國) (原聲)" -> 原聲
  • zh-Hant-TW: "英文(美國) (原文)" -> 原文
  • zu-ZA: "i-English (i-United States) yokuqala" -> yokuqala

The user's initial locales object:

const locales = {
  'ar': {
    ORIGINAL: 'أصلي',
  },
  'de': { // My data has de-DE: Original
    ORIGINAL: 'Original',
  },
  'en': { // My data has en-IN: original
    ORIGINAL: 'original',
  },
  'fr': { // My data has fr-FR: original and fr-CA: originale
    ORIGINAL: 'original',
  },
  'hi': { // My data has hi-IN: मूल
    ORIGINAL: '',
  },
  'ja-JP': { // My data has ja-JP: オリジナル
    ORIGINAL: 'オリジナル',
  },
  'zh-Hans-CN': { // My data has zh-Hans-CN: 原始
    ORIGINAL: '原始',
  },
}

My new list will be more comprehensive and specific with regional variants. I need to make sure the output format is exactly as requested:

const locales = {
  'lang-CODE': {
    ORIGINAL: 'translation',
  },
  // ...
}

And handle cases where I'm unsure with a placeholder and comment. For most of these, the extraction seems straightforward as the "original" keyword is usually at the end or clearly identifiable.

Let's review the "unsure" or "placeholder" condition.

  • ka-GE: ორიგინალია. I'm fairly confident, but it includes a suffix that means "is". If the base form ორიგინალი is sometimes used, this might be too specific. I'll use ორიგინალია as per the data and add a comment.
  • sl-SI: Izvirnik. I decided against Izvirnik: to be more general. This seems fine.
  • te-IN: అసలైనది. Similar to Georgian, this means "the original one". The root is అసలు. Sticking to data: అసలైనది. Comment might be useful.
  • th-TH: เสียงต้นฉบับ. This means "original audio". If only "original" (ต้นฉบับ) is used sometimes, this is specific. Sticking to data: เสียงต้นฉบับ. Comment might be useful.

Let's consider the existing entries in the user's locales object and how they map to the new data.

  • ar: أصلي (matches)
  • de: Original (my de-DE is Original, matches)
  • en: original (my en-IN is original, matches. I should probably add a generic en entry as well, or use en-US if that's the most common. The provided data has en-IN. Let's assume en should be original.)
  • fr: original (my fr-FR is original. fr-CA is originale. The user's fr might be for fr-FR.)
  • hi: '' (my hi-IN is मूल. This is a definite improvement.)
  • ja-JP: オリジナル (matches)
  • zh-Hans-CN: 原始 (matches)

I should create a new object. The user requests to "create an object in the format of the one above". This implies replacing or generating a new, more complete one.

Let's list the final decisions for each language code and its "original" keyword.

  1. af-ZA: oorspronklike
  2. am-ET: የመጀመሪያ
  3. ar: أصلي
  4. as-IN: মূল
  5. az-Latn-AZ: orijinal
  6. be-BY: арыгінальны (from (арыгінальны))
  7. bg-BG: оригинален
  8. bn-BD: মূল
  9. bs-Latn-BA: original
  10. ca-ES: original (from (original))
  11. cs-CZ: původní (from (původní))
  12. da-DK: originalt (from (originalt))
  13. de-DE: Original (from (Original))
  14. el-GR: πρωτότυπο
  15. en-IN: original
  16. es-419: original
  17. es-ES: original
  18. es-US: original
  19. et-EE: algne
  20. eu-ES: jatorrizkoa (from (jatorrizkoa))
  21. fa-IR: اصلی
  22. fil-PH: orihinal
  23. fr-CA: originale
  24. fr-FR: original
  25. gl-ES: orixinal (from (orixinal))
  26. gu-IN: ઑરિજિનલ
  27. he-IL: מקור
  28. hi-IN: मूल
  29. hr-HR: izvorno
  30. hu-HU: eredeti
  31. hy-AM: բնօրինակ (from (բնօրինակ))
  32. id-ID: asli
  33. is-IS: upprunalegt
  34. it-IT: originale
  35. ja-JP: オリジナル (from (オリジナル))
  36. ka-GE: ორიგინალია // Comment: "is original". Consider ორიგინალი (original) if this is too specific.
  37. kk-KZ: түпнұсқа (from (түпнұсқа))
  38. km-KH: ដើម
  39. kn-IN: ಮೂಲ
  40. ko-KR: 원본
  41. ky-KG: түпнуска
  42. lo-LA: ຕົ້ນສະບັບ
  43. lt-LT: pradinis
  44. lv-LV: oriģināls (from (oriģināls))
  45. mk-MK: оригинален
  46. ml-IN: ഒറിജിനൽ
  47. mn-MN: эх хувь
  48. mr-IN: मूळ
  49. ms-MY: asal
  50. my-MM: မူရင်း
  51. nb-NO: original
  52. ne-NP: मूल
  53. nl-NL: Originele
  54. or-IN: ମୂଳ
  55. pa-Guru-IN: ਮୂଳ
  56. pl-PL: oryginalny
  57. pt-BR: original
  58. pt-PT: original
  59. ro-RO: original
  60. ru-RU: оригинальная (from (оригинальная))
  61. si-LK: මුල්
  62. sk-SK: pôvodná // Comment: From "pôvodná zvuková stopa" (original audio track).
  63. sl-SI: Izvirnik // Comment: From "Izvirnik:" (Original:).
  64. sq-AL: origjinale
  65. sr-Cyrl-RS: оригинална
  66. sr-Latn-RS: originalna
  67. sw-TZ: halisi
  68. ta-IN: அசல்
  69. te-IN: అసలైనది // Comment: "the original one". Consider అసలు (original) if this is too specific.
  70. th-TH: เสียงต้นฉบับ // Comment: "original audio". Consider ต้นฉบับ (original) if this is too specific.
  71. tr-TR: orijinal
  72. uk-UA: оригінал (from (оригінал))
  73. ur-PK: اصل
  74. uz-Latn-UZ: original
  75. vi-VN: gốc
  76. zh-Hans-CN: 原始
  77. zh-Hant-HK: 原聲 (from (原聲))
  78. zh-Hant-TW: 原文 (from (原文))
  79. zu-ZA: yokuqala

The words in parentheses like (original) are often the ones I need. I should strip the parentheses. Example: be-BY: "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)" -> арыгінальны Example: ja-JP: "英語 (アメリカ合衆国)(オリジナル)" -> オリジナル (using full-width parentheses)

I need to be careful about stripping parentheses only when they surround the target word. Most of the target words are the last word, or the word inside trailing parentheses.

Let's re-check the extraction logic for words in parentheses:

  • be-BY: (арыгінальны) -> арыгінальны - Correct.
  • ca-ES: (original) -> original - Correct.
  • cs-CZ: (původní) -> původní - Correct.
  • da-DK: (originalt) -> originalt - Correct.
  • de-DE: (Original) -> Original - Correct.
  • eu-ES: (jatorrizkoa) -> jatorrizkoa - Correct.
  • gl-ES: (orixinal) -> orixinal - Correct.
  • hy-AM: (բնօրինակ) -> բնօրինակ - Correct.
  • ja-JP: (オリジナル) -> オリジナル - Correct.
  • kk-KZ: (түпнұсқа) -> түпнұсқа - Correct.
  • lv-LV: (oriģināls) -> oriģināls - Correct.
  • ru-RU: (оригинальная) -> оригинальная - Correct.
  • uk-UA: (оригінал) -> оригінал - Correct.
  • zh-Hant-HK: (原聲) -> 原聲 - Correct.
  • zh-Hant-TW: (原文) -> 原文 - Correct.

This seems consistent. The words are generally the last significant word or the word in the final set of parentheses.

What about the en key from the original example? The new data provides en-IN. It's common to have a base en (often en-US) as a fallback. The user's original object has en: { ORIGINAL: 'original' }. I should probably include this, or if there's an en-US in the list (there isn't, but es-US exists), use that. Since the user explicitly provided en in their example, and the value original is consistent with en-IN, I should retain an en entry. The provided data doesn't have a generic en line. I will add en: { ORIGINAL: 'original' } based on the user's initial object and the commonality.

The prompt asks to create an object "in the format of the one above". The user's initial object had some base language codes like ar, de, en, fr, hi. My new data has more specific codes like de-DE, en-IN, fr-FR, fr-CA, hi-IN. It's probably best to use the specific codes from the new data. If the user wants to consolidate de-DE into de, they can do that. I will provide the most specific information I have. So for de, I will use de-DE. For fr, I will have fr-FR and fr-CA. For hi, I will have hi-IN.

Let's consider the structure:

const locales = {
  'af-ZA': { ORIGINAL: 'oorspronklike' },
  'am-ET': { ORIGINAL: 'የመጀመሪያ' },
  // ... and so on for all 79 entries
};

This looks like the correct plan. I will generate the full object. I will also include the comments for the few cases identified. The initial locales object also had ja-JP and zh-Hans-CN which are present in the new list. The hi entry will be updated from '' to मूल.

Final check on comments:

  • ka-GE: ორიგინალია // Comment: "is original". The word ორიგინალი (original) might be a safer general match if variations exist.
  • sk-SK: pôvodná // Comment: Extracted from "pôvodná zvuková stopa" (original audio track); refers to "original".
  • sl-SI: Izvirnik // Comment: Extracted from "Izvirnik:" (Original:); refers to "Original".
  • te-IN: అసలైనది // Comment: "the original one". The word అసలు (original) might be a safer general match if variations exist.
  • th-TH: เสียงต้นฉบับ // Comment: "original audio". The word ต้นฉบับ (original) might be a safer general match if variations exist.

These comments highlight potential areas where a more general term might be better if the current specific term doesn't cover all cases. However, based on the provided data, these are the direct extractions.

The user did not provide an explicit en or en-US entry in the data list, only en-IN. The original locales object has en: { ORIGINAL: 'original' }. I should probably include en: { ORIGINAL: 'original' } for completeness, as it was in the original template, and it's a very common language. Or, I can just use en-IN from the data. The prompt says "update this to support all the display languages YouTube supports" and then provides data. I should stick to the provided data. So en-IN will be the English entry from the list. If the user wants a generic en, they can alias it.

Let's make sure there are no truly ambiguous cases where I should put a placeholder. Most seem quite clear. The ones with comments are not ambiguous in terms of extraction from the given string, but rather about their general applicability if YouTube uses slight variations. This fits the "If you're not sure about a translation or can't extract it, create a placeholder for that language code and a comment it" criteria for the "not sure about a translation" part (specifically, its robustness). But I did extract it. So, the comment is the way to go. I don't see any language where I cannot extract the word.

One final check on the hi entry. Original: hi: { ORIGINAL: '' }. New data: ["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"]. Extracted: मूल. This is a good update.

The problem states "extract the 'original' translation from each of these string required to do that". This implies I should be as precise as possible to what's given in the string. My refined list of extracted words seems to follow this principle. The output should be a single JS const definition.

assistant

Okay, I can help you create the updated locales object based on the data you've provided. I will extract the term for "original" from each string. For cases where the extraction might be very specific to the provided example string, I'll add a comment.

Here's the generated locales object:

const locales = {
  'af-ZA': {
    ORIGINAL: 'oorspronklike', // From "Engels (Verenigde State van Amerika) oorspronklike"
  },
  'am-ET': {
    ORIGINAL: 'የመጀመሪያ', // From "እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"
  },
  'ar': {
    ORIGINAL: 'أصلي', // From "الإنجليزية (الولايات المتحدة): أصلي"
  },
  'as-IN': {
    ORIGINAL: 'মূল', // From "মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"
  },
  'az-Latn-AZ': {
    ORIGINAL: 'orijinal', // From "ingilis (Amerika Birləşmiş Ştatları) orijinal"
  },
  'be-BY': {
    ORIGINAL: 'арыгінальны', // From "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"
  },
  'bg-BG': {
    ORIGINAL: 'оригинален', // From "английски (Съединени щати), оригинален"
  },
  'bn-BD': {
    ORIGINAL: 'মূল', // From "মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"
  },
  'bs-Latn-BA': {
    ORIGINAL: 'original', // From "engleski (Sjedinjene Države), original"
  },
  'ca-ES': {
    ORIGINAL: 'original', // From "anglès (Estats Units) (original)"
  },
  'cs-CZ': {
    ORIGINAL: 'původní', // From "Angličtina (Spojené státy) (původní)"
  },
  'da-DK': {
    ORIGINAL: 'originalt', // From "Engelsk (USA) (originalt)"
  },
  'de-DE': {
    ORIGINAL: 'Original', // From "Englisch (Vereinigte Staaten) (Original)"
  },
  'el-GR': {
    ORIGINAL: 'πρωτότυπο', // From "Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"
  },
  'en-IN': {
    ORIGINAL: 'original', // From "English (United States) original"
  },
  // Note: You might want to add a generic 'en' entry as well, e.g., 'en': { ORIGINAL: 'original' }, if your system falls back to it.
  'es-419': {
    ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (Latin American Spanish)
  },
  'es-ES': {
    ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (Spain Spanish)
  },
  'es-US': {
    ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (US Spanish)
  },
  'et-EE': {
    ORIGINAL: 'algne', // From "inglise (Ameerika Ühendriigid), algne"
  },
  'eu-ES': {
    ORIGINAL: 'jatorrizkoa', // From "ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"
  },
  'fa-IR': {
    ORIGINAL: 'اصلی', // From "انگلیسی (ایالات متحده) اصلی"
  },
  'fil-PH': {
    ORIGINAL: 'orihinal', // From "Ingles (Estados Unidos) orihinal"
  },
  'fr-CA': {
    ORIGINAL: 'originale', // From "Anglais (États-Unis) – originale"
  },
  'fr-FR': {
    ORIGINAL: 'original', // From "Anglais (États-Unis) original"
  },
  'gl-ES': {
    ORIGINAL: 'orixinal', // From "inglés (Estados Unidos) (orixinal)"
  },
  'gu-IN': {
    ORIGINAL: 'ઑરિજિનલ', // From "અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"
  },
  'he-IL': {
    ORIGINAL: 'מקור', // From "אנגלית (ארצות הברית), מקור"
  },
  'hi-IN': {
    ORIGINAL: 'मूल', // From "मूल भाषा अंग्रेज़ी (संयुक्त राज्य)" (Updates previous empty string for 'hi')
  },
  'hr-HR': {
    ORIGINAL: 'izvorno', // From "Engleski (Sjedinjene Američke Države) – izvorno"
  },
  'hu-HU': {
    ORIGINAL: 'eredeti', // From "angol (Egyesült Államok) eredeti"
  },
  'hy-AM': {
    ORIGINAL: 'բնօրինակ', // From "անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"
  },
  'id-ID': {
    ORIGINAL: 'asli', // From "Inggris (Amerika Serikat) asli"
  },
  'is-IS': {
    ORIGINAL: 'upprunalegt', // From "enska (Bandaríkin) upprunalegt"
  },
  'it-IT': {
    ORIGINAL: 'originale', // From "Inglese (Stati Uniti), originale"
  },
  'ja-JP': {
    ORIGINAL: 'オリジナル', // From "英語 (アメリカ合衆国)(オリジナル)"
  },
  'ka-GE': {
    ORIGINAL: 'ორიგინალია', // From "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია". Comment: This means "is original". The word `ორიგინალი` (original) might be a safer general match if variations exist.
  },
  'kk-KZ': {
    ORIGINAL: 'түпнұсқа', // From "ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"
  },
  'km-KH': {
    ORIGINAL: 'ដើម', // From "អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"
  },
  'kn-IN': {
    ORIGINAL: 'ಮೂಲ', // From "ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"
  },
  'ko-KR': {
    ORIGINAL: '원본', // From "영어(미국) 원본"
  },
  'ky-KG': {
    ORIGINAL: 'түпнуска', // From "англисче (Кошмо Штаттар) түпнуска"
  },
  'lo-LA': {
    ORIGINAL: 'ຕົ້ນສະບັບ', // From "ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"
  },
  'lt-LT': {
    ORIGINAL: 'pradinis', // From "„anglų (Jungtinės Valstijos)“ – pradinis"
  },
  'lv-LV': {
    ORIGINAL: 'oriģināls', // From "angļu (Amerikas Savienotās Valstis) (oriģināls)"
  },
  'mk-MK': {
    ORIGINAL: 'оригинален', // From "англиски (Соединети Американски Држави) оригинален"
  },
  'ml-IN': {
    ORIGINAL: 'ഒറിജിനൽ', // From "ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"
  },
  'mn-MN': {
    ORIGINAL: 'эх хувь', // From "англи (Америкийн Нэгдсэн Улс) нь эх хувь" (means 'original copy')
  },
  'mr-IN': {
    ORIGINAL: 'मूळ', // From "इंग्रजी (युनाईटेड स्टेट्स) मूळ"
  },
  'ms-MY': {
    ORIGINAL: 'asal', // From "Inggeris (Amerika Syarikat) asal"
  },
  'my-MM': {
    ORIGINAL: 'မူရင်း', // From "အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"
  },
  'nb-NO': {
    ORIGINAL: 'original', // From "engelsk (USA) original"
  },
  'ne-NP': {
    ORIGINAL: 'मूल', // From "अङ्ग्रेजी (संयुक्त राज्य) मूल"
  },
  'nl-NL': {
    ORIGINAL: 'Originele', // From "Originele Engels (Verenigde Staten)" (Note capitalization)
  },
  'or-IN': {
    ORIGINAL: 'ମୂଳ', // From "ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ"
  },
  'pa-Guru-IN': {
    ORIGINAL: 'ਮੂਲ', // From "ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"
  },
  'pl-PL': {
    ORIGINAL: 'oryginalny', // From "angielski (Stany Zjednoczone) oryginalny"
  },
  'pt-BR': {
    ORIGINAL: 'original', // From "Inglês (Estados Unidos) original"
  },
  'pt-PT': {
    ORIGINAL: 'original', // From "Inglês (Estados Unidos) original"
  },
  'ro-RO': {
    ORIGINAL: 'original', // From "engleză (Statele Unite ale Americii) original"
  },
  'ru-RU': {
    ORIGINAL: 'оригинальная', // From "Английский (Соединенные Штаты) (оригинальная)"
  },
  'si-LK': {
    ORIGINAL: 'මුල්', // From "ඉංග්‍රීසි (එක්සත් ජනපදය) මුල්"
  },
  'sk-SK': {
    ORIGINAL: 'pôvodná', // From "Angličtina (Spojené štáty) – pôvodná zvuková stopa". Comment: Extracted from "pôvodná zvuková stopa" (original audio track); refers to "original".
  },
  'sl-SI': {
    ORIGINAL: 'Izvirnik', // From "Izvirnik: angleščina (Združene države Amerike)". Comment: Extracted from "Izvirnik:" (Original:); refers to "Original".
  },
  'sq-AL': {
    ORIGINAL: 'origjinale', // From "anglisht (SHBA) origjinale"
  },
  'sr-Cyrl-RS': {
    ORIGINAL: 'оригинална', // From "енглески (Сједињене Државе), оригинална"
  },
  'sr-Latn-RS': {
    ORIGINAL: 'originalna', // From "engleski (Sjedinjene Države), originalna"
  },
  'sw-TZ': {
    ORIGINAL: 'halisi', // From "Kiingereza (Marekani) halisi"
  },
  'ta-IN': {
    ORIGINAL: 'அசல்', // From "ஆங்கிலம் (அமெரிக்கா) அசல்"
  },
  'te-IN': {
    ORIGINAL: 'అసలైనది', // From "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది". Comment: This means "the original one". The word `అసలు` (original) might be a safer general match if variations exist.
  },
  'th-TH': {
    ORIGINAL: 'เสียงต้นฉบับ', // From "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)". Comment: This means "original audio". The word `ต้นฉบับ` (original) might be a safer general match if variations exist.
  },
  'tr-TR': {
    ORIGINAL: 'orijinal', // From "İngilizce (Amerika Birleşik Devletleri) orijinal"
  },
  'uk-UA': {
    ORIGINAL: 'оригінал', // From "Англійська (Сполучені Штати) (оригінал)"
  },
  'ur-PK': {
    ORIGINAL: 'اصل', // From "انگریزی (ریاست ہائے متحدہ امریکہ) اصل"
  },
  'uz-Latn-UZ': {
    ORIGINAL: 'original', // From "inglizcha (Amerika Qo‘shma Shtatlari) original"
  },
  'vi-VN': {
    ORIGINAL: 'gốc', // From "Tiếng Anh (Hoa Kỳ) gốc"
  },
  'zh-Hans-CN': {
    ORIGINAL: '原始', // From "英语(美国)原始"
  },
  'zh-Hant-HK': {
    ORIGINAL: '原聲', // From "英文(美國) (原聲)" (means 'original sound')
  },
  'zh-Hant-TW': {
    ORIGINAL: '原文', // From "英文(美國) (原文)" (means 'original text/version')
  },
  'zu-ZA': {
    ORIGINAL: 'yokuqala', // From "i-English (i-United States) yokuqala"
  },
};

This object should provide a comprehensive set of translations for "original" based on the YouTube data you've supplied. Remember that track.name.includes(...) is case-sensitive, so the casing in the ORIGINAL values is important (e.g., Original for German, Originele for Dutch). The comments highlight a few cases where the extracted term is very specific, and a more general root word might be considered if these specific terms don't always match.

user

I forgot to mention, I do have a fallback in place from the verbatim language code to base language code and finally to "en" - I'm just going to use "en" for all the English entries as it's my native language, so it's the only one I'm confident to do this in. This looks good to me, please provide a final object without the comments

assistant (thinking)

The user wants the same JavaScript object as before, but with two modifications:

  1. All English-based locale codes (en-IN, en-US, etc.) should be consolidated under a single en key, using the value 'original'.
  2. All comments within the object should be removed.

I will iterate through the previously generated object. If the key starts with en-, I will skip it, ensuring that a separate en entry is added. For all other keys, I will copy the ORIGINAL value. Finally, I will add the en entry.

assistant

Okay, understood. Here's the final locales object with the English entries consolidated under en and all comments removed:

const locales = {
  'af-ZA': {
    ORIGINAL: 'oorspronklike',
  },
  'am-ET': {
    ORIGINAL: 'የመጀመሪያ',
  },
  'ar': {
    ORIGINAL: 'أصلي',
  },
  'as-IN': {
    ORIGINAL: 'মূল',
  },
  'az-Latn-AZ': {
    ORIGINAL: 'orijinal',
  },
  'be-BY': {
    ORIGINAL: 'арыгінальны',
  },
  'bg-BG': {
    ORIGINAL: 'оригинален',
  },
  'bn-BD': {
    ORIGINAL: 'মূল',
  },
  'bs-Latn-BA': {
    ORIGINAL: 'original',
  },
  'ca-ES': {
    ORIGINAL: 'original',
  },
  'cs-CZ': {
    ORIGINAL: 'původní',
  },
  'da-DK': {
    ORIGINAL: 'originalt',
  },
  'de-DE': {
    ORIGINAL: 'Original',
  },
  'el-GR': {
    ORIGINAL: 'πρωτότυπο',
  },
  'en': {
    ORIGINAL: 'original',
  },
  'es-419': {
    ORIGINAL: 'original',
  },
  'es-ES': {
    ORIGINAL: 'original',
  },
  'es-US': {
    ORIGINAL: 'original',
  },
  'et-EE': {
    ORIGINAL: 'algne',
  },
  'eu-ES': {
    ORIGINAL: 'jatorrizkoa',
  },
  'fa-IR': {
    ORIGINAL: 'اصلی',
  },
  'fil-PH': {
    ORIGINAL: 'orihinal',
  },
  'fr-CA': {
    ORIGINAL: 'originale',
  },
  'fr-FR': {
    ORIGINAL: 'original',
  },
  'gl-ES': {
    ORIGINAL: 'orixinal',
  },
  'gu-IN': {
    ORIGINAL: 'ઑરિજિનલ',
  },
  'he-IL': {
    ORIGINAL: 'מקור',
  },
  'hi-IN': {
    ORIGINAL: 'मूल',
  },
  'hr-HR': {
    ORIGINAL: 'izvorno',
  },
  'hu-HU': {
    ORIGINAL: 'eredeti',
  },
  'hy-AM': {
    ORIGINAL: 'բնօրինակ',
  },
  'id-ID': {
    ORIGINAL: 'asli',
  },
  'is-IS': {
    ORIGINAL: 'upprunalegt',
  },
  'it-IT': {
    ORIGINAL: 'originale',
  },
  'ja-JP': {
    ORIGINAL: 'オリジナル',
  },
  'ka-GE': {
    ORIGINAL: 'ორიგინალია',
  },
  'kk-KZ': {
    ORIGINAL: 'түпнұсқа',
  },
  'km-KH': {
    ORIGINAL: 'ដើម',
  },
  'kn-IN': {
    ORIGINAL: 'ಮೂಲ',
  },
  'ko-KR': {
    ORIGINAL: '원본',
  },
  'ky-KG': {
    ORIGINAL: 'түпнуска',
  },
  'lo-LA': {
    ORIGINAL: 'ຕົ້ນສະບັບ',
  },
  'lt-LT': {
    ORIGINAL: 'pradinis',
  },
  'lv-LV': {
    ORIGINAL: 'oriģināls',
  },
  'mk-MK': {
    ORIGINAL: 'оригинален',
  },
  'ml-IN': {
    ORIGINAL: 'ഒറിജിനൽ',
  },
  'mn-MN': {
    ORIGINAL: 'эх хувь',
  },
  'mr-IN': {
    ORIGINAL: 'मूळ',
  },
  'ms-MY': {
    ORIGINAL: 'asal',
  },
  'my-MM': {
    ORIGINAL: 'မူရင်း',
  },
  'nb-NO': {
    ORIGINAL: 'original',
  },
  'ne-NP': {
    ORIGINAL: 'मूल',
  },
  'nl-NL': {
    ORIGINAL: 'Originele',
  },
  'or-IN': {
    ORIGINAL: 'ମୂଳ',
  },
  'pa-Guru-IN': {
    ORIGINAL: 'ਮੂਲ',
  },
  'pl-PL': {
    ORIGINAL: 'oryginalny',
  },
  'pt-BR': {
    ORIGINAL: 'original',
  },
  'pt-PT': {
    ORIGINAL: 'original',
  },
  'ro-RO': {
    ORIGINAL: 'original',
  },
  'ru-RU': {
    ORIGINAL: 'оригинальная',
  },
  'si-LK': {
    ORIGINAL: 'මුල්',
  },
  'sk-SK': {
    ORIGINAL: 'pôvodná',
  },
  'sl-SI': {
    ORIGINAL: 'Izvirnik',
  },
  'sq-AL': {
    ORIGINAL: 'origjinale',
  },
  'sr-Cyrl-RS': {
    ORIGINAL: 'оригинална',
  },
  'sr-Latn-RS': {
    ORIGINAL: 'originalna',
  },
  'sw-TZ': {
    ORIGINAL: 'halisi',
  },
  'ta-IN': {
    ORIGINAL: 'அசல்',
  },
  'te-IN': {
    ORIGINAL: 'అసలైనది',
  },
  'th-TH': {
    ORIGINAL: 'เสียงต้นฉบับ',
  },
  'tr-TR': {
    ORIGINAL: 'orijinal',
  },
  'uk-UA': {
    ORIGINAL: 'оригінал',
  },
  'ur-PK': {
    ORIGINAL: 'اصل',
  },
  'uz-Latn-UZ': {
    ORIGINAL: 'original',
  },
  'vi-VN': {
    ORIGINAL: 'gốc',
  },
  'zh-Hans-CN': {
    ORIGINAL: '原始',
  },
  'zh-Hant-HK': {
    ORIGINAL: '原聲',
  },
  'zh-Hant-TW': {
    ORIGINAL: '原文',
  },
  'zu-ZA': {
    ORIGINAL: 'yokuqala',
  },
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment