{
"runSettings": {
"temperature": 0.5,
"model": "models/gemini-2.5-pro-preview-05-06",
"topP": 0.95,
"topK": 64,
"maxOutputTokens": 65536,
"safetySettings": [{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "OFF"
}, {
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "OFF"
}, {
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "OFF"
}, {
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "OFF"
}],
"responseMimeType": "text/plain",
"enableCodeExecution": false,
"enableSearchAsATool": false,
"enableBrowseAsATool": false,
"enableAutoFunctionResponse": false
}
}
I have a locales object I use in my YouTube browser extension to find the original audio track for a video, which contains the word "original" (in English). I need to update this to support all the display languages YouTube supports:
const locales = {
'ar': {
ORIGINAL: 'أصلي',
},
'de': {
ORIGINAL: 'Original',
},
'en': {
ORIGINAL: 'original',
},
'fr': {
ORIGINAL: 'original',
},
'hi': {
ORIGINAL: '',
},
'ja-JP': {
ORIGINAL: 'オリジナル',
},
'zh-Hans-CN': {
ORIGINAL: '原始',
},
}
I've extracted the following data from youtube, which pairs the actual lang attribute from YouTube's <html>
element with the translation of the same video track name in that display language. I need to create an object in the format of the one above which I can use to perform a substring match on a track name with track.name.includes(...)
to determine if it's the original track.
Extract the "orginal" translation from each of these strings required to do that. If you're not sure about a translation or can't extract it, create a placeholder for that language code and a comment it. Here is the data:
["af-ZA","Engels (Verenigde State van Amerika) oorspronklike"]
["am-ET","እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"]
["ar","الإنجليزية (الولايات المتحدة): أصلي"]
["as-IN","মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"]
["az-Latn-AZ","ingilis (Amerika Birləşmiş Ştatları) orijinal"]
["be-BY","англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"]
["bg-BG","английски (Съединени щати), оригинален"]
["bn-BD","মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"]
["bs-Latn-BA","engleski (Sjedinjene Države), original"]
["ca-ES","anglès (Estats Units) (original)"]
["cs-CZ","Angličtina (Spojené státy) (původní)"]
["da-DK","Engelsk (USA) (originalt)"]
["de-DE","Englisch (Vereinigte Staaten) (Original)"]
["el-GR","Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"]
["en-IN","English (United States) original"]
["es-419","Inglés (Estados Unidos) original"]
["es-ES","Inglés (Estados Unidos) original"]
["es-US","Inglés (Estados Unidos) original"]
["et-EE","inglise (Ameerika Ühendriigid), algne"]
["eu-ES","ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"]
["fa-IR","انگلیسی (ایالات متحده) اصلی"]
["fil-PH","Ingles (Estados Unidos) orihinal"]
["fr-CA","Anglais (États-Unis) – originale"]
["fr-FR","Anglais (États-Unis) original"]
["gl-ES","inglés (Estados Unidos) (orixinal)"]
["gu-IN","અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"]
["he-IL","אנגלית (ארצות הברית), מקור"]
["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"]
["hr-HR","Engleski (Sjedinjene Američke Države) – izvorno"]
["hu-HU","angol (Egyesült Államok) eredeti"]
["hy-AM","անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"]
["id-ID","Inggris (Amerika Serikat) asli"]
["is-IS","enska (Bandaríkin) upprunalegt"]
["it-IT","Inglese (Stati Uniti), originale"]
["ja-JP","英語 (アメリカ合衆国)(オリジナル)"]
["ka-GE","ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია"]
["kk-KZ","ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"]
["km-KH","អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"]
["kn-IN","ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"]
["ko-KR","영어(미국) 원본"]
["ky-KG","англисче (Кошмо Штаттар) түпнуска"]
["lo-LA","ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"]
["lt-LT","„anglų (Jungtinės Valstijos)“ – pradinis"]
["lv-LV","angļu (Amerikas Savienotās Valstis) (oriģināls)"]
["mk-MK","англиски (Соединети Американски Држави) оригинален"]
["ml-IN","ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"]
["mn-MN","англи (Америкийн Нэгдсэн Улс) нь эх хувь"]
["mr-IN","इंग्रजी (युनायटेड स्टेट्स) मूळ"]
["ms-MY","Inggeris (Amerika Syarikat) asal"]
["my-MM","အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"]
["nb-NO","engelsk (USA) original"]
["ne-NP","अङ्ग्रेजी (संयुक्त राज्य) मूल"]
["nl-NL","Originele Engels (Verenigde Staten)"]
["or-IN","ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ୍ର) ମୂଳ"]
["pa-Guru-IN","ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"]
["pl-PL","angielski (Stany Zjednoczone) oryginalny"]
["pt-BR","Inglês (Estados Unidos) original"]
["pt-PT","Inglês (Estados Unidos) original"]
["ro-RO","engleză (Statele Unite ale Americii) original"]
["ru-RU","Английский (Соединенные Штаты) (оригинальная)"]
["si-LK","ඉංග්රීසි (එක්සත් ජනපදය) මුල්"]
["sk-SK","Angličtina (Spojené štáty) – pôvodná zvuková stopa"]
["sl-SI","Izvirnik: angleščina (Združene države Amerike)"]
["sq-AL","anglisht (SHBA) origjinale"]
["sr-Cyrl-RS","енглески (Сједињене Државе), оригинална"]
["sr-Latn-RS","engleski (Sjedinjene Države), originalna"]
["sw-TZ","Kiingereza (Marekani) halisi"]
["ta-IN","ஆங்கிலம் (அமெரிக்கா) அசல்"]
["te-IN","ఇంగ్లీష్ (యునైటెడ్ స్టేట్స్) అసలైనది"]
["th-TH","เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)"]
["tr-TR","İngilizce (Amerika Birleşik Devletleri) orijinal"]
["uk-UA","Англійська (Сполучені Штати) (оригінал)"]
["ur-PK","انگریزی (ریاست ہائے متحدہ امریکہ) اصل"]
["uz-Latn-UZ","inglizcha (Amerika Qo‘shma Shtatlari) original"]
["vi-VN","Tiếng Anh (Hoa Kỳ) gốc"]
["zh-Hans-CN","英语(美国)原始"]
["zh-Hant-HK","英文(美國) (原聲)"]
["zh-Hant-TW","英文(美國) (原文)"]
["zu-ZA","i-English (i-United States) yokuqala"]
The user wants to update a JavaScript object locales
which stores the translation of the word "original" for different YouTube display languages.
The goal is to use this translation to find the original audio track of a YouTube video by checking if the track name includes the translated "original".
I need to process a list of language code/track name pairs. For each pair, I need to:
- Identify the language code.
- Extract the word for "original" from the track name string.
- Create an entry in the
locales
object in the format:'lang-CODE': { ORIGINAL: 'translation' }
. - If the translation is ambiguous or cannot be reliably extracted, I should use a placeholder and add a comment.
Let's go through each provided string:
["af-ZA","Engels (Verenigde State van Amerika) oorspronklike"]
->af-ZA
:oorspronklike
["am-ET","እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"]
->am-ET
:የመጀመሪያ
["ar","الإنجليزية (الولايات المتحدة): أصلي"]
->ar
:أصلي
(already in the example, but good to confirm)["as-IN","মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"]
->as-IN
:মূল
(means 'original' or 'root')["az-Latn-AZ","ingilis (Amerika Birləşmiş Ştatları) orijinal"]
->az-Latn-AZ
:orijinal
["be-BY","англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"]
->be-BY
:арыгінальны
["bg-BG","английски (Съединени щати), оригинален"]
->bg-BG
:оригинален
["bn-BD","মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"]
->bn-BD
:মূল
(means 'original' or 'root')["bs-Latn-BA","engleski (Sjedinjene Države), original"]
->bs-Latn-BA
:original
["ca-ES","anglès (Estats Units) (original)"]
->ca-ES
:original
["cs-CZ","Angličtina (Spojené státy) (původní)"]
->cs-CZ
:původní
["da-DK","Engelsk (USA) (originalt)"]
->da-DK
:originalt
["de-DE","Englisch (Vereinigte Staaten) (Original)"]
->de-DE
:Original
(already in the example, good to confirm)["el-GR","Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"]
->el-GR
:πρωτότυπο
(πρωτότυπο αρχείο means original file, so πρωτότυπο is original)["en-IN","English (United States) original"]
->en-IN
:original
["es-419","Inglés (Estados Unidos) original"]
->es-419
:original
["es-ES","Inglés (Estados Unidos) original"]
->es-ES
:original
["es-US","Inglés (Estados Unidos) original"]
->es-US
:original
["et-EE","inglise (Ameerika Ühendriigid), algne"]
->et-EE
:algne
["eu-ES","ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"]
->eu-ES
:jatorrizkoa
["fa-IR","انگلیسی (ایالات متحده) اصلی"]
->fa-IR
:اصلی
["fil-PH","Ingles (Estados Unidos) orihinal"]
->fil-PH
:orihinal
["fr-CA","Anglais (États-Unis) – originale"]
->fr-CA
:originale
["fr-FR","Anglais (États-Unis) original"]
->fr-FR
:original
(already in the example, good to confirm)["gl-ES","inglés (Estados Unidos) (orixinal)"]
->gl-ES
:orixinal
["gu-IN","અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"]
->gu-IN
:ઑરિજિનલ
["he-IL","אנגלית (ארצות הברית), מקור"]
->he-IL
:מקור
["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"]
->hi-IN
:मूल
(The example had an empty string, this is better. 'मूल भाषा' means 'original language')["hr-HR","Engleski (Sjedinjene Američke Države) – izvorno"]
->hr-HR
:izvorno
["hu-HU","angol (Egyesült Államok) eredeti"]
->hu-HU
:eredeti
["hy-AM","անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"]
->hy-AM
:բնօրինակ
["id-ID","Inggris (Amerika Serikat) asli"]
->id-ID
:asli
["is-IS","enska (Bandaríkin) upprunalegt"]
->is-IS
:upprunalegt
["it-IT","Inglese (Stati Uniti), originale"]
->it-IT
:originale
["ja-JP","英語 (アメリカ合衆国)(オリジナル)"]
->ja-JP
:オリジナル
(already in the example, good to confirm)["ka-GE","ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია"]
->ka-GE
:ორიგინალია
(orორიგინალ
if 'ია' is a suffix. Let's check. 'ორიგინალი' is 'original'. 'ორიგინალია' is 'is original'. Soორიგინალი
is better for substring matching.)["kk-KZ","ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"]
->kk-KZ
:түпнұсқа
["km-KH","អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"]
->km-KH
:ដើម
["kn-IN","ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"]
->kn-IN
:ಮೂಲ
["ko-KR","영어(미국) 원본"]
->ko-KR
:원본
["ky-KG","англисче (Кошмо Штаттар) түпнуска"]
->ky-KG
:түпнуска
["lo-LA","ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"]
->lo-LA
:ຕົ້ນສະບັບ
["lt-LT","„anglų (Jungtinės Valstijos)“ – pradinis"]
->lt-LT
:pradinis
["lv-LV","angļu (Amerikas Savienotās Valstis) (oriģināls)"]
->lv-LV
:oriģināls
["mk-MK","англиски (Соединети Американски Држави) оригинален"]
->mk-MK
:оригинален
["ml-IN","ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"]
->ml-IN
:ഒറിജിനൽ
["mn-MN","англи (Америкийн Нэгдсэн Улс) нь эх хувь"]
->mn-MN
:эх хувь
(means 'original copy' or 'master copy'.эх
means 'source' or 'original'. Let's useэх хувь
as it appears, or justэх
?эх хувь
seems safer for matching the full phrase.)["mr-IN","इंग्रजी (युनाईटेड स्टेट्स) मूळ"]
->mr-IN
:मूळ
["ms-MY","Inggeris (Amerika Syarikat) asal"]
->ms-MY
:asal
["my-MM","အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"]
->my-MM
:မူရင်း
["nb-NO","engelsk (USA) original"]
->nb-NO
:original
["ne-NP","अङ्ग्रेजी (संयुक्त राज्य) मूल"]
->ne-NP
:मूल
["nl-NL","Originele Engels (Verenigde Staten)"]
->nl-NL
:Originele
(Note: the word order is different here. "Originele" is at the beginning. This should be fine forincludes
.)["or-IN","ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ"]
->or-IN
:ମୂଳ
["pa-Guru-IN","ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"]
->pa-Guru-IN
:ਮੂਲ
["pl-PL","angielski (Stany Zjednoczone) oryginalny"]
->pl-PL
:oryginalny
["pt-BR","Inglês (Estados Unidos) original"]
->pt-BR
:original
["pt-PT","Inglês (Estados Unidos) original"]
->pt-PT
:original
["ro-RO","engleză (Statele Unite ale Americii) original"]
->ro-RO
:original
["ru-RU","Английский (Соединенные Штаты) (оригинальная)"]
->ru-RU
:оригинальная
(orоригинальный
if gender matters for matching.оригинальная
is feminine, likely referring to "дорожка" - track. Forincludes
,оригинальн
might be safer, or just use the full form seen). Let's stick to what's provided:оригинальная
.["si-LK","ඉංග්රීසි (එක්සත් ජනපදය) මුල්"]
->si-LK
:මුල්
["sk-SK","Angličtina (Spojené štáty) – pôvodná zvuková stopa"]
->sk-SK
:pôvodná
(pôvodná zvuková stopa means original audio track. Sopôvodná
is original).["sl-SI","Izvirnik: angleščina (Združene države Amerike)"]
->sl-SI
:Izvirnik
(Word order is different. "Izvirnik:" means "Original:". This should be fine.)["sq-AL","anglisht (SHBA) origjinale"]
->sq-AL
:origjinale
["sr-Cyrl-RS","енглески (Сједињене Државе), оригинална"]
->sr-Cyrl-RS
:оригинална
["sr-Latn-RS","engleski (Sjedinjene Države), originalna"]
->sr-Latn-RS
:originalna
["sw-TZ","Kiingereza (Marekani) halisi"]
->sw-TZ
:halisi
["ta-IN","ஆங்கிலம் (அமெரிக்கா) அசல்"]
->ta-IN
:அசல்
["te-IN","ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది"]
->te-IN
:అసలైనది
(orఅసలు
.అసలైనది
means 'the original one'.అసలు
is 'original'. Let's useఅసలైనది
as it appears.)["th-TH","เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)"]
->th-TH
:เสียงต้นฉบับ
(means 'original audio'.ต้นฉบับ
means 'original'.เสียงต้นฉบับ
is more specific and likely what appears. Let's useต้นฉบับ
as it's the core word for "original".)["tr-TR","İngilizce (Amerika Birleşik Devletleri) orijinal"]
->tr-TR
:orijinal
["uk-UA","Англійська (Сполучені Штати) (оригінал)"]
->uk-UA
:оригінал
["ur-PK","انگریزی (ریاست ہائے متحدہ امریکہ) اصل"]
->ur-PK
:اصل
["uz-Latn-UZ","inglizcha (Amerika Qo‘shma Shtatlari) original"]
->uz-Latn-UZ
:original
["vi-VN","Tiếng Anh (Hoa Kỳ) gốc"]
->vi-VN
:gốc
["zh-Hans-CN","英语(美国)原始"]
->zh-Hans-CN
:原始
(already in the example, good to confirm)["zh-Hant-HK","英文(美國) (原聲)"]
->zh-Hant-HK
:原聲
(means 'original sound')["zh-Hant-TW","英文(美國) (原文)"]
->zh-Hant-TW
:原文
(means 'original text', but in context of audio track, it implies original language/audio)["zu-ZA","i-English (i-United States) yokuqala"]
->zu-ZA
:yokuqala
Double check some of the more complex ones or where I made a choice:
ka-GE
: "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია".ორიგინალია
means "it is original".ორიგინალი
means "original". Forincludes()
,ორიგინალი
would be more robust if the suffix changes. Let's useორიგინალი
.mn-MN
: "англи (Америкийн Нэгдсэн Улс) нь эх хувь".эх хувь
means "original copy".эх
means "source/original".эх хувь
is probably what appears in the track name. Let's stick withэх хувь
. If it's too specific,эх
could be an alternative. For now,эх хувь
.ru-RU
: "Английский (Соединенные Штаты) (оригинальная)".оригинальная
(feminine adjective). If the noun it refers to changes gender or if it's just "Оригинал" (noun), this might not match. However, if YouTube consistently uses "оригинальная дорожка" (original track - feminine), thenоригинальная
is correct. Given the input format, this is the string to look for.sk-SK
: "Angličtina (Spojené štáty) – pôvodná zvuková stopa".pôvodná
is "original" (feminine adjective).pôvodná zvuková stopa
is "original audio track".pôvodná
seems correct.th-TH
: "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)".เสียงต้นฉบับ
means "original audio/sound".ต้นฉบับ
means "original (manuscript/version)".ต้นฉบับ
is the core word. If the track name is just "Original",ต้นฉบับ
would match. If it's "Original Audio",เสียงต้นฉบับ
would match. Let's useต้นฉบับ
for broader matching, assuming YouTube might sometimes shorten it. The user prompt says "substring match on a track name", so a shorter, more core term is often better.te-IN
: "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది".అసలైనది
means "the original one".అసలు
means "original".అసలు
might be more robust. Let's useఅసలు
.
Let's re-evaluate based on "substring match on a track name with track.name.includes(...)
".
This means we need the exact substring that YouTube uses to denote "original".
Let's refine:
af-ZA
: "Engels (Verenigde State van Amerika) oorspronklike" ->oorspronklike
am-ET
: "እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ" ->የመጀመሪያ
ar
: "الإنجليزية (الولايات المتحدة): أصلي" ->أصلي
as-IN
: "মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)" ->মূল
az-Latn-AZ
: "ingilis (Amerika Birləşmiş Ştatları) orijinal" ->orijinal
be-BY
: "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)" ->арыгінальны
bg-BG
: "английски (Съединени щати), оригинален" ->оригинален
bn-BD
: "মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)" ->মূল
bs-Latn-BA
: "engleski (Sjedinjene Države), original" ->original
ca-ES
: "anglès (Estats Units) (original)" ->original
cs-CZ
: "Angličtina (Spojené státy) (původní)" ->původní
da-DK
: "Engelsk (USA) (originalt)" ->originalt
de-DE
: "Englisch (Vereinigte Staaten) (Original)" ->Original
el-GR
: "Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο" ->πρωτότυπο
en-IN
: "English (United States) original" ->original
(The original example haden
:original
. This isen-IN
.) It's good to have a baseen
as well. The user's original object haden
. Let's assumeen
anden-IN
would be the same.es-419
: "Inglés (Estados Unidos) original" ->original
es-ES
: "Inglés (Estados Unidos) original" ->original
es-US
: "Inglés (Estados Unidos) original" ->original
et-EE
: "inglise (Ameerika Ühendriigid), algne" ->algne
eu-ES
: "ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)" ->jatorrizkoa
fa-IR
: "انگلیسی (ایالات متحده) اصلی" ->اصلی
fil-PH
: "Ingles (Estados Unidos) orihinal" ->orihinal
fr-CA
: "Anglais (États-Unis) – originale" ->originale
fr-FR
: "Anglais (États-Unis) original" ->original
gl-ES
: "inglés (Estados Unidos) (orixinal)" ->orixinal
gu-IN
: "અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ" ->ઑરિજિનલ
he-IL
: "אנגלית (ארצות הברית), מקור" ->מקור
hi-IN
: "मूल भाषा अंग्रेज़ी (संयुक्त राज्य)" ->मूल
(This updates the empty string in the original example forhi
)hr-HR
: "Engleski (Sjedinjene Američke Države) – izvorno" ->izvorno
hu-HU
: "angol (Egyesült Államok) eredeti" ->eredeti
hy-AM
: "անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)" ->բնօրինակ
id-ID
: "Inggris (Amerika Serikat) asli" ->asli
is-IS
: "enska (Bandaríkin) upprunalegt" ->upprunalegt
it-IT
: "Inglese (Stati Uniti), originale" ->originale
ja-JP
: "英語 (アメリカ合衆国)(オリジナル)" ->オリジナル
ka-GE
: "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია" ->ორიგინალია
. Reconsideringორიგინალი
vsორიგინალია
. The provided string is "ორიგინალია". If this is exactly what appears in the track name, thenორიგინალია
is the correct substring to search for. If YouTube sometimes uses just "ორიგინალი", thenორიგინალი
would be better. Without more examples, sticking to the provided string is safer. Let's useორიგინალია
.kk-KZ
: "ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)" ->түпнұсқа
km-KH
: "អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម" ->ដើម
kn-IN
: "ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆడ్ ಸ್ಟೇಟ್ಸ್) ಮೂಲ" ->ಮೂಲ
ko-KR
: "영어(미국) 원본" ->원본
ky-KG
: "англисче (Кошмо Штаттар) түпнуска" ->түпнуска
lo-LA
: "ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ" ->ຕົ້ນສະບັບ
lt-LT
: "„anglų (Jungtinės Valstijos)“ – pradinis" ->pradinis
lv-LV
: "angļu (Amerikas Savienotās Valstis) (oriģināls)" ->oriģināls
mk-MK
: "англиски (Соединети Американски Држави) оригинален" ->оригинален
ml-IN
: "ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ" ->ഒറിജിനൽ
mn-MN
: "англи (Америкийн Нэгдсэн Улс) нь эх хувь" ->эх хувь
mr-IN
: "इंग्रजी (युनाईटेड स्टेट्स) मूळ" ->मूळ
ms-MY
: "Inggeris (Amerika Syarikat) asal" ->asal
my-MM
: "အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း" ->မူရင်း
nb-NO
: "engelsk (USA) original" ->original
ne-NP
: "अङ्ग्रेजी (संयुक्त राज्य) मूल" ->मूल
nl-NL
: "Originele Engels (Verenigde Staten)" ->Originele
(Case sensitive match is important here ifincludes
is case sensitive, which it is in JS)or-IN
: "ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ" ->ମୂଳ
pa-Guru-IN
: "ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ" ->ਮੂલ
pl-PL
: "angielski (Stany Zjednoczone) oryginalny" ->oryginalny
pt-BR
: "Inglês (Estados Unidos) original" ->original
pt-PT
: "Inglês (Estados Unidos) original" ->original
ro-RO
: "engleză (Statele Unite ale Americii) original" ->original
ru-RU
: "Английский (Соединенные Штаты) (оригинальная)" ->оригинальная
si-LK
: "ඉංග්රීසි (එක්සත් ජනპදය) මුල්" ->මුල්
sk-SK
: "Angličtina (Spojené štáty) – pôvodná zvuková stopa" ->pôvodná
. The full string is "pôvodná zvuková stopa". If the track name only says "pôvodná", this is fine. If it says "pôvodná zvuková stopa", thenpôvodná
will match. If it says something like "Slovenčina (pôvodná)", thenpôvodná
is the key. This seems reasonable.sl-SI
: "Izvirnik: angleščina (Združene države Amerike)" ->Izvirnik
(The colon might be part of it, or not. "Izvirnik:" is "Original:". If the track name is "Izvirnik: English", thenIzvirnik:
would be the match. If it's "English (Izvirnik)", thenIzvirnik
. The data shows "Izvirnik:". Let's assume the colon is part of the identifier in this case. Or perhaps justIzvirnik
without the colon is safer if the colon is sometimes omitted or followed by a space that isn't always there. Given the data,Izvirnik:
is what's present. Let's useIzvirnik
to be a bit more general, as colons can be tricky in matching. Rechecking the prompt: "extract the 'original' translation".Izvirnik
is "original". The colon is punctuation. So,Izvirnik
.sq-AL
: "anglisht (SHBA) origjinale" ->origjinale
sr-Cyrl-RS
: "енглески (Сједињене Државе), оригинална" ->оригинална
sr-Latn-RS
: "engleski (Sjedinjene Države), originalna" ->originalna
sw-TZ
: "Kiingereza (Marekani) halisi" ->halisi
ta-IN
: "ஆங்கிலம் (அமெரிக்கா) அசல்" ->அசல்
te-IN
: "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది" ->అసలైనది
. Reconsideringఅసలు
vsఅసలైనది
. The provided string is "అసలైనది". If this is consistently used, then it's the correct substring.అసలు
is the root word "original".అసలైనది
is "original one". Let's stick to the provided data:అసలైనది
.th-TH
: "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)" ->เสียงต้นฉบับ
. My previous thought wasต้นฉบับ
. But the data showsเสียงต้นฉบับ
at the beginning. If this is the consistent pattern,เสียงต้นฉบับ
("original audio") is the string to search for. This is more specific. Let's useเสียงต้นฉบับ
.tr-TR
: "İngilizce (Amerika Birleşik Devletleri) orijinal" ->orijinal
uk-UA
: "Англійська (Сполучені Штати) (оригінал)" ->оригінал
ur-PK
: "انگریزی (ریاست ہائے متحدہ امریکہ) اصل" ->اصل
uz-Latn-UZ
: "inglizcha (Amerika Qo‘shma Shtatlari) original" ->original
vi-VN
: "Tiếng Anh (Hoa Kỳ) gốc" ->gốc
zh-Hans-CN
: "英语(美国)原始" ->原始
zh-Hant-HK
: "英文(美國) (原聲)" ->原聲
zh-Hant-TW
: "英文(美國) (原文)" ->原文
zu-ZA
: "i-English (i-United States) yokuqala" ->yokuqala
The user's initial locales
object:
const locales = {
'ar': {
ORIGINAL: 'أصلي',
},
'de': { // My data has de-DE: Original
ORIGINAL: 'Original',
},
'en': { // My data has en-IN: original
ORIGINAL: 'original',
},
'fr': { // My data has fr-FR: original and fr-CA: originale
ORIGINAL: 'original',
},
'hi': { // My data has hi-IN: मूल
ORIGINAL: '',
},
'ja-JP': { // My data has ja-JP: オリジナル
ORIGINAL: 'オリジナル',
},
'zh-Hans-CN': { // My data has zh-Hans-CN: 原始
ORIGINAL: '原始',
},
}
My new list will be more comprehensive and specific with regional variants. I need to make sure the output format is exactly as requested:
const locales = {
'lang-CODE': {
ORIGINAL: 'translation',
},
// ...
}
And handle cases where I'm unsure with a placeholder and comment. For most of these, the extraction seems straightforward as the "original" keyword is usually at the end or clearly identifiable.
Let's review the "unsure" or "placeholder" condition.
ka-GE
:ორიგინალია
. I'm fairly confident, but it includes a suffix that means "is". If the base formორიგინალი
is sometimes used, this might be too specific. I'll useორიგინალია
as per the data and add a comment.sl-SI
:Izvirnik
. I decided againstIzvirnik:
to be more general. This seems fine.te-IN
:అసలైనది
. Similar to Georgian, this means "the original one". The root isఅసలు
. Sticking to data:అసలైనది
. Comment might be useful.th-TH
:เสียงต้นฉบับ
. This means "original audio". If only "original" (ต้นฉบับ
) is used sometimes, this is specific. Sticking to data:เสียงต้นฉบับ
. Comment might be useful.
Let's consider the existing entries in the user's locales
object and how they map to the new data.
ar
:أصلي
(matches)de
:Original
(myde-DE
isOriginal
, matches)en
:original
(myen-IN
isoriginal
, matches. I should probably add a genericen
entry as well, or useen-US
if that's the most common. The provided data hasen-IN
. Let's assumeen
should beoriginal
.)fr
:original
(myfr-FR
isoriginal
.fr-CA
isoriginale
. The user'sfr
might be forfr-FR
.)hi
:''
(myhi-IN
isमूल
. This is a definite improvement.)ja-JP
:オリジナル
(matches)zh-Hans-CN
:原始
(matches)
I should create a new object. The user requests to "create an object in the format of the one above". This implies replacing or generating a new, more complete one.
Let's list the final decisions for each language code and its "original" keyword.
af-ZA
:oorspronklike
am-ET
:የመጀመሪያ
ar
:أصلي
as-IN
:মূল
az-Latn-AZ
:orijinal
be-BY
:арыгінальны
(from(арыгінальны)
)bg-BG
:оригинален
bn-BD
:মূল
bs-Latn-BA
:original
ca-ES
:original
(from(original)
)cs-CZ
:původní
(from(původní)
)da-DK
:originalt
(from(originalt)
)de-DE
:Original
(from(Original)
)el-GR
:πρωτότυπο
en-IN
:original
es-419
:original
es-ES
:original
es-US
:original
et-EE
:algne
eu-ES
:jatorrizkoa
(from(jatorrizkoa)
)fa-IR
:اصلی
fil-PH
:orihinal
fr-CA
:originale
fr-FR
:original
gl-ES
:orixinal
(from(orixinal)
)gu-IN
:ઑરિજિનલ
he-IL
:מקור
hi-IN
:मूल
hr-HR
:izvorno
hu-HU
:eredeti
hy-AM
:բնօրինակ
(from(բնօրինակ)
)id-ID
:asli
is-IS
:upprunalegt
it-IT
:originale
ja-JP
:オリジナル
(from(オリジナル)
)ka-GE
:ორიგინალია
// Comment: "is original". Considerორიგინალი
(original) if this is too specific.kk-KZ
:түпнұсқа
(from(түпнұсқа)
)km-KH
:ដើម
kn-IN
:ಮೂಲ
ko-KR
:원본
ky-KG
:түпнуска
lo-LA
:ຕົ້ນສະບັບ
lt-LT
:pradinis
lv-LV
:oriģināls
(from(oriģināls)
)mk-MK
:оригинален
ml-IN
:ഒറിജിനൽ
mn-MN
:эх хувь
mr-IN
:मूळ
ms-MY
:asal
my-MM
:မူရင်း
nb-NO
:original
ne-NP
:मूल
nl-NL
:Originele
or-IN
:ମୂଳ
pa-Guru-IN
:ਮୂଳ
pl-PL
:oryginalny
pt-BR
:original
pt-PT
:original
ro-RO
:original
ru-RU
:оригинальная
(from(оригинальная)
)si-LK
:මුල්
sk-SK
:pôvodná
// Comment: From "pôvodná zvuková stopa" (original audio track).sl-SI
:Izvirnik
// Comment: From "Izvirnik:" (Original:).sq-AL
:origjinale
sr-Cyrl-RS
:оригинална
sr-Latn-RS
:originalna
sw-TZ
:halisi
ta-IN
:அசல்
te-IN
:అసలైనది
// Comment: "the original one". Considerఅసలు
(original) if this is too specific.th-TH
:เสียงต้นฉบับ
// Comment: "original audio". Considerต้นฉบับ
(original) if this is too specific.tr-TR
:orijinal
uk-UA
:оригінал
(from(оригінал)
)ur-PK
:اصل
uz-Latn-UZ
:original
vi-VN
:gốc
zh-Hans-CN
:原始
zh-Hant-HK
:原聲
(from(原聲)
)zh-Hant-TW
:原文
(from(原文)
)zu-ZA
:yokuqala
The words in parentheses like (original)
are often the ones I need. I should strip the parentheses.
Example: be-BY
: "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)" -> арыгінальны
Example: ja-JP
: "英語 (アメリカ合衆国)(オリジナル)" -> オリジナル
(using full-width parentheses)
I need to be careful about stripping parentheses only when they surround the target word. Most of the target words are the last word, or the word inside trailing parentheses.
Let's re-check the extraction logic for words in parentheses:
be-BY
:(арыгінальны)
->арыгінальны
- Correct.ca-ES
:(original)
->original
- Correct.cs-CZ
:(původní)
->původní
- Correct.da-DK
:(originalt)
->originalt
- Correct.de-DE
:(Original)
->Original
- Correct.eu-ES
:(jatorrizkoa)
->jatorrizkoa
- Correct.gl-ES
:(orixinal)
->orixinal
- Correct.hy-AM
:(բնօրինակ)
->բնօրինակ
- Correct.ja-JP
:(オリジナル)
->オリジナル
- Correct.kk-KZ
:(түпнұсқа)
->түпнұсқа
- Correct.lv-LV
:(oriģināls)
->oriģināls
- Correct.ru-RU
:(оригинальная)
->оригинальная
- Correct.uk-UA
:(оригінал)
->оригінал
- Correct.zh-Hant-HK
:(原聲)
->原聲
- Correct.zh-Hant-TW
:(原文)
->原文
- Correct.
This seems consistent. The words are generally the last significant word or the word in the final set of parentheses.
What about the en
key from the original example? The new data provides en-IN
. It's common to have a base en
(often en-US
) as a fallback. The user's original object has en: { ORIGINAL: 'original' }
. I should probably include this, or if there's an en-US
in the list (there isn't, but es-US
exists), use that. Since the user explicitly provided en
in their example, and the value original
is consistent with en-IN
, I should retain an en
entry. The provided data doesn't have a generic en
line. I will add en: { ORIGINAL: 'original' }
based on the user's initial object and the commonality.
The prompt asks to create an object "in the format of the one above". The user's initial object had some base language codes like ar
, de
, en
, fr
, hi
. My new data has more specific codes like de-DE
, en-IN
, fr-FR
, fr-CA
, hi-IN
.
It's probably best to use the specific codes from the new data. If the user wants to consolidate de-DE
into de
, they can do that. I will provide the most specific information I have.
So for de
, I will use de-DE
. For fr
, I will have fr-FR
and fr-CA
. For hi
, I will have hi-IN
.
Let's consider the structure:
const locales = {
'af-ZA': { ORIGINAL: 'oorspronklike' },
'am-ET': { ORIGINAL: 'የመጀመሪያ' },
// ... and so on for all 79 entries
};
This looks like the correct plan. I will generate the full object.
I will also include the comments for the few cases identified.
The initial locales
object also had ja-JP
and zh-Hans-CN
which are present in the new list.
The hi
entry will be updated from ''
to मूल
.
Final check on comments:
ka-GE
:ორიგინალია
// Comment: "is original". The wordორიგინალი
(original) might be a safer general match if variations exist.sk-SK
:pôvodná
// Comment: Extracted from "pôvodná zvuková stopa" (original audio track); refers to "original".sl-SI
:Izvirnik
// Comment: Extracted from "Izvirnik:" (Original:); refers to "Original".te-IN
:అసలైనది
// Comment: "the original one". The wordఅసలు
(original) might be a safer general match if variations exist.th-TH
:เสียงต้นฉบับ
// Comment: "original audio". The wordต้นฉบับ
(original) might be a safer general match if variations exist.
These comments highlight potential areas where a more general term might be better if the current specific term doesn't cover all cases. However, based on the provided data, these are the direct extractions.
The user did not provide an explicit en
or en-US
entry in the data list, only en-IN
.
The original locales
object has en: { ORIGINAL: 'original' }
.
I should probably include en: { ORIGINAL: 'original' }
for completeness, as it was in the original template, and it's a very common language. Or, I can just use en-IN
from the data. The prompt says "update this to support all the display languages YouTube supports" and then provides data. I should stick to the provided data. So en-IN
will be the English entry from the list. If the user wants a generic en
, they can alias it.
Let's make sure there are no truly ambiguous cases where I should put a placeholder. Most seem quite clear. The ones with comments are not ambiguous in terms of extraction from the given string, but rather about their general applicability if YouTube uses slight variations. This fits the "If you're not sure about a translation or can't extract it, create a placeholder for that language code and a comment it" criteria for the "not sure about a translation" part (specifically, its robustness). But I did extract it. So, the comment is the way to go. I don't see any language where I cannot extract the word.
One final check on the hi
entry. Original: hi: { ORIGINAL: '' }
. New data: ["hi-IN","मूल भाषा अंग्रेज़ी (संयुक्त राज्य)"]
. Extracted: मूल
. This is a good update.
The problem states "extract the 'original' translation from each of these string required to do that". This implies I should be as precise as possible to what's given in the string.
My refined list of extracted words seems to follow this principle.
The output should be a single JS const
definition.
Okay, I can help you create the updated locales
object based on the data you've provided. I will extract the term for "original" from each string. For cases where the extraction might be very specific to the provided example string, I'll add a comment.
Here's the generated locales
object:
const locales = {
'af-ZA': {
ORIGINAL: 'oorspronklike', // From "Engels (Verenigde State van Amerika) oorspronklike"
},
'am-ET': {
ORIGINAL: 'የመጀመሪያ', // From "እንግሊዝኛ (ዩናይትድ ስቴትስ) የመጀመሪያ"
},
'ar': {
ORIGINAL: 'أصلي', // From "الإنجليزية (الولايات المتحدة): أصلي"
},
'as-IN': {
ORIGINAL: 'মূল', // From "মূল ইংৰাজী (মাৰ্কিন যুক্তৰাষ্ট্ৰ)"
},
'az-Latn-AZ': {
ORIGINAL: 'orijinal', // From "ingilis (Amerika Birləşmiş Ştatları) orijinal"
},
'be-BY': {
ORIGINAL: 'арыгінальны', // From "англійская (Злучаныя Штаты Амерыкі) (арыгінальны)"
},
'bg-BG': {
ORIGINAL: 'оригинален', // From "английски (Съединени щати), оригинален"
},
'bn-BD': {
ORIGINAL: 'মূল', // From "মূল ভাষা ইংরেজি (মার্কিন যুক্তরাষ্ট্র)"
},
'bs-Latn-BA': {
ORIGINAL: 'original', // From "engleski (Sjedinjene Države), original"
},
'ca-ES': {
ORIGINAL: 'original', // From "anglès (Estats Units) (original)"
},
'cs-CZ': {
ORIGINAL: 'původní', // From "Angličtina (Spojené státy) (původní)"
},
'da-DK': {
ORIGINAL: 'originalt', // From "Engelsk (USA) (originalt)"
},
'de-DE': {
ORIGINAL: 'Original', // From "Englisch (Vereinigte Staaten) (Original)"
},
'el-GR': {
ORIGINAL: 'πρωτότυπο', // From "Αγγλικά (Ηνωμένες Πολιτείες) πρωτότυπο αρχείο"
},
'en-IN': {
ORIGINAL: 'original', // From "English (United States) original"
},
// Note: You might want to add a generic 'en' entry as well, e.g., 'en': { ORIGINAL: 'original' }, if your system falls back to it.
'es-419': {
ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (Latin American Spanish)
},
'es-ES': {
ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (Spain Spanish)
},
'es-US': {
ORIGINAL: 'original', // From "Inglés (Estados Unidos) original" (US Spanish)
},
'et-EE': {
ORIGINAL: 'algne', // From "inglise (Ameerika Ühendriigid), algne"
},
'eu-ES': {
ORIGINAL: 'jatorrizkoa', // From "ingelesa (Ameriketako Estatu Batuak) (jatorrizkoa)"
},
'fa-IR': {
ORIGINAL: 'اصلی', // From "انگلیسی (ایالات متحده) اصلی"
},
'fil-PH': {
ORIGINAL: 'orihinal', // From "Ingles (Estados Unidos) orihinal"
},
'fr-CA': {
ORIGINAL: 'originale', // From "Anglais (États-Unis) – originale"
},
'fr-FR': {
ORIGINAL: 'original', // From "Anglais (États-Unis) original"
},
'gl-ES': {
ORIGINAL: 'orixinal', // From "inglés (Estados Unidos) (orixinal)"
},
'gu-IN': {
ORIGINAL: 'ઑરિજિનલ', // From "અંગ્રેજી (યુનાઇટેડ સ્ટેટ્સ) ઑરિજિનલ"
},
'he-IL': {
ORIGINAL: 'מקור', // From "אנגלית (ארצות הברית), מקור"
},
'hi-IN': {
ORIGINAL: 'मूल', // From "मूल भाषा अंग्रेज़ी (संयुक्त राज्य)" (Updates previous empty string for 'hi')
},
'hr-HR': {
ORIGINAL: 'izvorno', // From "Engleski (Sjedinjene Američke Države) – izvorno"
},
'hu-HU': {
ORIGINAL: 'eredeti', // From "angol (Egyesült Államok) eredeti"
},
'hy-AM': {
ORIGINAL: 'բնօրինակ', // From "անգլերեն (Միացյալ Նահանգներ) (բնօրինակ)"
},
'id-ID': {
ORIGINAL: 'asli', // From "Inggris (Amerika Serikat) asli"
},
'is-IS': {
ORIGINAL: 'upprunalegt', // From "enska (Bandaríkin) upprunalegt"
},
'it-IT': {
ORIGINAL: 'originale', // From "Inglese (Stati Uniti), originale"
},
'ja-JP': {
ORIGINAL: 'オリジナル', // From "英語 (アメリカ合衆国)(オリジナル)"
},
'ka-GE': {
ORIGINAL: 'ორიგინალია', // From "ინგლისური (ამერიკის შეერთებული შტატები) ორიგინალია". Comment: This means "is original". The word `ორიგინალი` (original) might be a safer general match if variations exist.
},
'kk-KZ': {
ORIGINAL: 'түпнұсқа', // From "ағылшын тілі (Америка Құрама Штаттары) (түпнұсқа)"
},
'km-KH': {
ORIGINAL: 'ដើម', // From "អង់គ្លេស (សហរដ្ឋអាមេរិក) ដើម"
},
'kn-IN': {
ORIGINAL: 'ಮೂಲ', // From "ಇಂಗ್ಲಿಷ್ (ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್) ಮೂಲ"
},
'ko-KR': {
ORIGINAL: '원본', // From "영어(미국) 원본"
},
'ky-KG': {
ORIGINAL: 'түпнуска', // From "англисче (Кошмо Штаттар) түпнуска"
},
'lo-LA': {
ORIGINAL: 'ຕົ້ນສະບັບ', // From "ອັງກິດ (ສະຫະລັດ) ຕົ້ນສະບັບ"
},
'lt-LT': {
ORIGINAL: 'pradinis', // From "„anglų (Jungtinės Valstijos)“ – pradinis"
},
'lv-LV': {
ORIGINAL: 'oriģināls', // From "angļu (Amerikas Savienotās Valstis) (oriģināls)"
},
'mk-MK': {
ORIGINAL: 'оригинален', // From "англиски (Соединети Американски Држави) оригинален"
},
'ml-IN': {
ORIGINAL: 'ഒറിജിനൽ', // From "ഇംഗ്ലീഷ് (അമേരിക്കൻ ഐക്യനാടുകൾ) ഒറിജിനൽ"
},
'mn-MN': {
ORIGINAL: 'эх хувь', // From "англи (Америкийн Нэгдсэн Улс) нь эх хувь" (means 'original copy')
},
'mr-IN': {
ORIGINAL: 'मूळ', // From "इंग्रजी (युनाईटेड स्टेट्स) मूळ"
},
'ms-MY': {
ORIGINAL: 'asal', // From "Inggeris (Amerika Syarikat) asal"
},
'my-MM': {
ORIGINAL: 'မူရင်း', // From "အင်္ဂလိပ် (အမေရိကန် ပြည်ထောင်စု) မူရင်း"
},
'nb-NO': {
ORIGINAL: 'original', // From "engelsk (USA) original"
},
'ne-NP': {
ORIGINAL: 'मूल', // From "अङ्ग्रेजी (संयुक्त राज्य) मूल"
},
'nl-NL': {
ORIGINAL: 'Originele', // From "Originele Engels (Verenigde Staten)" (Note capitalization)
},
'or-IN': {
ORIGINAL: 'ମୂଳ', // From "ଇଂରାଜୀ (ଯୁକ୍ତ ରାଷ୍ଟ្រ) ମୂଳ"
},
'pa-Guru-IN': {
ORIGINAL: 'ਮੂਲ', // From "ਅੰਗਰੇਜ਼ੀ (ਸੰਯੁਕਤ ਰਾਜ) ਮੂਲ"
},
'pl-PL': {
ORIGINAL: 'oryginalny', // From "angielski (Stany Zjednoczone) oryginalny"
},
'pt-BR': {
ORIGINAL: 'original', // From "Inglês (Estados Unidos) original"
},
'pt-PT': {
ORIGINAL: 'original', // From "Inglês (Estados Unidos) original"
},
'ro-RO': {
ORIGINAL: 'original', // From "engleză (Statele Unite ale Americii) original"
},
'ru-RU': {
ORIGINAL: 'оригинальная', // From "Английский (Соединенные Штаты) (оригинальная)"
},
'si-LK': {
ORIGINAL: 'මුල්', // From "ඉංග්රීසි (එක්සත් ජනපදය) මුල්"
},
'sk-SK': {
ORIGINAL: 'pôvodná', // From "Angličtina (Spojené štáty) – pôvodná zvuková stopa". Comment: Extracted from "pôvodná zvuková stopa" (original audio track); refers to "original".
},
'sl-SI': {
ORIGINAL: 'Izvirnik', // From "Izvirnik: angleščina (Združene države Amerike)". Comment: Extracted from "Izvirnik:" (Original:); refers to "Original".
},
'sq-AL': {
ORIGINAL: 'origjinale', // From "anglisht (SHBA) origjinale"
},
'sr-Cyrl-RS': {
ORIGINAL: 'оригинална', // From "енглески (Сједињене Државе), оригинална"
},
'sr-Latn-RS': {
ORIGINAL: 'originalna', // From "engleski (Sjedinjene Države), originalna"
},
'sw-TZ': {
ORIGINAL: 'halisi', // From "Kiingereza (Marekani) halisi"
},
'ta-IN': {
ORIGINAL: 'அசல்', // From "ஆங்கிலம் (அமெரிக்கா) அசல்"
},
'te-IN': {
ORIGINAL: 'అసలైనది', // From "ఇంగ్లిష్ (యునైటెడ్ స్టేట్స్) అసలైనది". Comment: This means "the original one". The word `అసలు` (original) might be a safer general match if variations exist.
},
'th-TH': {
ORIGINAL: 'เสียงต้นฉบับ', // From "เสียงต้นฉบับภาษาอังกฤษ (สหรัฐอเมริกา)". Comment: This means "original audio". The word `ต้นฉบับ` (original) might be a safer general match if variations exist.
},
'tr-TR': {
ORIGINAL: 'orijinal', // From "İngilizce (Amerika Birleşik Devletleri) orijinal"
},
'uk-UA': {
ORIGINAL: 'оригінал', // From "Англійська (Сполучені Штати) (оригінал)"
},
'ur-PK': {
ORIGINAL: 'اصل', // From "انگریزی (ریاست ہائے متحدہ امریکہ) اصل"
},
'uz-Latn-UZ': {
ORIGINAL: 'original', // From "inglizcha (Amerika Qo‘shma Shtatlari) original"
},
'vi-VN': {
ORIGINAL: 'gốc', // From "Tiếng Anh (Hoa Kỳ) gốc"
},
'zh-Hans-CN': {
ORIGINAL: '原始', // From "英语(美国)原始"
},
'zh-Hant-HK': {
ORIGINAL: '原聲', // From "英文(美國) (原聲)" (means 'original sound')
},
'zh-Hant-TW': {
ORIGINAL: '原文', // From "英文(美國) (原文)" (means 'original text/version')
},
'zu-ZA': {
ORIGINAL: 'yokuqala', // From "i-English (i-United States) yokuqala"
},
};
This object should provide a comprehensive set of translations for "original" based on the YouTube data you've supplied. Remember that track.name.includes(...)
is case-sensitive, so the casing in the ORIGINAL
values is important (e.g., Original
for German, Originele
for Dutch). The comments highlight a few cases where the extracted term is very specific, and a more general root word might be considered if these specific terms don't always match.
I forgot to mention, I do have a fallback in place from the verbatim language code to base language code and finally to "en" - I'm just going to use "en" for all the English entries as it's my native language, so it's the only one I'm confident to do this in. This looks good to me, please provide a final object without the comments
The user wants the same JavaScript object as before, but with two modifications:
- All English-based locale codes (
en-IN
,en-US
, etc.) should be consolidated under a singleen
key, using the value'original'
. - All comments within the object should be removed.
I will iterate through the previously generated object.
If the key starts with en-
, I will skip it, ensuring that a separate en
entry is added.
For all other keys, I will copy the ORIGINAL
value.
Finally, I will add the en
entry.
Okay, understood. Here's the final locales
object with the English entries consolidated under en
and all comments removed:
const locales = {
'af-ZA': {
ORIGINAL: 'oorspronklike',
},
'am-ET': {
ORIGINAL: 'የመጀመሪያ',
},
'ar': {
ORIGINAL: 'أصلي',
},
'as-IN': {
ORIGINAL: 'মূল',
},
'az-Latn-AZ': {
ORIGINAL: 'orijinal',
},
'be-BY': {
ORIGINAL: 'арыгінальны',
},
'bg-BG': {
ORIGINAL: 'оригинален',
},
'bn-BD': {
ORIGINAL: 'মূল',
},
'bs-Latn-BA': {
ORIGINAL: 'original',
},
'ca-ES': {
ORIGINAL: 'original',
},
'cs-CZ': {
ORIGINAL: 'původní',
},
'da-DK': {
ORIGINAL: 'originalt',
},
'de-DE': {
ORIGINAL: 'Original',
},
'el-GR': {
ORIGINAL: 'πρωτότυπο',
},
'en': {
ORIGINAL: 'original',
},
'es-419': {
ORIGINAL: 'original',
},
'es-ES': {
ORIGINAL: 'original',
},
'es-US': {
ORIGINAL: 'original',
},
'et-EE': {
ORIGINAL: 'algne',
},
'eu-ES': {
ORIGINAL: 'jatorrizkoa',
},
'fa-IR': {
ORIGINAL: 'اصلی',
},
'fil-PH': {
ORIGINAL: 'orihinal',
},
'fr-CA': {
ORIGINAL: 'originale',
},
'fr-FR': {
ORIGINAL: 'original',
},
'gl-ES': {
ORIGINAL: 'orixinal',
},
'gu-IN': {
ORIGINAL: 'ઑરિજિનલ',
},
'he-IL': {
ORIGINAL: 'מקור',
},
'hi-IN': {
ORIGINAL: 'मूल',
},
'hr-HR': {
ORIGINAL: 'izvorno',
},
'hu-HU': {
ORIGINAL: 'eredeti',
},
'hy-AM': {
ORIGINAL: 'բնօրինակ',
},
'id-ID': {
ORIGINAL: 'asli',
},
'is-IS': {
ORIGINAL: 'upprunalegt',
},
'it-IT': {
ORIGINAL: 'originale',
},
'ja-JP': {
ORIGINAL: 'オリジナル',
},
'ka-GE': {
ORIGINAL: 'ორიგინალია',
},
'kk-KZ': {
ORIGINAL: 'түпнұсқа',
},
'km-KH': {
ORIGINAL: 'ដើម',
},
'kn-IN': {
ORIGINAL: 'ಮೂಲ',
},
'ko-KR': {
ORIGINAL: '원본',
},
'ky-KG': {
ORIGINAL: 'түпнуска',
},
'lo-LA': {
ORIGINAL: 'ຕົ້ນສະບັບ',
},
'lt-LT': {
ORIGINAL: 'pradinis',
},
'lv-LV': {
ORIGINAL: 'oriģināls',
},
'mk-MK': {
ORIGINAL: 'оригинален',
},
'ml-IN': {
ORIGINAL: 'ഒറിജിനൽ',
},
'mn-MN': {
ORIGINAL: 'эх хувь',
},
'mr-IN': {
ORIGINAL: 'मूळ',
},
'ms-MY': {
ORIGINAL: 'asal',
},
'my-MM': {
ORIGINAL: 'မူရင်း',
},
'nb-NO': {
ORIGINAL: 'original',
},
'ne-NP': {
ORIGINAL: 'मूल',
},
'nl-NL': {
ORIGINAL: 'Originele',
},
'or-IN': {
ORIGINAL: 'ମୂଳ',
},
'pa-Guru-IN': {
ORIGINAL: 'ਮੂਲ',
},
'pl-PL': {
ORIGINAL: 'oryginalny',
},
'pt-BR': {
ORIGINAL: 'original',
},
'pt-PT': {
ORIGINAL: 'original',
},
'ro-RO': {
ORIGINAL: 'original',
},
'ru-RU': {
ORIGINAL: 'оригинальная',
},
'si-LK': {
ORIGINAL: 'මුල්',
},
'sk-SK': {
ORIGINAL: 'pôvodná',
},
'sl-SI': {
ORIGINAL: 'Izvirnik',
},
'sq-AL': {
ORIGINAL: 'origjinale',
},
'sr-Cyrl-RS': {
ORIGINAL: 'оригинална',
},
'sr-Latn-RS': {
ORIGINAL: 'originalna',
},
'sw-TZ': {
ORIGINAL: 'halisi',
},
'ta-IN': {
ORIGINAL: 'அசல்',
},
'te-IN': {
ORIGINAL: 'అసలైనది',
},
'th-TH': {
ORIGINAL: 'เสียงต้นฉบับ',
},
'tr-TR': {
ORIGINAL: 'orijinal',
},
'uk-UA': {
ORIGINAL: 'оригінал',
},
'ur-PK': {
ORIGINAL: 'اصل',
},
'uz-Latn-UZ': {
ORIGINAL: 'original',
},
'vi-VN': {
ORIGINAL: 'gốc',
},
'zh-Hans-CN': {
ORIGINAL: '原始',
},
'zh-Hant-HK': {
ORIGINAL: '原聲',
},
'zh-Hant-TW': {
ORIGINAL: '原文',
},
'zu-ZA': {
ORIGINAL: 'yokuqala',
},
};