Last active
April 10, 2025 08:25
-
-
Save oneohthree/f528c7ae1e701ad990e6 to your computer and use it in GitHub Desktop.
Quick bash slugify
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
echo "$STRING" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z |
Best to replace
tr A-Z a-z
at the end bytr "[:upper:]" "[:lower:]"
to support accentuated characters likeÉ
f.i.
These characters are handled by iconv. I thought, were they not, they would be handled by sed replace, but at least in GNU sed 4.8 most of them belongs to a-z range.
╰─➤ echo É | iconv -t ascii//TRANSLIT
E
# not every diacritic is contained in a-z
╰─➤ echo "ā, ä, ǟ, ḑ, ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ, ŗ, š, ț, ū, ž." | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | tr A-Z a-z 130 ↵
ā-ä-ǟ-ḑ-ē-ī-ļ-ņ-ō-ȯ-ȱ-õ-ȭ-ŗ-š-ț-ū-
It's good to replace multiple sed
processes with a single one using multiple -e
parameters.
It's good to use [:alnum:]
instead of [^a-zA-Z0-9]
.
It's good to use tr "[:upper:]" "[:lower:]"
instead of tr A-Z a-z
as a matter of principle for the goal of lowercasing input. To know that tr A-Z a-z
is good enough requires verifying what comes before in the pipeline, and knowing how iconv
works. That's added mental burden.
Putting it together:
iconv -t ascii//TRANSLIT | sed -E -e 's/[^[:alnum:]]+/-/g' -e 's/^-+|-+$//g' | tr '[:upper:]' '[:lower:]'
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Best to replace
tr A-Z a-z
at the end bytr "[:upper:]" "[:lower:]"
to support accentuated characters likeÉ
f.i.