This requires recode to be installed (brew install recode or apt install recode).
The example shows HTML files only. Adjust as required.
- (Recursively) find all files that end with
*.htmor*.html. - For each match, check its file type using
file. - Extract from the reply the
ISO-8859tag, usinggrepandcut(note: if you're using a more sophisticated version ofgrep— such as ugrep — then you might be able to directly format the result and skip the piping tocutto show only selected fields; "modern"grepversions may also have some formatting options these days (but I have not checked). - This will give you a list of the full paths (starting on the current working directory) for all files currently known as being Latin-1.
- Pipe the resulto to
cat(why exactly this is needed is a bit beyond me, but is some sort of shell-y requirement which baffled me for quite a while). - Feed the generated list through the usual
while read line; do ...; doneshell loop, printing each filename in turn. - Feed each filename to
recodefor converting it from Latin-1 to UTF-8, while preserving all timestamps and other attributes.
Is this the best solution? Probably not. It has the advantage of having just O(2*N) complexity (for N = number of files in the directory tree): find does a single pass to extract all filenames; these are then fed (as if they were just one list) into a loop to do the conversion, one by one — but at this stage, they have been filtered out already (i.e. no binaries, only text files with ISO-8859-1 encoding, etc.).
find . -name "*.htm?" -exec sh -c "file {} | grep ISO-8859 | cut -d':' -f 1" \; | cat | while read line; do recode Latin-1..UTF-8 $line; doneIt's possible to do everything in a single loop (e.g. O(N) complexity), but the exact command eluded me.
You can also tackle a different approach: use find just to retrieve directory names and give you a tree of those. Then feed those to grep, which will evaluate all the entries on each directory. The theory here is that grep — and especially ugrep! — might be considerably faster than find on each directory. And it's even possible that a few tweaks might allow ugrep (which works recursively by default!) to do all the work, and pipe the results to the while loop. Or even execute recode directly. Hmm. I should look more into that possibility...