Skip to content

Instantly share code, notes, and snippets.

@digitalist
Created March 21, 2021 08:07
qucik and dirty linux count words from html mirror
# todo: clean utf chars with tr
find -name '*.html' -exec html2text {} \; | tr -s '[[:punct:][:space:]]' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -bnr > ~/temp/words.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment