Skip to content

Instantly share code, notes, and snippets.

@mohnkhan
Created August 21, 2024 08:53
Show Gist options
  • Save mohnkhan/1071be1d8e929f52657ace208840826f to your computer and use it in GitHub Desktop.
Save mohnkhan/1071be1d8e929f52657ace208840826f to your computer and use it in GitHub Desktop.
CloneWebsite for scraping using commandline
wget --server-response \
--no-verbose \
--adjust-extension \
--convert-links \
--force-directories \
--backup-converted \
--compression=auto \
-e robots=off \
--restrict-file-names=unix \
--timeout=60 \
--warc-file=warc \
--page-requisites \
--no-check-certificate \
--no-hsts \
--span-hosts \
--no-parent \
--recursive \
--level=2 \
--warc-file=$(date +%s) \
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36" \
https://example.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment