wget2 | Cheatsheet¶
GNU Wget2 is the successor of GNU Wget, a file and recursive website downloader. Designed and written from scratch it wraps around libwget, that provides the basic functions needed by a web client. Wget2 works multi-threaded and uses many features to allow fast operation.
In many cases Wget2 downloads much faster than Wget1.x due to HTTP2, HTTP compression, parallel connections and use of If-Modified-Since HTTP header.
Installation¶
Crawl website and download only if file is valid from a file named 5.txt¶
Grab header of website¶
Read exported bookmark config and crawl ALL bookmaŕked sites¶
Dowwnload specifik file type¶
wget2 https://www.nr1.nu \
--method=GET \
--http-user='fbi@info.gov' \
--http-password='hidden@mail.gov' \
--referer='https://fbi.gov/secr3t/crawler' \
--user-agent='(FBI Crawler/v1.0.1|ForRealSeriousCrime|WeCrawlingForObtainingEvidence) AppleIsMalware/v1.0)' \
--save-headers \
--auth-no-challenge \
--header="Accept-Encoding: all" \
--secure-protocol=auto \
--http2=on \
--https-enforce=soft \
-A '*.html' -r
??? Note "Mirror any website as a pro - wuseman edition
```bash
wget2 --method=GET --password=yourFriend --user=yourFriend \
--http-user=yourFriend --http-password=yourFriend \
--referer='https://random.gov/secr3t/crawler' \
--user-agent='(random Crawler/v1.0.1) Hunter)' \
--adjust-extension -o ~/logs/wget2/wget2.log \
--stats-site=h:~/logs/wget2/stats-site.log \
--stats-server=h:~/logs/wget2/-stats-server.log \
--stats-tls=h:~/logs/wget2/stats-tls.log \
--stats-ocsp=h:~/logs/wget2/stats-oscp.log \
--stats-dns=h:~/logs/wget2/stats-dns.log \
--progress=bar --backups=backups --force-progress \
--server-response --quote=0 -e robots=off \
--inet4-only --tcp-fastopen --chunk-size=10M \
--local-encoding=encoding --remote-encoding=encoding \
--verify-save-failed --header='Accept-Charset: iso-8859-2' \
--max-redirect=250 --dns-caching --http2-request-window=250 \
--cut-dirs=100 --unlink --spider --limit-rate=20k --random-wait \
```