Skip to content

wget2 | Cheatsheet


Installation

emerge --ask net-misc/wget2

Crawl website and download only if file is valid from a file named 5.txt

wget2 --max-threads=250 --spider -i 5.txt -save-content-on=200

Grab header of website

wget2 https://www.nr1.nu -S

Read exported bookmark config and crawl ALL bookmaŕked sites

wget2 --spider --force-html -i bookmarks_5_1_22.html 

Dowwnload specifik file type

wget2 https://www.nr1.nu \
    --method=GET  \
    --http-user='fbi@info.gov' \
    --http-password='hidden@mail.gov' \
    --referer='https://fbi.gov/secr3t/crawler' \
    --user-agent='(FBI Crawler/v1.0.1|ForRealSeriousCrime|WeCrawlingForObtainingEvidence) AppleIsMalware/v1.0)'  \
    --save-headers \
    --auth-no-challenge \
    --header="Accept-Encoding: all" \
    --secure-protocol=auto \
    --http2=on \
    --https-enforce=soft \
    -A '*.html' -r  

??? Example "Mirror any website as a pro - wuseman edition

```bash
wget2 --method=GET --password=yourFriend --user=yourFriend \
     --http-user=yourFriend --http-password=yourFriend \
     --referer='https://random.gov/secr3t/crawler' \
     --user-agent='(random Crawler/v1.0.1) Hunter)' \
     --adjust-extension -o ~/logs/wget2/wget2.log \
     --stats-site=h:~/logs/wget2/stats-site.log \
     --stats-server=h:~/logs/wget2/-stats-server.log \
     --stats-tls=h:~/logs/wget2/stats-tls.log \
     --stats-ocsp=h:~/logs/wget2/stats-oscp.log \
     --stats-dns=h:~/logs/wget2/stats-dns.log \
     --progress=bar --backups=backups --force-progress \
     --server-response --quote=0  -e robots=off \
     --inet4-only --tcp-fastopen --chunk-size=10M \
     --local-encoding=encoding --remote-encoding=encoding \
     --verify-save-failed --header='Accept-Charset: iso-8859-2' \
     --max-redirect=250 --dns-caching --http2-request-window=250 \
    --cut-dirs=100 --unlink --spider --limit-rate=20k --random-wait \
```

Download all files to current dir without creating folders

wget2 --progress=bar --mirror --level=1 --max-threads=50 --robots=off --no-directories <url>