using wget as an offline browser to download all mp3 files from a website.

rupeshforu3

Baseband Member
Messages
35
Location
India
Hi I am Rupesh from India and I want to download a website using wget for offline viewing I mean I want to mirror a website ie., want to maintain exact copy of the website in my hard-disk. I have installed opensuse leap 42.3 with wget and it's GUI.



Previously I have downloaded the website using an offline browser called extreme picture finder. 90 % 0f the files of which I want have been successfully downloaded and so I want to download remaining 10 %.



I have read the manual page of wget and examined some of the tutorials found by searching web related to wget. I have tried what I found in tutorials and I am providing the output of those commands.



I have issued the command as below



Code:
wget -c -t 0 --recursive --force-directories   -o logfile.txt ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg,mp3,MP3,pdf





For the above command I got the output as below



Code:
idn_encode failed (-304): ‘string contains a disallowed character'

idn_encode failed (-304): ‘string contains a disallowed character'

--2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90recursive/

Resolving ‐‐recursive (‐‐recursive)... failed: Name or service not known.

wget: unable to resolve host address ‘‐‐recursive'

idn_encode failed (-304): ‘string contains a disallowed character'

idn_encode failed (-304): ‘string contains a disallowed character'

--2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90no-clobber/

Resolving ‐‐no-clobber (‐‐no-clobber)... failed: Name or service not known.

wget: unable to resolve host address ‘‐‐no-clobber'

idn_encode failed (-304): ‘string contains a disallowed character'

idn_encode failed (-304): ‘string contains a disallowed character'

--2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90accept/

Resolving ‐‐accept (‐‐accept)... failed: Name or service not known.

wget: unable to resolve host address ‘‐‐accept'

--2017-09-29 18:08:43--  http://jpg,gif,png,jpeg,mp3,mp3,pdf/

Resolving jpg,gif,png,jpeg,mp3,mp3,pdf (jpg,gif,png,jpeg,mp3,mp3,pdf)... failed: Name or service not known.

wget: unable to resolve host address ‘jpg,gif,png,jpeg,mp3,mp3,pdf'

idn_encode failed (-304): ‘string contains a disallowed character'

idn_encode failed (-304): ‘string contains a disallowed character'

--2017-09-29 18:08:43--  http://%E2%80%90%E2%80%90directory-prefix=/mnt/source/downloads/lectures/

Resolving ‐‐directory-prefix= (‐‐directory-prefix=)... failed: Name or service not known.

wget: unable to resolve host address ‘‐‐directory-prefix='

--2017-09-29 18:08:43--  http://www.pravachanam.com/categorybrowselist/20

Resolving www.pravachanam.com (www.pravachanam.com)... 162.144.54.142

Connecting to www.pravachanam.com (www.pravachanam.com)|162.144.54.142|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [text/html]

Saving to: ‘www.pravachanam.com/categorybrowselist/20'



     0K .......... .......... .......... .......... .......... 31.3K

    50K .......... ....                                        1.54M=1.6s



2017-09-29 18:08:46 (40.0 KB/s) - ‘www.pravachanam.com/categorybrowselist/20' saved [65802]



Loading robots.txt; please ignore errors.

--2017-09-29 18:08:46--  http://www.pravachanam.com/robots.txt

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 404 Not Found

2017-09-29 18:08:48 ERROR 404: Not Found.



--2017-09-29 18:08:48--  http://www.pravachanam.com/sites/default/files/favicon.ico

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 404 Not Found

2017-09-29 18:08:51 ERROR 404: Not Found.



--2017-09-29 18:08:51--  http://www.pravachanam.com/modules/system/system.base.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 200 OK

Length: 5428 (5.3K) [text/css]

Saving to: ‘www.pravachanam.com/modules/system/system.base.css?owgg5m'



     0K .....                                                 100% 16.2K=0.3s



2017-09-29 18:08:51 (16.2 KB/s) - ‘www.pravachanam.com/modules/system/system.base.css?owgg5m' saved [5428/5428]



--2017-09-29 18:08:51--  http://www.pravachanam.com/modules/system/system.menus.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 200 OK

Length: 2035 (2.0K) [text/css]

Saving to: ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m'



     0K .                                                     100%  236K=0.008s



2017-09-29 18:08:52 (236 KB/s) - ‘www.pravachanam.com/modules/system/system.menus.css?owgg5m' saved [2035/2035]



--2017-09-29 18:08:52--  http://www.pravachanam.com/modules/system/system.messages.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 200 OK

Length: 961 [text/css]

Saving to: ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m'



     0K                                                       100%  255M=0s



2017-09-29 18:08:52 (255 MB/s) - ‘www.pravachanam.com/modules/system/system.messages.css?owgg5m' saved [961/961]



--2017-09-29 18:08:52--  http://www.pravachanam.com/modules/system/system.theme.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 200 OK

Length: 3711 (3.6K) [text/css]

Saving to: ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m'



     0K ...                                                   100%  374K=0.01s



2017-09-29 18:08:52 (374 KB/s) - ‘www.pravachanam.com/modules/system/system.theme.css?owgg5m' saved [3711/3711]



--2017-09-29 18:08:52--  http://www.pravachanam.com/sites/all/libraries/mediaelement/build/mediaelementplayer.min.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 404 Not Found

2017-09-29 18:08:54 ERROR 404: Not Found.



--2017-09-29 18:08:54--  http://www.pravachanam.com/sites/all/modules/views_slideshow/views_slideshow.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 404 Not Found

2017-09-29 18:08:56 ERROR 404: Not Found.



--2017-09-29 18:08:56--  http://www.pravachanam.com/modules/comment/comment.css?owgg5m

Reusing existing connection to www.pravachanam.com:80.

HTTP request sent, awaiting response... 200 OK

Length: 184 [text/css]

Saving to: ‘www.pravachanam.com/modules/comment/comment.css?owgg5m'



On examining the above output we can clearly guess that wget is treating options as website addressees.



After that I have issued the command as below



Code:
wget ‐‐level=1 ‐‐recursive ‐‐no-parent ‐‐no-clobber   ‐‐accept mp3,MP3  http://www.pravachanam.com/categorybrowselist/20



On executing the above command it has created outfile.txt file and a directory called About Pravachanam.Com | Pravachanam.com under my current directory. wget has created some directories but not same as the source website I mean it has not maintained the directory structure same as source website.



In the outfile.txt I have found some lines ending with .mp3 and I have tried to examined the corresponding file in the directory created by wget but failed to locate the file and even failed to directory structure related to mp3 file.



I have installed and tried gwget which is the gnomes GUI for wget and in that I have tried a number of options or settings but it has failed to download I mean it has downloaded the home page and then stopped and after that it has displayed message as successfully completed downloading the website. In the GUI version of wget there is no options for selecting all the options found in the command line version of wget.





Please try suggest how to download mp3 files from a website with the following options using wget.



1)option for maintaining directory structure same as source website.

2)option for rejecting download of already downloaded files I mean skip those.

3)As I want to download all the mp3 files except the folders and files containing some words like xyz and so can you suggest how to skip download if the files or folders contain xyz in their names.

4) option to download files recursively and not to visit other website's.

5) option to try downloading files infinitely in the case of network failure.

6) option to resume download the files which are downloaded partially previously.

7) option to download only mp3 and reject all other file types if possible including html,php,css files.



Many of you may suggest that try to the manual page of wget and experiment on your own but taking advice and help from expert people like you is the signal to success. At present I am also reading the manuals and guides of wget but the help provided by you is most valuable. I am requesting as many people as to reply to this thread and help me.



Regards,

Rupesh.
 
Back
Top Bottom