webbot

A Java-based "web browser" that extract all links from a web-page, and display them.
8,6K Downloads
Aktualisiert 15. Okt 2003

Lizenz anzeigen

WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.

WEBBOT(URL)
URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.

WEBBOT(URL, WHAT)
displays only specific links. WHAT is a string:
'all_links': displays all links (default).
'page_links': displays all links to an html web page*.
'local_links': displays all local links on the server*.
'external_links': displays all links to external websites.
'image_links': displays all links to an image file**.
'image_tags': displays all image tags <img src="xxx">.
'.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.

WEBBOT(URL, WHAT, ACT)
performs an action on found links. ACT is a string:
'noaction': just display links (default)
'download': downloads all links found locally.
'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
<li>'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').

lks = WEBBOT(URL, ...)
returns an cell-array with links of URL{end}.

Notes: * Links explicitely pointing to a .htm or .html url.
** Image links are recognized by the following file types:
.jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff)

Try it with:
webbot('http://www.unitedmedia.com/comics/dilbert/archive/', ...
'local_links', 'cartoons');

Written by L.Cavin, 28.09.2003, (c) CSE
This code is free to use and modify for non-commercial purposes.
Web address: http://ltcmail.ethz.ch/cavin/CSEDBLib.html#WEBBOT

Zitieren als

Laurent Cavin (2025). webbot (https://www.mathworks.com/matlabcentral/fileexchange/4023-webbot), MATLAB Central File Exchange. Abgerufen.

Kompatibilität der MATLAB-Version
Erstellt mit R13
Kompatibel mit allen Versionen
Plattform-Kompatibilität
Windows macOS Linux
Kategorien
Mehr zu Call Web Services from MATLAB Using HTTP finden Sie in Help Center und MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Veröffentlicht Versionshinweise
1.0.0.0

Major update:
Much, much, faster downloads with the Matworks object "com.mathworks.mlwidgets.io.InterruptibleStreamCopier".
The old code using "java.net.url" is still included for demonstration purposes.