Extract data from HTML file stored in C drive of Laptop

10 Ansichten (letzte 30 Tage)
Pranav Balasaheb Mohite
Pranav Balasaheb Mohite am 4 Aug. 2022
Beantwortet: Saffan am 30 Aug. 2023
Hello Everyone,
I want to extract data from local HTML file stored in C drive of laptop.
Can anyonw guide me how can I extract the data from the HTML file and further converting the data into array of char and using it ahead.
the file format is HTML and link is something like - file:///C:/Users/Pranav/OneDrive/Desktop/.....................................
commands that I have already used - 1) str=fileread('xxxxxxxxxxxxxxxxx.html') ---> data=extractHTMLString (str)
but it is giving output data as a 1 X 1000000 range where each letter is considered.
I am looking forward to some quality advices
Thanks in advance!
  1 Kommentar
Walter Roberson
Walter Roberson am 6 Sep. 2022
Are you using extractHTMLText ?
As an experiment, what happens if you fileread() the file directly and process that?
You have two separate issues:
  1. Making sure that the text can be pulled out of a url;
  2. processing text
Reading the file without url will allow you to test out the processing part separately from reading from the url.
To test reading from the url you could fileread() from the url and fileread() from the local file without url, and compare the two.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Saffan
Saffan am 30 Aug. 2023
Hi,
To accomplish this, you can modify your code to add an additional step of creating an HTMLTtree using the “htmlTree” method. This method parses the HTML code in the string and returns the resulting tree structure. You can then extract the text from the HTMLtree as shown in the following code snippet:
% Read the HTML file
htmlContent = fileread(filePath);
% Create an HTML tree from the content
tree = htmlTree(htmlContent);
% Extract the text from the HTML tree
data = extractHTMLText(tree);
Refer to this for more information:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by