How to extract formatted hex data from a text file?

4 Ansichten (letzte 30 Tage)
I have a text file with tens of thousands of lines formatted like so:
...
[11:52:30.739] : Value: [BC-7F-00-00-F7-F6-2A-D7-24-D9-81-EE-6A-DD-08-D4-BD-D5-09-E1-10-F5-22-DB-20-DE-55-EA-21-D9-22-D1-EA-D5-45-E5-95-F8-2A-D7-22-DA-12-EF-95-DD-2A-D3-85-D4-5A-E1-EB-F6-2A-DB-22-DE-54-EA-A2-D8-7D-CF-15-D4-0A-E5-FB-FA-88-D6-A5-D9-D1-EA-80-DC-95-D1-24-D2-2D-E1-D5-F9-AA-DA-DA-DD-AA-E6-BD-D7-56-CE-D5-D1-5A-E5-5B-FD-D5-D5-55-DA-A8-E9-DA-DB-EF-CF-11-D0-D5-E1-15-FC-EA-D9-21-DE-D5-E6-A0-D6-B5-CD-0A-D0-EE-E6-55-FF-20-D5-55-DB-DE-E9-D5-DA-96-CE-2A-CE-5A-E3-DD-FC-EA-D8-F6-DD-55-E7-D5-D5-8A-CE-16-D0-A8-E8-15-FE-D4-D5-6A-DC-D5-EA-AA-DA-2A-CF-22-CE-A5-E4-B5-FC-6A-D9-4A-DE-AA-E8-FA-D5-A2-CF-12-D1-AA-E9-A0-FD-4A-D6-55-DC-EA-EB-95-DA-F5-CF-2A-CF-94-E4-EA-FB-CA-D9-40-DE-D5-E8-5A-D6-44-D0-2A-D2-B5-E9-2A-FD-40-D6-AA-DC-DB-EB-AB-DA-E8-CF-55-CF-6A-E4]
[11:52:30.777] : Value: [CB-7F-00-00-FF-FB-2A-D5-2A-DA-2D-EC-95-DB-B6-D0-2A-D1-B5-E5-DA-FA-15-D9-2A-DF-B6-E8-7F-D7-02-CD-55-D1-45-E9-4A-FE-AA-D4-D5-DA-D5-EB-EA-DB-0A-CF-20-D0-AA-E5-AF-FC-69-D9-DA-DE-AA-E6-55-D7-2A-CC-55-D0-56-E9-95-FF-00-D5-81-DA-55-E8-95-DB-95-CE-22-CF-56-E6-F6-FD-D5-D8-A4-DE-55-E5-5B-D6-B5-CC-D2-CF-16-EB-2A-01-85-D4-29-DB-AA-EA-AA-DA-BF-CE-D5-CE-44-E7-2A-FC-90-D9-4A-DD-2A-E5-A8-D6-AA-CD-55-D1-AD-E9-AA-FE-35-D4-EA-D8-24-E8-CA-DA-6D-D0-6A-D0-B5-E5-DD-FC-B6-D8-6A-DD-56-E5-85-D6-D2-CD-A5-D0-AB-E9-11-00-82-D4-15-DA-55-EA-0A-DB-15-D0-55-CF-6A-E6-50-FA-EA-D9-40-DC-AA-E5-28-D7-AA-CD-6E-D1-A4-E8-55-FB-7A-D5-55-D7-2A-E8-B6-DA-95-D1-7A-D1-5D-E5-57-F8-7D-D9-20-DD-0A-E7-DE-D7-20-CE-1A-D2-D5-E7-94-FC-5A-D5-89-D8-4A-EA-94-DB-AB-D1-D6-D1-AA-E4]
...
I need to extract the hex data in the following way: The first 32 bits are an index, and the following words are actual data.
Currently I'm parsing the file line-by-line and extracting the hex data using regexp. This is slow. In the end I'm getting a matrix with doubles that I can manipulate quickly.
There must be a faster method to extract this data and I don't know it. Perhaps you can help?
  2 Kommentare
dpb
dpb am 4 Jul. 2018
What's the data encoding scheme?
Paolo
Paolo am 4 Jul. 2018
What does your regexp look like? Perhaps it could be improved for efficiency.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Guillaume
Guillaume am 4 Jul. 2018
Bearbeitet: Guillaume am 4 Jul. 2018
How about:
filecontent = fileread(yourfile); %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = cell2mat(cellfun(@(hex) sscanf(hex, '%x-'), hexstrings, 'UniformOutput', false))';
This may be faster:
filecontent = fileread(yourfile); %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = reshape(sscanf(strjoin(hexstrings, '-'), '%x-'), [], numel(hexstrings))';
Unfortunately, textscan doesn't have a '%x' format specifier.
  2 Kommentare
dpb
dpb am 4 Jul. 2018
A major foobar, too... :( I added the enhancement request not long after it was introduced; like many simple but apparently not sexy-enough things, it's never made the cut of what gets attention.
Stanislav Steinberg
Stanislav Steinberg am 5 Jul. 2018
Guillaume, brilliant! So far it looks much faster than what I previously coded. Good to learn. Thanks.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Produkte


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by