How to extract formatted hex data from a text file?

Question

0 Stimmen

I have a text file with tens of thousands of lines formatted like so:

...
[11:52:30.739] :   Value: [BC-7F-00-00-F7-F6-2A-D7-24-D9-81-EE-6A-DD-08-D4-BD-D5-09-E1-10-F5-22-DB-20-DE-55-EA-21-D9-22-D1-EA-D5-45-E5-95-F8-2A-D7-22-DA-12-EF-95-DD-2A-D3-85-D4-5A-E1-EB-F6-2A-DB-22-DE-54-EA-A2-D8-7D-CF-15-D4-0A-E5-FB-FA-88-D6-A5-D9-D1-EA-80-DC-95-D1-24-D2-2D-E1-D5-F9-AA-DA-DA-DD-AA-E6-BD-D7-56-CE-D5-D1-5A-E5-5B-FD-D5-D5-55-DA-A8-E9-DA-DB-EF-CF-11-D0-D5-E1-15-FC-EA-D9-21-DE-D5-E6-A0-D6-B5-CD-0A-D0-EE-E6-55-FF-20-D5-55-DB-DE-E9-D5-DA-96-CE-2A-CE-5A-E3-DD-FC-EA-D8-F6-DD-55-E7-D5-D5-8A-CE-16-D0-A8-E8-15-FE-D4-D5-6A-DC-D5-EA-AA-DA-2A-CF-22-CE-A5-E4-B5-FC-6A-D9-4A-DE-AA-E8-FA-D5-A2-CF-12-D1-AA-E9-A0-FD-4A-D6-55-DC-EA-EB-95-DA-F5-CF-2A-CF-94-E4-EA-FB-CA-D9-40-DE-D5-E8-5A-D6-44-D0-2A-D2-B5-E9-2A-FD-40-D6-AA-DC-DB-EB-AB-DA-E8-CF-55-CF-6A-E4]
[11:52:30.777] :   Value: [CB-7F-00-00-FF-FB-2A-D5-2A-DA-2D-EC-95-DB-B6-D0-2A-D1-B5-E5-DA-FA-15-D9-2A-DF-B6-E8-7F-D7-02-CD-55-D1-45-E9-4A-FE-AA-D4-D5-DA-D5-EB-EA-DB-0A-CF-20-D0-AA-E5-AF-FC-69-D9-DA-DE-AA-E6-55-D7-2A-CC-55-D0-56-E9-95-FF-00-D5-81-DA-55-E8-95-DB-95-CE-22-CF-56-E6-F6-FD-D5-D8-A4-DE-55-E5-5B-D6-B5-CC-D2-CF-16-EB-2A-01-85-D4-29-DB-AA-EA-AA-DA-BF-CE-D5-CE-44-E7-2A-FC-90-D9-4A-DD-2A-E5-A8-D6-AA-CD-55-D1-AD-E9-AA-FE-35-D4-EA-D8-24-E8-CA-DA-6D-D0-6A-D0-B5-E5-DD-FC-B6-D8-6A-DD-56-E5-85-D6-D2-CD-A5-D0-AB-E9-11-00-82-D4-15-DA-55-EA-0A-DB-15-D0-55-CF-6A-E6-50-FA-EA-D9-40-DC-AA-E5-28-D7-AA-CD-6E-D1-A4-E8-55-FB-7A-D5-55-D7-2A-E8-B6-DA-95-D1-7A-D1-5D-E5-57-F8-7D-D9-20-DD-0A-E7-DE-D7-20-CE-1A-D2-D5-E7-94-FC-5A-D5-89-D8-4A-EA-94-DB-AB-D1-D6-D1-AA-E4]
...

I need to extract the hex data in the following way: The first 32 bits are an index, and the following words are actual data.

Currently I'm parsing the file line-by-line and extracting the hex data using regexp. This is slow. In the end I'm getting a matrix with doubles that I can manipulate quickly.

There must be a faster method to extract this data and I don't know it. Perhaps you can help?

2 Kommentare
Keine anzeigen Keine ausblenden

dpb am 4 Jul. 2018

What's the data encoding scheme?

Paolo am 4 Jul. 2018

What does your regexp look like? Perhaps it could be improved for efficiency.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Guillaume am 4 Jul. 2018

Bearbeitet: Guillaume am 4 Jul. 2018

In MATLAB Online öffnen

0 Stimmen

How about:

filecontent = fileread(yourfile);  %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = cell2mat(cellfun(@(hex) sscanf(hex, '%x-'), hexstrings, 'UniformOutput', false))';

This may be faster:

filecontent = fileread(yourfile);  %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = reshape(sscanf(strjoin(hexstrings, '-'), '%x-'), [], numel(hexstrings))';

Unfortunately, textscan doesn't have a '%x' format specifier.

2 Kommentare
Keine anzeigen Keine ausblenden

dpb am 4 Jul. 2018

A major foobar, too... :( I added the enhancement request not long after it was introduced; like many simple but apparently not sexy-enough things, it's never made the cut of what gets attention.

Stanislav Steinberg am 5 Jul. 2018

Guillaume, brilliant! So far it looks much faster than what I previously coded. Good to learn. Thanks.

Melden Sie sich an, um zu kommentieren.

How to extract formatted hex data from a text file?

2 Kommentare
Keine anzeigen Keine ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigen Keine ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

How to extract formatted hex data from a text file?

2 Kommentare Keine anzeigen Keine ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigen Keine ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

2 Kommentare
Keine anzeigen Keine ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden