How do I split a string using a only single whitespace delimiter so any multiple whitespaces are also "split"?

Question

deathtime am 30 Mai 2022

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1730140-how-do-i-split-a-string-using-a-only-single-whitespace-delimiter-so-any-multiple-whitespaces-are-a

Kommentiert: dpb am 30 Mai 2022

I am aware of how to use strsplit to split a string using whitespace delimiters. However, if there are multiple whitespaces between two strings of text, the entire whitespace is ignored (as it should be).

Consider the example in which I have a string:

string_line = 'Time Aero Inertial Total'

In this example, each individal string of text in string_line is separated by one whitespace, except between "Inertial" and "Total". In this instance, there are 10 whitespaces.

If I use:

split_string = strsplit(string_line)

I end up with a 1x4 cell, consisting of the strings of text in string_line separated by a whitespace i.e. "Time", "Aero", "Inertial" and "Total".

How do I create a 1x5 cell, in which 8 whitespaces between "Inertial" and "Total" is included in split_string as an empty cell. Thus, split_string should consist of: "Time", "Aero", "Inertial", " " and "Total".

2 Kommentare
Keine anzeigenKeine ausblenden

dpb am 30 Mai 2022

I presume the general use case will not be fixed-length strings???

I'm certain (well almost certain) a whizard could write a regular expression for the purpose; that someone isn't me, however.

I'd probably take the expedient of introducing delimiters into the string first.

The new(ish) pattern might be able to help build an appropriate expression to pass to split; it's a poor man's way to build regular expressions. I've used it for some very trivial cases, but not something specifically apropos to this. I think you can ask for patterns of at least N characters with it that could possibly let you return the blank field.

Alternatively, how did you get such a string to begin with? Can it be read in such that it maintains the fields one presumes it may have had at one time -- or if it is a header line in a computer-generated output file, then it might be fixed length and/or perthaps one could use an import object to define the format such as to be able to read the five variables; one being interpreted as a missing variable.

deathtime am 30 Mai 2022

The example I used was for the header line of a data file; I thought there might be a straightforward way to do it using delimiters, which is why I made up that example.

The strings that need to be split are actually the rows of data under each of those headings. The empty header in my example is not actually an empty header in my data file, but the column below it is empty. Upon your suggestion of fixed length separation, I just checked that each column has a width of 15 characters/spaces.

However, I need to split the strings on a line by line basis. The file consists of other data with different layout before and after the columns of interest - and this happens multiple times in the file.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Stephen23 am 30 Mai 2022

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1730140-how-do-i-split-a-string-using-a-only-single-whitespace-delimiter-so-any-multiple-whitespaces-are-a#answer_974470

Bearbeitet: Stephen23 am 30 Mai 2022

In MATLAB Online öffnen

Here are a few approaches using regular expressions, note that 'split' vs 'match' etc. may have different behaviors in case of leading/trailing delimiter characters.

Tx = 'Time Aero Inertial          Total';
C1 = regexp(Tx,'\S+|(?<=\s)\s+(?=\s)','match')
C1 = 1×5 cell array
    {'Time'}    {'Aero'}    {'Inertial'}    {'        '}    {'Total'}
C2 = regexp(Tx,'(?<=\S)\s|\s(?=\S)','split')
C2 = 1×5 cell array
    {'Time'}    {'Aero'}    {'Inertial'}    {'        '}    {'Total'}
C3 = regexp(Tx,'\s(\s+)\s|(\S+)','tokens');
C3 = [C3{:}]
C3 = 1×5 cell array
    {'Time'}    {'Aero'}    {'Inertial'}    {'        '}    {'Total'}

Checking:

isequal(C1,C2,C3)
ans = logical
   1
cellfun(@numel,C1)
ans = 1×5
     4     4     8     8     5

2 Kommentare
Keine anzeigenKeine ausblenden

Jan am 30 Mai 2022

Bearbeitet: Jan am 30 Mai 2022

Your C2 considers trailing spaces, the others don't.

dpb am 30 Mai 2022

Those who have delved into the regexp world and come out functional are a different breed! :)

I've never been able to get anywhere on the efforts I've tried when needed anything out of the ordinary and don't have the patience to learn the depths of understanding needed so have the rudiments down to start with. Fortunately, now being older than most topsoil, I've past the point of every having to have done... :)

Melden Sie sich an, um zu kommentieren.

Answer 2

DGM am 30 Mai 2022

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1730140-how-do-i-split-a-string-using-a-only-single-whitespace-delimiter-so-any-multiple-whitespaces-are-a#answer_974450

In MATLAB Online öffnen

I'm totally not the one to ask for elegant regex, but there are always workarounds.

thischar = 'Time Aero Inertial          Total';
C = regexp(thischar,'(\s*)','tokenextents');
C = unique([C{:}]);
fdelim = char(10); % pick some delimiter character to insert
thischar(C) = fdelim; % insert delimiters
splitchar = split(thischar,fdelim) % split
splitchar = 5×1 cell array
    {'Time'    }
    {'Aero'    }
    {'Inertial'}
    {'        '}
    {'Total'   }

I suppose you could always use the indices in C to split the vector directly, but this approach was easy enough.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Jan am 30 Mai 2022

For trailing spaces, empty matrices are appended to the output.

Melden Sie sich an, um zu kommentieren.

Answer 3

Jan am 30 Mai 2022

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1730140-how-do-i-split-a-string-using-a-only-single-whitespace-delimiter-so-any-multiple-whitespaces-are-a#answer_974515

Bearbeitet: Jan am 30 Mai 2022

In MATLAB Online öffnen

The dull method:

s = 'Time Aero Inertial          Total';
d = diff(s == ' ');
s(d == -1)     = '*';  % Start of non-spaces
s([0, d] == 1) = '*';  % End of non-spaces
s                      % Just for demonstration:
s = 'Time*Aero*Inertial*        *Total'
t = strsplit(s, '*')
t = 1×5 cell array
    {'Time'}    {'Aero'}    {'Inertial'}    {'        '}    {'Total'}

With 1 trailing space, a [] is appended to the output. With more trailing spaces, a block of spaces is replied there also.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

How do I split a string using a only single whitespace delimiter so any multiple whitespaces are also "split"?

2 Kommentare
Keine anzeigenKeine ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

How do I split a string using a only *single* whitespace delimiter so any multiple whitespaces are also "split"?

2 Kommentare Keine anzeigenKeine ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

How do I split a string using a only single whitespace delimiter so any multiple whitespaces are also "split"?

2 Kommentare
Keine anzeigenKeine ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden