Main Content

textanalytics.unicode.UTF32

Unicode UTF-32 string representation

Since R2021a

    Description

    The 32-bit Unicode transformation format (UTF-32) is a fixed length Unicode code point encoding that uses exactly 32 bits per code point.

    Creation

    Description

    str32 = textanalytics.unicode.UTF32(str) returns the Unicode UTF-32 representation of str. If str is an array, then str32(i) is the Unicode UTF-32 representation of the string str(i).

    example

    Input Arguments

    expand all

    Input text, specified as a string array, character vector, or cell array of character vectors.

    Example: ["An example of a short sentence."; "A second short sentence."]

    Data Types: string | char | cell

    Properties

    expand all

    UTF-32 code points, specified as a vector of integers with type uint32.

    If the input string contains surrogate pairs, then the corresponding list of code points has a different length.

    Data Types: uint32

    Object Functions

    characterCategoriesUnicode character categories
    hexConvert UTF-32 representation to hexadecimal values
    stringConvert UTF-32 representation to string

    Examples

    collapse all

    Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

    str = "Hello! ";
    str32 = textanalytics.unicode.UTF32(str)
    str32 = 
      UTF32 with properties:
    
        Data: [72 101 108 108 111 33 32 128512]
    
    

    Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

    str = "Hello! ";
    str32 = textanalytics.unicode.UTF32(str)
    str32 = 
      UTF32 with properties:
    
        Data: [72 101 108 108 111 33 32 128512]
    
    

    Get the Unicode character categories of str32 using the characterCategories function.

    ucats = characterCategories(str32)
    ucats = 1x1 cell array
        {[L    L    L    L    L    P    Z    S]}
    
    

    The Unicode character categories "L", "P", "Z", and "S" correspond to "letter", "punctuation", "separator", and "symbol", respectively.

    Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

    str = "Hello! ";
    str32 = textanalytics.unicode.UTF32(str)
    str32 = 
      UTF32 with properties:
    
        Data: [72 101 108 108 111 33 32 128512]
    
    

    Get the Unicode character categories of str32 using the characterCategories function. To return detailed Unicode character categories, set the 'Granularity' option to 'detailed'.

    ucats = characterCategories(str32,'Granularity','detailed')
    ucats = 1x1 cell array
        {[Lu    Ll    Ll    Ll    Ll    Po    Zs    So]}
    
    

    The Unicode character categories "Lu", "Ll", "Po", "Zs", and "So" correspond to "uppercase letter", "lowercase letter", "other punctuation", "space separator", and "other symbol", respectively.

    Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

    str = "Hello! ";
    str32 = textanalytics.unicode.UTF32(str)
    str32 = 
      UTF32 with properties:
    
        Data: [72 101 108 108 111 33 32 128512]
    
    

    Convert str32 to hexadecimal values using the hex function.

    hexStr = hex(str32)
    hexStr = 
    " 0048  0065  006C  006C  006F  0021  0020 1F600"
    

    Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

    str = "Hello! ";
    str32 = textanalytics.unicode.UTF32(str)
    str32 = 
      UTF32 with properties:
    
        Data: [72 101 108 108 111 33 32 128512]
    
    

    Convert str32 to string using the string function.

    str = string(str32)
    str = 
    "Hello! "
    

    References

    [1] Unicode Standard Annex #19 UTF-32 https://www.unicode.org/reports/tr19/tr19-9.html

    Version History

    Introduced in R2021a