Main Content


HMM-based model for named entity recognition (NER)

Since R2023a


    A hmmEntityModel object is a named entity recognition (NER) model that is based on a hidden Markov model (HMM).

    The addDependencyDetails function automatically detects person names, locations, organizations, and other named entities in text. If you want to train a custom model that predicts different tags, or train a model using your own data, then you can use the trainHMMEntityModel function.


    Train a HMM-based NER model using the trainHMMEntityModel function.


    expand all

    Named entities, specified as categorical array.

    Data Types: categorical

    Object Functions

    predictPredict entities using named entity recognition (NER) model


    collapse all

    Read the example entity data from exampleEntities.csv into a table.

    tbl = readtable("exampleEntities.csv",TextType="string");

    View the first few rows of the table. The table has two columns Token and Entity that correspond to the token and entities, respectively.

                 Token                 Entity   
        ________________________    ____________
        "Analyze"                   "non-entity"
        "text"                      "non-entity"
        "in"                        "non-entity"
        "MATLAB"                    "product"   
        "using"                     "non-entity"
        "Text Analytics Toolbox"    "product"   
        "."                         "non-entity"
        "Engineers"                 "non-entity"

    Train an HMM-based NER model using the trainHMMEntityModel function.

    mdl = trainHMMEntityModel(tbl)
    mdl = 
      hmmEntityModel with properties:
        Entities: [3×1 categorical]

    View the entities of the model.

    ans = 3×1 categorical

    To add entity details to documents using the trained hmmEntityModel object, use the addEntityDetails function and set the Model option to the trained NER model.

    Create a tokenized document containing text data.

    str = "MathWorks develops MATLAB and Simulink.";
    document = tokenizedDocument(str);

    Add entity details using the trained hmmEntityModel object and view the updated token details using the tokenDetails function. The Entity column contains the predicted entities.

    document = addEntityDetails(document,Model=mdl);
    details = tokenDetails(document)
    details=6×8 table
           Token       DocumentNumber    SentenceNumber    LineNumber       Type        Language      PartOfSpeech          Entity   
        ___________    ______________    ______________    __________    ___________    ________    _________________    ____________
        "MathWorks"          1                 1               1         letters           en       proper-noun          organization
        "develops"           1                 1               1         letters           en       verb                 non-entity  
        "MATLAB"             1                 1               1         letters           en       proper-noun          product     
        "and"                1                 1               1         letters           en       coord-conjunction    non-entity  
        "Simulink"           1                 1               1         letters           en       proper-noun          product     
        "."                  1                 1               1         punctuation       en       punctuation          non-entity  

    Extract the tokens that are named entities.

    idx = details.Entity ~= "non-entity";
    details(idx,["Token" "Entity"])
    ans=3×2 table
           Token          Entity   
        ___________    ____________
        "MathWorks"    organization
        "MATLAB"       product     
        "Simulink"     product     

    Version History

    Introduced in R2023a