CNN for text classification

Question

christian am 6 Jul. 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1992768-cnn-for-text-classification

Kommentiert: christian am 8 Jul. 2023

Hi, I am working with textual data and the main objective is to classify text strings into different classes. I have referred to the example provided in the link Classify Text Data Using Convolutional Neural Network - MATLAB & Simulink - MathWorks Italia.

I have a question regarding the role of the wordembedding layer when the input is already represented as numbers due to the encoding process. Since the input is numerical, I am uncertain about how the wordembedding layer contributes to the classification task. Could you help me to understand this?

Thank you in advance for your reply !

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Pratyush am 7 Jul. 2023

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1992768-cnn-for-text-classification#answer_1268703

Bearbeitet: Pratyush am 7 Jul. 2023

Hi Christian. The reason for applying a word embedding layer to the texts is to provide a semantic meaning to those texts. You are right that the workds are already represented as numbers due to encoding. But those encodings are simply tokens and do not have any semantic sense.

For example we can represent the words "apple" and "fruit" using numbers. But there is no way to represent the similarity between the two words just through the encoded numbers. Word embedding provides a vector for each word such that if two words are semantically similar, the distance between the vectors is less and vice-versa.

So after the word embedding layer, the vectors assigned to words "fruit" and "apple" will be such that the distance between the vectors will be small. On the other hand if two words are semantically not very close, like "fruit" and "car" will be assigned vectors such that the distance between them will be large.

Therefore applying the word embedding layer makes the training easier for the network. Hope this resolves your doubt.

You may refer the below resources to know more about the Word Embedding Layer.

Why do we use word embeddings in NLP? | by Natasha Latysheva | Towards Data Science

Word embedding. What are word embeddings? Why we use… | by Manjeet Singh | Data Science Group, IITR | Medium

Word embedding layer for deep learning networks - MATLAB (mathworks.com)

Classify Text Data Using Deep Learning - MATLAB & Simulink (mathworks.com)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Pratyush am 7 Jul. 2023

In MATLAB Online öffnen

Yes you are right. The embedding layer will provide a semantically meaningful vector for each word even if they are in encoded format. Encoding alone does not provide meaningful semantic representations.

The vocabulary is nothing but the unique words or tokens in the training dataset. Let's say the training dataset has 1023 unique words. Now the word embedding layer will take the number of unique words as a parameter (which is 1023), and for each word index from 1 to 1023 it will try to learn a semantically meaningful vector. Here is an example from the documentation you had provided earlier.

wordEmbeddingLayer(embeddingDimension,numWords,Name="emb")]

The second parameter numWords was computed earlier as the number of unique words or tokens in the training dataset as shown below.

enc = wordEncoding(documentsTrain); % Encoding of words
numWords = enc.NumWords; % Total number of unique words

So the required information of the vocabulary is provided to the network as the numWords parameter.

christian am 8 Jul. 2023

Thank you very much. Have a nice day!

Melden Sie sich an, um zu kommentieren.

CNN for text classification

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

CNN for text classification

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden