what are the different between test data and training data??

Question

0 Stimmen

what are different between test data and training data

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Thomas Koelen am 12 Mai 2015

1 Stimme

In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually a dataset is divided into a training set, a validation set (some people use 'test set' instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

ALAMGIR SARDAR am 27 Aug. 2019

Thanks

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 12 Mai 2015

1 Stimme

To expand on this a small bit:

You run calculations on the training set to determine various coefficients.

You can then use the testing set to check how well the predictions do on a wider set of data, and that gives you information about false positives and false negatives.

You can use those accuracy figures to go back and re-train. You do not need to use the same division of training and test data each time: there is a common technique called "leave one out" where you deliberately drop one item at a time from the training set and re-calculate, in case that one was an outlier that was preventing getting a good overall result.

There is a nasty problem in doing classification called 'Overtraining": the calculations might fit the data you have on hand extremely well but be useless for anything else. Dividing into training and testing reduces this risk: if the algorithm has not seen a bunch of data in its calculations then it is not going to adjust itself to be exactly right for that data and bad for other things. Using all of your data to train with is therefor not a good idea.

After the program has gone back and forth on training sets and validation sets, and has decided on the best coefficients, where the data was allowed to affect the algorithm, then it is time to run it on the remaining data and produce a report. The rest of the data might not have a known classification, but it might. If the classifications are known then when the programmer looks at the report the programmer might decide it is time to change the program. Or might not. The report is the kind of thing that gets written up in a paper: we did this and that and with a limited subset of data to train and test with, we did this well on real data. Or perhaps you send it to the people designing the equipment and experiments so they can see what needs to be improved on their end. Eventually you publish the paper or write a report or the like, and other people read it and want to use your program too. But they aren't going to do that if you haven't established evidence that it is not over-training on the particular data you gave it -- and seeing how well it did on data that was not used to design the details of the algorithm is evidence.

2 Kommentare
Keine anzeigen Keine ausblenden

Isabel Hostettler am 15 Feb. 2017

I've just read your answer, can I ask for advice/help or ask a question? I've come across the sentence: "quality of prediction was estimated to be good if the difference between the training and test dataset was <5 and acceptable if it was <10%". Now my question is, how did the person choose this difference to be good or acceptable, respectively? Is that the difference on always takes or is there a rule? A reference to relate to? Advice would be much appreciated. Isabel

NN am 9 Mär. 2021

about leave one out part, how is it done ?is it by leaving one data point and taking the rest again as test data ?

Melden Sie sich an, um zu kommentieren.

what are the different between test data and training data??

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

Kategorien

Tags

Community Treasure Hunt

what are the different between test data and training data??

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (2)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

2 Kommentare Keine anzeigen Keine ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden