> Missingness in training data & the submission

I'm trying to upload my first submission and I've seen that I need to upload a prediction for every observation (at least for the variables I am predicted). I am wondering what to do about missingness in the training set.

For my submission I am including the true values of the outcomes if they were available and the predicted values of the outcomes if they were in the prediction set OR if they were in the training set but were missing. Is this the approach others have taken?

Posted by: tdavidson @ June 17, 2017, 6:28 p.m.

Alternatively I could simply add the predicted values for all observations... perhaps this makes more sense

Posted by: tdavidson @ June 17, 2017, 6:32 p.m.

I think the advice is to use the predicted value for all observations - see earlier thread about evaluation.

The 'scoring' for the competition is only based on the non-missing values of the test set - it doesn't matter whether you use the actual value of the training data or the predicted value, though the organisers did express some interest in seeing the predictions made for the known data.

I also think that you cannot have missings in the prediction.csv file without creating an error.

Hope that helps.

Posted by: the_Brit @ June 19, 2017, 6:24 a.m.
Post in this thread