The effect of sample size and disease prevalence on supervised machine learning of narrative data