Date Added: Jan 2011
Most learning methods assume that the training set is drawn randomly from the population to which the learned model is to be applied. However in many applications this assumption is invalid. For example, lending institutions create models of who is likely to repay a loan from training sets consisting of people in their records to whom loans were given in the past; however, the institution approved loan applications previously based on who was thought unlikely to default. Learning from only approved loans yields an incorrect model because the training set is a biased sample of the general population of applicants.