Less is More: Selecting Sources Wisely for Integration
The authors are often thrilled by the abundance of information surrounding users and wish to integrate data from as many sources as possible. However, understanding, analyzing, and using these data are often hard. Too much data can introduce a huge integration cost, such as expenses for purchasing data and resources for integration and cleaning. Furthermore, including low-quality data can even deteriorate the quality of integration results instead of bringing the desired quality gain. Thus, "The more the better" does not always hold for data integration and often "Less is more".