In the domain of data science, solving problems and answeringquestions through data analysis is standard practice. Often, data scientists construct a model to predict outcomes or
discover underlying patterns, with the goal of gaining insights. Organizations can then use these insights to take actions that ideally improve future outcomes.
There are numerous rapidly evolving technologies for analyzing data and building models. In a remarkably short time, they have progressed from desktops to massively parallel warehouses with huge data volumes and in-database analytic functionality in relational databases and Apache Hadoop. Text analytics on unstructured or semi-structured data is becoming increasingly important as a way to incorporate sentiment and other useful information from text into predictive models, often leading to significant improvements in model quality and accuracy.
Emerging analytics approaches seek to automate many of the steps in model building and application, making machine e-learning technology more accessible to those who lack deep
quantitative skills. Also, in contrast to the “top-down” approach of first defining the business problem and then analyzing the data to find a solution, some data scientists may use a
“bottom-up” approach. With the latter, the data scientist looks into large volumes of data to see what business goal might be suggested by the data and then tackles that problem. Since
most problems are addressed in a top-down manner, the methodology in this paper reflects that view.