Enhancing Analytical Modeling for Large Data Sets With Variable Reduction
Off late, technology helps store huge data at no or token additional cost, as compared to earlier days. In today's business one keeps information in different tables in suitable structure. For instance, it can have account data, transaction data, customer demographic data, payment data, inbound-outbound call data, campaign data, account history data etc. in a financial business. For one's analytical purpose one collates all the information to create customers' single view that may contain huge number of variables. The challenge is to identify which few of them one will use for their modeling purpose. In high dimensional data sets, identifying irrelevant variables is more difficult than identifying redundant variables. It is suggested that first the redundant variables be removed and then the irrelevant ones looked for.