Institute of Electrical & Electronic Engineers
Attribute reduction for big data is viewed as an important preprocessing step in the areas of pattern recognition, machine learning and data mining. In this paper, a novel parallel method based on MapReduce for large-scale attribute reduction is proposed. By using this method, several representative heuristic attribute reduction algorithms in rough set theory have been parallelized. Further, each of the improved parallel algorithms can select the same attribute reduct as its sequential version, therefore, owns the same classification accuracy.