Institute of Research and Journals (IRAJ)
Daily data is generated at enormous speed. Single machine is insufficient to store and process it. Most of this data are unstructured. Most of the data (about 90%) was generated in last few years. Such data is characterized by high volume velocity and veracity. Such data is known as big data and is stored in clusters. Efficient framework is required for managing such big data. Framework which exists are Apache Hadoop, Apache spark, Apache Flink, Microsoft REEF (Retainable Evaluator Execution Framework). Flink is a new framework which has built-in optimization techniques for serialization and de-serialization. Flink also has built-in program optimizer which selects proper runtime operations for each program.