Big Data

DAME: A Distributed Data Mining & Exploration Framework within the Virtual Observatory

Date Added: Apr 2010
Format: PDF

Scientific areas share the same broad requirements of being able to deal with massive and distributed datasets while, when possible, being integrated with services and applications. In order to solve the growing gap between the incremental generation of data and the understanding of it, it is required to know how to access, retrieve, analyze, mine and integrate data from disparate sources. One of the fundamental aspects of any new generation of data mining software tool or package which really wants to become a service for the community is the possibility to use it within complex workflows which each user can fine tune in order to match the specific demands of his scientific goal. These workflows need often to access different resources (data, providers, computing facilities and packages) and require a strict interoperability on (at least) the client side. The project DAME (Data Mining & Exploration) arises from these requirements by providing a distributed WEB-based data mining infrastructure specialized on Massive Data Sets exploration with Soft Computing methods. Originally designed to deal with astrophysical use cases where first scientific application examples have demonstrated its effectiveness, the DAME Suite results as a multi-disciplinary platform independent tool perfectly compliant with modern KDD (Knowledge Discovery in Databases) requirements and Information & Communication Technology trends. Generally speaking, applications for KDD will come not from computer programs, nor from machine learning experts, nor from the data itself, but from people and communities who work with the data and the problems from which it arises. That is why it has been designed and provided the DAME infrastructure, to empower those who are not machine learning experts to apply these techniques to the problems that arise in daily working life. DAME project comes out as an astrophysical data exploration and mining tool, originating from the very simple consideration that, with data obtained by the new generation of instruments,