Towards Efficient Data Search and Subsetting of Large-Scale Atmospheric Datasets
Discovering the correct dataset in an efficient way is critical for effective simulations in atmospheric sciences. Compared to text-based web documents, many of the large scientific datasets contain binary or numerically encoded data that is hard to discover through the popular search engines. In the atmospheric sciences, there has been a significant growth in public data hosting. However, the ability to index and search has been limited by the metadata provided by the data host. The authors have developed an infrastructure - Atmospheric Data Discovery System (ADDS) - that provides an efficient data discovery environment for the observational datasets in the atmospheric sciences. To support complex querying capabilities, they automatically extract and index fine-grained metadata.