The Turnip: Visualizing Statistical Data Cleaning
Source: University of California
Databases with measured or humanly entered data can be full of erroneous data that regular statistical methods are created around, but solutions can be time consuming and the data can be difficult to clean manually. Current solutions are often big, complicated and/or require extensive knowledge to be operated. The Turnip will be presented in this paper. It is a lightweight tool to visually clean data based on statistical methods without requiring the user to have knowledge of statistical or mathematical analyzes. Users can visually analyze data through a spreadsheet interface and visual graphs, and both automatically and manually detect outliers, navigate through them and collaboratively add Meta data and descriptions about the data with comments.