Tip

  • Creator
    Topic
  • #4213883

    How to share a dataset

    Locked

    by somebodysmart ·

    I have built some genealogy websites to help genealogists locate specific facts when building and documenting their family trees. Some of the data come from public record requests, while other data come from website downloads.

    One frustration I encounter is when developers take a perfectly good dataset and make it “available” on their website in the form of a .pdf, which requires OCR analysis to extract the data into a form usable in developing my site. Now, a .pdf is perfectly good for humans to read, (while a .txt file is simpler because it does not require a special plug-in to open it.) but to share it with those who would help others find what they need, the best format is a tab-delimited .csv file. (Using a comma to separate and using quotation marks around fields with embedded commas is more complicated when you have embedded quotation marks and commas. Really, a tab-delimited file is the easiest.)

    Let’s say you run a cemetery and you put your burial list on the website. I find it somewhat frustrating when they only provide a query-based search box. Unless the data are proprietary, you should also provide a link to download the whole dataset as a .csv file and another link to provide a file easier for humans to read. If your burial list is not public, beware that I have learned a few tricks for downloading the whole file anyway. Then I can add the contents to a website where search engines will slurp it down and where end-users can learn where their great-uncle was buried. This is the benefit to the cemetery because the end-user visits the cemetery website and perhaps even visits the cemetery. Search engines will not enter query terms to acquire data that are trapped behind a search box. You worry about running your cemetery and its website, and let other webmasters download the data for bigger compilations in which relatives can find it more quickly.

    This is true for websites about anything else. Don’t just provide a query-based search box, but provide a link because somebody might want the whole dataset, in human-friendly form and in pipe-delimited .csv form. If it is proprietary, make sure it is secure. Some search boxes will give the whole dataset when the end-user simply hits SEARCH without entering any search terms. Others require more work. Some of them I cannot download with my limited knowledge but others can.

All Comments

  • Author
    Replies
    • #4227004

      How to share a dataset

      by cassharper030 ·

      In reply to How to share a dataset

      Sharing data shouldn’t be a chore! Ditch the PDFs and hidden data! Websites should offer downloadable datasets in both human-friendly formats (like .txt) and computer-readable formats (like .csv) alongside search boxes. This makes data discoverable by search engines and allows others to build on it, benefiting everyone.
      Even cemeteries with burial lists can see a boost by offering downloadable data – after all, searchable info means more visitors to their site!

    • #4251025

      Reply To: How to share a dataset

      by alincejohn42 ·

      In reply to How to share a dataset

      Sharing data should be effortless and accessible! Instead of relying on cumbersome PDFs and obscured information, websites should provide downloadable datasets in user-friendly formats like .txt and machine-readable formats like .csv. Alongside robust search functionality, this approach enhances data discoverability by search engines and facilitates its use by others, fostering innovation and collaboration. Even cemeteries can benefit from this by offering downloadable burial lists, making their information more searchable and attracting more visitors to their sites. Embracing these practices can significantly improve the accessibility and utility of data for everyone.

Viewing 1 reply thread