• Creator
  • #4213883

    How to share a dataset

    by somebodysmart ·

    I have built some genealogy websites to help genealogists locate specific facts when building and documenting their family trees. Some of the data come from public record requests, while other data come from website downloads.

    One frustration I encounter is when developers take a perfectly good dataset and make it “available” on their website in the form of a .pdf, which requires OCR analysis to extract the data into a form usable in developing my site. Now, a .pdf is perfectly good for humans to read, (while a .txt file is simpler because it does not require a special plug-in to open it.) but to share it with those who would help others find what they need, the best format is a tab-delimited .csv file. (Using a comma to separate and using quotation marks around fields with embedded commas is more complicated when you have embedded quotation marks and commas. Really, a tab-delimited file is the easiest.)

    Let’s say you run a cemetery and you put your burial list on the website. I find it somewhat frustrating when they only provide a query-based search box. Unless the data are proprietary, you should also provide a link to download the whole dataset as a .csv file and another link to provide a file easier for humans to read. If your burial list is not public, beware that I have learned a few tricks for downloading the whole file anyway. Then I can add the contents to a website where search engines will slurp it down and where end-users can learn where their great-uncle was buried. This is the benefit to the cemetery because the end-user visits the cemetery website and perhaps even visits the cemetery. Search engines will not enter query terms to acquire data that are trapped behind a search box. You worry about running your cemetery and its website, and let other webmasters download the data for bigger compilations in which relatives can find it more quickly.

    This is true for websites about anything else. Don’t just provide a query-based search box, but provide a link because somebody might want the whole dataset, in human-friendly form and in pipe-delimited .csv form. If it is proprietary, make sure it is secure. Some search boxes will give the whole dataset when the end-user simply hits SEARCH without entering any search terms. Others require more work. Some of them I cannot download with my limited knowledge but others can.

    • This topic was modified 3 months, 2 weeks ago by somebodysmart.

You are posting a reply to: How to share a dataset

The posting of advertisements, profanity, or personal attacks is prohibited. Please refer to our Community FAQs for details. All submitted content is subject to our Terms of Use.

All Comments

  • Author
    • #4227004

      How to share a dataset

      by cassharper030 ·

      In reply to How to share a dataset

      Sharing data shouldn’t be a chore! Ditch the PDFs and hidden data! Websites should offer downloadable datasets in both human-friendly formats (like .txt) and computer-readable formats (like .csv) alongside search boxes. This makes data discoverable by search engines and allows others to build on it, benefiting everyone.
      Even cemeteries with burial lists can see a boost by offering downloadable data – after all, searchable info means more visitors to their site!

    • #4228182

      How to share a dataset

      by cassharper030 ·

      In reply to How to share a dataset

      Provide a downloadable file format like a .txt file as it’s easily opened by most devices. Share the data in a structured format like a .csv file with clear separation between data points (tabs are preferred over commas). This allows researchers and developers to easily import and analyze the data.

      Search engines can’t access data hidden behind search boxes. By offering a downloadable file, your data becomes searchable online, increasing its reach and potential benefits.

    • #4230272

      Reply To: How to share a dataset

      by cchambers1.1974 ·

      In reply to How to share a dataset

      Sharing a dataset responsibly involves a few key steps: first, ensure the data is appropriately anonymized to protect privacy. Next, select a suitable platform or repository with proper data management policies in place. Document the dataset thoroughly, including its source, structure, and any relevant metadata. Lastly, consider licensing options to clarify how others can use and redistribute the data while respecting your rights and intentions. Transparent and ethical sharing fosters collaboration and innovation while safeguarding individuals’ privacy and data integrity.

      Note: unrelated link removed by moderator.

Viewing 2 reply threads