Data Management

Simple data storage with Ruby

Data storage does not always need a complete SQL RDBMS like PostgreSQL. In fact, sometimes small is beautiful, and plain text data formats can be even easier to use with the right libraries.

A lot of software development today involves a user-friendly front end for a massive SQL database. This pattern for development — sometimes referred to as CRUD applications (CRUD is an abbreviation for Create, Read, Update, and Delete) — has become so prevalent and such a solved problem in many ways that it never occurs to many people to do things differently. The truth is that a feature-rich SQL database management system such as PostgreSQL is massive overkill for a lot of use cases, like swatting a fly with an atom bomb. For simpler tools, a simpler approach to data storage may be more appropriate.

One of the benefits of using simpler data storage methodologies is that they often use plain text storage, which means it is relatively easy to read the data in a text editor rather than having to use complex software such as pgAdmin (a database management client for PostgreSQL).

CSV

One of the most venerable plain text data formats people use on a regular basis is comma separated value (or CSV) format. It is nothing more than plain text where each line is a record, row, or tuple (depending on your preferred jargon), and each of these is divided by commas into fields, columns, or elements. This is very convenient for storing tabular data without a complex database engine — that is, to represent data in a storable form that conforms to the characteristics of a two-dimensional table. As long as none of your fields need to contain any commas, programmatically parsing CSV data is trivially easy. With commas within some fields, however, the data format needs to use a different separator than commas, account for escape characters, or account for a quoting character. It may require more than one of these solutions, depending on how complex the data gets.

In its simplest form, a CSV file might contain lines of text like the following:

Perrin,Chad,Simple Data Storage With Ruby,Programming

Perrin,Chad,Review: The Best Linux Book Available,Open Source

Perrin,Chad,How To Get People To Use Strong Passwords,Security

For quoted values, you might see this in a line of the same file:

Perrin,Chad,"NetworkManager, The Fifth Horseman Of The Apocalinux",Open Source

Alternatively, the same can be achieved with an escape character:

Perrin,Chad,NetworkManager\, The Fifth Horseman Of The Apocalinux,Open Source

For cases where the field separator character might also be used within your data, the easiest approach is to use a different character as your field separator — one you know will not be used within the data itself. In some cases this might be a semicolon; in others, it might be a carat or tab. Using tabs as field separators is common enough to have given rise to its own initialism similar to CSV: the format is tab separated value (or TSV). Accounting for other character types, some have taken to calling CSV a "character separated value" format.

The smart way to handle CSV files in most cases is to use code that translates data to and from that format as needed, and to never "hand-edit" the data in a text editor yourself. For this reason, and because of the prevalence of the CSV data format (often used as an export format for SQL databases and spreadsheets, in fact), pretty much every modern programming language offers a fairly standard CSV library implementation and at least a few alternate implementations. Ruby is no exception, and its standard library offers a CSV class:

#!/usr/bin/env ruby

require 'csv'

articles = [

[

'Perrin',

'Chad',

'Simple Data Storage With Ruby',

'Programming'

],

[

'Perrin',

'Chad',

'Review: The Best Linux Book Available',

'Open Source'

],

[

'Perrin',

'Chad',

'NetworkManager, The Fifth Horseman of the Apocalinux',

'Open Source'

]

]

CSV.open('articles.csv', 'w') do |csv|

articles.each {|record| csv << record }

end

This produces a file containing the following data:

Perrin,Chad,Simple Data Storage With Ruby,Programming

Perrin,Chad,Review: The Best Linux Book Available,Open Source

Perrin,Chad,"NetworkManager, The Fifth Horseman of the Apocalinux",Open Source

Reading data like this into a Ruby program is similarly easy:

#!/usr/bin/env ruby

require 'csv'

articles = Array.new

CSV.foreach('articles.csv') do |record|

articles << record

end

Assuming the articles.csv file in this example is the same file written in the previous example, an articles array will be created that contains exactly the same data as the articles array in that previous example.

YAML

Another plain text data format is YAML, pronounced similarly to "camel." Originally, YAML was said to be an acronym for "Yet Another Markup Language," but as YAML development continued, it got more complex, and its uses expanded somewhat from its origins, developers renamed it "YAML Ain't Markup Language". In its simplest form, YAML is a plain text format for representing two types of data: named elements and unnamed list elements.

In the Ruby community, YAML has become something of a de facto standard for data marshalling — a fancy term for translating data in memory into a format suitable for storage and sharing with other programs, the same things we do with CSV format data. YAML is not a perfect format:

  • For the majority of purposes for which it is used, the sophistication of the format is a case of overkill (though less so than an SQL relational database). In fact, cat-v.org considers YAML "harmful" for that very reason.
  • YAML depends on significant whitespace for much of its syntax, and in some circles syntactically significant whitespace is considered harmful.
  • Performance for various YAML implementations varies substantially, but suffers somewhat simply because of the complexity of the language specification. Worse, the Ruby 1.8 standard library's implementation of YAML (using the Syck engine) is probably the slowest major implementation available in Ruby, though a faster implementation replaces it in the Ruby 1.9 standard library (using the Psych engine). The primary reasons for replacing Syckappear to be lack of steady maintainership since its creator vanished and bugginess.

YAML does offer some benefits that help keep it popular:

  • There are quite a few implementations, which — while not exactly an advantage over other popular formats — at least eliminates a lack of choice as a reason to use something else.
  • It is generally easier for humans to quickly read and understand than competing formats such as CSV, JSON, and XML (in increasing order of difficulty reading the format by eye).
  • Many YAML files actually end up looking a lot like the syntax of markup languages such as Markdown, which means that for certain limited purposes the formats can be treated as interchangeable.

Using the YAML library in Ruby is about as easy as using the CSV library:

#!/usr/bin/env ruby

require 'yaml'

articles = [

[

'Perrin',

'Chad',

'Simple Data Storage With Ruby',

'Programming'

],

[

'Perrin',

'Chad',

'Review: The Best Linux Book Available',

'Open Source'

],

[

'Perrin',

'Chad',

'NetworkManager, The Fifth Horseman of the Apocalinux',

'Open Source'

]

]

File.open('articles.yaml', 'w') do |out|

YAML.dump articles, out

end

This produces a file called articles.yaml that contains the following data:

—-

- - Perrin

- Chad

- Simple Data Storage With Ruby

- Programming

- - Perrin

- Chad

- "Review: The Best Linux Book Available"

- Open Source

- - Perrin

- Chad

- NetworkManager, The Fifth Horseman of the Apocalinux

- Open Source

Reading data from a file like articles.yaml into an articles array is even easier:

#!/usr/bin/env ruby

require 'yaml'

articles = YAML.load_file('articles.yaml')


By chaining CSV and YAML operations, it is a trivial exercise to create a YAML file representing the same data as a CSV file, or vice versa. For applications that need persistent data stores that are not too big or complex, CSV and YAML formats offer readable plain text formats, and Ruby's CSV and YAML libraries offer simple programmatic access to those formats.

Notes:

For more about how to use CSV from Ruby's standard library, see Ruby-Doc.org's Class: CSV page.

For more about how to use YAML from Ruby's standard library, see Ruby-Doc.org's Module: YAML page.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

Editor's Picks