One of the first casualties in IT during a down economy is technical documentation. However, embarking on Big Data projects means that you may have to reconsider how your organization approaches documentation during product development.

Developing documentation as part of a Big Data project is about finding balance in your team resources. Putting documentation onto the Data Scientist’s responsibilities isn’t completely appropriate. Likewise, the same is true with the data analysts and system and administrators. Documentation efforts as part of a Big Data project also face the challenge of tackling large unstructured data, which means an experienced technical writer could be the next key member of your Big Data team.

The technical writer and Big Data projects

My non-scientific survey of technical writer job descriptions in the Big Data sector was very telling about the role.  I found some common patterns in the job description such as the following:

  • API documentation experience
  • Working knowledge of open source tools such as JSON, Python, and REST
  • Enterprise RDBMS experience with Oracle, PostgreSQL, SQL Server, or other similar databases
  • Ability to read C, C++, and Java is starting to show up on job postings
  • Working knowledge of Big Data foundational technologies such as Cassandra, Hadoop, and MapReduce.
  • Working knowledge of Linux/Unix

The reoccurring job requirements I found during the survey point to a more experienced and self-sufficient technical writer who leans more on the technical side of technical writer as the choice for a Big Data team. While the usual technical writer experience such as online help, user interface, and editing still remain, the other job requirements point to companies seeking a technical writer who can be a full contributor on a Big Data team versus a passive asset brought in at the end of a development project.

These Big Data projects are requiring documentation of their core technologies and infrastructure as well based on the job descriptions I came across. Big Data is very ripe with documentation requirements for both internal and external audiences especially for organizations seeking to build the Big Data expertise in house first before outsourcing any components of their Big Data development or operations.

Automated documentation tools and Big Data projects

The right technical writer for the Big Data project isn’t always the whole solution. However, the unstructured nature of Big Data may indeed bump right into the structured nature of automated documentation tools.   A guest post by Peter Gruenbaum, founder of SDK Bridge, entitled Automated Documentation for REST APIs published on the Programmable Web points to the three options available for automating REST API documentation including:

  • Automated documentation from code
  • Automated documentation from structured data
  • Create your own solution

While GruenBaum does an excellent job of transferring API documentation best practices to a Big Data project some challenges remain. Though I do expect more automated documentation tools to launch to meet the demands of the nascent Big Data industry.

Unfortunately, API documentation in my experience usually one of the first casualties to overscheduling, shifting priorities, and good intentions on software development projects but becomes a necessity once you move into Big Data. Since Big Data means that multiple apps that may access the data plus new intellectual property, and internal knowledge building considerations and API documentation is integral to the success of each of those project elements.

Planning for automating Big Data documentation needs to start in the initial stages of the project (I’m thinking the requirements phase let alone at the inception of a Big Data team) because at time of writing your Big Data team may very well have to roll their own automated documentation solution and that requires adequate preparation time.

The Big Data documentation set

While the documentation set, you produce for a Big Data project doesn’t diverge much from what a development team might produce for any other large-scale enterprise project it is certainly pointing at a return of larger scale documentation efforts since Big Data and that requires more planning and resources to deliver.

For example, the Analytics Plan, while certainly the responsibility of the Data Scientist could also potentially benefit from a professional technical writer especially if the plan is a deliverable to external clients.

Another thing that I found during my survey of Big Data technical writer job openings is that the definition of Big Data documentation is broadening from the standard old API, developer, user, and online help documentation. A technical writer working on a Big Data project is increasingly being asked to develop the following:

  • Blog posts
  • White papers
  • Technical articles
  • Marketing collateral

Such writing requirements for a technical writer point to a role that must cooperate with Big Data Scientist, project managers, marketing, and sales in order to spread the good word about the Big Data project.

Technical documentation and Big Data project

Documenting the Big Data project begins with the core experience and skills that technical writers use on other IT and software development life cycle projects but like many other jobs may gain new dimensions to the role once an organization establishes its own Big Data team with project documentation and communications requirements abounding for sometime to come.