By Duane Craig

Fiverr is a
marketplace for online services with millions of users in 200 countries across
the globe. Every day the company collects millions of rows of semi-structured
and unstructured data that reside in several sources such as MySQL relational
database, mongoDB, Redis, and more. Fiverr also uses web-based services like
Google Analytics. All told its big data overhead consists of approximately 80%
semi-structured traffic data, and 20% structured data.

Maximum value

Seeking maximum value from big data, many companies are tempted
to try a solution such as a standard relational
database management system
(RDBMS), which is very convenient as it uses
regular SQL language, making it attractive to analysts. It also integrates
well, offering reporting and supported extract, translate and load tolls. Then
there’s the price. MySQL for example, is free and open source. But, since
Fiverr uses Agile Development,
Slava Borodovsky,
director of business intelligence, explained that it’s very difficult to use a
standard relational database management system solution.

“In agile development there are so many changes with a
given product on a daily basis, that using schema-based tools is not efficient.
Each time there are new parameters in production, you need to change the schema
of your database. That’s a painful procedure, especially with a big amount of
data,” Borodovsky said.

That challenge inspired Fiverr to consider a solution for
its big data that would support an open schema, thereby allowing it to make
changes on the fly. Hadoop
became the focal point but not without some deep assessment of the potential
issues that can arise with it. For example, while Hadoop was deemed optimal for
Fievrr’s type of data environment and scale, it requires special skills and
attention, according to Borodovsky.

“It’s a powerful system, but ‘regular’ business
intelligence folks, such as analysts and even developers, can’t deal with it,”
he said. “It requires special programming knowledge like Java, and a very
technical orientation. It is something very different from the regular SQL
world. In most cases if a company wants to use Hadoop it needs to hire
employees with special knowledge and skills, who are also very costly. In
addition to headcount, they’ll need to create a distributed environment, which
also has extra costs.”

Fiverr initially sought to address the challenges of Hadoop
implementation by using a columnar database to store traffic data. However,
there were problems with this, and the company needed a better solution. What
if all the benefits of Hadoop were in the cloud, leaving most of the challenges
behind? Enter Xplenty,
or Hadoop as a service.

More info: Hadoop
success requires avoidance of past data mistakes

Xplenty

Xplenty’s GUI enables the user to create complex data flows
in just minutes

After signing up, Fiverr implemented Xplenty’s solution in a
few days and the company started to get positive results very quickly. Xplenty’s
cloud architecture made it very simple to implement Hadoop for BI needs.

“The biggest surprise was the speed of implementation,”
said Borodovsky. “It was up and running after a few days. The cloud
infrastructure of Xplenty makes the implementation process very easy and only
required minimal IT efforts. The biggest challenge had to do with the format
that we store our data in, since at that time Xplenty wasn’t supporting the
JSON format. However, we solved that problem in a day or two. We also made
small changes in our data files structure by splitting them into smaller files
to increase performance. The implementation process was very transparent and
easy.”

The company now stores all its traffic data as text files in
JSON format and processes it with Xplenty. Now, Fiverr analysts can create
Hadoop clusters and run complex analytical tasks within a few clicks. There is
no need for a technical person to take care of Hadoop maintenance and
optimization. This solution keeps Fiverr up to date with new changes on the
site while keeping it very responsive to new metrics.

Fiverr was intent on mining its traffic data to do funnel,
conversion, and trend analysis. Those complex analytical tasks, along with
click-thru analysis, are typically the kind which are big and semi-structured,
as when stored in JSON format. The duration of those BI processes from business
request to analytical insight shrank dramatically.

Less schema manipulation

Using Hadoop, Fiverr doesn’t need to change schema of its
database/data warehouse. This is typically very time-consuming and involves IT
resources, which can all too often create additional bottlenecks in BI process
flow. The company can now start using new parameters that were added in
production right after they go live. As an example, Borodovsky cites the
process that measures the performance of a new feature that was added to
production.

“In the typical database world we would need to change
the structure of our data warehouse and add additional columns to tables to
store the new parameters,” he explained. “Then we’d need to change
the ETL processes that will parse the new parameters and insert them to a
table. Next, we’d have to write queries, create reports and analyze the
feature.”

“This process usually ranges from a day in small companies
and start-ups, to a number of days, and even weeks in big companies. With Xplenty’s
Hadoop solution we can skip the first two steps. We can create a new process in
Xplenty with a number of clicks and get the insights very quickly. The average
duration of a BI process has changed and is at least two times faster than
before, in terms of processes that are related to traffic analysis.”

“With Xplenty, we are saving time dealing with data, as
it’s not necessary to change the schema constantly. We are also independent in
terms of IT, where we’ve saved on headcount resources and can put more
attention on analytics and business insights than on technical maintenance of
Hadoop. As with many things in IT, finding the right solution sometimes takes a
while. We met Xplenty at just the right time.”

Instantly provision more cluster nodes to scale up and
provide more compute power