Mastodon C is a big data company in London, England. They
provide data science and big data technology services to organizations with too
much data and not enough resources to analyse it. Mastodon C has been around
for about 18 months and runs big data systems for clients including media
firms, EST (Energy Savings Trust), and the Technology
Strategy Boards Future Cities program.

Francine Bennett, CEO of Mastodon C, has a background at
Google as a data analyst. Bennett has spoken about data science at London’s Women in Data group, major conferences and
the European Commission.

The founders of the company – Bennett and Bruce Durling,
Mastodon C’s CTO –  built a data
processing platform using cloud infrastructure, open source tools and

And they work in a sustainable way. Not by assuming cloud
infrastructure is better than on-premise infrastructure, and not by joining a
carbon offsetting scheme – Mastodon C performed a data analysis of
the IaaS industry.

Mastodon C’s
technical stack

Mastodon C’s data processing platform is placed on cloud
infrastructure – usually AWS EC2 machines running
the Ubuntu OS. The platform’s principal components
are Hadoop and Cassandra, wrapped in Clojure
code. It’s a combination that runs on commodity boxes, so different cloud
providers can be used for the infrastructure layer.

Bennett said they tailor this platform to meet customer
requirements. “The way that we work with clients is we’ve got this common core
of Hadoop and Cassandra technology,
with bits we have added around that to make it easier to deal with. It’s
loosely a platform that we customize to each customer”.

Clojure is used to customize the platform.
Clojure functional language that is heavy on mathematical functions and
expressions. Bennett explained why Mastodon C’s programmers are all Clojure
programmers. “That’s good for putting together data pipes. It’s Lisp and good
for expressing data flows”.

The C in Mastodon C

Mastodon C makes their workloads greener – not by
offsetting, but by moving computation to greener data centers.  It’s something Mastodon C have always done –
that C in the name stands for ‘Carbon Cloud Compute’. Bennett analyzed data center locations, their PUE (Power Usage
Effectiveness) and the power grids that feed them.

A company like AWS manages a global infrastructure. How
do you figure out if Oregon is greener than west Virginia? Bennett’s analyses
worked out “where to place processing jobs and analysis jobs. Because there
really isn’t public information out there we used public information on power
grids and some other analyses and pasted it all together, to figure out where
was best”. And best is not all about PUE. “It turns out the power grid you’re
feeding off is really important – more important than the efficiency itself”.

“PUE is important but if you think about it, say you have a
really good PUE of one and a half. A bad PUE might be three. That’s a two times
difference. If you actually look at the power grids of different locations
–look at west Virginia, which is where the biggest AWS site is – that region is
mainly powered by coal”.

“That’s tens of times worse than Switzerland, which is
mostly nuclear. The multiples are much bigger in terms of national grids than
they are in terms of PUE”.

How do customers react when Mastodon C suggest running
workloads in a less polluting IaaS data center? 
“Most people say ‘Oh, that’s nice, yes, you do that, that’s fine’.  If there is cost of effort involved then it
doesn’t tend to happen – I think just because 
IT and CSR [Corporate Social Responsibility]
do not tend to meet that much in the middle. They are slightly different worlds
in big companies. It’s difficult for an IT person to make decisions and justify
them to their boss when it comes to green considerations”.

It’s an exciting time
for data scientists

Bennett is positive about the future for Mastodon C. “The
potential is just so big and so interesting that is quite an exciting time to
be doing this stuff”.

Cloud IaaS and big data software can be applied by analysts
to help customers, in a way that was difficult in the past. “The infrastructure
is very solid now and commoditized. Much of the software is commoditized,
increasingly so. And there’s loads of data around – lots more than there was a
few years ago. It’s automatically produced by systems and by sensors and by all
sorts of other things. Suddenly the pieces are in place to do useful things
with data, which makes it very exciting”.