Mastodon C is a big data company in London, England. They provide data science and big data technology services to organizations with too much data and not enough resources to analyse it. Mastodon C has been around for about 18 months and runs big data systems for clients including media firms, EST (Energy Savings Trust), and the Technology Strategy Boards Future Cities program.Francine Bennett, CEO of Mastodon C, has a background at Google as a data analyst. Bennett has spoken about data science at London's Women in Data group, major conferences and the European Commission.
The founders of the company - Bennett and Bruce Durling, Mastodon C’s CTO - built a data processing platform using cloud infrastructure, open source tools and customization.
And they work in a sustainable way. Not by assuming cloud infrastructure is better than on-premise infrastructure, and not by joining a carbon offsetting scheme – Mastodon C performed a data analysis of the IaaS industry.
Mastodon C’s technical stack
Mastodon C’s data processing platform is placed on cloud infrastructure – usually AWS EC2 machines running the Ubuntu OS. The platform’s principal components are Hadoop and Cassandra, wrapped in Clojure code. It’s a combination that runs on commodity boxes, so different cloud providers can be used for the infrastructure layer.
Bennett said they tailor this platform to meet customer requirements. “The way that we work with clients is we’ve got this common core of Hadoop and Cassandra technology, with bits we have added around that to make it easier to deal with. It’s loosely a platform that we customize to each customer”.
Clojure is used to customize the platform. Clojure functional language that is heavy on mathematical functions and expressions. Bennett explained why Mastodon C’s programmers are all Clojure programmers. “That’s good for putting together data pipes. It’s Lisp and good for expressing data flows”.
The C in Mastodon C
Mastodon C makes their workloads greener – not by offsetting, but by moving computation to greener data centers. It’s something Mastodon C have always done - that C in the name stands for ‘Carbon Cloud Compute’. Bennett analyzed data center locations, their PUE (Power Usage Effectiveness) and the power grids that feed them.
A company like AWS manages a global infrastructure. How do you figure out if Oregon is greener than west Virginia? Bennett’s analyses worked out “where to place processing jobs and analysis jobs. Because there really isn’t public information out there we used public information on power grids and some other analyses and pasted it all together, to figure out where was best”. And best is not all about PUE. “It turns out the power grid you’re feeding off is really important – more important than the efficiency itself”.
“PUE is important but if you think about it, say you have a really good PUE of one and a half. A bad PUE might be three. That’s a two times difference. If you actually look at the power grids of different locations –look at west Virginia, which is where the biggest AWS site is – that region is mainly powered by coal”.
“That’s tens of times worse than Switzerland, which is mostly nuclear. The multiples are much bigger in terms of national grids than they are in terms of PUE”.How do customers react when Mastodon C suggest running workloads in a less polluting IaaS data center? “Most people say ‘Oh, that’s nice, yes, you do that, that’s fine’. If there is cost of effort involved then it doesn’t tend to happen - I think just because IT and CSR [Corporate Social Responsibility] do not tend to meet that much in the middle. They are slightly different worlds in big companies. It’s difficult for an IT person to make decisions and justify them to their boss when it comes to green considerations”.
It’s an exciting time for data scientistsBennett is positive about the future for Mastodon C. “The potential is just so big and so interesting that is quite an exciting time to be doing this stuff”.
Cloud IaaS and big data software can be applied by analysts to help customers, in a way that was difficult in the past. “The infrastructure is very solid now and commoditized. Much of the software is commoditized, increasingly so. And there’s loads of data around – lots more than there was a few years ago. It’s automatically produced by systems and by sensors and by all sorts of other things. Suddenly the pieces are in place to do useful things with data, which makes it very exciting”.