Global auction marketplace eBay has revealed the principles it developed for its own internal cloud, underlining how it’s often non-tech issues that hamper the transition.
The auction site wanted a cloud to give more flexibility to developers and to a business that has to cope with two billion page views per day, nine petabytes of data storage, 6,000 application servers and 23 million lines of code, according to eBay cloud architect Jean-Christophe Martin.
“If you have a complex organisation, you’re going to have a complex system,” Martin told a London audience at the recent Cloud Expo.
“The first fallacy of the cloud is that people think moving to the cloud will solve all their processing problems, but that’s not true. If you want to build your internal cloud or if you want to move your organisation to the cloud, you have to look at all the aspects of the processes that will prevent, for example, self-service,” he said.
“Everything where a user is waiting and you have a complex process, apart from approvals or assignments, will prevent you from providing self-service in some way.”
Martin said eBay had arrived at five principles when developing its own internal cloud.
eBay principle 1: Simplify
Simplification is the most important issue, according to Martin. “When you look at most internal environments, you find there is a lot of complexity that will prevent you moving to the cloud or will make your life much more difficult,” he said.
He suggested organisations split the simplification issue into three areas: process, technological, and organisational simplification.
“The first thing is look at all your steps and try to remove some. You can eliminate them because you can automate or because you realise they are not useful in this new environment,” he said.
But if you have a necessary process that is too time-consuming and you cannot eliminate it, you can move it out of your path.
“So, for example, at eBay we moved fulfilment, which is the acquisition of hardware, out of the request for resource. So when a developer wants a VM, we are not acquiring the hardware at the time the developer wants it. We have acquired the hardware before and so we’ve moved the process of hardware fulfilment into the background,” Martin said.
He said when eBay starting building its cloud, it had about 250 different hardware models and versions. “You can imagine your capacity for testing and fulfilment if you have 250 items in the cloud to try,” he said.
“So now we have five logical models that we can source from different vendors. You can forecast demand for those models much more easily because we only have five to track.” eBay also moved to a commodity hardware model.
But consolidation should also happen in software. “At eBay we must have had one version of each operating system in existence, from Solaris, OpenSolaris, Ubuntu, CentOS to all the versions of Windows. So we tried to reduce the variations because you can’t really have a menu that’s too big and manage those six, 10, 50 operating system versions, that you have to offer in your cloud,” Martin said.
Consolidation must also occur in the management layer. “You might have different monitoring tools, one for the network, one for the servers, one for the database and you have to consolidate,” he said.
The ticketing system at eBay was also too complex. “Central deployment of a new application to 100 servers might have required before something like 30 or 40 tickets,” Martin said.
“Really, what you should look at is eliminating approvals and assignments so that you can simplify the user experience.”
eBay principle 2: Automate everything
Martin believes it’s important to look at the end-to-end lifecycle of servers and appliances to see what you can automate and what are the impediments to automation.
“Sometimes you can’t automate something - such as racking and wiring. There’re no robots yet that can do that for you,” he said.
“So now we acquire racks of servers that are pre-cabled and pre-wired so that we can roll the rack or container of racks into our datacentres and we just have to wire back the top of the rack switches or the container top of the racks to our infrastructure. It’s not really eliminating it - it’s really just moving it out of the way.”
You also have to spot where there’s no support for automation. “We had, for example, some firewalls that we could not automate because the vendor did not provide any way to amend the configuration of the firewall so what we did was change firewall vendor,” said Martin.
“In this case there was an API, and even though you could configure the firewall with the API you still had to go to the UI to apply it, which was completely insane.”
The last point is a lack of control of the device or the software. “For example, if you’re in colocation and you’re using a router or a switch from the provider. They’re not going to let you log on the switch or log on the firewall to change the configuration,” he said.
“So the way we solved the problem was we virtualised the network. You have to find solutions so that you’re not blocked by those issues.”
Martin said if you go to the big four system-management vendors, they will sell you service catalogues, run-book automation, monitoring, configuration management databases and financial management. “These don’t work when you have a cloud environment,” he said.
In their place you want RESTful APIs, state-driven close-loop automation, big data and machine learning, distributed state management and pay-as-you-go.
eBay uses a system based on Hadoop, which contains all the site metrics. “We can do alerts and post-processing, dynamic threshold detection instead of having users entering thresholds. You need to get the metrics out of those systems and in a place where you can actually process them,” Martin said.
The auction site picked OpenStack for its cloud platform. “We now have around 80 percent of our site running on that first version of our cloud, which we developed ourselves, and we are migrating to an open-source cloud,” he said.
According to Martin, the main reason why eBay is migrating to an open-source cloud is the community. “OpenStack is a great community and ecosystem. There are a lot of providers that support OpenStack,” he said.
“It’s also very close to what system administrators understand: it’s in Python. It’s easier to train system admins in Python than in Java. A lot of big vendors are putting a lot of money in OpenStack and supporting the OpenStack APIs.”
eBay principle 3: Applications anywhere
Martin believes it’s important to be able to deploy any application anywhere in the datacentre.
“Sometimes you build one cloud for developers and one for production and what you do is create silos. You have this infrastructure for each silo and that wastes a lot of resources because you have spare capacity in every one of those silos and you can’t share it,” he said.
So eBay created a single infrastructure that is shared across all the use cases. On top of that sit virtual environments, which replicate what people had before. “But we tried to make it so that it’s only by configuration. It’s not like a physical isolation in your datacentre,” Martin said.
The advantage of virtualisation and shared infrastructure is that it is easier to automate and there are fewer skillsets,
eBay principle 4: Plan for failure
Applications have the potential for failure and you must plan accordingly, Martin said. At eBay, applications such as My eBay are partitioned by function.
“Then each one of those functions is split horizontally. So the My eBay application is deployed on, say, 500 servers. So now we have multiple instances serving the same function so it’s much easier to recover from failure,” he said.
On top of that, eBay has created availability zones across its datacentres to add extra resiliency.
eBay principle 5: Proportional security
eBay used to have a physical environment that was partially designed to meet security requirements, with one environment isolated from another.
“What we did was look at all those requirements of each physical environment and defined them logically,” Martin said.
“So if the secure environment means that one zone should be isolated and users should not have access to those machines, we can translate that into policies and infrastructure configurations,” he said.
“As a result we can build classes of service, like airline offer, with a very well defined view of what you get and what you don’t get with each one.”
According to Martin, the result is much finer control of security and a reduction in the size of environments.