Scott Ellis, product manager of Google Cloud Platform, shared proactive steps users can take to prevent data stored on Google's infrastructure from falling into the wrong hands.
TechRepublic's Dan Patterson spoke to Scott Ellis, product manager on the Google Cloud platform, about data protection:
Patterson: The cloud, undoubtedly, is powering digital transformation for companies of all sizes, whether they're SMB start ups or enterprise companies. And with the GDPR, every company is thinking about exposure or accidental exposure of cloud data to the public or to other business entities. Scott, thanks a lot for your time today. I wonder if we could start with, I know it may sound obvious, but what is accidental exposure of PII (personally identifiable information) or data, versus intentional exposure or any other type of exposure?"
Ellis: Sure. Accidental exposure is when you have data that, let's say, has a certain level of sensitivity that is overexposed, or exposed to somebody who doesn't need to see it, doesn't have a business purpose to see it, or it wasn't intended for them to see it. So, in the case where they may have inadvertently gotten access to it, or gotten access to a share, or something where the settings were exposing it to people who didn't need to see it. That's where we look at it as a misconfiguration or an accidental exposure, as opposed to an intentional exposure, where somebody maliciously or with intent tried to take data or expose it. This is more just sort of an accidental or incidental.
Patterson: Again, this may sound 101 to you, and to many of our listeners, but just to set a baseline, when we say data, how do we define data? It's a little bit like defining what "is" is, but in this case, there's all kinds of information that could potentially be exposed. PII, other types of information. So, how do we define data in this context?
Ellis: That's a great question. Data could be anything about your business, or your customers, often called users or user data, which you mentioned one of them, PII, or personally identifiable information. Some of that is just sensitive by default, maybe someone's credit card number, social security number, or other types of identifiers that are sensitive. And in other cases, it may be something that is known, like an email address or phone number, but maybe tied to other data that is sensitive.
SEE: In Google we trust: But why? (ZDNet)
So, you can imagine, if there is information about a user, their behavior, how they're interacting with someone's business or an application, and that's tied to their identity, that exposes more information about that user, and so generally that's something sensitive. And so, we talk about sensitive data, that's a lot of what we're talking about, is identifiable information about an individual, or something sensitive tied to their identity.
Patterson: When we think about cloud platforms like the Google Cloud platform, often we think about developers creating applications, but there are a number of uses for the cloud platform. SMBs could use it. Carol's Carpet Cleaning could use the cloud platform. So, when data exposure occurs, I know you're not an attorney, and I'm not asking for a legalistic answer, but where is the liability and responsibility? Is Carol's Carpet Cleaners liable? Is Google liable? And who is responsible for making sure that the information stays safe, or if accidental exposure occurs, who is responsible for making sure that data is locked up after the fact?
Ellis: It's a shared-responsibility model, and what that means, is that the customer takes on responsibility to make sure the data is properly managed, that it is being used in proper business practices, and exposed only when there is a valid reason to do so. And it's really -- on the cloud platform -- it's giving users, customers, the tools so they can properly manage their data to meet those requirements that they have, or their best practices that they have, to the expectation of their users, or any other sort of regulation that they have. And so, it's really a shared responsibility model, where ultimately, the customer makes the decision of how that data can be used, shared, exposed, processed, and so forth.
Patterson: So, again, with digital transformation, more and more companies are using the cloud, and for all intents and purposes, becoming technology firms, no matter what their core competency is. Is there a set of not just best practices, but policies that you advise companies follow to make sure that accidental exposure doesn't occur?
Ellis: We can. We've looked at some best practices, and policies can change depending on the customer, but there are some things out there, like leased privilege, or need to know. If you look at best practices in general, it's about knowing, really understanding where your data is, understanding what is sensitive, what isn't sensitive, and really setting the best access control policies, the best sharing policies, making sure both internally and externally, with maybe partners, or out on public sharing of data, that you're only sharing the data that is appropriate to share, and with the people who have the right control, so that it doesn't get overexposed.
So, generally, when you're looking at something like need-to-know or leased privilege, it primarily looks at only giving the data they need to do their job, or just to perform that business function; and, so to do that, you really want to understand your data. We provide a lot of tools to help customers get better insights into their data, [and] understand what's sensitive. Tools like our Data Loss Prevention API and platform allows them to discover and classify their data, redact information if that's appropriate, as well as a whole suite of access control and management and monitoring tools, to make sure that data is only going where they intend it to go.
Patterson: And what about notification of users, I like to call them people. But if your data is leaked, you have a right, or it seems like you should have a right to know. How do companies, or what are the best practices for companies notifying users, and what happens if they don't?
Ellis: I'm not sure on all the details of what happens if they don't. But I think, generally, you want to have very robust monitoring in place, looking for ... Just to give an example, looking at the various layers of defense. You might have things set up properly, but you also want to monitor that somebody doesn't accidentally, let's say, turn something off. Maybe you have a policy in place, and you have some enforcement. You also want to monitor that that's actually happening. You want to monitor that the enforcement wasn't turned off, or that there wasn't some other behavior that looked out of the normal.
And so, really having layers of defense and monitoring for all of the automated or defense mechanisms that you have in place helps customers create this kind of comprehensive view, and that's going to allow them to not only have the policy in place on what you should do, but also have some checks and balances in place to make sure that these things are happening, or catch things when they're not happening, and then take the appropriate action.
Patterson: Let's talk about some of those tools that allow you to have insights into the data. You mentioned a moment ago, the APIs. What are some of ... I know there's a ton of APIs, but what are some of the most useful APIs, for especially SMBs and startups, companies that may not have access to the resources of an enterprise organization?
Ellis: Sure. One of those tools that we have is the Data Loss Prevention API. It's an extension of our data loss prevention infrastructure that we also have in Gmail and Drive, and some of, like our G Suite applications. In a more traditional sense, data-loss prevention helps customers identify, classify, and sort of manage or govern their data, so they can do things like, I see data here that's sensitive. Somebody's trying to perform an action who might share it, let's say, email it out to somebody. We can take some action, maybe quarantine that data.
We've extended that to our cloud platforms in the form of an API, so that customers can build this into their development workflows, their production workflows, and really look at the entire data lifecycle, of when they're collecting data, all the way to when they're storing, sharing, and using that data. They can use our API to scan their data, understand what's in it, understand things like personal identifiable information, so they really have that intelligence and knowledge of where their data is, and what level of sensitivity it has. We've also added some additional features to do things like redaction and masking, that may be appropriate for a customer who needs to sort of reduce some of these sensitive elements or identifying elements, to help them still do their business, but also kind of do it at a reduced risk.
Patterson: Scott, I always try to stay nonpartisan, and never take sides, but right now, the world is thinking about Facebook and data exposure. I know they're a competitor of yours, but also, the GDPR, which is going to change the way data is stored and managed in regions all over the world. So, when you talk to your customers, and you get feedback from them, what do they tell you their concerns are? What are cloud customers worried about? What information do they not have, and they would like to have? And does that help you point the ship in a different direction, or do you change policies based on what you learn from your customers?
Ellis: Yeah, we definitely engage with customers and hear their feedback, and one of the main things that we hear is, to properly manage their data, the first thing that they need to start with is really understanding it, to know where it is. And it some cases, companies have programs in place that govern their data, but there are cases where they need additional insights, additional intelligence about that data. Maybe they have cases where they might inadvertently collect that data, and they want to protect against that, and other cases where they just want to have an extra check.
So, that's why we've really focused on giving them tools to really understand their data, so they can properly manage it, as well as the tools to actually go in and restrict access, lock down an access, and properly label, annotate, manage data using things like our Identity and Access Management platform, or our Resource Management platform, as well as tools that allow you to also monitor and look for changes. An open source tool we support called Forseti, as well as some tools like our recently announced File Security Command Center, allow you to really go in and see a lot of these signals, look at changes in configuration that might be detrimental to your security infrastructure, and monitor for these things so that you can protect against inadvertent or possibly malicious change, or if somebody tries to turn off the security control.
Patterson: That's fascinating. Scott, thanks a lot for your time, today. Last question for you. We are already living in a post-GDPR world. How do you anticipate the way data is stored and managed will change over the next, say, 18- to 36-months?
Ellis: Following some of our beliefs around really understanding your data, really having intelligence about your data, customers will have more governance, more understanding of their data. And this will both help them manage it better from a security privacy perspective, as well as really understand what they're collecting, where it's going, and provide better valuable services. Data intelligence across the board is going to help make more valuable products, but specifically here, it's going to help them understand what their data is, where it's going, and make sure that they have the proper configuration, the proper sharing and exposure controls. That's going to continue to increase. We're going to see more and more control of data, really showing automation and more enforcement of these monitoring policies. It will just get more and more automated, and more controlled.