Data on premises, control in the cloud – will that work?

02May19

All three of the major cloud service providers have (or have announced) ‘have your cake and eat it’ versions of their services where data resides on premises whilst stuff is managed from a control plane in the cloud.

All of these services are predicated on a notion that data needs to reside on premises, whilst at the same time providing a subset of the services available in the public cloud, using the same management interface and underlying APIs.

Servers huggers gonna hug

We have to ask why organisations (or at least the people working in them) might think that they need to keep their data on premises, and there are essentially two reasons that come up time after time:

  1. Sensitivity – this label covers the plethora of security, privacy and regulatory related things that ostensibly get in the way of data being put into the ‘public’ cloud.
  2. Latency is for when the round trip from the customer’s on premises location to the cloud and back introduces unacceptable latency.

The latency argument is pretty clear cut

If the ms it takes to get data from your factory sensor to the cloud and back to the robot is too much then cutting that out by having the kit close by is clearly going to work. This is a good reason for adopting this type of hybrid model. Of course other hybrid models that deal with ‘edge’ compute are also available, so there are choices to filter through.

The sensitivity argument is much more murky

In principle there’s a clear separation of concerns between the data, which is sensitive, and that stays on premises; and the control plane metadata, which isn’t sensitive, and can happily go back and forth to that public cloud that we were unwilling to trust with our sensitive data.

In practice there’s an administrative level back door wired up from the kit hosting my sensitive data going right into that public cloud that we were unwilling to trust with our sensitive data. Awkward. Of course we can spend some due diligence time picking over controls and monitoring; and some lawyer time picking over contracts over who gets blamed for what.

Things get much murkier if you ship logs

If the control plane is just about turning stuff on and off then we can claim a separation between control metadata (not sensitive) and app data (sensitive), and the lines around that claim stay pretty sharp and clean. But once we start throwing logging across that line it’s no longer sharp and clean, especially when we get to exception handling.

Exceptions contain things like stack traces, and stack traces have a nasty habit of carrying with them the in memory plain text of all that sensitive stuff you’ve so carefully encrypted at rest and in motion.

For sure developers can be asked to write code that doesn’t leak sensitive data to logs, and that’s just as easy to police as every other aspect of code security.

This can also become the province of ‘data loss prevention’ (DLP) technologies, though they’ve tended to focus on human driven channels like email and file sharing rather than system stuff like logs.

An approximation that emerges here is that if the data is so sensitive that it needs to be kept on premises then it’s likely also the case that the logs and any associated log management need to stay on premises too. Log shipping to take advantage of cloud based log management tools seems to puncture any clean line between sensitive app data that must be kept on premises and control metadata that can be allowed into the public cloud.

Conclusion

The latency argument for these data on premises, management in the cloud models stands up well to scrutiny; the sensitivity argument (which seems far more prevalent) isn’t quite so robust. It’s clear that the cloud service providers want to lure the server huggers in with a ‘have your cake and eat it’ model, but it’s less clear that the model is robust in the face of security, privacy and regulatory demands that customers insist can only be dealt with using on premises infrastructure. Of course the cloud service providers know this, and have chosen to launch these services anyway, so they must see some profitable middle ground.

Fundamentally the issue here is all about control. Do the server huggers just want control of their data, in which case these approaches might appease; or are they trying to hold onto control of the whole infrastructure?



No Responses Yet to “Data on premises, control in the cloud – will that work?”

  1. Leave a Comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.