Workload factoring

15Dec09

One of the things that IGT2009 got me thinking about again was workload. It’s clear when listening to people’s stories of what they’ve put ‘on the cloud’ (both public and private) that certain workloads fit more easily than others.

The easiest pickings seem to be embarrassingly parallel high performance compute (HPC) tasks that have high CPU demands and low data needs, such as Monte Carlo simulation. These are the same things that we were running on our ‘grids’ 4-5 years ago, and the same things that were running on large ‘farms’ of machines for plenty of years before the label ‘grid’ came along. I feel pretty safe in making a prediction here – when any new computing fad comes along (such as whatever follows ‘cloud’) then HPC workloads will be the first thing that people do seriously and declare as a success.

The problem that beset grid computing was that it gained little awareness in the minds of developers beyond the HPC clique. There were many things that made grids attractive in the same way as today’s various ‘as a service’ offerings such as quick provisioning, scale, and management, but mostly people stuck with what they knew (or what they thought would look good on their CV at the end of the project).

This time around things are different. After HPC the second most easy type of workload to deploy to a cloud is one that’s been developed on a cloud; and all over the industry developers are now being eased into positions where this is the path of least resistance (you can have a self service virtual server today and get on with things, or you can wait 6 months for your own box). It seems that many of the resource sharing issues that acted as a barrier to grid adoption evaporate when resources have been shared with others since day 1 of development. Prediction 2 – ‘cloud’ adoption will continue to grow, as pretty much all new development will be done in some form of cloudy environment.

So… with trivial stuff and new stuff out of the way what does that leave? It mostly leaves a smelly pile of legacy apps. Some of them will run nice and self contained on a single box that’s an easy target for P2V migration, but others will have a spiders web of nasty dependencies – things that might just break if something gets changed. This is the enterprise IT equivalent of toxic waste, and toxic waste costs big $ to deal with.

I suspect that a relatively small quantity of VERY toxic IT will be something that continues to skew IT costs over coming years (and make enterprise IT people look bad in comparison to various cloudy/SaaS alternatives). The problem is the server as the denominator. Most IT shops have a bunch of fixed costs, and need some mechanism to carve them up between end users. Through various ‘cost transparency’ initiatives what normally happens is that a bunch of costs get lumped together, and divided through by the total number of servers (or whatever, though it is usually servers), and then end user departments are billed pro rata the number of servers that they use. This is a really bad deal for those that aren’t the toxic waste users.

I was once involved in a cost comparison exercise for a bunch of HPC servers (where the business need was growing rapidly ahead of Moore’s law, meaning that we kept on needing to buy more kit). At one stage it looked like we needed to outsource the whole hardware/data centre piece, as the internal charge back cost way more than a service provider. But that wasn’t because we were rubbish at running a data centre and the servers within it, it was because the charge back for the grid users was full of toxic waste from the non grid users. The grid machines were the cheapest in the whole organisation to run – rack em and stack em, throw on a standard build, done – no care and feeding, no project managers, no cruft. Once we figured out what the actual costs were for the grid machines (and adjusted the charge back accordingly) the service providers couldn’t come close (and still make any decent profit). Of course the other users weren’t too happy when their bill was adjusted to reflect their real costs, but surely that’s the whole point of cost transparency.

This brings me to prediction 3 – users of IT will increasingly be faced with bills for the toxic waste they’ve accumulated. This will give them a stark choice of paying up and shutting up, or moving to something new and cheaper (and likely cloudy in some way). The process will be determined entirely by economics (rather than technical considerations). IT organisations that do a poor job of presenting clear economics to their users will be unpleasant places to work.



No Responses Yet to “Workload factoring”

  1. Leave a Comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.