T-Shirt Sizes and the Copycloud

19Mar18

TL;DR

T-Shirt sizes are frequently used to create the VM types and cost structure for private clouds, but if the sizing isn’t informed by data this can lead to stranded resources and inefficient capacity management. It’s the antitheses of dynamic capacity management where every VM is sized according to the resources it actually consumes, ensuring that as much workload as possible fits onto the minimal physical footprint.

Background

A little while ago I wrote about Virtual Machine Capacity Management, and I won’t repeat the stuff about T-shirt sizing and fits. That post was aimed at the issues with public clouds. This post is about what happens if the idea of T-Shirt sizing is applied to private environments.

Copying T-shirt sizes is one of the ways that private clouds pretend that they’re like public clouds, but it’s a move that throws away the inherent flexibility that’s available to fit allocations to employment and make best use of the physical capacity. It’s worth noting that the problem gets exacerbated by the fact that those private clouds generally lack T-shirt ‘fits’ found in the different instance type families of public clouds. A further complicating factor is where T-shirt sizes are arbitrary and often aligned to simple units rather than being based on practical sizing data.

A look at the potential issue

If we size on RAM and say that Medium is 1GB and Large is 2GB then any app running (say) a 1GB Java VM is going to need an Large even though in practice it will be using something like 1.25GB RAM (1GB for the Java VM + a little overhead for the OS and embedded tools). In which case every VM is swallowing .75GB more than it needs – effectively we could do buy 5 get 3 free.

Could a rogue app eat all my resources?

The ‘rogue app’ thing is nothing but fear, uncertainty and doubt (FUD) to justify not using dynamic capacity management. Such apps would be showing up now as badly performing (due to exhausting the CPU or RAM allocations that constrain them), so if there were potential rogues in a given estate they’d be obvious already. Even if we do accept that there might be a population of rogues, if we take the example above we’d have to have more than 3/8 of the VMs being rogue to ruin the overall outcome, and that’s a frankly ridiculous proposition.

Showing a better way

The way ahead here is to show the savings, and this can be done using the ‘watch and see’ mode  present in most dynamic capacity management platforms. This allows for the capture of data to model the optimum allocation and associated savings – so that’s the way to put a $ figure onto how arbitrary T-shirt sizing steals from the available capacity pool.

T-Shirts can be made to fit better

If T-Shirts still look desirable to simplify a billing/cost structure then the dynamic capacity management data can be used to determine a set of best (or at least better) fit sizes versus unit based. So returning to the example above Medium might be 1.25GB rather than 1GB, and then every VM running a 1GB Java heap can fit in a medium.

So why do public clouds stick to standard sizes?

Not all public clouds force arbitrary sizing – Google Compute Engine has custom machine types. It’s also likely that the sizes and fits elsewhere are based on extensive usage data to provide VMs that mostly fit most of the time.

That said, one of the early issues with public cloud was ‘noisy neighbour’ and so as IaaS become more sophisticated the instance types became about carving up a chunk of physical memory so that it was evenly spread across the available CPU cores (or at the finest grain hyperthreads).

Functions as a Service (FaaS) (aka ‘Serverless’) changes the game by charging for usage rather than allocation, but it achieves that by taking control of the capacity management bin packing problem. Containers as a Service models have so far mostly shown through the cost structure of underlying VMs, but as they get finer grained it’s possible for a model that carves between allocation and usage might emerge.

Conclusion

T-Shirt sizes are a blunt instrument versus the surgical precision of good quality dynamic capacity management tools. At their worst they can lead to substantial stranded capacity and corresponding wasted resources. There can be a place for them to simplify billing, but even then finer grained capacity management leads to finer grained billing and savings.



No Responses Yet to “T-Shirt Sizes and the Copycloud”

  1. Leave a Comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.