After some reflection on my recent series of posts about Paremus ServiceFabric on EC2 I realise that I never provided a high level commentary on what each of the moving parts does, and why they’re important.

  • Paremus ServiceFabric – this is a distributed OSGi runtime framework. The point is that you can package an application as a set of OSGi bundles, and deploy them onto the fabric without having to care too much about underlying resources – that’s what the fabric management is there to take care of. What’s especially neat about the Paremus implementation of this is that the fabric management itself runs on the fabric, so it gets to benefit from the inherent scalability and robustness that’s there (and avoids some of the nasty single points of failure that exist in many other architectures).
    • OSGi is a good thing, because it provides a far more dynamic deployment mechanism for applications (making it easier to design for maintenance).
    • ServiceFabric also makes use of Service Component Architecture (SCA), which allows better abstraction of components from underlying implementation details. This allows parts of the overall architecture to be swapped out without having to reach in an change everything. Jean-Jacques Dubray from SAP provides an excellent explanation of how this improves upon older approaches on his blog.
  • CohesiveFT Elastic Server on Demand – this is a factory for virtual appliances. I used it to build the Amazon Machine Images (AMIs) that I needed. A bit like OSGi it uses a concept of bundles, and for some of the software that wasn’t already there in the factory (e.g. the Paremus stuff) I had to create my own. Once I had the bundles that I needed I was then able to choose an OS, and build a server to my recipe (aka a ‘bill of materials’). The factory would send me an email once a server was ready (and optionally deploy and start it for me straight away).
  • CohesiveFT VPNcubed – this was the overlay network that ensured that I had consistent network services (that supported multicast) covering the private and public pieces of the project. It basically consists of two parts:
    • A manager – which can exist in the private network or the cloud (or both). For simplicity I went with a pre packed AMI hosted on EC2
    • A set of clients. These are basic OpenVPN clients. For my AMIs I used a pre packed bundle. For the machines on my home network I just downloaded the latest version of OpenVPN. The manager provides ‘client packs’ containing certificates and configuration files, which need a little customisation to specify the manager location.
  • CohesiveFT ContextCubed – this provides the ability to start and customise a bunch of virtual appliances (AMIs) automatically. With the help of their CTO, Pat Kerpan, I was working with a pre release of this service (hence no link). ContextCubed (which I accidentally called ConfigCubed in my post about it) provides an init.d style mechanism that sits outside of the virtual machine itself. I used it to download and install VPNcubed client packs, start the VPN, stop some services I didn’t want, reconfigure the firewall to allow multicast, and add binding config to the Paremus Atlas service (before starting it up). I could have also used it to create hosts files to work around some of the naming issues I encountered, but I think I’ll wait for Pat to fix things up with DNScubed or whatever he ends up calling it. Hopefully in due course the *cubed services will all find their way onto the same virtual appliance, so there can be a one stop shop for stuff that makes an application work in a hybrid cloud (or whatever suits your taste from private to public).

One thing that would have been fun to try (but that I didn’t attempt) is closing the loop between the PaaS management layer in ServiceFabric, and the IaaS management layer in ContextCubed. This would allow (for instance) extra machines to be deployed dynamically to satisfy peaky workloads (or deal with failure) running on ServiceFabric. I’ll leave that for another day.


I spent a couple of hours tinkering with this over the holidays, but mostly put it down and got on with eating, drinking and being merry.

The first breakthrough was that ContextCubed just worked once I had the right Ruby Gems installed (and in fact day 5 had got me to within one line of installation). This means that I can now start up a cluster of ServiceFabric Atlas nodes that automagically join the VPNcubed overlay network with a single command line – nice :)

The final hurdle turned out to be a mix of name resolution and binding. The former issue I fudged with some hosts entries, and the later was a simple one liner in the Atlas config.ini (which is trivial to automate with ContextCubed). Once ContextCubed gets its DNS appendage (DNScubed?) then there should be a decent mechanism for name resolution that ties into the VPN overlay.

When I embarked on this project it was supposed to be simple, and if I look back at all the problems that I encountered and lessons learned then there wasn’t really anything fundamental standing in my way. From a personal perspective the 2-3 man days that I’ve sunk into this have been very useful and educational (and it’s been nice to get my hands dirty with some real techie stuff); and the great thing is that I can now achieve in seconds stuff that was taking me hours when I set out on this journey.

For the sake of completeness here are links to the previous entries (and I know I should have done a better job of chaining them, but I can go back and fix that now). Day: 1, 2-3, 4, 5

If you’ve found this series interesting, then you might also want to take a look at my corp blog post ‘Designed for cloud‘. PaaS in general, and ServiceFabric in particular (with its ability to handle OSGi at scale), are just the sort of thing that software developers need to help them move from design for purpose/manufacture to design for maintenance.


I almost certainly won’t end up buying one of these, but it has the look of something that I’ve been hoping for since giving up my X60T tablet for my s10e netbook.

I have missed the ability to draw stuff on the screen. That gap has kind of been filled now as Santa brought me a Wacom Bamboo Pen tablet for Christmas, which is absolutely wonderful for photo editing and such. It would just be brilliant though if that was built into the screen rather than another piece of kit to carry around (and if the screen was sufficiently high resolution to make a decent ebook reader). The Bamboo does almost look like it was made for my black s10e though, as they have very similar aesthetics, and go together very well size wise.

I suspect that the U1 will be cursed by lots of first of type fatal flaws, but hope that by the time CES rolls around next year there will be some decent options in this area. Meanwhile I’m on the lookout for a Pine Trail netbook with a 1280×720 screen and integrated 3G, and if one comes along with a touch screen then that would be awesome.


Between snow, getting some prerequisite scripts and docs a bit too late and various other stuff getting in the way, there hasn’t been too much progress today. I think I have everything set up to launch a complete cluster of Atlas agents in the sky, and get them to attach to the overlay VPN and call home, but the mechanics aren’t quite working yet (I’m drowning in Ruby dependency issues).

Today was supposed to be my last working day of the year (though I’ll be in the office on Monday for a client meeting that couldn’t be moved), so this saga will draw to a halt for the time being. I may get some time to hack away over Christmas, but no promises.

Two small victories:

  1. I figured out the command line I need to kill the SSL Elastic Server manager (which conflicts with Atlas in wanting port 4433):
    ps -ef | grep ssl_server | awk ‘{ print $2; }’ | xargs kill -9
  2. I also figured out why my security groups between vpncubed-client and vpncubed-mgr weren’t working – I was using public IPs for wget and in vpncubed.conf rather than private IPs – doh!

Hopefully this story will have a quick and happy ending in the New Year.

Previous post – Paremus ServiceFabric on EC2 day 4

Following post – Paremus ServiceFabric on EC2 – declaring victory


The multicast woes are now behind me (thanks Dimitriy), and I now have a fabric that spans my home network and EC2. The problem with multicast turned out to be firewall related, and the simple fix was:

/sbin/iptables -I OUTPUT -o tun0 -j ACCEPT
/sbin/iptables -I INPUT -i tun0 -j ACCEPT

Tomorrow I’ll try to get something running on the fabric, and will also take a look at automating the deployment process for members of the fabric.

Previous post – Paremus ServiceFabric on EC2 days 2/3

Following post – Paremus ServiceFabric on EC2 day 5


I didn’t get to spend my full attention on this over the last couple of days, and somewhat as expected I’ve run into trouble with multicast. Right now it seems that whenever I put a node into the VPN overlay network it stops being capable of doing multicast.

I’ll report back once these issues are resolved, and hopefully getting a fabric up and running that spans my home network and the cloud will be a few simple steps beyond.

Previous post – Paremus ServiceFabric on EC2 day 1

Following post – Paremus ServiceFabric on EC2 day 4


One of the things that IGT2009 got me thinking about again was workload. It’s clear when listening to people’s stories of what they’ve put ‘on the cloud’ (both public and private) that certain workloads fit more easily than others.

The easiest pickings seem to be embarrassingly parallel high performance compute (HPC) tasks that have high CPU demands and low data needs, such as Monte Carlo simulation. These are the same things that we were running on our ‘grids’ 4-5 years ago, and the same things that were running on large ‘farms’ of machines for plenty of years before the label ‘grid’ came along. I feel pretty safe in making a prediction here – when any new computing fad comes along (such as whatever follows ‘cloud’) then HPC workloads will be the first thing that people do seriously and declare as a success.

The problem that beset grid computing was that it gained little awareness in the minds of developers beyond the HPC clique. There were many things that made grids attractive in the same way as today’s various ‘as a service’ offerings such as quick provisioning, scale, and management, but mostly people stuck with what they knew (or what they thought would look good on their CV at the end of the project).

This time around things are different. After HPC the second most easy type of workload to deploy to a cloud is one that’s been developed on a cloud; and all over the industry developers are now being eased into positions where this is the path of least resistance (you can have a self service virtual server today and get on with things, or you can wait 6 months for your own box). It seems that many of the resource sharing issues that acted as a barrier to grid adoption evaporate when resources have been shared with others since day 1 of development. Prediction 2 – ‘cloud’ adoption will continue to grow, as pretty much all new development will be done in some form of cloudy environment.

So… with trivial stuff and new stuff out of the way what does that leave? It mostly leaves a smelly pile of legacy apps. Some of them will run nice and self contained on a single box that’s an easy target for P2V migration, but others will have a spiders web of nasty dependencies – things that might just break if something gets changed. This is the enterprise IT equivalent of toxic waste, and toxic waste costs big $ to deal with.

I suspect that a relatively small quantity of VERY toxic IT will be something that continues to skew IT costs over coming years (and make enterprise IT people look bad in comparison to various cloudy/SaaS alternatives). The problem is the server as the denominator. Most IT shops have a bunch of fixed costs, and need some mechanism to carve them up between end users. Through various ‘cost transparency’ initiatives what normally happens is that a bunch of costs get lumped together, and divided through by the total number of servers (or whatever, though it is usually servers), and then end user departments are billed pro rata the number of servers that they use. This is a really bad deal for those that aren’t the toxic waste users.

I was once involved in a cost comparison exercise for a bunch of HPC servers (where the business need was growing rapidly ahead of Moore’s law, meaning that we kept on needing to buy more kit). At one stage it looked like we needed to outsource the whole hardware/data centre piece, as the internal charge back cost way more than a service provider. But that wasn’t because we were rubbish at running a data centre and the servers within it, it was because the charge back for the grid users was full of toxic waste from the non grid users. The grid machines were the cheapest in the whole organisation to run – rack em and stack em, throw on a standard build, done – no care and feeding, no project managers, no cruft. Once we figured out what the actual costs were for the grid machines (and adjusted the charge back accordingly) the service providers couldn’t come close (and still make any decent profit). Of course the other users weren’t too happy when their bill was adjusted to reflect their real costs, but surely that’s the whole point of cost transparency.

This brings me to prediction 3 – users of IT will increasingly be faced with bills for the toxic waste they’ve accumulated. This will give them a stark choice of paying up and shutting up, or moving to something new and cheaper (and likely cloudy in some way). The process will be determined entirely by economics (rather than technical considerations). IT organisations that do a poor job of presenting clear economics to their users will be unpleasant places to work.


I’ve known the chaps at Paremus since shortly after they set up shop, and I’ve watched the evolution of ServiceFabric since its earliest days. Since it has all the makings of a killer PaaS offering I thought I’d sharpen up my practical cloud skills by getting it running on EC2.

The first challenge is that ServiceFabric uses multicast to communicate between nodes in the fabric, and this isn’t something supported by EC2 (or any other IaaS that I’m aware of). This isn’t a problem though, as I set up CohesiveFT’s VPNcubed, which supports multicast. It also has the side benefit of allowing me to create a network topology that spans cloud and non cloud machines, so I can throw in some boxes from my home network to try out hybrid configurations. I kept things simple, and set up a single manager for the VPN-Cubed for EC2 Free Edition, which went pretty much as described in the step by step guide.

The next stage was to create some workload, so I used Elastic Server to create an AMI that had Ubuntu 9.04 as the base, along with the VPN-Cubed client, Sun Java 6 and Paremus’s Nimble. Nimble wasn’t there already, but it was a few minutes work to upload the package and enrol it into the build system, which then created and provisioned an EC2 instance for me automatically.

Once the Nimble enabled AMI was up and running I got it connected into the VPN overlay, and started up Nimble with:

./posh -sc "repos -l springdm;add
org.springframework.osgi.samples.simplewebapp@active"

I recommend giving this a go yourself if you have 5 minutes to spare – it’s a wonderful demo of dynamic provisioning.

Once Nimble had done it’s stuff it was then just a question of browsing to the http://nimble-machine-vpn-addr:8080/simple-web-app and I could see that the plumbing was working.

Snags along the way:

  • Firewalls – maybe stating the obvious, but it really is crucial to get the right end points defined as being able to talk to each other, and security groups didn’t quite seem to cut it as expected.
  • OpenVPN throwing its toys out of the pram over an SSL verification error because the date was wrong on one of my home VMs. This stuff is much easier to diagnose when using OpenVPN straight from the command line (openvpn vpncubed.conf) rather than via it’s daemon.

So, that’s it for day one, a working dynamically provisioned web application running within a VPN overlay network.

For day two I’m moving on to full fat ServiceFabric, and will join battle properly with multicast and VPN binding issues. Wish me luck.

Following post – Paremus ServiceFabric on EC2 days 2/3


I spent most of last week at the IGT 2009 ‘World Summit of Cloud Computing‘. There were some great speakers there, but the session that sticks in my mind was Alistair Croll’s piece at the end where he talked about the future of cloud. One of the most thought provoking statements that he made was something like ‘this won’t end like it has started, there will probably be an inversion’. This got me thinking about the oscillations that we see all around us, and particularly the tendency to move between centralised and distributed models for all kinds of things. This is something that we’ve seen a few times already in IT (mainframe – mini – PC – client server – n-tier …), and similar things happen with IT organisations (centralised infrastructure -> business aligned infrastructure and back again).

Nick Carr in his book The Big Switch frequently uses an analogy of the electricity industry for how IT is developing. In the early days of electricity, generation was distributed (to the point of use), and there was a need for substantial organic expertise in electricity. Over time electricity became a utility, where generation became somebody else’s problem; and yet after a century or so we might be on the verge of the next oscillation for electricity. There are tremendous losses between the original source of energy (whether it’s coal, natural gas, nuclear or whatever) and the point of consumption. It’s not atypical for only 25% of the energy used to actually make it out of the wall socket. This is why huge users of energy, like aluminium plants and cloud provider data centres try to get very close to sources of cheap electricity. It’s also why there’s a small but growing trend towards local generation (particularly with emerging ‘green’ sources such as solar and wind).

One of the IT megatrends that receives constant attention is Moore’s law (and it’s close cousin Kryder’s law), and the consequent doubling in capacity of various things every 18-24 months. One of the issues that’s discussed far less frequently is that different pieces of the architecture started at different places – so the gaps in absolute performance become more severe over time. On a log scale chart the lines keep on rising, but they never cross. Network is always the ‘thin straw’, which is why it makes sense to manage large data sets locally where storage is cheap (where I have to agree with Cory Doctorow and what he said here – http://www.guardian.co.uk/technology/2009/sep/02/cory-doctorow-cloud-computing).

By far the most vigorous debate in Cloud computing is around the consequences of ceding control to the provider of centralised services (aka the ‘public cloud’ providers) like Amazon, Google and Microsoft. This is why people talk about ‘private clouds’ (regardless of how nonsensical that term is). What this debate often seems to miss is that what we’ve come to call ‘cloud’ is really all about management, and has little to do with location. The naming is screwed up, because ‘cloud’ comes from what we drew on white boards to represent stuff on the internet, but the ideas and principles are sound. For now the easiest way to get great management (and hence quicker and cheaper provisioning of stuff that you need/want) is to go to the people that sell this stuff over the Internet, but the inversion is coming, the oscillation is changing phase. Things like Ubuntu’s Enterprise Cloud (UEC) can all of that management goodness, and let you run it on your own machines. Stuff like CohesiveFT’s Elastic Server lets you build your ‘machines’ to work with anybody’s IaaS layer and management tools, then their ‘cubed’ stuff abstracts away the network, config and other services so that you’re isolated from annoying detail.

Even then, a couple of clicks during the installation of an OS, or packaging stuff up by a bill of materials is beyond the desire or capabilities of the mass market. People want to just buy stuff that works. They want appliances. They want their virtual appliances to just happen on a device of their choosing, and this is where we see convergence of ‘cloud’ and what’s happening in enterprise IT… the oscillations will move into phase (at least for a while).

For some time complex software has been sold to enterprise IT in the shape of appliances. This was done to stop the IT people from doing dumb stuff with that software that would add months to the roll out time and maximise the chances of stuff breaking and leading to support calls. One of the problems of enterprise IT as it stands today is a tendency to smash things down to their constituent parts, and then rebuild things in a way that even their mother wouldn’t love. I’ve heard it said recently that ‘cloud is for everybody except the Fortune 500’, and ‘everybody in the Fortune 500 is married to Oracle’, but as Larry makes his stuff into appliances like Exadata 2 then the worlds are aligning. A data warehousing appliance is just as much about canned management as an EC2 instance.

I said recently that I didn’t want to own any servers, and wondered how large my company would have to grow before the economics tipped towards a move away from pure play SaaS and towards on site stuff? What I realise now is that the question of ownership is ancillary. The real point is that I don’t want to manage any servers… ever, and that’s fine, as when the cloud turns itself inside out, and I find that my data has returned home, I’ll still be benefiting from the canned management expertise of people that can do this stuff better and cheaper than me.


Earlier this year I gave a talk on cloud security at the e-Crime congress. One of the other speakers was John Suffolk, who when he wasn’t struggling with some very badly formatted PowerPoint [1] asked the audience ‘who in this room thinks they are keeping up with technology?’. I think I ruined his script a little by sticking my hand up, as apparently the normal response by an entire audience at a technology conference is to passively accept that they’re not keeping up.

This raises a fascinating question for me – why is it that people in the technology industry feel that they are constantly slipping behind? I could perhaps blame it on traditional British reserve, but there were plenty of US and other visitors in the audience. It might also be ‘head above the parapet’ syndrome, where others feel that they are competent in keeping up, but don’t feel like putting that to the test in public (and for what it’s worth John didn’t give me a hard time over it, though he also didn’t catch up afterwards to get the story behind the action).

What really scares me is the prospect that we have hundreds of people in senior positions within the IT industry who basically accept that they are to some degree clueless about what’s going on. People who’ve given up on keeping up, somehow overwhelmed by the consequences of Moore’s law and all that it brings down upon us.

Has it always been like this, or are things getting worse over time? If we’d asked a conference full of microcomputer enthusiasts in the early 80s the same question would their answer have been the same (there was after all a bewildering array of new machines, languages, applications and accessories emerging on the scene at the time)?

I put my hand up because I consider that it’s my job to keep up with this stuff. I may not know everything about anything, but I try hard to have a broad (and necessarily superficial) knowledge of as much as possible. I know that plenty of others will say that they’re too busy with their ‘day job’ to spend the necessary time, but what does it actually take? I probably spend an hour or so a day in my RSS aggregator (and Twitter) catching up on what people who’ve passed some kind of arbitrary interest threshold have to say (and reading the stories that they have to tell, or following the links they’ve exposed for me). I get through a lot more stuff in that hour than I used to in the days before RSS and Twitter – just as Moore brings us an exponentially growing problem Classen brings us logarithmic utility to deal with it.

The broader point here is that as a civilisation we must be keeping up with this stuff. Unless there’s some secret warehouse in the Bay area that’s shipping in alien technology then we’re dealing with a closed loop. People create all of this new stuff, and so the knowledge about what it does and how it can be useful is simply distributed; and the web (2.0) and all of the collaboration tools that sit on it not only give us the means to create new stuff (and introduce new complexity) but they also give us the means to harness and understand it.

My prediction – individuals will continue to feel left behind, whilst society as a whole continues to plough forward.

[1] almost certainly not his fault. I bet that he (or more likely an assistant or somebody else that does slide mongering for him) has to contend with some ancient version of MS Office running on an obsolete build of Windows on top of crumbling neutered hardware. If my impressions of public sector IT (even amongst its highest operatives), formed by my own somewhat dated experience, are wrong then please correct me with a comment below.