Originally posted internally 18 Nov 2016:

Background

Late yesterday I got an email from my colleagues in France asking me to review a pitch deck about our Docker capabilities for a financial services client. I’m repeating my answers here as they probably deserve some broader sharing…

Docker in the Enterprise

Their presentation hit most of Ian Mielle’s Docker in the Enterprise Checklist, which I’ve used with other customers.

The question it left me with is ‘what are they trying to achieve’, by which I mean what is the point of containerising (existing) applications? The entire thing seemed to be aimed at a mass migration exercise – what’s that supposed to accomplish?

Faster testing

For me the point of containerisation is faster cycle time for testing, which is super useful if you can insert containers into a critical bottleneck in the idea->production pipeline where existing tests are too slow. This is particularly the case in development, where containers can help to provide much quicker feedback, and save developers repeating steps that they know work. Of course once a container is the output of a development process it then makes sense to have some means to take containers to production, which then drives the need to provide for a wide variety of operational considerations.

Connecting agile infrastructure to business needs

We shouldn’t make the same mistake we have with cloud where customers have bought a story of agility, agility, agility: agile business needs agile software needs an agile infrastructure. Far too many customer conversations run along the lines of ‘so I bought a cloud to be my agile infrastructure, when do I get my business agility?’. The business agility only happens by providing a connective tissue between business needs and the agile infrastructure. Generally this will take the form of a continuous integration (CI) pipeline, and when we join that to a cloud we have the ability to do continuous delivery (CD)[1].

Containers can help speed up flow and feedback within a CI/CD pipeline, and containers in production can be the new cloud target; but we must help our customers build those pipelines, and not just repackage old apps in new wrapping (because although that might put hours on the clock for us, it doesn’t really provide much value to them).

Other container benefits

I’ll conclude by adding that containers can offer benefits in resource usage (versus VMs) with RAM, CPU and startup time, but those are only achievable in an environment where there’s a good understanding and close control over how shared libraries are used. Containers also provide application portability between environments, but making apps portable before you need to move them is a premature optimisation.

Who’s doing this already?

There was also a question about customer references, so I noted that: In terms of existing customers we have a large OpenShift deployment, but it’s not reference-able (sensitive government account). Most of our other customers are at an exploratory/experimental stage (or using containers just in their ‘mode 2’ division, often with poor consideration for operational issues). I’d also note that most of the quick wins are in the development area.

Note

[1] Where continuous delivery means the ability for any commit that passes tests to be pushed to production. I don’t generally expect our customers (especially those in regulated industries) to be shooting for CD meaning continuous deployment, where every commit that passes tests is pushed directly to production.

Retrospective

This was the blog that got most comments, so I was very pleased to see a lively discussion on the topic, and different areas of the company starting to work together.

Original Comments

CN

So for DevOPs and in particular containers, although the same could be applied to serverless/loosely-coupled apps, use cases are there certain customers or types of customers CSC should be targetting?

I’ve worked on some PoCs and small parts of big orgs may have “a product” and a dev team who are actively developing a software solution that would fit with fast testing cycles, CI and the whole ethos of these new tools.  Many others though, some newer, but typically the older, larger, legacy customers, have very large estates where they run legacy apps, some COTS, or where we provide VDI to a large part of their estate and they do not have a focal app that they develop in house.  This makes “selling” the idea or benefits of these new tools harder as it is more difficult to demonstrate the tangible benefits, whether it’s directly to the customer, or for our own internal use.

Should we be moving away from big monolithic customers?

Is it simply that in GIS we don’t see the app-side(GBS) as much and some of our bigger customers have huge development farms that we don’t see?

Are we just setting them up servers or storage and never question why or what they wanted it for and we don’t spot the pattern of “5 VMs this week, 5 more next week, then decom them” because we are treating the provisioning/ordering requests in a disconnected way?

Is there room/time/space for the PMs or PE/DEs to look holistically at a request for what is a dev/test environment as you reference above and give architecture the chance or incentive to suggest that rather than just standing up the customer 5 VMs a week ramping up to 200 or rebuilding the same 20 every week, that we could offer them a solution such as the OpenShift one you mentioned, or a VIC deployment?

Lastly would that be frowned upon when it comes to billing?  If you sell/bill 200VMs versus 40VMs runnning 5 Docker instances each…how is that efficiency recognised in revenue, e.g. how do we drive the good behaviour and new innovations?

Lots of questions, I know, but I really want to look at some practical implementations, appreciating that some are sensitive, but in particular any innovative uses which aren’t the traditional “We are company X and software is our business so we were already doing Docker/Jenkins etc. can you just sell us VIC/Openshift so we don’t have to build it ourselves?”

AC

I was directed here via Distinguished Architect TD regarding docker questions and capability in our organisation.

Your blog brings all the same points that the client is making although we are not the only vendor entrenched on the account, and the design and implementation is being driven by a 3rd party, not CSC.

I am trying to find out is where we sit in maturity for docker design and support,  container support and design, agreements and SLA’s that may have been utilised in other docker installations for a development, and production workload.

Do we have any Do’s and Don’ts with docker that are red flags?

Our client is looking to run docker in VM’s on a Vblock, which technically I see little issue with, but CN raises very similar points of concern to my own.

Have we developed a docker container with a CSC SOE contained in it either Windows 2012 or RHEL?

Would we support our SOE or an O/S in a container (are we geared up and trained in it, do we know the pitfalls and constraints that may be apparent in a highly virtualization and duplicated environment) .

CN

Interesting questions too AC

Obviously VMware’s preferred option is to use the lightweight Photon OS as the platform for hosting your containers (possibly circling back to Chris’s earlier post about VIC)

However I have heard a reasonably hard and fast “we only support RHEL” from the cloud delivery org.  That in itself is reasonable based on the scaling back we are doing pre-merger.  Is there bandwidth for people to cover other non-SOE’d environments.  Personally I can’t see why CentOS, Ubuntu and Photon couldn’t be supported, but I’m not aware of any internal training Cloud or Platform engineers have had on supporting Containers.

I know CS and CK devised the Infrastructure-As-Code workshops which have been run multiple times here in Chorley and, I believe, in Chesterfield.  That is a great jump-off point for engineers to start thinking about and learning about containers, but most engineers leave that and their day to day work doesn’t utilise that learning.

It’s a wider catch-22 question.

Sales: “Do we support containers, I have a customer that wants them?”

Operations: “We haven’t had any customers that wanted containers before so we aren’t able to support them”

I believe in theory the progression is

1. Identify Market opportunity
2. Specify and design an offering
3. Build that offering
4. Deliver the offering and industrialise/automate the delivery support
5. Provide operational support for the offering

Where does the training of the delivery/support staff come?

Is it the responsibility of the offering to assign part of it’s development budget to training Operational Engineering (OE) to be able to support the offering?

This has been mentioned on CS/SH PLM/ODF calls recently but I would be interested to see how it’s implemented in practice where sales and customer zero traditionally fund training of staff.

AC

Thanks CN,

I will reach out to our POD’s and service teams to advise on their current capabilities, however I suspect that this time it will pass us by.

CS

A few points from an offerings perspective:

1. We have SOEs for Ubuntu (and OEL and CentOS) in addition to RHEL.

1a. I’m not sure that it makes any sense to bake Docker into the SOEs as it’s too much of a moving target (something that’s been brought up in a rash of ‘Docker’s not ready for production’ blog posts over the past few months).

2. We’ve been working on integration of Agility with Kubernetes (which implies Docker underneath)

3. VMs on a vBlock is still fine, but we’re moving now to Modern Platform, which provides a much more flexible (and lower cost) infrastructure

#2 there is probably the key one in terms of answering the question of what we have from an offering perspective for Docker/containers. Having an offering for Docker itself makes about as much sense as having an offering for ‘ps’ or ‘grep’ or any other user space Unix tool – we need to manage at a higher level, and bringing together our Cloud Management Platform and Kubernetes does that (though it still leaves a bunch of holes to fill with other aspects of service management).

Going back to the original post… anybody who’s buying into a container management solution without thinking about how it connects to their CI to become CD probably hasn’t thought things through completely, and we need to provide more help with that process.

NS

If someone says the only answer is RHEL for the cloud, then talk to someone else as they don’t understand cloud . Take a look at RHEL Atomic and Alpine Linux for some other alternative container OSs. The container OS should be the responsibility of the application developer though and no-one should expect support from the IT support function / CSC IMO.

Running Docker on top of your Linux of choice is fine, but that’s a very limited approach and isn’t suitable for containers in production. For that, as CS mentions, you need a container manager. Agility isn’t there yet and until we can see what it does and what limitations it may have it’s difficult to know whether it will be a great tool or just a tick box. Shout out to the Agility team – share your design plans and get some feedback!

Docker/containers are just one aspect of the emerging cloud native application development/hosting options. If you take into account PaaS and serverless computing paradigms then at present I can’t see how Agility will support this wider capability. Agility + support for containers seems a little like the old argument that if I have a VM and can automatically deploy some middleware on it I have a PaaS platform! I therefore think CSC needs to get a little opinionated in this space and start to think about re-using some of the account knowledge around OpenShift, Cloud Foundry, BlueMix, etc. that is surely out there.

CN

Interesting point about the RACI for the Container OS NS.

I agree everything should be self service and neither the Dev OR CSC should look after container OS…over and above dropping it and spinning up a new one if required, but that “should” all be handled by Agility/Openshift/vRealize/Bluemix….etc.

I know that currently the Agility Devs are focussing on delivering pre-promised road map items and updates/fixes…not sure where Docker integration/management is on their Roadmap.

PC

Hi AC,

Docker containers are also implemented in Windows Server 2016 which is now GA.  The features is called Windows Containers.  I believe we also have a Windows 2016 SOE that is available.  Please reach out to PT for more information.

Here is a great quickstart article:  Windows Containers Quick Start

We’ll be looking at Windows Containers for new MyWorkStyle offering deployments, in the new year, so watch this space!  :-)

CN

Interesting to hear that PC would be a great Lunch+Learn in AEC or GIS town hall topic. I’d sign up to see that use case!

PC

Thanks for the feedback.  My initial thoughts are that containers would help us constrain the sprawl of virtual machine hogs in an environment, helping us to control costs but also better control the predictabilty of resource usage over time.  It’s no more than a theory at this stage.  We need some lab time to see if it has wings and to truly understand any other benefits.  I’m very open to ideas and new ways.  :-)

AC

Hi PC,

I think for me its the service, support and availability aspect from an application perspective.

If the Base O/S becomes almost the equivalent of a Hypervisor for the docker layer and it’s component, it is almost a commodity item.  (Take it with a grain of salt, but we don’t need to patch monthly, we roll out new VM’s with the pre patched SOE instead.)  a standard sized VM with a standard OS spread over multi vblocks or cloud environments, just means far lower support costs for CSC.

It should also mean we have capability to report on performance metrics,  not just at the CPU layer,  but now,  more importantly at the docker, and application layer,

End user experience can be a higher focus. (how long does it really take a user to carry out these tasks, and where is that constraint or bottleneck)

Also improved service for our clients with a higher application uptime far simpler to achieve, and simpler DR scenarios that if not automated, then far swifter to trigger and spin up.

I haven’t even started to dig into the real potential of Client PC docker containers yet.  That could be a whole world of fun.


Originally posted internally 9 Nov 2016:

TL;DR – the 3 ways

  1. Improve flow
  2. Improve feedback
  3. Improve our ability to learn by experimentation

Why this is for you

When you see the word ‘DevOps’ please keep reading – this post is still for you. I deliberately left the word DevOps out of the title of this post, because I don’t want people to filter based on assumptions. This post is for you.

This post is for you if you’re a developer that doesn’t do ops – because it’s about how we help ourselves and our customers improve development.

This post is for you if you’re and ops person that doesn’t do dev – because it’s about how we help ourselves and our customers improve operations.

This post is for you if you’re a project manager – because it’s about how we make our delivery better.

This post is for you if you’re an architect – because it’s all about design thinking, and how we can make things better by design.

This post is for you if you’re a manager – because your help is needed in leading the necessary changes to our culture – the way we do things around here.

What is DevOps anyway?

DevOps is the set of practices that emerge from organisations that have designed for operations, which is a more mature level of design than design for purpose, or design for manufacture. Pretty much every product or service goes through design evolution, and we can now see many software based services that have effectively designed for operations. DevOps is the label we’ve given to the stuff they’ve done to accomplish that.

What are the ‘3 ways’?

  1. Flow
  2. Feedback
  3. Continuous Learning by Experimentation

The 3 ways originate from the total quality management movement in manufacturing, and over the course of the last few years those lessons from the late 20th century are being relearned and applied to software. They were first articulated in a DevOps context by Gene Kim and his co-authors in The Phoenix Project, and have recently been much more clearly explained in practical terms in The DevOps Handbook.

The rest of this post will look at each of the 3 ways, and what it means to us at DXC

Continuous Learning by Experimentation

The tip of our spear here is Operational Data Mining (ODM), which is run out of GD’s Operations Engineering (OE) team. ODM takes the data from our service management and ancillary systems and allows us to perform analysis, modelling, hypothesis extraction and experiment design for things happening in our delivery.

ODM helps us identify the constraints in our operational environment that are holding us back, and it helps us figure out what to do about them.

Pretty much every experiment we’ve run so far has succeeded, because the data has let us to obvious stuff, and generally it’s clear what to do about it. It’s not uncommon for front line staff to say, ‘I knew that, I knew we needed to do that’. The way we do things around here, our culture, has stopped those front line people from fixing those problems. ODM is providing the empowerment to change how we do things around here, to change our culture (to more data driven decision making rather than decisions made by ‘HiPPOs‘ – the Highest Paid Person’s Opinion).

It’s worth highlighting here that the true measure of ODM working for us will be when we move past the obvious ‘I knew that’ stuff and start finding the real edge of our organisation. We’ll know that’s happening when experiments fail to produce the expected results, and that’s just fine. Learning by failing needs to sit alongside of learning by doing and learning by using in our overall learning approach.

Flow

Flow is about moving things through our ‘factory’ as quickly as possible. It means breaking things down so that they’re smaller, easier to manage, and more likely to succeed.

Flow is the opposite of ‘Death Star’ projects, where every conceivable requirement and every possible risk mitigation gets pushed into something that takes years to deliver and $MMs to get there.

Just about every struggling project that I encounter is one where we’ve piled up a bunch of requirements, and pushed back delivery. Those projects get later and more complex, and more complex and later. We fix that by delivering as little as possible, as early as possible, as quickly as possible – then following with the next incremental delivery.

Feedback

Feedback is about finding out whether something works or not as soon as possible, so that any corrective action can be early and inexpensive.

Practices like continuous integration (CI) improve feedback when developing software, but there are many opportunities to create and improve feedback besides CI.

Feedback can only work if we’re willing to accept that things can go wrong, and be ready and willing to change what we’re doing (a little) and try again.

Many of the customers I talk to expected more from the cloud they bought from us. They bought into ‘agility, agility, agility’ – that business agility needed agile software that needed an agile infrastructure. Having bought the agile infrastructure they waited for the magic to happen, but there is no magic. We need to help those customers create the connective tissue between their business needs and their cloud infrastructure, and that’s done with delivery pipelines. Pipelines that embed testing and quality assurance. Pipelines that provide feedback.

What *you* can do

Think about what you’re doing:

  1. Does it improve flow?
  2. Does it improve feedback?
  3. Does it provide an opportunity for learning by experimentation?

If you can’t answer yes to at least one of those then think again, and ask for help if you’re struggling on your own.

Personal learning

We cant become a learning organisation without individual learning, and we shouldn’t conflate learning with training, or learning with education (as Andrew Clay Shafer explains in his ‘There is No Talent Shortage‘).

IT is a complex space, so nobody comes into an IT career without a desire to learn, but for some of us that desire gets crushed by burnout. An early statement in the DevOps handbook is:

When people are trapped in this downward spiral for years, especially those who are downstream of Development, they often feel stuck in a system that pre-ordains failure and leaves them powerless to change the outcomes. This powerlessness is often followed by burnout, with the associated feelings of fatigue, cynicism, and even hopelessness and despair.

If your work at DXC is making you feel like that then firstly take heart that we’re trying to fix things, and secondly please try to find again that thirst for learning that brought you into an IT career in the first place.

Conclusion

DevOps is just a name that’s been coined for the stuff that organisations that are good at delivery do.

We want to be good at delivery too, which means we need to embrace DevOps.

There are 3 ways: Flow, Feedback and Continuous Learning Through Experimentation – everything that we do should aim to improve at least one of those.

Here’s a nice visual from the folks at Pivotal:

 

Retrospective

This post more than any from the back catalogue embodies why I took the job in the first place and what I’m trying to achieve. Things don’t change overnight, and the arrival of DXC means that I’ve gone from a ‘bigger train set’ to an environment that’s three times bigger still; but the weekly ODM reviews show the power of incremental gains. If the UK can use incremental gains to come 2nd in the Olympics and Paralympics then we can do the same to come 1st in the global services integrator competitive landscape.

Original Comments

MB

Great post – well done. Having being an advocate Agile + DevOps + Lean Change in Consulting (ANZ) for some time I am totally with you.

The big question is how can this lead to real, tangible action within CSC? That is, what 3 things would you do if you were Steve Hilton, Jim Smith, or Carlos Lopez to foster a culture of Flow, Feedback, and Faith (Trust, Experimentation and Learning)?? I ask because culture is the challenge before anything else.

For me I would:

– Introduce Lean Change techniques into the way we run our business (see Lean Change | CSC Consulting — CSC Consulting Australia and Lean Change: a unique approach to managing change at speed — CSC Consulting) – we use it in our consulting practice

– Foster a culture of trust by setting clear expectations around what is required from employees, THEN remove bureaucracy, approvals, red-tape and THEN manage the exceptions (the 5% who get it wrong not the 95% who get it right)

– Invest in better business intelligence so we all gain more situational awareness which leads to us improving what we do.

Of course I totally understand that it’s not a simple as what I’ve written – the leaders have the tough job of keeping the business running well, while driving change through it – that’s not easy stuff!

CS

Thanks MB – I wasn’t familiar with the Lean Change material, and it’s great stuff.

The good news here is that the process to change our culture is already underway. I should explain that with reference to my preferred definition of culture – ‘the way we do things around here’, and so we’re changing the way we do things around here.

As I mentioned in the post the Operational Data Mining from the Operations Engineering (OE) team is where we’re leading the change, by identifying the constraints then capturing the data, doing the analysis, building the model, forming the hypothesis and running the experiment; rinse and repeat (and share lessons across the org). In recent weeks we (as a management team) have been looking for ways to push that forward (from the Global Delivery Network) into the broader picture of how we engage on delivery.

The bad news is that we have a psychological safety problem that touches many of our people, it’s been a problem for a while, but it’s particularly bad as we head towards DXC. I’m getting the ball rolling now to figure out how we establish better psychological safety for more of our people once we’re past the merger. I don’t think we can do much to foster trust until we’re trustworthy as an organisation – so the first step has to be ‘Make Safety a Prerequisite‘ (if I may borrow from Modern Agile).

The investment in better business intelligence is happening – that’s why we have a data science team in OE, and why we’re working with our colleagues in Big Data & Analytics to build robust and contemporary flow based analytics platforms for the data exhaust generated by our operations.

SS

“- Foster a culture of trust by setting clear expectations around what is required from employees, THEN remove bureaucracy, approvals, red-tape and THEN manage the exceptions (the 5% who get it wrong not the 95% who get it right)”

Makes so much sense as a guiding principle. Hopefully DXC will provide a great opportunity for a clean slate approach rather than a mishmash of legacy thinking and workarounds on top of workarounds…


Originally posted internally 22 Sep 2016, and it’s another post where I took an email reply to a broader audience.

I got an email question about switching from Red Hat Enterprise Linux (RHEL) to Oracle Linux in order to save cost, and I thought the answer would be worth sharing more broadly:

With Linux it’s important to be clear that you’re paying for curation of the distribution, support, and the patches/updates that go with that. Although there is a key used on the OS to provide access to updates, it’s not really a license in the traditional sense.

What Oracle have done is copied the Red Hat Enterprise Linux (RHEL) distribution, and they’re selling a support contract at a lower cost to Red Hat. They’ve also done a pretty good job of making the process of switching from Red Hat’s update infrastructure to their update infrastructure fairly painless – so the in place switch is more than a license key change, but not much more. I’d also not be too bothered about Oracle ramping prices later – firstly switching back would be easy, and secondly Oracle seem much more interested in undermining Red Hat than they are in building their own open source based business.

It’s also important to be clear on what the value proposition of a distribution is, which (beyond support/updates) is generally tied to the certification of third party software that will run on top of it. This is where things get tricky for Oracle Linux, as most software vendors (beside Oracle themselves) certify RHEL but not Oracle. It shouldn’t matter, given that the only change is around where updates get downloaded from, but it’s this sort of petty detail that can cause unnecessary noise and problems when faced with an outage.

Personally I’d question the value of paying for Linux support at all, particularly as paying for Linux often becomes a cost barrier to other changes that would otherwise make sense. When did this customer last have an incident with Red Hat, and how did that work out for them (is the myth of software support at play here)?

If you look at the cloud most people use Ubuntu (without paying Canonical for support) or CentOS (a RHEL like distribution).

I’d also highlight the the Canonical support model is much more cost effective than Red Hat. This is something that I wrote about in ‘Banking on Ubuntu‘ a little while ago after meeting Ubuntu/Canonical founder Mark Shuttleworth.

We have standard operating environments (SOEs) for Ubuntu.

Retrospective

Before moving the infrastructure as code training to Katacoda it was run out of Docker containers on AWS. Since RHEL is the most common Linux in our customer environments I went with a CentOS Amazon Machine Image (AMI )to keep things similar. This was painful, because there are no official public AMIs for CentOS (like there are for Ubuntu), just marketplace AMIs, and if you build an AMI from the marketplace you can’t make it public. I wanted to have public AMIs so that staff could run the training materials in their own AWS accounts, and that initially drove me to the unofficial Bashton CentOS. But I fear there might have been a vulnerability in those AMIs, as any VM left running too long turned into a compromise notification from AWS. Only later did I realise that I could just use AWS Linux, which is also EL/yum flavoured.

I’m still shocked by how many organisations think that (for ‘insert specious regulatory reason’) they must pay for Linux support, and generally in a way that has them coughing up by the socket like the bad ole days of proprietary Unix like SCO.

Original Comments

NS

Paying for Linux support is not a technical decision, but purely a commercial one. In a support contract you have to ask who is ultimately responsible for fixing a bug in the Linux kernel – CSC or a vendor? Are  customers happy with a CSC reasonable efforts approach or do they want some certainty? It’s the latter approach that the commercial people often insist upon.

CS

It’s not a black and white question of pay or don’t pay – it’s a question of who do we pay, and how much in order to get the right level of risk mitigation?

NS

It might be an interesting option for some customers if we could offer a global Linux support offering around CentOS and Ubuntu.


Originally posted internally 16 Aug 2016:

The title of this post comes from a tweet by Evan Goer I saw last week. It’s derived from Michael Pollan’s statement from In Defence of Food: ‘Eat food. Not too much. Mostly plants.

Write code

Just as food fuels our bodies, code fuels our industry. The world is moving to infrastructure as code, which I’ve written about before. I noted yesterday that GE’s CEO Jeff Immelt is quoted as saying ‘We hire 4,000 to 5,000 college grads every year, and whether they join in finance or I.T. or marketing, they’re going to code.’ – so whatever you’re doing it’s likely that you can do it better/faster/cheaper by writing some code, and if you don’t know how to do that it’s never too late to learn.

Not too much

Everything in moderation. Quality beats quantity with code just as it does with food.

Mostly docs

Because people need to understand what the code does, and ‘use the source Luke’ just doesn’t cut it. Good docs give your code the superpower of enabling other people to (re)use it.

Collaborate

Finally share, share, share – put the code and docs on GitHub where your colleagues can find it, use it, improve it.

Retrospective

This was a reminder of my earlier ‘The value of artifacts‘, where I first started banging the drum about the importance of documentation (and samples and examples).

Along with the infrastructure as code Katacoda scenarios I’ve published a screencast on getting set up with Git Bash and Atom, and one of the things I highlight there is Atom’s built in Markdown preview, which is super helpful to anybody writing docs that are going to live in/on GitHub.

Original Comments

GS

This is a simple yet good write up to promote movement from sys admins to sys engineers in the organization.

MN

I believe his point was that people working in fields that use computers (not just Information Technology) will need to understand coding just a little more than they do today.

IP

I think we really need to go for infrastructure as code and start cranking it out but I wonder how long before serverless overtakes.

Incidentally are we building infrastructures as code libraries for our partner’s clouds? Be really something if we one day had them available to clients in a catalogue such that they could build their own CSC supported environment, then just press a button to handover support to an SLA. Too much dream?

MN

I’ve been working a bit with containers and I think there’s some legs to it.

There seems to be a need for a universal mediator or we’re just going to be continually tying together pieces.  With SDN, it actually seems like it is getting worse, where vendor specific technology/implementations are driving the need to create zones of intermediary relationships.

Function based serverless looks pretty nifty.  It is certainly another tool in the toolbox.  I’m not really sure how it differs all that much from container based workload chaining though.


Originally posted internally 12 Jan 2016:

PC sends out a weekly update for things happening in the Workplace offering, and this week he touched on the topic of programming languages. Rather than replying to him 1:1 by email I’m writing here because I’d like to have a more open debate. Before we start… take a look at the RedMonk Programming Language Rankings (latest is June 2016), which PC referred to in his newsletter.

A couple of the languages that PC touched on are Scala and R. They’re both interesting, though for different reasons.

Scala caused my friend Kirk Wylie to write I Want A New Programming Language where he introduced a very interesting concept of a ‘Journeyman Language’. His general point was that Scala was too high end, and if he let (some of) his team develop in Scala then he’d end up with a part of the OpenGamma code base that could only be maintained by Scala experts.

Kirk brought up the concept of a ‘Journeyman Language’ before Go (or GoLang) was launched, but Go is now celebrating its 5th birthday, and in its short life just about all modern infrastructure software has been written or rewritten in Go (e.g. Docker, Cloud Foundry, Teraform, Kubernetes). I asked Kirk if Go fitted his requirements for a Journeyman Language, and he said it didn’t; though it seems to me that although it might not have all the bells and whistles Kirk was after it’s doing exactly that job at Google and elsewhere – allowing relatively inexperienced developers to be productive, and create code that’s comprehensible to others (after all the point of a programming language is to create stuff that’s understandable by other humans – compilers take care of making things work on the machines).

There are two other interesting things about Go:

  1. It’s the first language since C that’s a systems language and an application development language.
  2. It has great support for concurrency, having adopted the communicating sequential processes (CSP) approach that debuted in Occam – a language developed to work on Transputers, which were one of the first truly parallel processing substrates. Former Netflix Chief Architect (and all round tech superhero) Adrian Cockcroft does an excellent job of connecting the present to the past in his Gophercon 2016 presentation Communicating Sequential Goroutines. The key here is that Go (and its idioms) has made parallel processing easy and accessible, and since we now have multi core CPUs (and GPUs) all over the place that’s a good thing.

The language wonks’ language that’s appeared alongside Go is Rust, which comes from Mozilla (the Firefox browser people). Rust has a smaller footprint than Go (which might be a big deal for IoT applications), but it seems to have a steeper learning curve – so it’s less of a ‘Journeyman Language’. The Chef people recently launched their Habitat automation tool, which was written in Rust, but it’s a nightmare of complexity that will prevent the kind of grass roots uptake we’ve seen with Docker. Rust might not be to blame for that, but it’s a bad start for it in terms of mind share and impact.

The point to this post is that learning programming languages is an investment – it takes time to become proficient in a language, but that investment pays off when the language lets you solve a problem more quickly and efficiently. It’s easy to be reductionist about this and think too much about individual productivity, but Kirk’s point was about team productivity (and longevity) and there’s a broader point these days with open source with respect to community productivity (and longevity) – if you can bring other people into you project as collaborators then that becomes a force multiplier – so pick a language that’s popular and effective.

The final point is about context. R is great for statistics, which is why we our data scientists use it as part of Operational Data Mining, though they also use a ton of Python. Scripting languages like Bash used to be where the action was at with systems operations, but the game has moved to Go and configuration management tools like Ansible (which is based on Python), Puppet and Chef (which are based on Ruby). The fact that Ruby has been squeezed out of tools like Cloud Foundry in favour of Go shows a meaningful direction of travel. For application development the RedMonk team have a saying that when apps grow up they get rewritten in Java, though it’s now clear that many adopters of microservices are doing so at least in part to allow some developers to use Go, Node.js or whatever else takes their fancy. I can see some reason for arguing that beginners should learn JavaScript, because it has utility on the client and server side (making it just about ubiquitous), but JavaScript has some awful idiosyncrasies and weirdness, which brings truth to the saying ‘In Ruby, everything is an object. In Clojure, everything is a list. In Javascript, everything is a terrible mistake.’

Retrospective

Another RedMonk Programming Language Rankings came along in January, which means we’re not that far from the next cut. Last time around it seems that Swift and Typescript were most worthy of comment.

I’m inclined towards thinking that the most important thing to learn is Markdown, but that’s a topic for another post.

Meanwhile we ran the DXC Codes competition for school kids, which we based around Scratch – a language that can trace its origins back to LISP via LOGO. There were some amazing entries.

Original Comments

NS

I can see where Kirk is coming from, but it appears he’s nuts. Who writes their own RDBMS?

“if you could write a reasonably high performance RDBMS system in the language, it has enough features. If you couldn’t, it’s not good enough. I like this particular test because I’ve done it several times”

These days we operate in a landscape where developers should use the best programming language / framework (very important) for the job, rather that what happens to be the current flavour of the month.

HS

R (and R Studio) is absolutely the way to go as entre to data science, with useful programming by-product. Programmers might come at it from the other direction, Python and Scikit-Learn, a popular machine learning library.

HS

It has always been somewhat amazing to me how many new languages emerge over time. Evolution of computer programming languages (original link lost on C3) And I know the dilemma of trying to decide which ‘next’ one to learn for those still active in the development of their software engineering career. I suppose it all depends where you start? An Excel macro person might like to try Python or R. A professional Java or C# programmer might be interested in understanding why they might be interested in Go, Scala, Hasskell, Julia, Lua, Erlang, Rust, Clojure, The ‘Full Stack’ developer will need to know how to mesh a bunch of approaches. For those who specialise in SQL, they might want to explore how that seques into machine learning frameworks. Etc.

I must have a stab at writing that ‘Why I should learn to program in <X> or can afford to ignore it?’  for all these languages, and more.


Originally posted internally 15 Jul 2016:

This is something I’ve been meaning to do since TechCom, so it’s a little overdue. My intention going forward is to have a monthly cadence.

Since I have some involvement in build, sell, deliver I’ll organise along those lines:

Build

‘Modern Platform’ is due before the Technical Design Council (TDC) next week, which will be the culmination of months of work with our partners. The intention is to have a basis for all of our offerings (cloud, workplace, big data and analytics, and the platform itself) that can start small (and cheap) and scale well. Our original target was to be able to do a 100 Virtual Machine (VM) minimum footprint at $1200 per VM, we’ve not quite hit the low end on scale – it’s 150 VMs, but we’ve exceeded the price objective as they come in at $800/VM (so arguably the target is hit – it just comes with spare capacity). This is going to get us into smaller accounts that we couldn’t reach before, but also allows us to start small and scale up in larger opportunities.

Some good progress has been made in our open source efforts around our offerings with the release of AWS and Azure adapters for Agility and also a Terraform plugin. Coming soon will be adapters for Kubernetes and AWS CloudFormation. Open Sourcing these things allows our customers to engage without having to ask permission, and creates an opportunity for co-creation with a community of customers forming around our offerings.

Sell

The first version of our Key Transformation Shift (KTS) documents (aka Digital Shifts) happened at the end of last year. Since TechCom the team has been busy revising all of the documents with the main emphasis being on highlighting the interlock between each shift, and also the interlock to cyber. The version 2 documents are now in the final stages of review before publication, and there will be a big marketing push around the Digital Shifts starting in September. Along the way there will be Town Halls for each of the shifts. The Integrated Digital Services Management (IDSM) Town Hall happened already, but if you missed it then catch up with the recording on YouTube.

Over the past few weeks I’ve been pulled into a number of pursuits, ranging from a services company in Helsinki to a comms provider in London. They’re all very interesting, and each presents its own challenges. The biggest issue however seems to be (as SJ put it) ‘if only CSC knew what CSC knows’. The digital shifts and offerings are all great, but sometimes our sales force needs stuff that’s easier to digest for customers, and we can all help make that easier to come by.

Deliver

The big news in delivery is the roll out of the new GIS organisational structure and operating model (to align with build, sell, deliver). In particular the Offering Delivery Function (ODF) is taking shape under JH’s leadership. The formation of the Operations Engineering (OE) group has been keeping PF and I busy, but the new OE lead will be starting next month. OE will bring together automation skills from the Automation Centre of Excellence (ACoE) and data skills from Governance Analytics Metrics and Business Intelligence (GAMBI) along with a team of data scientists that we’re recruiting to replace the services we’ve been getting from CKM Advisors with an organic capability.

As we shift our delivery to a more infrastructure as code based model it means that ops people need to get (or sharpen up) dev skills – using source code management, configuration management and continuous integration tools. To that end we’ve created the Infrastructure as Code boot camp, which is an instructor facilitated workshop aimed at POD staff (and GIS managers). I’ve personally been getting around the UK PODs, though I was very pleased to see DE and MH running a second generation course in Chorley and then CN volunteering to run a third generation course – that’s how it’s supposed to work so that we can get scale across the organisation. Meanwhile CK has been busy getting around US PODs, and is now part way through a tour around APAC visiting the PODs there. If you’re in India watch out for him at a POD near you next week.

Retrospective

Sadly I didn’t get into a regular cadence for these newsletters as other commitments got in the way of writing time.

I ended up taking on the x86 and Distributed Compute P&L as Offering General Manager, which allowed me to shepherd Modern Platform through the Platform Lifecycle Management (PLM) process to early release. It was very gratifying to see VMware CEO Pat Gelsinger talking about one of the first Modern Platform deployments in his Dell EMC World keynote last week; and the ‘turnkey on day 1’ mantra has become a big part of what I’m working on between build and deliver in the new organisation.

The Agility adapters for Kubernetes and Cloud Formation were released, and I noted an improvement in the quality of README.md documents where I wasn’t having to do a pull request against every one before we switched the repo from private to public.

The second set of digital shifts papers were published, with much better integration between each of them, and a cyber/security thread running through them all. They’re now undergoing another revision to align with the new DXC Technology offering families, and with a bunch of new in job CTOs holding the pen.

We replaced the offering delivery function (ODF) with offering delivery and transformation (OD&T) as we became DXC Technology, but the general direction remains the same. Similarly operations engineering (OE) became operations engineering and excellence (OE&E).

I ran the last infrastructure as code boot camp in the old workshop format in the Vilnius POD in the final week of CSC, and tried the new Katacoda based approach on attendees there. The whole workshop has now been migrated to Katacoda to improve reach and scale, and I hope to have everybody in delivery (and elsewhere in the company) complete basic courses in collaborative source control, config management and CI/CD (managers included – the EVP for Global Delivery made his first pull request whilst running through a Katacoda scenario a few days ago).

 

 

 

 


Originally posted internally 26 May 2016:

Last week I called in on Brad Meiseles, the senior director of engineering responsible for VIC. It’s a product I’ve been watching since the earliest rumblings around what Project Bonneville did with the VMfork technology that had originally been envisaged as something for quicker launching virtual desktop infrastructure (VDI).

VIC has taken a while to hatch, as it’s a complete rewrite of what was done with Bonneville, but it might be one of the most important things to emerge from VMware this year. Containers are a big threat to VMware and all of the vSphere (and associated management) licenses they sell, but VIC gives VMware the opportunity to set up an enterprise toll gate to containers because it gives the security of VMs with hardware trust anchors at the same time as giving the quick launch, low footprint and packaging ecosystem of containers.

There’s just one problem. The underlying technology that VIC uses for its quick launching magic comes with VMfork in vSphere 6, and most of the world is still running vSphere 5.5. Furthermore containers have swept through the industry so fast because most people already have a Linux that can run them, so they haven’t had to wait for the hardware refresh cycles that went along with most ESX/vSphere deployments/upgrades. I’m hopeful that VIC will be released with vSphere 5.5 compatibility; after all quick launch matters a great deal in dev/test environments for quick cycle times, but it’s less of an issue in production, and VIC is very much aimed at being the secure production destination for containers that get developed elsewhere.

Retrospective

VIC hasn’t become as big a part of the enterprise containers conversation as I expected it to be. I think this is down to companies taking a bimodal approach (or preferably pioneers, settlers, town planners [PST]), and so there’s little mixing of existing VM environments with new container environments (which generally tend to be using Kubernetes, quite often under the auspices of Red Hat’s OpenShift).

Original Comments

VK

It is an interesting update. Would like to know if they will extend the support to other docker eco-system components/options, like Swarm or kube, networking , volume mapping options.

This is more a question. IMO, a container solutions need container eco-system of components for their POV, scale and future roadmap. Support to Docker alone is a limited option. what do you think? Cattle needs a helmsman as Kubernetes would say!

CS

Since they’re doing an implementation of the Docker APIs it will work with many other parts of the ecosystem.

It’s explicitly intended to work with Kubernetes, Swarm etc., and if you look at VMware’s broader strategy around cloud native applications (including things wearing the Photon badge) then there’s explicit support for multiple orchestration and scheduling systems.

Things get a little more tricky with aspects like networking. Firstly VMware is trying to position products like NSX as offering better capability than networking based on a host Linux kernel, and secondly any networking that assumes multiple containers sharing the same kernel (which is most Docker networking right now) will run into trouble in an environment where each container has its own kernel.

VK

Thanks.
Anything that works with kube is and should be good (read bais)
NSX is good and rightly complex and solves more than container networking. It addresses the entire enterprise L3 overlay architecture and so on.
but…It is imperative for VMware to share the licensing model easy & upfront, if we are to solution with them mixing with opensource.
Please correct me if I am in isolation saying this. The whole solutioning gets tedious if the licensing info is not available- simple and easy. I call this the Oracle-fuss.
Asking for ‘deal size’, ‘we can work it out’ responses.

To circle back to your main post, project lightwave from VMware is suppose to address the LDAP requirement of photon and in turn their container solution.
If VIC -vSphere level REST support is published, it will Enroute Agility container support


Originally posted internally 12 Feb 2016, this was the first of what became a series of posts where I took an email reply to a broader audience. Global Infrastructure Services (GIS) was the half of CSC that I worked in before the creation of the Global Delivery Organisation (GDO) in DXC Technology:

JP Morgenthal sent out a note asking a bunch of people for their views on DevOps at DXC Technology. Here’s my reply to him:


In my IPexpo presentation some 16 months ago ‘What is DevOps, and why should infrastructure operations care?‘ I started out by saying ‘DevOps is an artefact of design for operations’, and highlighting the need for culture change (and only now do I see the typo in that deck).

With that in mind I’d say that GIS is at the start of the journey. We have a plan to redesign the organisation for operations, and we have the intent to change our culture (the way we do things around here), but activity on the ground is only just moving from planning to execution.

In my first sit down with Steve Hilton his list of priorities were: automation, automation, automation. Breaking this down a little:

1/ Operational Data Mining (ODM), which is the process by which we’re taking the data exhaust from the IT Service Management (and ancillary) systems that we look after and applying big data analysis tools to get insight into experiments that we should run to improve our operations across people, process and tools. It’s early days, but I’d say that so far we’ve been most successful in the people and process areas because we’ve found it hard to get out of our own way when it comes to tool deployment. This is a salutary reminder that DevOps is not (just) about a bunch of tools, because we’ve been able to have real impact without any tool changes (e.g. changing shift patterns to avoid a start of day overload has greatly improved SLA conformance without requiring any additional staff [or any other changes]).

2/ Becoming more responsive – this is the part where the Automation 1.5 programme kicks in, and the tools start to be meaningful. Some of it’s about streamlining how we do ITSM, and I’d note that there’s still a bunch of ITIL happening there. The rest is about ‘every good systems administrator should replace themselves with a script’, and providing the framework to create, curate, share and deploy those scripts. The SLAM.IO team in KH’s group have made great progress on wrapping triage scripts (that use Ansible) so that operators don’t have to yak shave their way through the same command line interactions every time they look at a box. This is a precursor to API driven integration, where the triage gets done before a human even sees the ticket.

3/ Becoming more proactive – once we step beyond incidents and problems we’re looking at deliberate change, and making that a push button repeatable operation rather than manually ploughing through runbooks. Agility takes centre stage here from a tooling perspective, but we also have Hanlon to help us deal with bare metal (before it becomes a cloud that Agility can work with) along with the output from DB’s Automation 2.0 team (which is in the process of being partly open sourced).

Upskilling our organisation will be a key part of the culture change, which is why I’ve been writing about the ‘Four Pillars of Modern Infrastructure‘ and encouraging people to learn Git/GitHub/Ansible/Docker and AWS. I’m now working with CK and GS on a DevOps bootcamp workshop that we’ll start delivering to the PODs next month. This follows on from work by GR, DE and others before I arrived.

We also need to be more data driven. ODM is a part of that, but more broadly I’m challenging people to ask ‘What would Google do?‘.

It’s also important to recognise that the shift to infrastructure as code doesn’t end with the code. Our code must be supported by great documentation, samples and examples to be truly effective, which is why I’ve been very clear about what I value.

This is of course a very high level view. You’re going to find some great examples of agile development and CI in the development centric parts of GIS, and I’ll leave it to the individual leaders in those areas to explain in more detail.

Retrospective

This was the first time I used ‘Design for Operations’ in the context of a relatively wide internal audience, and I’m super pleased that it’s since become a common part of how we talk internally and externally about what we do and how we’re changing. The post came shortly before the publication of The DevOps Handbook, so although I was familiar with the ‘3 DevOps ways’ of ‘Flow, Feedback and Continuous Learning by Experimentation’ from The Phoenix Project I didn’t call them out specifically. As I was later setting up the Operations Engineering (OE) group I took an approach of ‘All in on ODM’, making continuous learning by experimentation applied to operation constraints the tip of our change spear; and I generally think that improvements to flow and feedback come naturally from that. Automation 1.5 was a Death Star that we stopped building, and Automation 2.0 became part of the overall OE direction. 15 months later, and we now have a delivery organisation that’s no longer divided into Dev and Ops, so with respect to Conway’s Law things are now pointed in the right direction. I’ve also found some awesome (DevOps) engineers in our Newcastle Digital Transformation Centre who have a can do attitude, and the skills to make stuff happen.

Original Comments:

MN

This is exactly the type of aspirational view we need for the digital enterprise.

GIS solutions have to provide the foundation for these capabilities in order to elevate the value chain.

Many parts of the traditional role of GIS (equipment integration) are approaching utility if not already there.  If you look at AWS alone, they are attacking some very mature and integrated business models with their offerings, following plans that look very much like this evolution.

Very well put Chris.

 

MH

I blogged about Automation a while back Manual tasks of today should be the Automated tasks of tomorrow (link lost to the demise of C3). A couple of interesting takes on it.


Originally published internally 26 Jan 2016:

Last week Docker Inc acquired Cambridge based Unikernel System Ltd, which has got a lot of people asking ‘what’s a unikernel?’, a question that’s well covered in the linked piece. Going back a few years I covered the launch of Mirage OS, which is the basis for what Unikernel Systems do – they’ve since interfaced it with the Docker API so that unikernels could be managed as if they’re containers.

The acquisition caused Joyent’s CTO Bryan Cantrill to write that Unikernels are unfit for production, where he restates some points that he made when I interviewed him at QCon SF in November. Bryan makes a good point about debugging, but I think there are cases where Unikernels don’t really need to be debugged (and Bryan pretty much made the point when talking about ‘correct software’ when we spoke), which is the essence of ‘Refereeing the Unikernels Slamdown‘.

It’s worth noting that DXC Technology has a (very specialised) dog in this fight with our open source Hanlon project it’s not actually a unikernel (as it works with a regular Linux kernel), but it might be argued that it’s on the unikernel spectrum (and for further exploration of that space take a look at some of the presentations from OperatingSystems.io on topics like rump kernels)

Retrospective

Unikernels haven’t taken over the world, but they’re usefully doing the ‘correct software’ job in things like Docker for Mac. The recent release of LinuxKit also shows that Docker Inc is investing in other places along the ‘unikernel spectrum’ that I referred to, making it easy to build stripped down containers that sit on top of Linux, but aren’t strictly unikernels.

Original Comments

MH

There are number of slides and videos around Unikernels just posted from Docker at SCALE-14 (Lunix Meetup) posted at Recap: Docker at SCALE 14x | Docker Blog

TM

In my view, Chris, the Hanlon-Microkernel project is an example of a Docker container that we deploy (and run) dynamically during the process of iPXE-booting perfectly normal (albeit small) Linux kernel. To provide a bit more detail, we use the RancherOS Linux distribution (a Docker-capable Linux kernel that has a total size, for both the kernel image and it’s RAM disk, of approximately 22MB) as our iPXE-boot kernel and dynamically inject the Hanlon-Microkernel Docker container image into that Linux kernel at boot using a cloud-config that is supplied by the Hanlon server.

A Unikernel (from my understanding), is really just a Linux kernel that has been stripped down to the minimal packages and services that are necessary to run a single application.  In my mind, that is quite different from the approach that is taken by RancherOS (or TinyCore Linux) where a “regular” Linux distribution is compressed to boot quickly (and often run in memory).  In those operating systems you typically have all of the same processes available to you (including standard Linux commands and even services like SSH), giving you much greater access to the system if you need to debug something that has gone wrong in that system.  I guess you could make the argument that it’s in the “unikernel spectrum”, but I tend to think of the approaches taken by Unikernels as being quite different from the approaches we’ve taken for years now to make small kernels (which are typically intended to run multiple services, not just one service).  Just my 0.02 (in your favorite local currency)…

MN

Cantrill was belaboring the use case with points that would fit in the early days of VMware.  It evolved though.  As will Unikernel.

Thing is, its in the toolbag now.

NB

‘Unikernels will send us back to the DOS era’ – DTrace guru Bryan Cantrill speaks out • The Register

MN

I enjoyed the 4th paragraph quotes.

CS

Some great stuff from Brendan Gregg on Unikernel Profiling


Originally posted internally 12 Jan 2016:

What Would Google Do? It’s a good generic question when considering any problem in the IT space.

Often the answer is pretty obvious, where Google’s already doing something (and better still if it’s published the hows, whats and whys). Other times there’s a shape of an answer, where Google can be seen to be doing something, but it’s less clear how they’re doing it.

There is also a generic answer. Google is a data driven organisation (arguably often to the point of damaging itself and its users), so the answer to all questions is driven by data. If there’s no data then the first job is to get the data – making the mechanism to source the data if need be.

 


 

The alternative to WWGD is the HiPPO – the Highest Paid Person’s Opinion. There are a few problems with HiPPOs, which is why they’re best consigned to 60’s fictional characters like Don Draper rather than the decision making processes of modern organisations:

  • HiPPOs are expensive
  • HiPPOs are a bottleneck to decision making
  • HiPPOs are subject to all kinds of human frailties that might misalign their opinions with the realities of the world around them, not least the ‘tyranny of expertise’

Google have a saying for dealing with this, ‘don’t bring an opinion to a data fight’.

Retrospective

As we build out operational data mining (ODM) and built the operations engineering team (OE, now OE&E) to support that there was a palpable shift in the culture of the organisation from being opinion driven to data driven. This has been empowering for front line staff, and generally made DXC Technology less political and hence a nicer place to work. As we set about building OE there were two aspects of Google practice that we borrowed from heavily. The first was Site Reliability Engineering (SRE) and the second was Google’s ‘data bazaar’ Goods.

Original comments

NB:

HiPPO is the road to irrelevance.

‘The best ideas win, independent of titles: In a social business, ideas and information flow horizontally, vertically, from the bottom and from the top; throughout the business. Ideas are like sounds, and they should be heard through the seams of the social fabric. In the absence of sound, ideas die. The most damaging syndrome is the HIPPO (highest paid person’s opinion) syndrome, whereby all the decisions are ultimately dictated by the biggest title. The best ideas must win. That’s the biggest benefit of being social.’

Recognizing Good Ideas (link broken by demise of C3, referenced HBR’s ‘Innovation Isn’t an Idea Problem‘)

Re-examine how you tackle tough problems, and make important decisions.  “Decision Making By Hippo”  that is, following the lead of the most highly paid person simply because they are in that position, is a very bad idea.  Instead, the intelligence and capability of all the organization members can, and should, be tapped.- Andrew McAfee

If only CSC knew what CSC knows (link broken by demise of C3, referenced HP’s former CEO Lew Platt, “If only HP knew what HP knows, we would be three times more productive.”)

LEF paper ‘Energizing and Engaging Employees – Social media as a source of management innovation‘ (page 35)