LessOps

09Aug17

JeffConf have posted the video from my talk there on LessOps (or should that be ‘LessOps), which is how I see operations working out in a world of ‘serverless’ cloud service:

The full playlist is here, and I’ve also published the slides:


In a note to my last post ‘Safety first‘ I promised more on this topic, so here goes…

TL;DR

As software learns from manufacturing by adopting the practices we’ve called DevOps we’ve got better at catching mistakes earlier and more often in our ‘production lines’ to reduce their cost; but what if the whole point of software engineering is to make mistakes? What if mistake is the unit of production?

Marginal cost

Wikipedia has a pretty decent definition of marginal cost:

In economics, marginal cost is the change in the opportunity cost that arises when the quantity produced is incremented by one unit, that is, it is the cost of producing one more unit of a good. Intuitively, marginal cost at each level of production includes the cost of any additional inputs required to produce the next unit.

This begs the question of what is a ‘unit of a good’ with software?

What do we make?

Taking the evolutionary steps of industrial design maturity that I like to use when explaining DevOps it seems that we could say the following:

  • Design for purpose (software as a cottage industry) – we make a bespoke application. Making another one isn’t in incremental thing, it’s a whole additional dev team.
  • Design for manufacture (packaged software) – when software came in boxes this stuff looked like traditional manufactured goods, but the fixed costs associated with the dev team would be huge versus the incremental costs of another cardboard box, CD and set of manuals. As we’ve shifted to digital distribution marginal costs have tended towards zero, so thinking about marginal cost isn’t really useful if we’re thinking that the ‘good’ is a given piece of packaged software.
  • Design for operations (software as a service/software based services) – as we shift to this paradigm then the unit of good becomes more meaningful – a paying user, or a subscription. These are often nice businesses to be in as the marginal costs of adding more subscribers are generally small and can scale well against underlying infrastructure/platform costs that can also be consumed as services.

The cost of mistakes

Mistakes cost money, and the earlier you eliminate a mistake from a value chain the less money you waste on it. This is the thinking that lies at the heart of economic doctrine from our agricultural and industrial history. We don’t want rotten apples, so better to leave them unpicked versus spending effort on harvesting, transportation etc. just to get something to market that won’t be sold. It’s the same in manufacturing – we don’t want a car where the engine won’t run, or the panels don’t fit, so we’ve optimised factory floors to identify and eliminate mistakes as early as possible, and we’ve learned to build feedback mechanisms to identify the causes of mistakes and eliminate them from designs (for the product itself, and how it’s made).

What we now label ‘DevOps’ is largely the software industry relearning the lessons of 20th century manufacturing – catch mistakes early in the process, and systematically eliminate their causes.

Despite our best efforts mistakes make it through, and in the software world they become ‘bugs’ or ‘vulnerabilities’. For any sufficiently large code base we can start building statistical models for probability and impact of those mistakes, and we can even use the mistakes we’ve found already to build a model for the mistakes we’ve not found yet[1].

Externality and software

Once again I can point to a great Wikipedia definition for externality:

In economics, an externality is the cost or benefit that affects a party who did not choose to incur that cost or benefit. Economists often urge governments to adopt policies that “internalize” an externality, so that costs and benefits will affect mainly parties who choose to incur them.

Externalities, where the cost of a mistake don’t affect the makers of the mistake, happen a lot with software, and particularly with packaged software and the open source that’s progressively replaced it in many areas. It’s different at the other extremes. If I build a trading robot that goes awry and kills my fund then the cost of that mistake is internalised. Similarly if subscribers can’t watch their favourite show then although that might initially look like an externality (the service has their money, and the subscriber has to find something else to do with their time) it quickly gets internalised if it impacts subscriber loyalty.

Exploring the problem space

Where we really worry the most about mistakes in software is when there’s a potential real world impact – we don’t want planes falling out of the sky, or nuclear reactors melting down etc. This is the cause of statements like, ‘that’s fine for [insert thing I’ll trivialise here], but I wouldn’t build a [insert important thing here] like that’.

Software as a service (or software based services) can explore their problem space all the way into production using techniques like canary releases[2]. People developing industrial control systems don’t have that luxury (as impact is high, and [re]release cycles are long), so they necessarily need to spend more time on simulation and modelling thinking through what could go wrong and figuring out how to stop that. This dichotomy can easily distil down to a statement on the relative merits of waterfall versus agile design approaches, which Paul Downey nailed as:

Agile: make it up as you go along.
Waterfall: make it up before you start, live with the consequences.

It can be helpful to look at these through the lens of risk. ‘Make it up as you go along’ can actually make a huge amount of sense if you’re exploring something that’s unknown (or a priori unknowable), which is why it makes so much sense for ‘genesis’ activities[3]. ‘Live with the consequences’ is fine if you know what those consequences might be. In each case the risk appetite can be balanced against an ability to absorb or mitigate risk.

This can be where the ‘architecture’ thing breaks down

We frequently use ‘architecture’ when talking about software, but it’s a word taken from the building industry, and professional architects get quite upset about their trade moniker being (ab)used elsewhere. When you pour concrete mistakes get expensive, because fixing the mistake involves physical labour (with picks and shovels) to smash down what was done wrong before fresh concrete can be poured again.

Fixing a software mistake (if it’s caught soon enough) is nothing like smashing down concrete, which is why as an industry we’ve invested so much in moving towards continuous integration (CI) and related techniques in order to catch mistakes as quickly and cheaply as possible.

Turning this whole thing around

What if the unit of production is the mistake?

What then if we make the cost per unit as low as possible?

That’s an approach that lets us discover our way through a problem space as cheaply as possible. To test what works and find out what doesn’t – experimentation on a massive scale, or as Edison put it:

I’ve not failed. I’ve just found 10,000 ways that won’t work.

What we see software as a service and software based services companies doing is finding ways that work by eliminating thousands of ways that don’t work as cheaply and quickly as possible. The ultimate point is that their approach isn’t limited to those types of companies. When we simulate and model we can discover our way through almost any problem space. This is what banks do with the millions of ‘bump runs’ through Monte Carlo simulation of their financial instruments in every overnight risk analysis, and similar techniques lie at the heart of most science and engineering.

Of course there’s still scope for ‘stupid’ mistakes – mistakes made (accidentally or intentionally) when we should know better. This is why a big part of the manufacturing discipline now finding its way into software is ‘it’s OK to make mistakes, but try not to make the same mistake twice’.

Wrapping up

As children we’re taught not to make mistakes – for our own safety, and throughout our education the pressure is to get things right. With that deep cultural foundation it’s easy to characterise software development as a process that seeks to minimise the frequency and cost of mistakes. That’s a helpful approach to some degree, but as we get to the edges of our understanding it can be useful to turn things around. The point of software can be to make mistakes – lots of them, as quickly and cheaply as possible, because it’s often only by eliminating what doesn’t work that we find what does.

Acknowledgement

I’d like to thank Open Cloud Forum’s Tony Schehtman for making me re-examine the whole concept of margin cost after an early conversation on this topic – it’s what prompted me to go a lot deeper and figure out that the unit of production might be the mistake.

Notes

[1] ‘Milk or Wine: Does software security improve with age?
[2] I’d highly recommend Roy Rapoport’s ‘Canary Analyze All The Things: How We Learned to Keep Calm and Release Often‘, which explains the Netflix approach.
[3] Oblique reference to Wardley maps, where I’d recommend a look at: The OSCON videoThe CIO magazine articleThe blog introThe (incomplete) book (as a series of Medium posts by chapter)Chapter 2 has the key stuff about mapping, and The online course


Safety first

27Jul17

Google’s Project Aristotle spent a bunch of time trying to figure out what made some teams perform better than others, and in the end they identified psychological safety as the primary factor[1]. It’s why one of the guiding principles to Modern Agile is ‘Make Safety a Prerequisite’.

The concept of safety comes up in Adrian Cockcroft’s comment on innovation culture that I referenced in my Wage Slaves post:

Here’s an analogy: just about everyone knows how to drive on the street, but if you take your team to a racetrack, sit them in a supercar and tell them to go as fast as they like, you’ll get three outcomes.

  1. Some people will be petrified, drive slowly, white knuckles on the steering wheel, and want to get back to driving on the street. Those are the developers that should stay in a high process, low risk culture.
  2. Some people will take off wildly at high speed and crash on the first corner. They also need process and structure to operate safely.
  3. The people that thrive in a high performance culture will take it easy for the first few laps to learn the track, gradually speed up, put a wheel off the track now and again as they push the limits, and enjoy the experience.

Numbering added by me for easier referencing

This unpacks to being about risk appetite and approach to learning, and it’s all a bit Goldilocks and the three bears:

  1. The risk appetite of the go slow racer is too cold, which means that they don’t create opportunities for learning.
  2. The risk appetite of the crash and burn racer is too hot, they too don’t create opportunities for learning.
  3. The risk appetite of the progressive racer is just right. They create a series of learning opportunities as they explore the limits of the car, the track and their skill.

This is where I think I’m going to diverge from Adrian’s view, and I’m somewhat taking what he says at face value, so there will inevitably be nuance that I’ve missed… I read Adrian as saying that forward leaning companies (like Netflix and Amazon) will set up their hiring, retention and development to favour type 3 people – the ‘natural’ racers.

I have a problem with that, because ‘natural’ talent is mostly a myth. If I think back to my own first track day (thanks Borland) I’d have been picked out as a type 2. I don’t know how many times I spun that VX220 out backwards from a corner on the West track at Bedford Autodrome, but it was a lot, and the (very scared looking) instructor would surely have said that I wasn’t learning (at least not quickly enough).

I returned to the same track(s) a few years later and had a completely different experience. The lessons from the first day had sunk in, I’d altered the way I drove on ordinary roads, I’d bought a sports car and learned to be careful with it, I’d spent time playing racing games on PCs and consoles. Second time out I’d clearly become a type 3 – maybe not the fastest on the track, but able to improve from one lap to the next, and certainly not a danger to myself and others.

So it seems that the easy path here is pick out the type 3s; but there’s another approach that involves getting the type 1s to take on more risk, and getting the type 2s to reign it in a little. Both of these activities happen away from the track, in a safer environment – the classroom and video games (or their equivalents) that let people explore their risk envelope and learning opportunities without threat to actual life or limb; somewhere that the marginal cost of making mistakes is low[2].

The story doesn’t end there. Once we have our type 3s (either by finding them or converting them) there’s still plenty that can be done for safety, and the racing world is a rich source for analogy. Bedford Autodrome is reputed to be one of the safest circuits in the world. It’s been purpose designed for teaching people to race rather than to be used as a venue for high profile competitions. Everywhere that you’re likely to spin out has been designed so that you won’t crash into things, or take off and crash land or whatever. So we can do things to the environment that ensure that a mistake is a learning experience and not a life ending, property destroying tragedy.

Some though should also be given to the vehicles we drive and the protective clothing we wear. Nomex suits, crash helmets, tyre tethers, roll over bars – there have been countless improvements in racing safety over the years. When I watched F1 back in the days of James Hunt it felt like every race was a life or death experience. We lost Aytron Senna, and Niki Lauda still wears the scars from his brush with death; it’s much better that I can watch Lewis Hamilton take on Sebastian Vettel pretty sure that everybody will still be alive and uninjured as the checkered flag gets waved. It’s the same with software, as agile methodologies, test driven development (TDD), chaos engineering and continuous integration/delivery (CI/CD) have converged on bringing us software that’s less likely to crash, and crashes that are less likely to injure. It’s generally easier to be safer if we use the right ‘equipment’.

This connects into the wider DevOps arc because the third DevOps way is continuous learning by experimentation. Learning organisations need to be places where people can take risk, and most people will only take risk when they feel safe. There may be some people out there who are ‘naturals’ at calibrating their approach to risk and learning from taking risks, but I expect that most people who seem to be ‘naturals’ are actually people who’ve found a safe environment to learn. So if we want learning organisations we must create safe organisations, and do everything we can to change the environment and ‘equipment’ to make that so.

Notes

[1] For more on Aristotle and its outcome check out Matt Sakaguchi’s QCon presentation ‘What Google Learned about Creating Effective Teams‘ and/or the interview he did with Shane Hastie on the ‘Key to High Performing Teams at Google‘.
[2] This is a huge topic in its own right, so I’ll cover it in a future post.


Wage Slaves

26Jul17

I recently had the good fortune of meeting Katz Kiely and learning about the Behavioural Enterprise Engagement Platform (BEEP) that she’s building. After that meeting I listened to Katz’s ‘Change for the Better‘ presentation, which provided some inspiring food for thought.

Katz’s point is that so much human potential is locked away by the way we construct organisations and manage people. If we change things to unlock that potential we have a win-win – happier people, and more productive organisations. It’s not hard to see the evidence of this at Netflix, Amazon (especially their Zappos acquisition), Apple etc.

The counter point hit home for me on the way home as I read an Umair Haque post subtitled ‘Slavery, Segregation and Stagnation‘. His observation is that the US economy started based on slavery, then moved to a derivative of slavery, then moved to a slightly different derivative of slavery. Student debt and the (pre existing) conditions associated with health insurance might not be anywhere near as bad as actual slavery, but they’re still artefacts of a systemically coercive relationship between capital and labour. Coercion might have seemed necessary in a world of farm hands and factory workers (though likely it was counterproductive even then), but it’s the wrong way to go in a knowledge economy.

Adrian Cockcroft puts it brilliantly in his response to (banking) CIOs asking where Netflix gets its amazing talent from, “we hired them from you and got out of their way”. He goes on to comment:

An unenlightened high overhead culture will drag down all engineers to a low level, maybe producing a third of what they would do, working on their own.

Steve Jobs similarly said:

It doesn’t make sense to hire smart people and then tell them what to do; we hire smart people so they can tell us what to do.

So the task at hand becomes to build organisations based on empowerment rather than coercion, and that starts with establishing trust (because so many of the things that take power away sprout from a lack of trust).

 


In a footnote to yesterday’s application intimacy post I said:

in time there will be services for provisioning, monitoring and logging, and all that will remain of ‘infrastructure’ will be the config of those services; and since we might treat that config as code then ultimately the NoOps ‘just add code – we’ll take care of the rest’ dream will become a reality. Barring any surprises, that time is likely something in the region of 5 years away.

That came from an extensive conversation with my colleague Simon Wardley on whether NoOps is really a thing. The conversation started at Serverlessconf London where I ended up editorialising the view that Serverless Operations is Not a Solved Problem. It’s worth pointing out a couple of things about my take on Simon’s perspective:

  1. Simon sees DevOps as a label for the (co-evolved) practices emerging from IaaS utilisation, and hence it’s not at the leading edge as we look to a more PaaS/FaaS future.
  2. Simon is a great visionary, so what he expects to come true isn’t the same as what’s actually there right now.

This whole debate was due to come up once again at London CloudCamp on 6 July at an event titled “Serverless and the death of DevOps“. Sadly I’m going to miss CloudCamp this time around, but in the meantime the topic has taken on a life of its own in a post from James Governor:

it’s a fun event and a really vibrant community, but the whole “death of devops” thing really grinds my gears. I blame Simon Wardley. 😉

Whilst not explicitly invoking Gene Kim and the ‘3 Ways’ of DevOps (Flow, Feedback and Continuous Learning by Experimentation); it seems that James and I are on the same page about the ongoing need to apply what manufacturing learned from the 50s onwards to today’s software industry (including Serverless).

Meanwhile Paul Johnston steps in with an excellent comment and follows up with a complete post ‘Serverless is SuperOps‘. In his conclusion Paul says:

Ops becomes your primary task, and Dev becomes a the tool to deliver the custom business logic the system needs.

I think that’s a sentiment born from the fact that (beyond trivial use cases) using Serverless right now is just the opposite of NoOps; the ops part is really hard, and ends up being the majority of the overall effort. There may no longer be a need to worry about VMs and OSes and patching and all of those IaaS concerns (that have in many cases been automated to the point of triviality); but there’s still a need to worry about provisioning, config management, logging and monitoring.

Something that Paul and I dived into recently are some of the issues around testing. Paul suggests ‘The serverless approach to testing is different and may actually be easier‘, but concludes:

we’re currently lacking the testing tools to really drive home the value.—looking forward to when they arrive.

I asked him, “How do you do canarying in Serverless?“, which led to a well thought through response in ‘Serverless and Deployment Issues‘. TL;DR canarying is pretty much impossible right now unless you build your own content router, which is something that’s right up there on the stupid and dangerous list; this is stuff that the platform should do for you.

Things will be better in the future. As Simon keeps pointing out the operational practices will co-evolve with the technologies. Right now Serverless is only being used by the brave pioneers, and behind them will come the settlers and town planners. Those later users won’t come until stuff like canarying has been sorted out, so the scope of what a Functions as a Service (FaaS) platform does will expand, and the effort to make things work will correspondingly contract. In due course it’s possible that if we look at it just right (and squint a bit) we could call that NoOps. Of course to do that we will have had to learn how to encode everything we want to do with provisioning, logging and monitoring into the (infrastructure as code) config management; we will have had to teach the machine (or at least the machine learning behind it) how to care on our behalf. Until then, as Charity Majors says – ‘you can’t outsource caring‘.


This is another post that’s a recycled email, one which started out with the title: ‘Our share of the cloud shared responsibility model (and the need for application intimacy)’

The original email came from a number of discussions in the run up to the DXC merger, and I must thank many of my CTO and Leading Edge Forum (LEF) colleagues for their input. It was addressed to the Offering General Managers covering Cloud (for the infrastructure centric bottom up view) and Application Services (for the more top down connection of business need to application, which is nicely summed up in JP’s Business Experience Management paper).

TL;DR

The IT landscape in front of us will be all about the apps, and we need to get intimacy with those apps to earn our share of the cloud shared responsibility model.

Assumptions

To set the scene let me lay out some assumptions:

  1. The march of public cloud might look unstoppable right now because it’s in the early linear section of its adoption S curve, but it will stop. When it does stop it will have absorbed much of the workload that presently runs in our data centres and our customer’s data centres. The infrastructure revenue that we have associated with that is going away, never to return, and we’ll get crumbs from the AWS/Azure/GCP table where we can as the partner for the billing process.
  2. We will willingly engage in the movement of that workload to win our place as the partner in the billing process[1].
  3. The place that we earn our keep going forward is bridging the gap between what our customers want, and what the clouds actually deliver. For the time being there might be dreams of ‘NoOps’, but even in a world of Serverless functions it turns out that there’s still a need for provisioning, config management, monitoring and logging[2] – the things that we wrap up today as (integrated digital) service management. Our customers want NoOps, but it will be some time yet before they get it straight from a cloud provider, which is why they’ll turn to us.

Our Share of Shared Responsibility

The key here is something that we might call ‘application intimacy’. Doing what it takes to run an application (any application) requires a high touch contextual understanding of what the app is and does, when it’s performing well and performing badly, how it’s helping a business win or holding them back. Cloud service providers don’t do application intimacy, which is the whole point of their shared responsibility model. We generally talk about the shared responsibility model in the context of security, but it also extends into pretty much every aspect of operations. The shared responsibility model is the line where the cloud service provider says ‘our work here is done – your problem now’, and that’s where we step in because our customers still want that problem to be somebody else’s problem.

Clearly we can gain application intimacy forensically – it’s what we do within transition and transformation (T&T) for any IT Outsourcing (ITO); but there’s also an obvious opportunity to gain application intimacy organically – as we build apps from scratch or help customers (re)define their application construction and testing pipelines (aka CI/CD).

Application Intimacy

So… the call to action here is to orient our new company around application intimacy – it needs to be in the heart of our strategy, our messaging, our organisation, our offerings, our culture. If we can win at application intimacy then we take our share of the shared responsibility model, and earn a rightful place at the table alongside the cloud service providers.

Notes

[1] and right now the cloud providers see that they need our help – AWS is looking to drive 60%+ of their global business to GSIs who have differentiated offerings in the marketplace (e.g. app intimacy), strong vertical/ industry expertise, and c-suite relationships.
[2] in time there will be services for provisioning, monitoring and logging, and all that will remain of ‘infrastructure’ will be the config of those services; and since we might treat that config as code then ultimately the NoOps ‘just add code – we’ll take care of the rest’ dream will become a reality. Barring any surprises, that time is likely something in the region of 5 years away.


I just had to Google ‘outro‘ to confirm it’s actually a word (as the Chrome spellcheck thinks not).

Yesterday marked the end of the back catalogue of posts that I’d originally made on C3, so it’s back to normal service here on the blog.

If you’d like a list for this series of posts then check out the pingbacks to the intro post.


Originally posted internally 13 Dec 2016, this post marks the end of my journey through the back catalogue:

The dust has hardly settled from re:Invent, but today brings the first big public launch since AWS’s big annual event – AWS Managed Services.

This will be one of those blog posts that I hope saves me having to repeat myself in many emails, as I’ve already had people asking, “how we plan to partner / compete with this new service”, as “It will seem to people like direct competition”.

For me the key passage in the launch blog is:

Designed for the Fortune 1000 and the Global 2000, this service is designed to accelerate cloud adoption. It simplifies deployment,  migration, and management using automation and machine learning, backed up by a dedicated team of Amazon employees. AWS MS builds on AWS and provides a set of integration points (APIs and a set of CLI tools) for connection to your existing service management system. We’ve been working with a representative set of AWS enterprise customers and partners for the last couple of years in order to make sure that this service meets a very wide range of enterprise requirements.

So:

  • It’s for large companies (as SMEs don’t really do ITIL etc.).
  • It’s about integration with existing service management via APIs.
  • Even though there’s console functionality over the top of the APIs (to provide minimal viable accessibility) this couldn’t displace a service management system entirely.

We have been working with AWS on this, so we’re one of those partners mentioned above. This gives us the services at their end to properly bring AWS into the Integrated Digital Services Management (IDSM) fold. In many cases this is just giving us a uniform way to do something (and have it reported) where there were previously hundreds of different variations on the approach that could be taken (e.g. patch management).

Overall I don’t think this is AWS eating our lunch – it’s AWS making it easier for our customers to use them (and easier and more consistent for us to help them do that).

Original Comments

CN

I was being a bit facetious with the “eating our lunch” comment :-)  Perhaps not enough coffee after being stuck trying to get into Chorley after a truck spontaneously combusted on the M6.

Workloads are going to be deployed on AWS/Azure more and more (not so much on Cisco’s InterCloud Services…)

So it’s good to know the answer to “how do you provide operational support. patching etc of workload OS’s in your hybrid solutions?”

is “exactly the same way we do for private cloud/on-premises workloads*  “

*from the client/users’ perspective

MH

Re Cisco Interconnect – Article on The Register this morning

Cisco to kill its Intercloud public cloud on March 31, 2017 • The Register

NB

Is CSC part of this also CS?

What’s the Role of AWS Consulting Partners in AWS MS?

‘APN Partners were key in the development of this service, and play an active role in the deployment and use of AWS MS. Having a standard operating environment not only fast tracks customer onboarding, but creates many different opportunities for APN Partners to enable and add value for AWS MS customers. In the coming weeks, we will also be launching a new AWS Managed Services designation as part of the AWS Services Delivery Program for APN Partners (stay tuned to the APN Blog for more information to come).

Key to the integration and deployment of AWS MS, AWS Consulting Partners enable Enterprises to migrate their existing applications to AWS and integrate their on-premises management tools with their cloud deployments. Consulting Partners will also be instrumental in building and managing cloud-based applications for customers running on the infrastructure stacks managed by AWS MS. Onboarding to AWS MS typically requires 8-10 weeks of design/strategy, system/process integration, and initial app migration, all of which can be performed by qualified AWS Consulting Partners. In order to participate, APN Partners will need to complete either the AWS Managed Service Provider Program validation process, and/or earn the Migration or DevOps Competency, as well as complete the specialized AWS MS partner training.’

Introducing AWS Managed Services

CS

Yes – specifically our AWS services under BR

HS

(Since I appear to have been invited to comment)

It’s like in the old days of B2B e-commerce hubs … there was always that ‘on ramp’ work. Some B2B Exchanges made that easy for customers. Other’s did not. The ones that did further cemented their position. The warning for CSC, at the time (mid 90s), was this: We did supply chain consulting, but supply chain became a service (the B2B hubs). Since CSC consulting supply chain practice defined itself as “selling people to do supply chain work” we never got into the game of *being* the B2B hub … even if we did do B2B implementation work. It’s the same with clouds, but on a much larger scale. If we define ourselves, and therefore limit ourselves, in a narrow role, we’ll be like the horse buggy whip makers in the era when the first auto cars were coming in.

It was inevitable that AWS would launch such Managed Services of course. The long long history of ‘cloud impacts on our business’ documented in C3 is a legend in its lifetime. Let’s move on. We cannot remain an infrastructure operations services company, surely? Surely that message is now several years overdue? Surely the declining part of the traditional market is plain to see? Why would anyone be surprised?

Markets evolve like this:

– first Commodities

– then Products

– then Services

– then Experiences

– then Amenities

Make money at one level, and competition eventually pushes you to the next level, where you need to find new blue oceans. So AWS is moving from ‘product’ to ‘service’ in some areas. This might sound theoretical, and it is a broad generalisation, but the commoditization AND servitization (look it up) together combine to cause us to change our offers. It’s like the pet shop that no longer sells pet grooming products but does the pet grooming as a service.

We can keep on trying to hang onto old models, to retain some revenue and opportunity, but the time will come when a complete break must be taken. And one of the best clean break strategies there is is this: *be* the element of transformation. Take the clients on the journey. That’s an ‘experience’ to give them and it often positions yourself for whatever they ask for next.

Had the supply chain consultants at CSC in the mid 90s realised this they would still be here. But they are not. They let a trend destroy them for they defined their role too narrowly. They did not understand that much of what they did could be automated away via the function of the new B2B hubs. As a result, the practice in CSC started to decline. At first they found it harder to find work, which they interpreted as a downturn in the market for their skills, so they did the logical thing, reduced the size of the supply chain practice. Over time, it eroded to very little, a few people hanging on doing some niche work on the edges of the market.

NF

backed up by a dedicated team of Amazon employees

Is this just an additional revenue earner for AWS rather than bringing in their partner network more?

CS

I read that as firstly pandering to a group that expects people to be massaging their egos, but the reality is that they put together a 2pizza to maintain the cohesion of this on top of an ever growing services estate.

So this isn’t the staffed by Amazon help desk.


Originally posted internally 28 Nov 2016:

Background

This is another one of those blog posts where the same question has come up multiple times in the past few weeks, so there’s probably a broader audience for the discussion.

Where do I anchor my trust?

The point of a blockchain is to anchor trust against proof of work (Bitcoin style) or proof of stake (Etherium style) – if you already have a trust anchor then you don’t really need a blockchain. Examples of trust anchors that we frequently do have are TPMs, HSMs and CAs[1] (including Active Directory) – in fact anywhere that there’s an existing identity ecosystem there will be existing trust anchors for that identity ecosystem.

So why all the fuss about Blockchain?

Blockchains seem to be driving a wedge into places where it’s been difficult to federate trust/identity, but I’d suggest that Santander and Goldman Sachs walking away from R3 might at least in part be because they’ve figured out that fancy crypto doesn’t solve political problems.

But I still want a secure audit trail..

Signed audit trails are of course still a good idea, and aren’t anywhere near widely used. The question here is whether the root for that signing needs to be a distributed trust mechanism, or a simple key store.

Note

[1] The public CAs (as trusted by popular web browsers) are right now a total mess. There are far too many examples of negligence and malfeasance. This is in fact one area where Blockchain could be really useful (e.g. for authenticity validation and certificate revocation).

Retrospective

I think people are starting to get the whole trust anchor point here, and the fact that there’s generally little need to establish fresh trust. The conversation seems to be moving on to ‘distributed transaction ledger’, which to be fair to R3 is exactly the language used for the Corda launch.

Tim Bray recently wrote ‘I Don’t Believe in Blockchain‘, which provides further food for thought around the (lack of) geek ecosystem emerging around the technology. That said, some of the smartest people I know are presently beavering away at R3…

Original Comments

MW

You’ve probably seen this CS, interesting insight into government thinking for leveraging distributed ledger technology for delivery of government services, potentially significantly disruptive…..as well as proof of work and proof of stake, they’re arguably harder to compromise versus centralised systems and currency of shared data is assured..

https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/492972/gs-16-1-distributed-ledger-technology.pdf

Estonia are leading the way.

http://www.bbc.co.uk/news/technology-36276673


TL;DR

Greater automation is the future for the IT industry, and we’ve called DXC’s automation programme ‘Bionics’. It’s about being data driven with a flexible tool kit, rather than being all in on a particular vendor or product. To understand what we’re trying to achieve with Bionics (besides reading the rest of this post) I recommend reading The DevOps Handbook, and to get the foundation skills needed to contribute please run through the Infrastructure as Code Boot Camp [DXC only link].

Introduction

‘Bionics’ is the name that we’ve given to DXC Technology’s automation programme that brings together CSC’s ‘Operational Data Mining’ (ODM) and HPE ES’s ‘Project Lambroghini’. This post is written for DXC employees, and some of the links will go behind our federated identity platform, but it’s presented here on a public platform in the interest of ‘outside in'[1] communication that’s inclusive to customers and partners (and anybody else who’s interested in what we’re doing). What I’ll present here is part reading list, and part overview, with the aim of explaining the engineering and cultural foundations to Bionics, and where it’s headed.

Not a vendor choice, not a monoculture

The automation programme I found on joining CSC can best be described as strategy by vendor selection, and as I look across the industry it’s a pretty common anti-pattern[2]. That’s not how we ended up doing things at CSC, and it’s not how we will be working at DXC. Bionics is not a label we’re applying to somebody else’s automation product, or a selection of products that we’ve lashed together. It’s also not about choosing something as a ‘standard’ and then inflicting it on every part of the organisation.

Data driven

Bionics uses data to identify operational constraints, and then further uses data to tell us what to do about those constraints through a cycle of collection, analysis, modelling, hypothesis forming and experimentation. The technology behind Bionics is firstly the implementation of data analysis streams[3] and secondly a tool bag of automation tools and techniques that can be deployed to resolve constraints. I say tools and techniques because many operational problems can’t be fixed purely by throwing technology at them; it’s generally necessary to take an holistic approach across people, process and tools.

Scaleable

The constraints that we find are rarely unique to a given customer (or industry, or region) so one of the advantages we get from the scope and scale of DXC is the ability to redo experiments in other parts of the organisation without starting from scratch. We can pattern match to previous situations and what we learned, and move forward more quickly.

Design for operations

Data driven approaches are fine for improving the existing estate, but what about new stuff? The key here is to take what we’ve learned from the existing estate and make sure those lessons are incorporated into anything new we add (because there’s little that’s more frustrating and expensive than repeating a previous mistake just so that you can repeat the remedy). That’s why we work with our offering families to ensure that so far as possible what we get from them is turnkey on day 1 and integrated into the overall ‘Platform DXC’ service management environment for ongoing operations (day 2+). Of course this all takes a large amount of day 0 effort.

Required reading

What the IT industry presently calls ‘DevOps’ is largely the practices emerging from software as a service (SaaS) and software based services companies that have designed for operations (e.g. Netflix, Uber, Yelp, Amazon etc.). They in turn generally aren’t doing anything that would be surprising to those optimising manufacturing from Deming‘s use Statistical Process Control onwards.

Theory of constraints lies at the heart of the Bionics approach, and that was introduced in Goldratt‘s The Goal, which was recast as an IT story in Gene Kim (et al’s) The Phoenix Project. I’d suggest starting with Kim’s later work in the more prescriptive DevOps Handbook, which is very much a practitioner’s guide (and work back to the earlier stuff if you find it inspiring[4]).

The DevOps handbook does a great job of explaining (with case study references) how to use the ‘3 DevOps ways’ of flow, feedback and continuous learning by experimentation[5].

Next after the DevOps Handbook is Site Reliability Engineering ‘How Google Runs Production Systems’ aka The SRE Book. It does just what it says on the jacket, and explains how Google runs systems at scale, which has brought the concepts and practices of Site Reliability Engineering (SRE) to many other organisations.

Learning the basics of software engineering

The shift to automated operations versus the old ways of eyes on glass, hand on keyboards means that we need to write more code[6]; so that means getting ops people familiar with the practices of software engineering. To that end we have the Infrastructure as Code Boot Camp, which provides introductory material on collaborative source code management (with GitHub), config management (with Ansible) and continuous integration/continuous delivery (CI/CD) (with Jenkins). More material will come to provide greater breadth and depth on those topics, but if you can’t wait check out some of the public Katacoda courses.

Call to action

Read The DevOps Handbook to understand the context, and do the Infrastructure as Code Boot Camp to get foundation skills. You’ll then be ready to start contributing; there’s plenty more reading and learning to do afterwards to level up as a more advanced contributor.

Notes

[1] My first ‘outside in’ project here was the DXC Blogs series, where I republished a number of (edited) posts that had previously been internal (as explained in the intro). I’ll refer to some of those past posts specifically.
[2] I’ve been a huge fan of understanding anti-patterns since reading Bruce Tate’sBitter Java‘. Anti-patterns are just so much less numerous than patterns, and if you can avoid hurting yourself by falling down well understood holes it’s generally pretty easy to reach the intended destination.
[3] It’s crucial to make the differentiation here between streams and lakes. Streams are about working with data now in the present, whilst lakes are about trawling through past data. Lakes and streams both have their uses, and of course we can stream data into a lake, but much of what we’re doing needs to have action in the moment, hence the emphasis on streams.
[4] If you want to go even further back then check out Ian Miell’s Five Books I Advise Every DevOps Engineer to Read
[5] More on this at 3 Ways to Make DXC Better
[6] Code is super important, but it’s of little use if we can’t share and collaborate with it, which is why I encourage you to Write code. Not too much. Mostly docs.