TL;DR

My hardware and software setup for my Raspberry Pi sous vide setup have remained the same for over 5 years, but a failed remote controlled socket forced me to update almost everything.

Background

The Maplin remote control socket would turn on, and briefly supply power to the slow cooker, but then it would appear to trip. This wasn’t the first time, as the original socket failed after a few years, but this time there’s no chance of getting a replacement from Maplin as they’ve gone out of business.

A1210WH

After a bit of hunting around I tried ordering a Lloytron A1210WH socket set, as it looked identical to the Maplin one (and hence was likely to have come from the same original equipment manufacturer), but the package I received had the newer model A1211WH.

A1211WH

I could have returned them in the hope of getting some of the older ones from elsewhere, but I decided to bite the bullet and just make things work with the new ones.

Blind Alley

I had a go at using 433Utils, which was able to read different codes from the new remote with RFSniffer, but sending those codes using codesend just didn’t work (and didn’t even show the same code being played back on RFsniffer).

pilight

How I Automated My Home Fan with Raspberry Pi 3, RF Transmitter and HomeBridge had a similar issue with 433Utils, and used pilight to read codes from his remote then send them out from a Pi controlled transmitter.

Here began a bit of a struggle to get things working. My ancient Raspbian didn’t have dependencies needed by pilight, so I burned a new SD card with Raspbian Stretch Lite (and then enabled WiFi and SSH for headless access).

My initial attempts to use pilight-debug crashed on the rocks of missing config:

pilight no gpio-platform configured

In retrospect the error message was pretty meaningful, but Google didn’t help much with solutions, and all of the (pre version 8) example configs I’d seen didn’t have the crucial line for:

gpio-platform: "raspberrypi1b1"

That raspberry version maps to the options for the WiringX platform that sits within pilight.

With the config.json sorted for my setup (GPIO 0 to transmitter, and GPIO 7 to receiver [as a temporary replacement for the 1Wire temperature sensor]) I was able to capture button presses from the remote control. It quickly became apparent that there was no consistency between captures, and I’m guessing the timing circuits just aren’t that accurate. But the patterns of long(er) pulses to short(er) pulses were consistent, so I extracted codes that were a mixture of 1200us, 600us and a 7000us stop bit (gist with caps, my simplification, and generated commands).

Success

With timings in hand I was able to turn the new sockets on and off with pilight-send commands in shell scripts for on and off. It was then just a question of updating my control script to invoke those rather than the previous strogonanoff scripts (having migrated my entire sousvide directory from old SD to new via a jump box with a bit of tar and scp).


In Plain Sight

22Jul18

“The future is already here — it’s just not very evenly distributed.” – William Gibson

This post is about a set of powerful management techniques that have each been around for over a decade, but that still haven’t yet diffused into everyday use, and that hence still appear novel to the uninitiated.

Wardley Maps

Simon Wardley developed his mapping technique whilst he was CEO at Fontango.

A Wardley map is essentially a value stream map, anchored on user need, and projected onto a X axis of evolution (from genesis to commodity) and a Y axis of visibility.

The primary purpose of a Wardley map is to provide situational awareness, but they have a number of secondary effects that shouldn’t be ignored:

  • Maps provide a communication medium within a group that has a pre determined set of rules and conventions that help eliminate ambiguity[1].
  • Activities evolve over time, so map users can determine which activities in their value chain will evolve anyway due to the actions of third parties, and which activities they choose to evolve themselves (by investment of time/effort/money).
  • Clusters of activities can be used to decide what should be done organically within an organisation, and what can be outsourced to others.

Working Backwards

Amazon’s CTO Werner Vogels wrote publicly about their technique of working backwards in 2006, and the origin stories of services like EC2 suggest that it was well entrenched in the Amazon culture before then[2].

The technique involves starting with a press release (and FAQ) in order to focus attention on the outcome that the organisation is trying to achieve. So rather than the announcement being written at the end to describe what has been built, it’s written at the start to describe what will be built, thus ensuring that everybody involved in the building work understands what they’re trying to accomplish.

A neat side effect of the technique is that achieveability gets built in. People don’t tend to write press releases for fantastical things they have no idea how to make happen.

Site Reliability Engineering (SRE)

class SRE implements DevOps

SRE emerged from Google as an opinionated approach to DevOps, eventually as a book. Arguably SRE is all about Ops, complementing Dev as practiced by SoftWare Engineers (SWE); but the formalisation of error budgets and Service Level Objectives (SLOs) provides a very clean interface between Dev and Ops to create an overall DevOps approach.

SRE isn’t the only way of getting software into production and making sure it continues to meet expectations, but for organisations starting from scratch it’s a well thought through and thoroughly documented approach that’s known to work (and with a pre fabricated market of practitioners); so the alternative of making up an alternative seems fraught with danger. It’s no accident that Google’s using SRE at the heart of its Customer Reliability Engineering (CRE) approach where it crossed the traditional cloud service provider shared responsibility line to work more closely with its customers[3].

Pulling it all together

These techniques don’t exist in isolation. Whilst each is powerful on its own they can be used in combination to greatly improve organisation performance. Daniel Pink’s Drive talks about Autonomy, Mastery and Purpose in terms of the individual, but at an organisation level[4] they might fit like this:

  • Autonomy – Wardley maps provide a way to focus on the evolution of a specific activity, and with that determined the team can be left to figure out their way to achieving that.
  • Mastery – SRE gives us a canned way to get software into a production environment , making it clear which better skills are needed and must be brought to bear.
  • Purpose – the outcome orientation that comes from working backwards provides clarity of purpose, so nobody is in doubt about what they’re trying to accomplish.

Notes

[1] I commonly find that when I introduce Wardley mapping to senior execs their initial take is ‘but that’s obvious’, because they internally use something like the mapping technique as part of their thought process. I then ask them ‘do you think your entire team shares your views of what’s obvious?’.
[2] Arguably a common factor for many of these approaches is that they become public at a point where the companies they emerged from have determined that there’s nothing to lose by talking about them. In part that’s down to inevitable leakage as staff move on and take ways of working with them, and in part it’s because it does take so long for these techniques to find widespread use amongst potential competitors.
[3] A central argument here is that achieving ‘4 nines’ availability on a cloud platform is only possible when the cloud service provider and customer have a shared operations model, and sharing operations means having a mutually agreed upon mechanism for how operations should be done.
[4] An organisation might be an Amazon ‘2 pizza’ team, or an entire company.


Silent PC

16Jul18

TL;DR

I’ve been very happy with the silence of my passively cooled NUC for the past 4 years, but it was starting to perform poorly. So when I came across a good looking recipe for a silent PC with higher performance I put one together for myself.

Background

I’ve been running my NUC in an Akasa silent case since shortly after I got it, and it’s been sweet, until it wasn’t. Silence is golden, but having a PC that’s constantly on the ragged edge of thermal limiting for the CPU and/or SSD[1] became pretty painful[2]. When I came across this Completely Silent Computer post a few weeks back I knew it was exactly what I wanted[3].

Parts

I pretty much followed Tim’s build, with a few exceptions:

  • I went for the black DB4 case
  • In line with his follow up Does Pinnacle Ridge change anything? I went with the Ryzen 5 2600 CPU
  • The MSI GeForce GTX 1050 Ti Aero ITX OC 4GB was available, so I went with that
  • A Samsung 970 SSD (rather than the 960)

Unfortunately I wasn’t able to get everything I needed from one place, so I ended up placing three orders:

  1. QuietPC for all the Streacom stuff (DB4 case, CPU and GPU cooling kits and PSU)
  2. Overclockers for the motherboard, SSD and CPU
  3. Scan for the RAM and GPU

By some miracle everything showed up the following day (with the Scan and Overclockers boxes coming in the same DPD van). The whole lot came to £1551.34 inc delivery, which is a bit better than the AUD3000 total mentioned in the original post. I didn’t exhaustively shop on price, so it’s possible I could have squeezed things a little more.

Testing, testing

There aren’t that many components, and they only work as a whole, so I put it all together and (of course) it didn’t work first time. The machine would power on, but there was no output from the graphics card.

There was nothing to go on for diagnostics other than that the power button and LED were apparently working.

So I had to pretty much start over, with everything laid out on a bench, and I found that the CPU wasn’t seated properly. I guess having such a complex heat pipe system attached puts a fair bit of mechanical force onto things that can dislodge what seemed like a sound fit.

As I was checking things out I also noticed that the SSD was perilously close to a raised screw hole on the motherboard holder, which I chose to drill out – better safe than sorry.

Putting it back together I retested after each stage in the construction (each side of the case in terms of heat transfer arrangements), and everything went OK through to completion.

It’s fast

Geekbench is showing a single core score of 4329 and a multi core score of 20639.

That’s way ahead of my NUC which managed 2420 and 4815 respectively. It even beats my son’s i5-6500 based gaming rig that clocked 3208 and 8045.

Cool, but not super cool

As I type the system is pretty much idle, but I’m seeing a CPU temperature in the range of 57-67C, which is nothing like the figures Tim got when measuring Passively-cooled CPU Thermals. The GPU is telling me it’s at 48C. There are a few factors that come into play here:

  • It’s baking in the UK at the moment, so my ambient temperature is 28C rather than 20C.
  • One of the Streacom heat pads was either missing or got lost during my build, so I’m waiting on another to arrive. Thus the thermal efficiency of the CPU cooling isn’t presently all it could be.

I’d also note that I went with the LH6 CPU cooling kit despite having no plans to overclock as I’d like to keep everything as cool as possible.

The case temperature is around 40C, so hot to the touch, but not burning hot. In the winter I might appreciate the warmth it radiates, but right now I’d rather have it off my desk.

Cable management

The DB4 case design has everything emerging from the bottom, which might look amazing for photos when it’s not plugged into anything, but is far from ideal for actual usage. I’ve bundled the cables together and tied them off to the stand, but this is not a machine that makes it easy to pop things into. There are a couple of USB ports on one corner (which I’ve arranged as front right), but using them is a fiddle.

I’m pleased to have USB ports on my keyboard and a little hub sat on my monitor pedestal.

Conclusion

After using a silent PC for over 4 years there’s no way I’d go back to the whine of fan noise, so I was pleased to find an approach that kept things quiet whilst offering better performance. The subjective user experience is amazing (this is easily the fastest PC I’ve ever used), so my fingers are crossed that it stays that way.

Notes

[1] There’s not much talk about thermal throttling of SSDs, but it is a thing, and it can badly hurt user experience when your writes get queued up. I do worry that my new M2 drive is sat baking at the bottom of the new rig, and if I find myself taking it apart again I might stick a thermal pad in place so that it can at least conduct directly onto the motherboard tray.
[2] I suspect that over the years the Thermal Interface Material (TIM) in the CPU degraded, leading to the whole rig running hotter, leading to a spiral of poor performance. When it was new it ran quick enough, and (relatively) cool, but it seems over time things got worse.
[3] I considered another NUC, and the Hades Canyon looks like it would have met my needs, but Akasa don’t yet do a silent case for it.


#2 of jobs that should exist but don’t in most IT departments (#1 was The Application Portfolio Manager).

What’s a constraint?

From Wikipedia:

The theory of constraints (TOC)[1] is an overall management philosophy introduced by Eliyahu M. Goldratt in his 1984 book titled The Goal

It’s the idea that in a manufacturing process there will be a constraint (or bottleneck) and that:

  • there’s no point in doing any optimisation work before the constraint, because that will just make work in progress stack up even quicker
  • there’s no point in doing any optimisation work after the constraint, because the work in progress is still stuck upstream

TOC drives us towards a singular purpose – identify the present constraint and fix it.

Of course this becomes a game of ‘Whac-A-Mole‘, just as soon as one constraint is dealt with another lies waiting to be discovered. But it’s an excellent way of ensuring that time, money and other resources are focused in the right place, and the starting point for continuous improvement that takes advantage of incremental gains.

The constraint unblocker

Is an individual who’s empowered to work across an organisation identifying its constraints and leading the efforts to fix them.

James Hamilton

One of my industry heroes is Amazon’s constraint unblocker – James Hamilton[2]. He has:

  • Reinvented data centre cooling (and many other aspects of data centre design)
  • Reinvented servers
  • Reinvented storage
  • Reinvented networks
  • Modified power switching equipment

Take a look at his AWS Innovation at Scale presentation for some depth, or the Wired article Why Amazon Hired a Car Mechanic to Run Its Cloud Empire.

The consequence of that list above shouldn’t be underestimated. Where Hamilton (and his like at Google, Facebook etc.) have led, the entire industry has followed.

The impact of that list shouldn’t also imply that there’s no point in doing this elsewhere. This approach isn’t just the preserve of hyperscale operators. All IT shops have their constraints, and so all IT shops should have a leader who’s focused on unblocking them.

TOC and DevOps

There’s a close relationship between TOC and DevOps. The Goal inspired The Phoenix Project and the ‘3 DevOps Ways’ of Flow, Feedback and Continuous Learning by Experimentation are all about dealing with constraints.

That isn’t however to say that organisations doing DevOps have everything covered. The 3 ways make sure that constraints are addressed in the context of a single continuous delivery pipeline for a single product, but as soon as there’s more than one product there’s most likely a global constraint that can’t be dealt with at a local level.

Amazon may be doing DevOps up and down the organisation, and they very effectively organise themselves into ‘2 pizza‘ teams ‘working backwards‘ building micro services to power their ever expanding service portfolio. But they still need James and his team working top down to get the big roadblocks out of their way as they spend $Billions scaling their infrastructure.

Data (science) required

Notionally this stuff was easy with manufacturing. Look down on the factory floor and you can see the workstation where the work in progress is stacking up. Pop down there and figure out how to fix it.

Of course the reality was much messier than that, which is why Goldratt quickly found himself revising The Goal, and a whole consulting industry sprang up around TOC. But with software we have to acknowledge from the outset that we’re not going to see work in progress physically piling up; and beyond DevOps it’s entirely possible that the constraint may have little to do with ‘work in progress’.

Thus in IT we need data to find our constraints, and we usually need that same data (or more) to inform the model-hypothesise-experiment process that determines what to do about a constraint. In my own work (that we now brand Bionix) that’s why we start with the data science team and their analytics.

Why bother?

My personal observations of TOC in action over the past few years have generally found a 20% improvement in efficiency/effectiveness on the first iteration. That’s not moving the decimal disruption, but that’s a realistic first approximation of what’s achievable in a six week cycle.

Of course because this is Whac-A-Mole you never get the same pay off again. The next iteration might be 15%, then 12%, then 9% and quickly off into the weeds. But stack those gains on top of each other and you’re quickly into completely different territory.

Conclusion

As we can see from Amazon even the best organisations have constraints, and they can benefit from having a leader focused on identifying and fixing them. That way they can achieve continuous improvement and the fruits of incremental gains across the organisation, and not just in a product silo.

Notes

[1] I find it somewhat frustrating that ‘theory’ is used here as it makes the approach seem ‘academic’ and thus easily dismissed by those claiming that they only care about practical outcomes.
[2] James starred in my ScotCloud keynote last year “Our problems are easy“.


#1 of jobs that should exist but don’t in most IT departments

What should we do about all the legacy stuff?

This was a question that came up at the closing panel of the Agile Enterprise Rome conference I was at in May. The context was ‘we’ve spent a couple of days hearing about this great stuff with microservices and containers and serverless, but what should we do about our legacy?’.

I’ve heard this question, or some variant of it, many times over my career.

My answer in Rome was something like this:

The very reason that legacy exists is that it satisfies a business need at a price point that’s better than migrating to something new.

There are some important implications to that statement:

  1. You’ve actually figured out what the migration costs are
  2. Those costs are regularly re-evaluated to take account of industry changes

Those things imply a portfolio management approach where each application has a value and a cost to trade out of a given position, and where the portfolio is reappraised on a regular basis. This isn’t something I see being done in a particularly structured way in (m)any organisations[1].

Step functions and gravity wells

A big part of the problem here turns out to be non linearities in the (license) cost for many legacy systems.

How much do you need to reduce your mainframe MIPS to cut your mainframe spend by 50%?

It turns out that the answer to that isn’t anything like 50%, or even 75% or 90%. In most cases it’s essentially impossible to cut mainframe spend by reducing usage unless the mainframe is completely eliminated. The same is roughly true for many classes of legacy software driving an ‘all or nothing’ approach.

This picture is further complicated by bundling within Enterprise License Agreements (ELAs), where account managers will hold firm on well established revenue (their cash cows) but happily throw in some of their shinier new stuff[2]. There’s also the issue of ‘where software goes to die’ vendors that aquire and hoard legacy assets giving them multiple points of leverage when it comes to ELA (re)negotiation time – they’re good at playing the portfolio management game.

5 Rs

There are multiple options for what happens to an application when it’s moved off a legacy system. Gartner suggests the 5 Rs[3] in its ‘Five Ways to Migrate Applications to the Cloud‘:

  1. Rehost
  2. Refactor
  3. Revise
  4. Rebuild
  5. Replace

Broadly this has approximately nothing to do with ‘the Cloud’. Each path implies a different cost/value trade off that needs to be assessed.

For most applications it will be simple to eliminate most of the Rs as viable potential courses of action, leaving one or two to be properly considered and priced.

Who’s your head of application portfolio management?

Becomes the pertinent question. If this isn’t somebody’s job, then it’s probably nobodies’ job, and it won’t be getting done. If organisations aren’t active about this portfolio management approach then inertia will take charge of their direction.

Conclusion

Applications are an investment, and like any other investment they should be managed. A portfolio approach, and tools to evaluate trade offs and options naturally follows; and of course the process has to be iterative, because the world keeps changing. If organisations aren’t active about this, then their direction gets determined by inertia.

Notes

[1] I’ve seen IT Portfolio Management tools like Alfabet (now owned by Software AG) implemented in some organisations, but even then I’ve seen little evidence of the tools being used in a rigorous way (or having much impact on overall IT strategy).
[2] Aka the ‘drug dealer model
[3] With thanks to Johan Minnaar for bringing my attention to the model and my colleague Jim Miller for highlighting its ubiquity.


TL;DR

If you can persuade people that their side is going to win without their vote, then perhaps just enough of them won’t bother to show up that you can steal the win.

Background

The two countries that I spend most of my time in (the UK and US) continue to recoil from the effects of narrowly won campaigns that didn’t turn out how the pundits predicted. Social media is credited (by which I mean blamed) for much of this. But the narrative that I’m seeing seems incomplete, and hence doesn’t ring true – no wonder there’s so much cognitive dissonance around this issue.

Activating voters

The role of social media in bringing people into a campaign first came to light during Obama’s run in 2008. Widespread use of social media itself was pretty new then, but the ability for politicians to connect with voters without intermediaries was and remains hugely powerful. I have no doubts that Trump connected better with his base as a consequence of his positive use of social media, and I also think Leave were more savvy than Remain in the Brexit referendum[1].

I use the term ‘positive’ here without any value judgement of a particular side or campaign, but rather for the ability of a politician to connect with their voters in a direct and authentic way that activates them to vote in their favour.

Depressing voters

Michael Moore used the term ‘depressed voter’ in his 5 reasons Trump is going to win:

… it will be what’s called a “depressed vote” – meaning the voter doesn’t bring five people to vote with her. He doesn’t volunteer 10 hours in the month leading up to the election. She never talks in an excited voice when asked why she’s voting for…

This becomes the negative side of influencing the electorate:

  • You’re going to win anyway – so treat yourself to that lie in
  • They’re all as bad as each other – what’s the point in voting

It doesn’t need to appeal to anything besides apathy and indifference, and it’s negative because it stops a voter from voting. Whatever their intention might have been, it doesn’t show up at the ballot box.

Conclusion

As we continue to pick over the outcome of these votes there’s a ton of analysis about who voted which way, and why, and how they might have been influenced by social media campaigns. And then things start getting murky over how those campaigns were orchestrated and financed.

But things get even murkier if we look at who didn’t vote, and why, and how they might have been influenced by social media campaigns. And how those campaigns were orchestrated and financed.

But wait… there’s more

The role of polls and pollsters, and the interplay with social media is only just starting to be examined. The simple lesson here seems to be that the only poll that matters is the actual vote, and anything else might well be part of a disinformation campaign or an elaborate con.

Update 5 Jul 2018 – A couple of days after I posted this Cory Doctorow published Zuck’s Empire of Oily Rags on the same topic. He doesn’t focus on the negative aspects I note above, but the general narrative is (in my opinion) spot on. The line that I expect will be quoted most is:

Cambridge Analytica didn’t convince decent people to become racists; they convinced racists to become voters.

What may also happen here is that they convinced decent people to be apathetic about voting.

Note

[1] This observation extends to just about everything to do with modernity. Remain ran a campaign that wouldn’t have been out of place in the 19th century, and were completely outplayed by Dominic Cummings and his understanding of stochastic processes (branching histories) and OODA loops.


I first used this analogy at an Open Cloud Forum event in Zurich a couple of months back, and I just used it again in a panel discussion at DevSecOps Days London. I’ve been meaning to incorporate it into a DevOps presentation, but until then…

Jenga

The ‘traditional’ Enterprise IT approach to stability is a game of Jenga – don’t touch anything in case the tower falls over. Each change feels like it brings us closer to calamity; and eventually it does all fall down and you have to pick up the pieces, put them back in place, and start over.

Riding a Bike

The agile/DevOps approach to stability is to keep moving forward, like riding a bike – if you have enough velocity, you’re stable.


Laser Printers

16Jun18

My family prints a lot[1] – about 1200 pages/year, which is why I made the decision almost a decade ago to switch from inkjet to laser. Inkjets weren’t just costing me a fortune in ink; they were also costing me a fortune in printers because they kept clogging up and failing in various ways. I worked my way through a variety of Epsons and Canons before giving up on the genre[2].

Black and White

My first buy back at the end of 2008 was an HP LaserJet 2420DN (the Duplex, Networked version) that was made around 2006 and that I picked up on eBay for £75. It was barely run in with a page count of just under 150,000, which is just 2 months usage at its advertised duty cycle. The toner that came with it had a little life left, but I lucked into a brand new HP toner on eBay for £6.26 that I’ve been using ever since – 7683 pages printed so far, with a forecast of over 2000 still to come. Over the years it’s needed some new rollers (£10.40) and a new fuser sleeve (£3.54), but it’s otherwise been a trouble free workhorse.

The ratio of simplex:duplex has worked out at around 2:3, leading to an average page cost (inc paper and the printer itself amortised over usage so far) of 1.66p/page.

Colour

For a while I hung on to an inkjet just for colour printing, but inkjets hate infrequent use, and so reliability and print quality worsened. When a deal came along in 2010 for Dell’s 1320CN with extra toners for £133.90 I grabbed it.

Colour printing is a less frugal endeavour altogether, but at least the 1320CN is a popular model with a plentiful supply of cheap(er) generic toners. Sadly it only does one sided printing, which has come out at 5.5p/page over the 4000 or so pages printed so far.

If I was starting over

I’d probably go for a Color LaserJet with Duplex and Network so that I could get everything from one unit rather than running two printers. Something like the 3600dn[3] seems to fit the bill as it uses decent capacity toners.

Update 18 Jun 2018

I spent a bit more time modelling costs over the weekend. As things stand the cost per page breaks down to:

  • B&W – 75% Hardware, 4% Toner, 21% Paper – I’m obviously benefiting from ridiculously cheap toner here, but the ‘right’ printer is one with cheap consumables, and there seems to be no better way of getting that than using older laser printers that are (or have been) popular. A quick look at eBay shows that I could easily get another 6000 page toner for about £10.
  • Colour – 61% Hardware, 28% Toner, 11% Paper – once again cheap toner makes a huge difference. When I first bought a replacement toner multi-pack (CMYK) in 2012 it was £18.94, but I’ve since got them as cheaply as £9.99.

If I project usage out a bit further (2x,3x,4x) I quickly get below 1p/page for B&W and 4p/page for Colour as the hardware is amortised and the costs become dominated by toner (more so for colour) and paper (more so for B&W).

I also analysed toner usage… I seem to be getting about 10,000 pages from an HP toner rated at 6,000, which is great (though not uncommon from what I’ve seen in forums). On the other hand I’m getting more like 666 pages for Dell toners rated at 2,000, which is pretty miserable (but probably a reflection of the fact that the colour printer gets used a fair bit for photos, which obviously use tons more toner than a normal page of text with a few words in colour).

Update 9 Jul 2018

Chris Neale pointed me to this Twitter thread from Paul Balze about inkjets:

Notes

[1] Hardly surprising given that my wife is a school teacher and both of my kids are still at school; though I think the bulk of the printing in the household comes from my wife.
[2] I’ve never owned an HP DeskJet myself due to the cost of consumables. Something I’d note from family members running these things is that they last pretty well, but ultimately fall victim to drivers not being available for newer versions of Windows, which has never been an issue for workhorse HP LaserJets.
[3] Or the newer 3800dn or CP3505.


I’m starting to see companies abandon Pivotal Cloud Foundry (PCF) in favour of Kubernetes distributions such as Red Hat’s OpenShift; and it’s almost certainly just a matter of time before we see traffic in the opposite direction.

My suspicion is that this is nothing to do with the technology itself[1], but rather that early implementations have failed to turn out as hoped, and people are blaming the platform rather than their inability to change the culture[2]. So they wheel in an alternative platform (and some fresh faces) and have another go.

We’ve seen this movie before with mobile development[3]. The native developers switched to cross platform frameworks just as the cross platform framework folk switched to native. It wasn’t that one approach was better or worse; as ever with these things there are trade offs that need to be balanced. It was just that v1 sucked, because the organisation that had built v1 hadn’t completed its cultural transformation; so the people making v2 wanted to change things up a bit.

Notes

[1] I could (and may) write an entirely separate post on the pros and cons of PCF and K8s, but the most important point is that they’re both platforms inspired by Google’s Borg that people can run outside of Google (or even on Google Cloud). Meanwhile this post ‘Comparing Kubernetes to Pivotal Cloud Foundry—A Developer’s Perspective‘ by Oded Shopen covers most of the key points.
[2] I’ll use ‘the way we do things around here’ as my definition for culture
[3] and NOSQL


The Spectre and Meltdown bugs have been billed as a ‘failure of imagination’, where the hardware designers simply didn’t conceive of the possibility that a performance optimisation might lead to a security vulnerability.

I personally find this a little hard to swallow. The very first time I came across side-channel attacks the first thing I though of was CPU caches. I just naively assumed that the folk at Intel etc. were smart enough to have figured out the potential problems and already designed in the countermeasures.

Regardless of whether Spectre and Meltdown genuinely were caused by failure of imagination (and I have my doubts about ARM here given that the CSDB instruction was already in the silicon of their licensees) it’s a class of problem we collectively need to think harder about. There seem to be a few valid approaches here:

  1. Adopting a more adversarial mindset – think about how an attacker might try to exploit a new feature or performance optimisation – the ‘red team‘ approach.
  2. ‘Chicken bits'[1] to allow features/optimisations to be disabled if they’re discovered to be vulnerable.
  3. Use of artificial intelligence (AI) to imagine harder/differently. When Google’s Deepmind team created AlphaGo it played Go like a human but a bit better; when they created AlphaGo Zero it came up with entirely different plays. I’d therefore expect that similar approaches could be applied to security validation.

Note

[1] Hat tip to Moritz Lipp for this term from the Q&A section of his QCon London presentation ‘How Performance Optimisations Shatter Security Boundaries