I’ve continued tinkering with my OpenELEC media player, and there’s too much stuff to do as just updates or comments to the original post.

Somebody gave me a nice laser cut Rasberry Pi logo at the last OSHUG meeting

Build

I started out with a canned build[1], but discussion on the OpenELEC thread on the Raspberry Pi Forums suggested that I was missing out on some features and fixes. I therefore did another build (in the OpenELEC directory):

git pull
PROJECT=RPi ARCH=arm make

I’m presently running r10979, which seems to be behaving OK. I’ve uploaded some later builds to github, but not had the time to test them out myself. To get some of the later builds to compile properly I needed to delete the builds directory:

rm -rf build.OpenELEC-RPi.arm-devel/

To use these binaries simply copy the OpenELEC-RPi.arm-devel-datestamp-release.kernel file over kernel.img and OpenELEC-RPi.arm-devel-datestamp-release.system over system on a pre built SD card. As these files sit on the FAT partition this can easily be done on a Windows machine (even though it can’t see the ext4 Storage partition). The files can’t be copied in place on the Rasberry Pi because of locks.

Config.txt

This is the file that’s used to set up the Raspberry Pi as it boots. The canned build that I’m using didn’t have one, so I created my own:

mount /flash -o remount,rw
touch /flash/config.txt

I’ve set mine to to start at 720p 50Hz:

echo 'hdmi_mode=19' >> /flash/config.txt

There are loads of other options that can be explored such as overclocking the CPU.

Remotes

The cheap MCE clone that I bought still isn’t working entirely to my satisfaction, but I’m less bothered about that as there are other good options. I already raved a little about XBMC Commander for the iPad in an update to my original post (it also works on the iPhone, and presumably recent iPod Touch). I’ve also tried out the Official XBMC Remote for Android, which is a little less shiny but pretty much as functional; best of all it’s free.

NFS

When I first set up CIFS to my Synology NAS I meant to try out NFS as well. At the time I didn’t as things weren’t working properly on my NAS, which turned out to be down to a full root partition stopping any writes to config changes. Having sorted that out I’m now using .config/autostart.sh to mount using NFS thus:

#! /bin/sh
(sleep 30; \
mount -t nfs nas_ip:/volume1/video /storage/videos -r; \
mount -t nfs nas_ip:/volume1/music /storage/music -r; \
mount -t nfs nas_ip:/volume1/photo /storage/pictures -r \
) &

Conclusion

That’s it for now. The dev build I’m on seems stable enough and functional enough for everyday use, so I’ll probably stick with that rather than annoying the kids with constant interruptions to their viewing. Hopefully I won’t have to wait too long for an official stable release.

Notes

[1] The original  canned build is now ancient history, so I’m now linking to the latest official_images.

Updates

Update 1 (4 Jun 2012) – r11211 release bundle and image (900MB when unzipped so should fit onto 1GB and larger SD cards).
Update 2 (4 Jun 2012) I’ve put r11211 and will put subsequent bundles and images that I make into this Box.net folder.
Update 3 (5 Jun 2012) my Box.net bandwidth allowance went pretty quickly, so I’ve now put up the latest release bundles and image files on a VPS.
Update 4 (26 Jan 2013) release candidates should be used rather than dev builds in most cases, so links modified to point to those.


My old Kiss Dp-600 media player has been getting progressively less reliable, so for a little while I’ve been telling the kids that I’d replace it with a Raspberry Pi. Of course getting hold of one has proven far from simple.

Some time ago the prospect of using XBMC on the Raspi was confirmed, leading me to consider that this spells the end for media player devices (or at least a change in price point). Perhaps I should have done more pre work, but in the end I waited for the device to arrive before getting started. My first search immediately took me to OpenElec and a post about building for Raspi. I downloaded the sources and after some tool chain related hiccups[1] kicked of the build process on an Ubuntu VM. This turned out to be entirely unnecessary, as I was able to download a binary image[2].

The next step was to copy the image onto an SD card. This was fairly straightforward using the Windows Image Writer, which is the same tool used to write the standard Debian images for Raspi. In my case I couldn’t quite squeeze the image onto a handy 2GB SD card[3], but I had a larger card handy that seems to work fine.

I was now able to boot into XBMC and use the cheap MCE remote I’d bought on eBay a little while ago. After fiddling with some settings I’ve been able to get things so that everything plays ago (with sound). I’m using some mount commands in .config/autostart.sh[4] to connect to CIFS shares on my NAS for videos, music and photos:

#! /bin/sh
(sleep 30; \
mount -t cifs //nas_ip/video /storage/videos -o username=foo,password=S3cret; \
mount -t cifs //nas_ip/music /storage/music -o username=foo,password=S3cret; \
mount -t cifs //nas_ip/photo /storage/pictures -o username=foo,password=S3cret \
) &

Stuff that I’d still like to change:

  • SPDIF – The Raspi doesn’t have SPDIF out via its 3.5mm jack, so I have no way of piping digital audio to my AV receiver (sadly my TV doesn’t have a digital audio output). Maybe I’ll be able to use a cheap USB sound card to fix this.
  • Resolution – I’ve got things going pretty well at 720p, but I haven’t found a reliable way to get 1080p output. My TV might be partly to blame here. I bought a 37″ LCD about a year too early, and the best choice at the time was Sharp’s ‘PAL Perfect‘ screen. It has a resolution of 960×540, which makes downscaling of 720p and 1080p very simple.
  • Reboots – don’t seem to be reliable at all. I’ve not yet managed to get a clean restart after doing ‘reboot now’ from the command line. Even pulling power seems like a hit and miss affair. I can see this being a problem for the inevitable time that the system fails whilst I’m away for a week travelling[5].
  • Remote – when I first tested the MCE remote on a Windows laptop most of the buttons seemed to do sensible/expected stuff. On OpenElec/XBMC the key buttons (arrows, select and back) seem to work – along with the mouse, but many of the other buttons don’t seem to work at all.

Conclusion

Getting OpenElec going with the Raspberry Pi was pretty straightforward. It feels a little rough around the edges, but it’s early days. Even at this stage I’m reasonably confident that I can replace the DP-600. It’s also cool to be able to SSH into my media player knowing that it’s a tiny little computer running inside a business card box.

Updates

Update 1 (14 May 2012) – The reboot issue turned out to be SD card related. It seems that the Raspi is fussy about these things, and the PNY 8GB Class 4 card that I was using didn’t cut it. The 2GB SanDisk Extreme III that I’m now using seems much more reliable (and no slower).
Update 2 (14 May 2012) – I got XBMC Commander for my iPad. It’s worth every penny of the £2.49 that I spent on it as it totally transforms the user experience. Using a remote to navigate a large media library is a pain. Using a touch screen lets you zoom around it – recommended.
Update 3 (20 May 2012) – I’ve done a Pt.2 post.
Update 4 (31 May 2012) – binary image link updated to r11170.
Update 5 (3 Jun 2012) – binary image link changed from github to Dropbox.
Update 6 (4 Jun 2012) – Dependencies in [1] updated to add libxml-parser-perl as this has caused the build to fail when I’ve used fresh VPSes.
Update 7 (5 Jun 2012) – binary image link changed to a VPS.
Update 8 (26 Jan 2013) – binary image link changed to official_images, as most people should be using a release candidate rather than a dev build. Anybody wanting to upgrade an older build should get their binary from the OpenELEC.tv downloads page (Raspberry Pi is near the bottom) and follow the upgrade instructions.

Notes

[1] On first running ‘PROJECT=RPi ARCH=arm make’ I hit some dependency errors:

./scripts/image
./scripts/image: 1: config/path: -dumpmachine: not found
make: *** [system] Error 127

This was fairly easily fixed by following the instructions for compiling from source, which in my case running Ubuntu 10.04 meant invoking:

sudo apt-get install g++ nasm flex bison gawk gperf autoconf \
automake m4 cvs libtool byacc texinfo gettext zlib1g-dev \
libncurses5-dev git-core build-essential xsltproc libexpat1-dev \
libxml-parser-perl

[2] Thank you marshcroft for your original image – much appreciated. Now replaced by a much newer build.
[3] Clearly some 2GB SD cards have a few more blocks than others.
[4] Thanks to this thread for showing the way.
[5] There have been times that I’ve suspected the old DP-600 of subscribing to my TripIt feed – failure seemed to be always timed to the first days of a long business trip.


After months of waiting, my Raspberry Pi finally arrived on Friday[1]. Somehow I resisted the temptation to dash straight home and start playing with it, and went along to my daughter’s summer concert at school. This one has been earmarked to replace our decrepit Kiss Dp-600 streaming media player – more on that later. First though it needed a box. Since the form factor is the size of a credit card, and credit cards are the same size as business cards, I reckoned one of the plastic boxes that business cards come with might work. It does:

I could have done a better job with the RCA hole – it’s a bit too high. Hopefully somebody will come up with a nice paper template to do this properly (and I expect a laser cutter could do a much better job than me with a steak knife and a tapered reamer).

I’ve not done anything with the box lid yet, but it’s probably a good idea to keep dust out. I’m guessing that at a max power draw of 3.5W that heat dissipation shouldn’t be too much of a worry.

[1] My order number was 2864, so it looks like I just missed the earlier first batch of 2000. If there’s a next time I need to remember to fill out the interest form first before tweeting about it :(


I’ve recently had a couple of laptops on loan that have got me thinking about what the perfect enterprise laptop might feature.

Business – Lenovo X220

I was loaned this to try out a super secret new security product. Regular readers here will know that I have a fondness for Lenovo laptops, and this is probably the best one I’ve ever used – an almost perfect balance of performance, size, weight and endurance.

What I like:

  • Performance – SSD and 8GB RAM are a potent combination, and I never threw anything at the Core i5 CPU that would cause it to sweat.
  • DisplayPort and VGA connectors – so no snags getting hooked up to a variety of displays.
  • Great keyboard.
  • Wired Gigabit ethernet -making the transfer of some videos to watch on the plane painless.
  • Size and weight – 1.5kg.
  • Removable SSD – so I could take a snapshot of the machine as it was given to me, and revert back to that later.

What could be even better:

  • The screen is OK at 1366 x 768, but I’ve used finer pitch screens before and liked them.
  • I didn’t have a chance to test a docking station, so it’s not clear to me whether or not it can drive two displays digitally (like my HP laptop does at work)[1].
  • Waking from sleep is a bit hit and miss, and I’ve more than once ended up shutting it down completely rather than waking it up :(

Consumer – Dell XPS13

This is Dell’s first Ultrabook, and they have been kind enough to lend me one for a week.
What I like:
  • Looks great
  • Trackpad is surprisingly nice to use (I generally prefer at TrackPoint) – this one is really big, and supports a range of multi touch gestures that ease navigation.
  • The chicklet keyboard is also good, and is backlit – making typing in the dark a breeze.
  • No fan or exhaust grill – so no heating up the bottle of water on my airline tray table. The quid pro quo here is that the base can get somewhat warm.
What could be even better:
  • The screen is once again 1366 x 768, and whilst the Gorilla Glass looks great it’s also a little too reflective at times.
  • A single mini DisplayPort socket means carrying around a bagful of dongles to get hooked up to external displays, and limits connectivity to one screen. There’s also WiDi[2], but I’ve yet to find something that I can use that with.
  • If feels heavy even though it’s 100g lighter than the X220 – clearly density matters.

Enterprise Ultrabook – the best of both worlds?

So what might happen if the best features of these two laptops were brought together, or put another way what would I change to make an Ultrabook an Enterprise Ultrabook?
  • Removable drive – so that sensitive corporate data (and precious user config/state) can be easily held onto.
  • Some means to drive two external screens and the rest of the stuff on a typical work desk – keyboard, mouse, webcam, headset etc.[3]
  • A means to connect to a wired network, as using WiFi with a VPN can be a time consuming and frustrating process. (This should be incorporated into the solution for connectivity for the point above).
  • An optional smartcard slot[4].

The Apple Alternative

I’ll stave off the comments about the Macbook Air right here. Yes, with its Thunderbolt it has many of the capabilities I’m asking for here. I don’t even think its that big a deal that it doesn’t come with Windows[5]. The inaccessibility of the hard drive and lack of smartcard option count against it a little, but it’s a machine that probably sets the benchmark that Ultrabooks (and by extension Enterprise Ultrabooks) will be measured against.

Conclusion

The Lenovo X220 is a great business laptop, and I’ll much regret having to give it back next week. I’m sure to be tempted by its Ivy Bridge based successor (which if the rumours are to be believed will be branded the X230). The Dell shows that the Ultrabook spec for consumer machines can bring some nice packaging into the mix, and it would be great to see a best of both worlds machine. The hard part is connectivity, and it seems to be that Thunderbolt based docking will be the way ahead.
Update 1 (10 May 2012) – I’ve been very impressed with the battery life on the Dell. First impressions suggest that it might be capable of managing half a day without wall power.
Update 2 (10 May 2012) – The more I see of Windows 8 the more I think it will be best on devices with touch screens. This could of course hinder enterprise adoption (at least of the Metro interface, and it remains to be seen how optional that is – at this stage I’m guessing that where enterprises do adopt Windows 8 it will be hard to distinguish from Windows 7).

[1] It seems that the typical office still hasn’t got with having proper large monitors (by which I mean 27″ or 30″), having instead pairs of 19″, 22″ or 24″ screens.
[2] The first time I heard this talked about I heard ‘Wide Eye’ rather than ‘WiDi’ – somehow this made sense for a system able to wirelessly project images.
[3] Traditionally this has been done with a docking station. I think the truly modern way would be a Thunderbolt connection to a monitor that’s then the hub for everything else, but it’s not reasonable to expect new machines to drive an upgrade of everything else so perhaps some sort of (Thunderbolt) docking strip is the way ahead.
[4] The X220 doesn’t seem to have one of these (though it does have a fingerprint reader, which might be an alternative for some, and it also has an ExpressCard slot that I’m sure could be put to work in this area).
[5] Large enterprises have a habit of smashing operating systems down to their constituent parts and (at great cost) reassembling them. They also generally have enterprise licensing deals with Microsoft. I hence can’t see Boot Camp being that much of an obstacle to adoption.


For a little while I’ve been experiencing lousy service from my credit card providers, and judging by what I hear from others I’m far from alone on this. The level of false positives from card company fraud detection systems has reached a point where it’s creating a bad customer experience, and it often seems that ‘common sense’ has been thrown out the window.

A personal example

I recently went on a family holiday to the US. Within hours of arriving my card was blocked and I found myself having to call the fraud department.

What should have happened

I booked my airline tickets using the card. It shows clearly on my statement that one of the tickets was in my name, and the destination of the flight. It should therefore have been no surprise when I showed up in Tampa on the appointed day and started spending money[1]. The key point here is that the card company had very specific data about my future movements.

What actually happened

I picked up my hire car in Tampa (card transaction for future fuelling fees etc.) and headed off towards my ultimate destination of Kissimmee. On nearing my destination I needed to fill up with fuel [2], so I stopped at a 7-11 to gas up. I tried to pay at the pump [3], but this failed, so I went into the store to pre authorise a tank full of fuel. My card was declined and I had to use another. When I checked my email shortly afterwards there was a fraud alert, and when I switched on my UK mobile it immediately got a text saying the same [4]. I called the fraud line (and got through straight away, as it was early in the UK morning), and explained that the (attempted) transactions were genuine, and that I would remain in the US for another couple of weeks. The card was unblocked and I continued spending… for a while at least.

What happened next

Two weeks later, another gas station, another transaction that I had to use another card for, another text asking me to call the fraud department. This time it took almost 9 minutes to get through, as it was Easter Saturday and still the middle of the shopping afternoon back at home. I was pretty angry – at the wait, and because it had happened again within the time that I’d specified I’d be using the card in the US. There were apparently three suspicious transactions, with the last one causing my card to be blocked:

  1. Buying gas at the same 7-11 that had caused the problems last time.
  2. Some groceries from Super Target [5].
  3. Another attempt to buy gas (at a place on the road back to Tampa).
Clearly buying gas is a red flag – every attempt I made during the whole stay was considered fraudulent or potentially fraudulent. At the same time, the hundreds of dollars that I spent in theme parks, restaurants, shops and even a gun club were all just fine.

What’s going on here?

I necessarily need to speculate here a little, as the card company can’t/won’t explain how its fraud detection algorithms work[6]. It’s a classic case of ‘computer says no‘. Likely there are a bunch of heuristics about transaction types that are more likely to be fraudulent[7]. My guess would be that convenience stores rank as pretty high risk, and the problem in my case is that it’s almost impossible to buy gas at anywhere that isn’t also a convenience store. Somewhere else somebody has done a cold analysis of the cost of dealing with false positives (which mainly falls on me the customer) versus the costs of fraud. There is no doubt a lot of analysis going on here.

Doing better

So data and analysis are at the heart of this, but is it the right data leading to the right analysis? I think not. As a customer I think the experience is lousy precisely because things that seem obvious to me are being apparently ignored by the card company:

  • Location – if I’m buying airline tickets with my card then the card company knows in advance where I should be. These data points should take precedence over heuristics about ‘normal’ spending locations.[8]
  • Inference – if I rent a car for 2 weeks then I’m pretty likely to buy some gas to go in it.
  • Explicit overrides  – if I tell the company where I’m going to be, and what I’m likely to be spending on then the fraud pattern matching should adjust to suit.

Conclusion

The costs of dealing with fraud false positives have been largely passed to the customer, and this (unsurprisingly) is leading to poor customer experience. To customers like me it’s obvious how card companies could make better use of the data at hand to fix this, but the fix will entail getting beyond some pretty blunt heuristic approaches in order to focus on the individual and what’s ‘normal’ for them in very specific circumstances – not just what’s ‘normal’ across a giant data set.

[1] In fact the fraud bells should have started ringing if I started making in person transactions somewhere other than Tampa.
[2] Why the car wasn’t supplied full like I’d paid for is another story.
[3] The pumps always ask for a zip code, which is likely where the problems begin for anybody outside of the US. Perhaps the card companies should allow users to register fake ZIP codes for such purposes (I always input the ZIP for an office where I used to work – which is generally OK for buying MTA cards in New York, and a variety of goods that are delivered online).
[4] Luckily for me it seems that I wasn’t charged some extortionate roaming rate for receiving that text.
[5] Why that transation was flagged an the other 4-5 times I bought there weren’t is a total mystery.
[6] Presumably in the belief of security by obscurity – if the bad guys don’t know how the system works then they can’t engineer around it.
[7] Possibly some big data type tools have been used behind the scenes here.
[8] For bonus points the companies should provide an easy way for me to notify travel plans for when I’m buying tickets with another card e.g. my TripIt feed has all of this data. There’s a huge opportunity here for companies to become ‘friends’ in social networks that brings utility beyond just better targeting of ads.


Three screens

17Apr12

I’ve had a run of bad luck with screens recently..

Laptop

The first casualty was my son’s X121e. He brought it home from school one evening saying ‘the screen on my laptop is broken … I didn’t drop it’. My response was of course ‘why are you telling me you didn’t drop it’, and ‘what caused the mark on this corner’.

The constantly flickering screen, which could be made to change if I wiggled it, seemed like symptoms of a loose connector. Luckily Lenovo provides service manuals for its laptops[1], so I set about stripping it down. After about half an hour I reached the point where I was able to reseat the LCD connector and confirm that it was working as it should. Then it was just a matter of another half an hour to put it back together.

Chris 1 : Murphy 0

Smartphone

A few days later my wife came home one evening saying ‘can you take a look at my phone’. The screen had just vertical lines on it, so my first suspicion was another loose connector. Sadly this time around it wasn’t that simple. After swapping things around with my identical (except for the outer shell colour) ZTE Blade I was able to confirm that a replacement screen would sort things out. I’m keeping an eye on eBay for a phone offered for parts/repair with a good screen.

Chris 1 : Murphy 1

iPad

Angry Birds Space came out whilst I was in Florida on holiday. I think it’s a great way of teaching kids about zero-G physics. Sadly my son managed to test Earth gravity whilst playing it by pushing the iPad out of its SmartCover – over a stone floor. Here’s the result:

My immediate thoughts were of having to buy a new iPad (perhaps the new version), and having to make an insurance claim.

After calming down a bit I found out that 1. the iPad was still working and 2. it was possible to get the touch screen replaced. With some more searching I discovered an entire cottage industry of iPad screen fixers, and identified a reasonably close by one that could do the job for $100. A couple of days later I spent and hour and a half in the company of Ruben at RCMA watching him pick bits of glass out of my iPad. Although I’m a keen maker and fixer it’s definitely one of those jobs you don’t want to do without some help (and the Ifixit guides like this are good, but don’t show the full horror of screen removal – especially when it’s so badly broken. Ruben said mine was the worst he’d ever seen, and there was proof in his bin that he’d seen plenty).

I’m not entirely convinced that the WiFi is all it used to be – maybe the antenna suffered from the drop, but at least I have a whole iPad again.

Chris 2 : Murphy 1

Conclusion

Each of these issues looked pretty terminal at first – rendering an expensive gadget useless. In each case though the problem has proven to be fixable at low to moderate cost in time and parts. Not all modern electronics need to be treated as disposable.

[1] X121e service manual.


This should work for any service that only supports POP3S, not just gmail. You’ll need a Linux box/VM (I generally use Ubuntu).

Background

Since the mid 90s I’ve used Ameol to retrieve email. When I started using gmail I forwarded mail on to my ISP’s POP3 service and collected it with Ameol so that I’d have a local copy of my mail. These days I use Ameol pretty infrequently, and sometimes my mailbox fills its quota. This causes gmail to start spewing out retry-timeout messages. Most recently this happened whilst I was on holiday, and I was unable to remotely connect to my PC at home to run Ameol. To clear my email out of the ISP server I configured a Google Apps account to fetch it, expecting to simply switch Ameol to that temporarily when I got home. Unfortunately Ameol is an ancient POP3 client, and doesn’t support SSL connections. I needed some way to convert between regular POP3 on port 110 and POP3S on port 995.

First attempt – perdition

The first POP3 proxy that my searching turned up was perdition. I was able to install this on Ubuntu without trouble, but sadly unable to get it suitable configured. Man pages for documentation are all very well, but it would be great if there were some more obvious examples of how to use the tool for various typical scenarios.

Success – stunnel

Whilst troubleshooting perdition I did a manual connection to gmail’s POP3S service using openssl. This worked fine, and suggested that I needed a very simple proxy application; and that’s what stunnel does.

Installation

I tried just running ‘stunnel’ in case it was already installed. Ubuntu very helpfully said:

The program ‘stunnel’ is currently not installed. You can install it by typing:
apt-get install stunnel4

Configuration

I created the following config file in /etc/stunnel/stunnel.conf:

client = yes
debug = debug
cert = /etc/ssl/certs/stunnel.pem

[pop3s]
accept = host_ip:110
connect = pop.gmail.com:995

Don’t forget to substitute your own host_ip above.

I then had to create the certificate reference above:

openssl req -new -out /etc/ssl/certs/stunnel.pem -keyout /etc/ssl/certs/stunnel.pem -nodes -x509 -days 365

Starting

Before starting I had to edit /etc/default/stunnel4 and change the following line:

ENABLED=1

It was then just a case of running:

/etc/init.d/stunnel4 start

A quick ‘netstat -an’ confirmed that stunnel was listening on port 110

Use

Once stunnel was running on my Ubuntu VM I was then able to configure my POP3 client (Ameol) to connect to host_ip using my Gmail username and password. Job done :)

Update 10 Jan 20

Alexander Traud emailed me to note:

With the latest version of stunnel, “client” and “cert” should not be in the nameless (global) section but within the section pop3s.

Finally, I changed all “host_ip” to “::” because of <https://serverfault.com/q/666712>


This post was inspired by a conversation I had with a VC friend a few weeks back, just as he was about to head out to the Structure conference that would be covering this topic.

Big Data seems to be one of the huge industry buzz phrases at the moment. From the marketing it would seem like any problem can be solved simply by having a bigger pile of data and some tools to do stuff with it. I think that’s manifestly untrue – here’s why…

If we look at data/analysis problems there are essentially three types:

Simple problems

Low data volume, simple algorithm(s)

This is the stuff that people have been using computers for since the advent of the PC era (and in some cases before that). It’s the type of stuff that can be dealt with using a spreadsheet or a small relational database. Nothing new or interesting to see here…

Of course small databases grow into large databases, especially when individual or department scale problems flow into enterprise problems; but that doesn’t change the inherent simplicity. If the stuff that once occupied a desktop spreadsheet or database can be rammed into a giant SQL database then by and large the same algorithms and analysis can be made to work. This is why we have big iron, and when that runs out of steam we have the more recent move to scale out architectures.

Quant problems

Any data volume, complex algorithm(s)

These are the problems where you need somebody that understands algorithms – a quantitative analyst (or quant for short). I spent some time before writing this wondering if there was any distinction between ‘small data’ quant problems and ‘big data’ quant problems, but I’m pretty sure there isn’t. In my experience quants will grab as much data (and as many machines to process it) as they can lay their hands on. The trick in practice is achieving the right balance between computational tractability and time consuming optimisation in order to optimise the systemic costs[1].

Solving quant problems is an expensive business both in computation and brain power, so it tends to be confined to areas where the pay-off justifies the cost. Financial services is an obvious example, but there are others – reservoir simulation in the energy industry, computational fluid dynamics in aerospace, protein folding and DNA sequencing in pharmaceutics and even race strategy optimisation in Formula 1.

Big Data problems

Large data volume, simple algorithm(s)

There are probably two sub types here:

  1. Inherent big data problems – some activities simply throw off huge quantities of data. A good example is security monitoring where devices like firewalls and intrusion detection sensors create voluminous logs for analysis. Here the analyst has no choice over data volume, and must simply find a means to bring appropriate algorithms to bear.The incursion of IT into more areas of modern life is naturally creating more instances of this sub type. As more things get tagged and scanned and create a data path behind them we get bigger heaps of data that might hold some value.
  2. Big data rather than complex algorithm. There are cases when the overall performance of a system can be improved by using more data rather than a more complex algorithm. Google are perhaps the masters of this, and their work on machine language translation illustrates the point beautifully.

So where’s the gap between the marketing hype and reality?

If Roger Needham were alive today he might say:

Whoever thinks his problem is solved by big data, doesn’t understand his problem and doesn’t understand big data[2]

The point is that Google’s engineers are able to make an informed decision between a complex algorithm and using more data. They understand the problem, they understand algorithms and they have access to massive quantities of data.

Many businesses are presently being told that all they need to gain stunning insight that will help them whip their competition is a shiny big data tool. But there can be no value without understanding, and a tool on its own doesn’t deliver that (and it would be foolish to believe that a consultancy engagement to implement a tool helps matters much).

What is good about ‘big data’?

This post isn’t intended to be a dig at the big data concept, or the tools used to manage it. I’m simply pointing out that it’s not a panacea. Some problems need big data, and others don’t. Figuring out the nature of them problem is the first step. We might call that analysis, or we might call that ‘data science’ – perhaps the trick is figuring out where the knowledge border lies between the two.

What’s great is that we now have a ton of (mostly open source) tools that can help us manage big data problems when we find them. Hadoop, Hive, HBase, Cassandra are just some examples from the Apache stable, there are plenty more.

What can be bad about big data?

Many organisations now have in place automated systems based on simple algorithms that process vast data sets – credit card fraud detection being one good example. This has consequences for process visibility and ignoring specific data points that can ruin user experience and customer relationships. I’ll take this up further in a subsequent post, but I’m sure we’ve all at some stage been a victim of the ‘computer says no’ problem where nobody can explain why the computer said no, and it’s obviously a bad call given a common sense analysis.

Conclusion

For me big data is about a new generation of tools that allow us to work more effectively with large data sets. This is great for people who have inherent (and obvious) big data problems. For cases when it’s less obvious there’s a need for some analysis work to understand whether analysis of a larger data set might deliver more value versus using more complex algorithms.

[1] I’ve come across many IT people who only look at the costs of machines and the data centres they sit within. Machines in small numbers are cheap, but lots of them become expensive. Quants (even in small numbers) are always expensive, so there are many situations where the economic optimum is achieved by using more machines and fewer quants.
A good recent example of this is the news that Netflix never implemented the winning algorithm for its $1m challenge.
[2] Original `Whoever thinks his problem is solved by encryption, doesn’t understand his problem and doesn’t understand encryption’


Firstly let me say that I like Linode a lot. They had a promotion running a little while ago which got me going with my first virtual private server (VPS), and I only moved off to somewhere from lowendbox after the promotion because my needs are small (and I wanted to match my spend accordingly)[1]. This tweet probably sums things up perfectly:

I first heard about the incident via Hacker News, where somebody had posted a blog post from one of the victims. The comments on both sites make for some interesting reading about security and liability. Shortly later Linode posted a statement. The true details of what went down, and whether it was an inside job as some speculate, or ‘hackers’ remains to be seen. Clearly whoever perpetrated the attack knew exactly what they were after and went straight for it – what law enforcement would normally call a ‘professional job’.

So… what can we learn from this? Here are some of my initial thoughts:

Physical access === game over

You have to be able to trust your service provider with your data. If that data has a cash equivalent of thousands of dollars then you have to be able to trust them a lot. There’s a special sort of service provider that we normally use for this – one that’s heavily regulated and where the customer (normally) gets reimbursed if mistakes are made – we call these banks. Of course regular banks haven’t got into servicing novel cryptocurrencies like bitcoin. Ian Grigg does a pretty good job of explaining why (and indeed why Bitcoin will slip into a sewer of criminality where this incident is but one example).

Bottom line – if you can’t trust the people that have physical access, then don’t do it.

Admin access === game over

If a service provider provides out of band management tools that provide the equivalent of physical access then you also need to trust whoever has access to that too. whilst it’s not clear yet who perpetrated the attack, it is pretty clear that it was done by subverting the management tools. This is particularly true when the management tool has direct control over security functionality, such as the ability to reset the root password.

In this case mileage may vary. Some VPS admin tools provide the ability to reset passwords, whilst others don’t.

Bottom line: Management tools might be convenient for service providers and their users, but can present a massive security back door for any measures taken on the machine itself.

Passwords === game over

If I look in the logs of any of my VPSs then I see a constant flood of password guessing attacks. This is why I either turn off passwords altogether with passwd -l account or disable password login in the SSH daemon.

It seems that at least some of the victims had chosen long, hard to guess (or brute force) passwords, which can raise the bar on how long an attack takes. Few people are that disciplined though. Passwords (on their own[2]) are evil, and should be avoided at all costs.

Of course SSH keys aren’t a panacea. Private keys need to be looked after very carefully.

Bottom line: Disable passwords, use SSH keys, look after the private key.

Conclusion

It seems to have become best practice in the IaaS business to build machines that only work with SSH keys, and where the management console (and the API under it) don’t have security features than can be used to subvert anything done to secure the machine[3]. These two steps go a long way towards ensuring that security in a VM/VPS has a sound foundation.

There is nothing that can be done about physical access (at least until homomorphic encryption becomes a reality) – so if you can’t trust your service provider (or at least get contractual recompense for any incident) then think again.

[1] Apart from some experimentation I pretty much never actually do much with any VPS that I run. They’re just used as end points for SSH and OpenVPN tunnels for when I want to swerve around some web filters (or keep my traffic from the prying eyes of those running WiFi).
[2] When I was running a VPS on Linode I took the precaution of adding two factor authentication (2FA) using Google Authenticator.
[3] Most management functions have serious implications for integrity and availability, but if they can’t hurt confidentiality then that’s a good start.


I’ve been on the road now for a week and a half, which has brought me into contact with some of the slowest, most expensive Internet access I’ve suffered in some time. I’m used to mobile Internet being expensive and slow, but this has been even worse.

My problems started in an airport lounge in Singapore. I’d forgotten to check before leaving home that my replacement iPhone[1] had my audiobooks on it, and it turned out that they’d been missed from my iTunes sync. I have the Audible app, so I needed to download a book segment of around 90M. This would take a few minutes on my home broadband (which isn’t stunningly fast at around 4.5Mb/s). Unfortunately it was much slowed than that, and the connection kept dropping, and when the connection dropped it wouldn’t always continue – more time and bandwidth wasted as I started over. I didn’t get to listen to my audiobook on the flight – that’s OK, I managed to get some sleep.

When I checked into my hotel I thought my troubles would be over, and indeed I was able to download a few audiobooks. I woke early the following morning (a little before 4am) and thought I’d catch up with Google Reader and Twitter. Things were painfully slow so I ran a quick Speedtest:

Wow! I guess a Skype call home would be out of the question then. This is on an Internet service billed at AUD25/day (plus applicable taxes), and at that time in the morning I can hardly believe that other hotel residents were swamping their pipe. I’d also note that matters didn’t improve over the following hours/days. I complained at checkout, and thankfully the Internet charges were dropped from my bill.

For the purpose of comparison I’d note that AUD30 had got me a PAYG SIM that included 500MB of data on an HSPA network (and 500 voice minutes and unlimited SMS)[2]. So… looks like hotels are up to the same game they play with phones, offering a price point that’s even worse than mobile providers.

Another day another hotel. This time the performance isn’t too bad:

Unfortunately there’s a catch… AUD24/day only gets 100MB. After that you can pay AUD0.10/MB (up to a total daily cap of 1000MB) or switch to a throttled service that performs like this[3]:

Wow again! That’s even worse than the first speed test I did.

For a further comparison I tested the WiFi at the meeting I was attending (no charge, no caps):

That looks to me like a decent ADSL2 service. I’m not too familiar with local broadband pricing, but I expect that costs the same for a month as my hotel broadband is costing for a day or two.

It’s not news that hotels ream their customers for extras like this. But the cost, quality and limitations are pretty shocking. AUD104 for a maximum of 1000MB of data looks like it’s explicitly designed to make movie streaming cost prohibitive (to protect an in room movie distribution monopoly?). Of course (as the SOPA/PIPA advocates continuously fail to appreciate) the Internet isn’t just a medium for media distribution. These limits preclude the downloading of larger apps, and get in the way of desktop video conferencing.

I think the hotels can and should do better than this. What’s kind of perverse here is that the high end places seem to be the worst culprits for this kind of behaviour (whilst many cheaper hotels offer free access to fast pipes). The same is probably also true for many airline lounges.

[1] The original developed an ever growing yellow blotch on the screen, so I sent it back.
[2] The Aussie mobile carriers seem to make it super easy for visitors to buy their services. There were a number of providers with shops right at arrivals in SYD. I wish it were the same elsewhere (but I guess roaming tariffs provide perverse incentives where it’s better to keep somebody as another firm’s customer rather than make them your own).
[3] I was told at check in that Internet was complimentary with my room rate, so I’m not expecting to see the AUD24/day charge, but after only a morning of emailing and reading (no serious video or app usage) I’ve already blown past my 100MB quota, and with work to do I’ve selected the faster more expensive option – it’s unclear whether that will be charged.