internet cloud

A Big Cloud of Data

Our move into the cloud“, said the CFO to her fellow board-members, “was a huge success“. She went on to explain the financials in simple terms, as was her skill. The cost-savings they had made by moving infrastructure to Amazon AWS had been leveraged, she informed them, into growing the business and expanding into new markets.

However“, she admonished looking over her spectacles straight into the eyes of the CTO, “a couple of acquisitions later and we now have half of our infrastructure on AWS, half on Azure, and the other half on Google“. She was used to using accountancy tricks to fudge the numbers, and every corporation does it, so no eyebrows were raised by her mis-calculation.

The CTO visibily shrank into his seat but was called to explain the situation. “Well, er“, he began hesititantly, “she is right“, adding the obligatory “of course“.

We now require three IT teams to manage the systems in spite of not even having the budget for one team. The teams we’ve acquired are each working to their own and frequently conflicting plans and nobody really knows what is going on in IT anymore. The database consultant we bought in told us, for example, that our three customer databases are incompatible with each other. She gave the example of the city of ‘Geneva’ having 124 different spellings in our files.

But can you fix it?“, interjected the CEO.

No. I cannot fix it, Sir. We first need to plan how we are going to clean-up the data, before we can consider a merge. Merging any two of the databases into the third is not, the consultant advised, an option. We need to create an entirely new company-wide schema, clean-up the data, and only then migrate. This will have a knock-on effect to all our existing applications, each of which will need to be validated against the new schema and most of which will require extensive refactoring. But before we do any of that, we will have to decide which platform we want to use. Technically this is all possible but the short answer is that it’s going to take years and we don’t have a fraction of the budget for it“.

Couldn’t we hire some data scientists“, piped up the head of HR drawing bemused looks from the board.

What’s a data scientist?

I read about them on LinkedIn“, she began but didn’t get any further as the members immediately lost interest and spoke over her.

I don’t understand“, admitted the company lawyer, “isn’t this the same problem as if we’ve acquired a company that used Unix? We’d still have to migrate them to Windows.“.

No, it is not the same, it is in addition. We still have all those old problems and choices to make as well as deciding which of the clouds, if any, we are going to continue to use. Before that, we have to establish exactly what we have in our clouds“.

You mean you don’t know what our, how do you call it, application landscape looks like anymore?

That’s about the size of it. We don’t even know how much data we’ve got, and we’re certainly in no position to make statements about its consistency.

Couldn’t you just do a full data scan and filter out the duplicates?

That’s not possible anymore. Some of our IT teams have gone NoSQL deciding for themselves erroneously that ‘No’ means ‘no’ and they have refused to use any kind of relational database. Making it impossible to identify duplicates. This is what the DBA meant with her example of the different spellings of Geneva.“.

So how do these guys store their data?

They dump it willy-nilly on any hard-drive in the company network that happens to have free space available“.

How do they ever access it?

With difficulty

But they can access it?

Partially. But they have to reinvent many wheels each time they want to run a query and get different results each time they run it. But, I’m told, it’s really fast for doing simple queries. Almost as fast as MySQL he said. So long as we don’t try to do any transactions or require reliable results from our searches.

Who said this?

Scrummaster Flash of Kanban B, the certified master data bibliotarian from the lean devops team that we acquired in the merger with Troglodyte Systems in Q2 last year

What kind of queries can we run?

He didn’t say.“.

There is one thing we could do“, suggested the CFO.

All eyes turned to her. “Which is?

Instead of investing in IT, we could use the bail-out money to buy back our own stock. This will push up the price of our shares, in spite of the dire unprofitable state of our IT systems, and ensure we all get our big bonuses.

Sounds great“, the board members nodded in broad agreement, apart from the CTO who asked, “but how are we to manage this mess with the staff we’ve got?

Fuck ’em“, advised the CEO. “We’ll outsource it all to some backward country where the staff either do whatever they are told or they don’t eat“, and the board members nodded in broad agreement.