Thursday, 1 January 2015

Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman)

Level [C3]

[Disclaimer: I am by no means a Scientific Programmer but I am striving to become one] It is the turn of yet another year and the time is ripe for the last year reviews, predictions for the new year and its resolutions. Last year I made some bold statements and made some radical decisions to start transitioning. I picked up a Mac, learnt some Python and Bash and a year on, I think it was good and really enjoyed it. Still (as I predicted), I spent most of my time writing C#. [working on a Reactive Cloud Actor micro-Framework, in case for any reason it interests you]. Now a year on, Microsoft is a different company: new CEO, moving towards Open Source and embracing non-Windows operating systems. So how it is going to shift the innovation imbalance is a wait-and-see. But anyway, that was last year and is behind us.

Now let's talk about 2015. And perhaps programming in general. Are you sick of hearing Big Data buzzwords? Do you believe Data Science is a pile of mumbo jumbo to bamboozle us and actually used by a teeny tiny number of companies, and producing value even less? IoT is just another hype? I hope by reading the below, I would have been able to answer you. Sorry, no TL;DR

*     *     *

It was a warm, sunny and all around really nice day in June. The year is 2007 and I am on a University day trip (and punting) to Cambridge along with my classmates many of whom are at least 15 years younger than me. Punting is fun but as a part time student this is one of the few times I have a leisurely access to our Image Processing lecturer - a bright and young guy - again younger than me. And I open the discussion with how we have not moved much since the 80s in the field of Artificial Intelligence. We improve and optimise algorithms but there is no game-changing giant leap. And he argues the state of the art usually improves little by little.


"Day out punting in cambridge"

Next year, we work on a project involving some machine learning to recognise road markings. I spend a lot of time on feature extraction and use a 2 layer Neural Network since I get the best result out of it compared to 3. I am told not to use many layers of neurons as it usually gets stuck on a local minima during training - I actually tried and saw it. Overall the result was OK but it involved many pre- and post- processing techniques to achieve acceptable recognition.

*     *     *

I wake up and it is 2014. Many Universities, research organisations (and companies) across the world have successfully implemented Deep Learning using Deep Neural Networks - which have many layers of neurons. Watson answers all the questions in Double Jeopardy. Object Recognition from image is almost a solved case - with essentially no feature extraction.

A Deep Neural Network
Perhaps my lecturer was right: with improving training algorithms and providing many many labeled data, we suddenly have a big leap in science (or was I right?!). It seems that for the first time implementation has got ahead of the mathematics: we do not fully understand why Deep Learning works - but it works. And when they fail, we still don't know why they fail.

And guess what, industry and the academia have not been this close for a long time.

And what has all this got to do with us? Rise of the machine intelligence is going to change programming. Forever.

*     *     *

Honestly, I am sick of the amount of bickering and fanboyism that goes today in the programming world. The culture of "nah... I don't like this" or "ahhh... that is s..t" or "ah that is a killer" is what has plagued our community. One day Angular is super hot next week it is the worst thing. Be it zsh or Bash. Be in vim vs. Emacs vs. Sublime Text vs Visual Studio. Be it Ruby, Node.js, Scala, Java, C#, you name it. And same goes for technologies such as MongoDB, Redis... subjectivism instead of facts. As if we forgot we came from the line of scientists.

Like children we get attached to new toys and with the attention span of a goldfish, instead of solving real world problems, ruminate over on how we can improve our coding experience. We are ninjas and what we do no one can do. And we can do whatever we want to do.

"I have got power"

Yes, we are lucky. A 23-year old kid with a couple of years of programming experience can earn double of what a 45-year old retail manager with 20 years of experience earns annually. And what we do with that money? spend all of it on booze, specialty burgers, travelling and conferences, gadgets - basically whatever we want to.

But those who remember the first .com crash, can tell you it has not always been like this. In fact, back in 2001-2002 it was really hard to get a job. And the problem was, there were many really good candidates. IT industry became almost impenetrable since there was this catch-22 of requiring job experience to get the job experience. But anyway, the good ones, the stubborn ones and those with little talent but a lot of passion (includes me) stayed on for the good days that we have now. Reality was many programmers of the time had read "Access in 24 hours" and landed a fat salary in a big company. And on the other hand, projects were failing since we spent most of our time writing documentation. The industry had to weed out bad coders and inefficient practices.

And we have software craftsmanship movement and agile practices.

*     *     *

The opposition has already started. You might have seen discussions DHH has had with Kent Beck and Martin Fowler on TDD. I do not agree 100% with Erik Meijer says here (only 90%) but there is a lot of truth in it. We have replaced fact-based data-backed attitude with a faith-based wishy-washy peace-hug-freedom hippie agile way, forcing us mechanically to follow some steps and believe that it will be good for us. Agile has taken us a long way from where we started at the turn of the century, but there are problems. From personal experience, I see no difference in the quality of developers who do TDD and do not. And to be frank, I actually see negative effect, people who do TDD do not fully think hard about the consequence of the code they write - I know this could be inflammatory but hand on heart, that is my experience.  I think TDD and agile has given us a safety net that as a tightrope walker, instead of focusing on our walking technique, we improve the safety net. As long as we do the motions, we are safe. Unit tests, coverage, planning poker, retrospective, definition of done, Story, task, creating tickets, moving tickets. How many bad programmers have you seen that are masters of agile?

You know what? It is the mediocrity we have been against all the time. Mediocre developers who in the first .com boom got into the market by taking a class or reading a book are back in a different shape: those who know how to be opinionated, look cool, play the game and take the paycheck. We are in another .com boom now, and if there is a crash, sadly they are out - even if it includes me.


*     *     *

I think we have neglected the scientific side of our jobs. Our maths is rusty and those who did study CompSci do not remember a lot of what they read. We cannot calculate the complexity of our code and fall to the trap that machines are fast now - yes it didn't matter for a time but when you are dealing with petabytes of data and pay by processing hours? When our team first started working on recommendations, the naive implementation took 1000 node for 2 days, now the implementation uses 24 nodes for a few hours, and perhaps this is still way way too much.

"we are craftsmen and craftswomen"

But really, since when did our job look like a craftsman (a carpenter)? We are Ninjas? And we do code Kata to keep our skills/swords sharp. This is all gone too far into the world of fantasy. The world of warcraft. This is now a New Age full-blown religion.

What an utter rubbish.

*     *     *

Now back on earth, languages of the 90s and early 2000 are on the decline. Java, C#, C++ all on the decline. But they are being replaced by other languages such as Scala right? I leave that to you to decide based on the diagram below. 
Google trends of "Java", "Scala", "C#" and "Python Programming" (so that it does not get mixed up with Python the snake) - source: google
The only counter trend is Python. The recent rise in Python popularity is what I call "rise of the scientific programmer" - and that is just one of the signs. Python is a very popular language in the academic space. It is easy to pick up works everywhere and has some functional aspects making it terse. But that is not all: it sits on top of a huge wealth of scientific libraries and it can talk to Java and C as well. Industry innovations have started to come straight from the Universities. From the early 2000s where the academia seemed completely irrelevant to now where it leads the innovation. PySpark has come fully from the heart of Berkeley's University. Many of the contributors to Hadoop code and its wide ecosystem are in the academia.

We are now in need of people who can scientifically argue about algorithms and data (is coding anything but code+data?) and most of them could implement an algorithm given the paper or mathematical notation. And guess what, this is the trend for jobs with "Machine Learning":
Trend of jobs containing "Machine Learning" - Source: ITJobsWatch

And this is really not just Hadoop. According to the source above Machine learning jobs have had 41% rise from 2013 to 2014 while hadoop jobs had only 16%.

This Deep Learning thing is real. It is already here. All those existing algorithms need to be polished and integrated with the new concepts and some will be just replaced. If you can give interactions of a person with a site to a deep network, it can predict with a high confidence whether they are gonna buy, leave or indecisive. It can find patterns in diseases that we as humans cannot. This is what we were waiting for (and we were afraid of?). Machine intelligence is here.

The scientific Programmer [And yes, it has to know more]


Now one might say that the answer is the Data Scientists. True. But first, we don't have enough of them and second, based on first hand experience, we need people with engineering rigour to produce production ready software - something that certainly some Data Scientist have but not all. So I feel that a programmer turned Statistician can build a more robust software than the other way around. We need people who understand what it takes to build a software that you can put in front of millions of customers to use. People who understand linear scalability, SLA, monitoring and architectural constraints.

*     *     *

Horizon is shifting.

We can pick a new language (be it Go, Haskell, Julia, Rust, Elixir or Erlang) and start re-inventing the wheel and start from pretty much the same scratch again because hey, this is easy now, we have done it before and don't have to think. We can pick a new albeit cleaner abstraction and re-implement thousands of hours of hard work and sweat we and the community have suffered - since hey we can. We can rewrite the same HTTP pipeline 1000s of different ways and never be happy with what we have achieved, be it Ruby on Rails, Sinatra, Nancy, ASP.NET Web API, Flask, etc. And keep happy that we are striving for that perfection, that unicorn. We can argue about how to version APIs and how a service is such RESTful and such not RESTful. We can mull over pettiest of things such as semicolon or the gender of a pronoun and let insanely clever people leave our community. We can exchange the worst of words over "females in the industry" while we more or less are saying the same thing, Too much drama.

But soon this will be no good. Not good enough. We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning. We need to man up and re-learn that being a good coder has nothing to do with the number of stickers you have at the back of your Mac. It is all scientific - we come from a long line of scientists, we have got to live up to our heritage.

We need to go and build novelties for the second half of the decade. This is what I hope to be able to do.