Tuesday, 14 June 2016

After all, it might not matter - A commentary on the status of .NET


Did you know what was the most menacing nightmare for a peasant soldier in Medieval wars? Approaching of a knight.

Approaching of a knight - a peasant soldier's nightmare [image source]

Famous for gallantry and bravery, armed to the teeth and having many years of training and battle experience, knights were the ultimate war machine for the better part of Medieval times. The likelihood of survival for a peasant soldier in an encounter with a knight was very small. They should somehow deflect or evade the attack of the knight’s sword or lance meanwhile wielding a heavy sword bring about the injury exactly at the right time when the knight passes. Not many peasant had the right training or prowess to do so.


Appearing around 1000 AD, the dominance of knights started following the conquest of William of Normandy in 11th century and reached it heights in 14th century:
“When the 14th century began, knights were as convinced as they had always been that they were the topmost warriors in the world, that they were invincible against other soldiers and were destined to remain so forever… To battle and win renown against other knights was regarded as the supreme knightly occupation” [Knights and the Age of Chivalry,1974]
And then something happened. Something that changed the military combat for the centuries to come: the projectile weapons.
“During the fifteenth century the knight was more and more often confronted by disciplined and better equipped professional soldiers who were armed with a variety of weapons capable of piercing and crushing the best products of the armourer’s workshop: the Swiss with their halberds, the English with their bills and long-bows, the French with their glaives and the Flemings with their hand guns” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
The development of longsword had provided more effectiveness for the knight attack but there was no degree of training or improved plate armour could stop the rise of the projectile weapons:
“Armorers could certainly have made the breastplates thick enough to withstand arrows and bolts from longbows and crossbows, but the knights could not have carried such a weight around all day in the summer time without dying of heat stroke.”
And the final blow was the handguns:
“The use of hand guns provided the final factor in the inevitable process which would render armor obsolete” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
And with the advent of arbalests, importance of lifelong training disappeared since “an inexperienced arbalestier could use one to kill a knight who had a lifetime of training”

Projectile weapons [image source]

Over the course of the century, knighthood gradually disappeared from the face of the earth.

A paradigm shift. A disruption.

*       *       *

After the big promise of web 1.0 was not delivered resulting in the .com crash of 2000-2001, development of robust RPC technologies combined with better languages and tooling gradually rose to fulfill the same promise in web 2.0. On the enterprise front, the need for reducing cost by automating business process lead to the growth of IT departments in virtually any company that could have a chance to survive in the 2000s decade.

In the small-to-medium enterprises, the solutions almost invariably involved some form of a database in the backend, storing CRUD operations performed on data entry forms. The need for reporting on those databases resulted in creating business Intelligence functions employing more and more SQL experts.

With the rise of e-Commerce, there was a need for most companies to have online presence and and ability to offer some form of shopping experience online. On the other hand, to reduce cost of postage and paper, companies started having account management online.

Whether SOA or not, these systems functioned pretty well for the limited functionality they were offering. The important skills the developers of these systems needed to have was good command of the language used, object-oriented coding design principles (e.g. SOLID, etc), TDD and also knowledge of the agile principles and process. In terms of scalability and performance, these systems were rarely, if ever, pressed hard enough to break - even with sticky sessions could work as long as you had enough number of servers (it was often said “we are not Google or Facebook”). Obviously availability suffered but downtime was something businesses had used to and it was accepted as the general failure of IT.

True, some of these systems were actually “lifted and shifted” to the cloud, but in reality not much had changed from the naive solutions of the early 2000s. And I call these systems The Simpleton Swamps.

Did you see what was lacking in all of above? Distributed Computing.

*       *       *

It is a fair question that we need to ask ourselves: what was it that we, as the .NET community, were doing during the last 10 years of innovations? The first wave of innovations was the introduction of revolutionary papers of on BigTable and Dynamo Which later resulted in the emergence of NoSQL movement with Apache Cassandra, Riak and Redis (and later Elasticsearch). [During this time I guess we were busy with WPF and Silverlight. Where are they now?]

The second wave was the Big Data revolution with Apache Hadoop ecosystem (HDFS, Pig, Hive, Mahout, Flume, HBase). [I guess we were doing Windows Phone development building Metro UI back then. Where are they now?]

The third wave started with Kafka (and streaming solutions that followed), Grid Computing platforms with YARN and Mesos and also the extended Big Data family such as Spark, Storm, Impala, Drill, too many to name. In the meantime, Machine Learning became mainstream and the success of Deep Learning brought yet another dimension to the industry. [I guess we were rebuilding our web stack with Katana project. Where is it now?]

And finally we have the Docker family and extended Grid Computing (registry, discovery and orchestration) software such as DCOS, Kubernetes, Marathon, Consul, etcd… Also the logging/monitoring stacks such as Kibana, Grafana, InfluxDB, etc which had started along the way as an essential ingredient of any such serious venture. The point is neither the creators nor the consumers of these frameworks could do any of this without in-depth knowledge of Distributed Computing. These platforms are not built to shield you from it, but to merely empower you to make the right decisions without having to implement a consensus algorithm from scratch or dealing with the subtleties of building a gossip protocol.


And what was it that we have been doing recently? Well I guess we were rebuilding our stacks again with the #vNext aka #DNX aka #aspnetcore. Where are they now? Well actually a release is coming soon: 27th of June to be exact. But anyone who has been following the events closely knows that due to recent changes in direction, we are still - give or take - 9 to18 months far from a stable platform that can be built upon.

So a big storm of paradigm shifts swept the whole industry and we have been still tinkering with our simpleton swamps. Please just have a look at this big list, only a single one of them is C#: Greg Young’s EventStore. And by looking at the list you see the same pattern, same shifts in focus.

.NET ecosystem is dangerously oblivious to distributed computing. True we have recent exceptions such as Akka.net (a JVM port) or Orleans but it has not really penetrated and infused the ecosystem. If all we want to do is to simply build the front-end APIs (akin to nodejs) or cross-platform native apps (using Xamarin studio) is not a problem. But if we are not supposed to build the sizeable chunk of backend services, let’s make it clear here.

*       *       *

Actually there is fair amount of distributed computing happening in .NET. Over the last 7 years Microsoft has built significant numbers of services that are out to compete with the big list mentioned above: Azure Table Storage (arguably a BigTable implementation), Azure Blob Storage (Amazon Dynamo?) and EventHub (rubbing shoulder with Kafka). Also highly-available RDBM database (SQL Azure), Message Broker (Azure Service Bus) and a consensus implementation (Service Fabric). There is a plenty of Machine Learning as well, and although slowly, Microsoft is picking up on Grid Computing - alliance with Mesosphere and DCOS offering on Azure.

But none of these have been open sourced. True, Amazon does not Open Source its bread-and-butter cloud. But considering AWS has mainly been an IaaS offering while Azure is banking on its PaaS capabilities, making Distributed Computing easy for its predominantly .NET consumers. It feels that Microsoft is saying, you know, let me deal with the really hard stuff, but for sure, I will leave a button in Visual Studio so you could deploy it to Azure.


At points it feels as if, Microsoft as the Lords of the .NET stack fiefdom, having discovered gunpowder, are charging us knights and peasant soldiers to attack with our lances, axes and swords while keeping the gunpowder weapons and its science safely locked for the protection of the castle. .NET community is to a degree contributing to the #dotnetcore while also waiting for the Silver Bullet that #dotnetcore has been promised to be, revolutionising and disrupting the entire stack. But ask yourself, when was the last time that better abstractions and tooling brought about disruption? The knight is dead, gunpowder has changed the horizon yet there seems to be no ears to hear.

Fiefdom of .NET stack
We cannot fault any business entity for keeping its trade secrets. But if the soldiers fall, ultimately the castle will fall too.

In fact, a single company is not able to pull the weight of re-inventing the emerging innovations. While the quantity of technologies emerged from Azure is astounding, quality has not always followed exactly. After complaining to Microsoft on the performance of Azure Table Storage, others finding it too and sometimes abandon the Azure ship completely.


No single company is big enough to do it all by itself. Not even Microsoft.

*       *       *

I remember when we used to make fun of Java and Java developers (uninspiring, slow, eclipse was nightmare). They actually built most of the innovations of the last decade, from Hadoop to Elasticsearch to Storm to Kafka... In fact, looking at the top 100 Java repositories on github (minus Android Java), you find 24 distributed computing projects, 4 machine library repos and 2 languages. On C# you get only 3 with claims to distributed computing: ServiceStack, Orleans and Akka.NET.

But maybe it is fine, we have our jobs and we focus on solving different kinds of problems? Errrm... let's look at some data.

Market share of IIS web server has been halved over the last 6 years - according multiple independent sources [This source confirms the share was >20% in 2010].

IIS share of the market has almost halved in the last 6 years [source]

Now the market share of C# ASP.NET developers are decreasing to half too from tops of 4%:

Job trend for C# ASP.NET developer [source]
And if you do not believe that, see another comparison with other stacks from another source:

Comparing trend of C# (dark blue) and ASP.NET (red) jobs with that of Python (yellow), Scala (green) and nodejs (blue). C# and ASP.NET dropping while the rest growing [source]

OK, that was actually nothing, what I care more is OSS. Open Source revolution in .NET which had a steady growing pace since 2008-2009, almost reached a peak in 2012 with ASP.NET Web API excitement and then grew with a slower pace (almost plateau, visible on 4M chart - see appendix). [by the way, I have had my share of these repos. 7 of those are mine]

OSS C# project creation in Github over the last 6 years (10 stars or more). Growth slowed since 2012 and there is a marked drop after March 2015 probably due to "vNext". [Source of the data: Github]

What is worse is that the data showing with the announcement of #vNext aka #DNX aka #dotnetcore there was a sharp decline in the new OSS C# projects - the community is in a limbo situation waiting for the release - people find it pointless to create OSS projects on the current platform and the future platform is so much in flux which is not stable enough for innovation. With the recent changes announced, practically it will take another 12-18 months for it to stabilise (some might argue 6-12 months, fair enough, take what you like). For me this is the most alarming of all.

So all is lost?

All is never lost. You still find good COBOL or FoxPro developers and since it is a niche market, they are usually paid very well. But the danger is losing relevance…

Practically can Microsoft pull it off? Perhaps. I do not believe it is hopeless, I feel a radical change by taking the steps below, Microsoft could materially reverse the decay:
  1. Your best community brains in the Distributed Computing and Machine Learning are in the F# community, they have already built many OSS projects on both - sadly remaining obscure and used by only few. Support and promote F# not just as a first class language but as THE preferred language of .NET stack (and by the way, wherever I said .NET stack, I meant C# and VB). Ask everyone to gradually move. I don’t know why you have not done it. I think someone somewhere in Redmond does not like it and he/she is your biggest enemy.
  2. Open Source good part of distributed services of Azure. Let the community help you to improve it. Believe me, you are behind the state of the art, frankly no one will look to copy it. Someone will copy from Azure Table Storage and not Cassandra?!
  3. Stop promoting deployment to Azure from Visual Studio with a click of a button making Distributed Computing looking trivial. Tell them the truth, tell them it is hard, tell them so few do succeed hence they need to go back and study, and forever forget about one-button click stuff. You are not doing a favour to them nor to yourself. No one should be acknowledged to deploy anything in distributed fashion without sound knowledge of Distributed Computing. 

Last word

So when I am asked about whether I am optimistic about the future of .NET or on the progress of dotnetcore, I usually keep silent: we seem to be missing the point on where we need to go with .NET - a paradigm shift has been ignored by our ecosystem. True dotnetcore will be released on 27th but  after all, it might not matter as much as we so much care about. One of the reasons we are losing to other stacks is that we are losing our relevance. We do not have all the time in the world. Time is short...

Appendix

Github Data

Gathering the data from github is possible but due to search results being limited to 1000to rate-limiting, it takes a while to process. The best approach I found was to list repos by update date and keep moving up. I used a python script to gather the data.

It is sensible to use the number of stars as the bar for the quality and importance of Github projects. But choosing the threshold is not easy and also there is usually a lag between creation of a project and it to gain popularity. That is why the threshold has been chosen very low. But if you think the drop in creation of C# projects in Github was due to this lag, think again. Here is the chart of all C# projects regardless of their stars (0 stars and more):


All C# projects in github (0 stars and more) - marked drop in early 2015 and beyond

F# showing healthy growth but the number of projects and stars are much less than that of C#. Hence here we look at the projects with 3 stars and more:


OSS F# projects in Github - 3 stars or more
Projects with 0 stars and more (possible showing people starting picking up and playing with it) is looking very healthy:


All F# projects regardless of stars - steady rise.


Data is available for download: C# here and F# here

My previous predictions

This is actually my second post of this nature. I wrote one 2.5 years ago, raising alarm bells for the lack of innovation in .NET and predicting 4 things that would happen in 5 years (2.5 years from now):
  1. All Data problems will be Big Data problems
  2. Any server-side code that cannot be horizontally scaled is gonna die
  3. Data locality will still be an issue so technologies closer to data will prevail
  4. We need 10x or 100x more data scientists and AI specialists
Judge for yourself...


Deleted section

For the sake of brevity, I had to delete this section but this puts in context how we have many more hyperscale companies:

"In the 2000s, not many had the problem of scale. We had Google, Yahoo and Amazon, and later Facebook and Twitter. These companies had to solve serious computing problems in terms of scalability and availability that on one hand lead to the Big Data innovations and on the other hand made Grid Computing more accessible.

By commoditising the hardware, the Cloud computing allowed companies to experiment with the scale problems and innovate for achieving high availability. The results have been completely re-platformed enterprises (such as Netflix) and emergence of a new breed of hyperscale startups such as LinkedIn, Spotify, Airbnb, Uber, Gilt and Etsy. Rise of companies building software to solve problems related to these architectures such as Hashicorp, Docker, Mesosphere, etc has added another dimension to all this.

And last but not least, is the importance of close relationship between academia and the industry which seems to be happening after a long (and sad) hiatus. This has lead many academy lecturers acting as Chief Scientists, etc to influence the science underlying the disruptive changes.

There was a paradigm shift here. Did you see it?"

Friday, 13 May 2016

XML or JSON, and that is not the question

So in last couple of days, our .NET community has showed some strong reactions to the announcements in the ASP.NET team stand-up. While Ruby and more recently node community are known to endless dramas on arguably petty issues, it felt that .NET community was also capable of throwing tantrums. For those who are outside .NET community or have not caught up with the news, .NET/ASP.NET team have decided to revert the project.json (JSON) in #DotNetCore back to *.csproj/*.vbproj (XML) and resurrect msbuild. So was it petty in the end?

Some believed it was: they argued all that was changed was the format of the project file and the drama associated with it was excessive. They also pointed out that all the goodness of project.json would be ported to the familiar yet different *.csproj. I call this group the loyalists:

On the other hand, some were upset by the return of the msbuild to the story of .NET development. This portion of the community were arguing that +15-year-old msbuild does not have a place in the modern development. They have been celebrating death of this technology not knowing it was never really dead - I call them msbuild-antagnoists. The first group (loyalists), on the other hand, were flagging that the msbuild would be improved and the experience would be modernised.

Now there were another group of people were frustrated that this decision had been made despite the community feedback and solely based on the feedback of “some customers” behind the closed doors. I call them OSS-apologetics and their main issue was the seemingly lack of weight of the community feedback when it comes to the internal decisions that Microsoft takes as a commercial enterprise - especially in the light of the fact that project.json was announced almost 2 years ago and it was very late to change it.

Now there were yet another group that had invested time and effort (==money?) in building projects and tooling (some of which commercial) and they felt that the rug has been pulled from underneath them and all those hours gone to waste - for the lack of a better phrase I call them loss-bearers. And they were even more upset to see that their loss was accounted as a learning process:
Obviously there is not a great answer for them but it is usually said that it is a very minor part of the whole community who have been living on the bleeding edge and knew it could be coming any minute, as mentioned on the stand-up:


Where do I stand?

I stand somewhere in between. Cannot quite agree with the loyalists since it is not just the question of format. On the other hand, I do not bear any losses since I had decided long time ago that I will skip the betas and pick it up when the train of changes slows down - something not yet in sight.

But I do not think any of the above captures the essence of what has been happening recently. I am on the belief that this decision along with the previous disrupting ones have been important and shrewd business decisions to save the day and contain losses for Mircosoft as a commercial platform - and no one can blame Microsoft for doing that.

I had warned times and times again that the huge amount of change in the API and tooling and no easy migration path will result in dividing the community into a niche progressive #DotNetCore minority and the mainstream commercial majority who would stay on .NET Fx and need years (not months) to move on to #DotNetCore - if at all. And this potentially will create a Python-vs-3-like divide in the community.

The cross from the old .NET to the new #DotNetCore (seemingly similar on the surface yet wildly different at heart) would not be dissimilar to the cross between VB6 to .NET. And what makes it worse is that unlike then, there are many viable alternatives OSS stacks (back then there was only Java and C/C++). This could have meant that the mainstream majority might in fact decide to try an altogether different platform.

So Microsoft as a business entity had to step in and albeit late, fix the few yet key mistakes made at the start and alongside the project during the last 2 years:
  • ASP.NET team to make platform/language decisions and implement features with clever tricks rather the .NET Fx baking such features in the framework itself. An example was Assembly Neutral Interfaces.
  • Ignoring the importance upgrade path for the existing projects and customers
  • Inconsistent, confusing and ever changing layering of the stack
  • Poor and conflicting scheduling messages
  • Using Silverlight’s CoreCLR for ASP.NET resulting in dichotomy of the runtime, something that as far as I know has no parallel in any other language/platform. In the most recent slides I do not see CoreCLR being mentioned anymore yet it might be there. If it is, it will stay a technical debt to be paid later.
All in all it has been a rough ride both for the drivers and the passengers of this journey but I feel now the clarity and cohesion is back and long-standing issues have been addressed now.

Where could I be wrong?

My argument naturally brings these counterarguments:
  • Perhaps had ASP.NET team not pushed the envelope this far by single-handedly crusading to bring modern ideas and courageous undertakings such as cross-platform, we would be having .NET 5 now instead of #DotNetCore.
  • By carrying baggage from the past (msbuild), Microsoft is extending the lifespan its stacks which in the short term will be beneficial to the corporate but since it is not a clean break, in the long term results in dispersion of the community and a need for another redeux.
Hard to answer these arguments since one is a hypothetical situation and the other looks well into the uncertainty of the future. I will leave it to the readers to weigh the arguments.

Last word

It is not possible to hide that none of this has been without casualties. Some confidence lost, community at times upset and overall has not been all rosy as far as the Microsoft’s image in its OSS ventures goes. I did mention old and new Microsoft coming head-to-head, which might not be correct but as Satya Nadella said, culture does not change overnight.

Monday, 8 February 2016

Future of Information: the Good, Bad and Ugly of it

We are certainly at the cusp of a big revolution in the human civilisation - caused by the Information Technology and Machine Intelligence. There are golden moments in the history that have fundamentally changed us: late dark ages for the Astronomy, early renaissance for Physics, 1700-1800s for the Chemistry, late 1800s for the Microbiology, 1950s for the transistors… and the periods get more and more compressed. It looks like a labyrinth where it gets narrower when you get closer to its centre.

Without speculating on what the centre could look like, and considering this could be still a flat line of constant progress, we need to start thinking what the future could look like  - not because it is fun, but because an action could be warranted now. There is no shortage of speculation or commentary, one man can dream, and fathom a far future which might or might not be close to the distant reality. And that is not the point. The point is, as I will outline below, it could be getting late to do what we need to DO. Yes, this is not a sci-fi story…

On one hand, there is nothing new under the sun, and the cycle of change has always been with the mankind since the beginning. We always had the reluctant establishment fighting with the wind of change promoted by the new generation.

Figure 1 - Accelerating change [Source: Wikipedia]

On the other hand, this is the first time in the history that the cycle of change has been reduced to less than a generation (a generation is normally considered 20-25 years). You see, the politicians of the past had time to grow up with the changes, feel the needs, brew new ideas and come up with the solutions. Likewise, the nations have had the time to assimilate and react to the changes in terms of aspirations, occupation and direction as the new changes would not be fully in effect during the one person’s lifetime. What about now? Only a decade ago (pre-iPhone) looks like a century backwards. The cycle of change already looks like to be around 5-10 years [see Figure 1]. And look at our politicians: it is not a coincidence someone like Trump can capture the imagination of a nation in the lack of visionary contenders. The politics as we know it has reached the end of life - IMHO due to lack of serious left-wing ideas - but that is not the topic of this post. The point I am trying to make is politicians are no longer able to propose but the most trivial changes since their view of the world is limited by their lack of understanding of a whole new virtual world being created alongside this physical world whose rules do not exist in any books.

And it is not just the politics that is dropping far behind. Economics in the face of fast cycle of change will be different too. First of all, today’s financial malaise experienced in many developed countries might still be around for years to come. In an age of Keynesian economy and central intervention characterised by low inflation, low growth and abundance of money printed by central banks, it seems the banks are no longer relevant. Current economy sometimes referred to as the Japanisation, which was spotted back in 2011 and 5 year on feels no different. And it is no coincidence that an IMF report finds decreasing efficiency of capital in Japanese Economy - that can be applied elsewhere. Looking at the value of bank stocks provides the glaring fact that they are remnants of institutions from the past. True, they are probably still financing mine and your mortgage but their importance as the cornerstone of development during the previous centuries is gone. Why? Because the importance of capital in a world where there is so much of it around without finding a suitable investment is overrated. With 10-yr US Bonds at around 1.8% and yield on 2-yr German bund at -0.5% (!), an investment with 2% annual return is a bargain. In fact today’s banking is characterised by piling up losses year on year (for example this and this). Looking at the Citigroup or Bank of America’s 10 year chart is another witness to the same decline. In an environment when money is cheap (Because of ZIRP), it cannot be the main driver in the business, as money (and hence banks) is not the scarce commodity anymore. See? We did not even have to mention bitcoin, blockchain or crowdfunding.

Figure 2 - Deutsche Bank Stock since year 2000 [Source: Yahoo Finance]
But beyond our myopic view of the economy focused on the current climate, there is a rhetoric looking at it from a different angle and far into the future, seeing the same pattern. In one interesting essay on Economics of the future, authors find an ever decreasing role for the capital. While mentioning the importance of the suitable labour (in terms of geek labour force, currently the scarcest resource resulting in companies not growing their full potential) could be helpful, it is evident that capital is no more an issue.

In essence what all this means is, if historically the banks as the institutions controlling capital had the upper hand, in the days to come it will be those controlling Information. The future of our civilisation will be surrounding the conflicts to control the Information, on one hand by the state, on the other by the institutions and finally by us and NGOs for the privacy.

The Good

Rate of data acquisition has been described as exponential. This has been mainly with regard to the virtual world and our surroundings but very soon it will be us. From our exact whereabouts to our blood pressure to various hormonal levels and perhaps our emotional status, all will be around very soon. A lot of this is already possible, also known as quantifying self. But it is only a matter of time for this to be for everyone.

It is not difficult to think what it can do to promote health and disease prevention. Even now those who suffer from heart arrhythmia carry devices that can defibrillate their heart if a deadly ventricular fibrillation occurs. The blood pressure, sugar level, various hormonal levels, and all sorts of measurable elements can be tracked. Cancerous cells can be detected in blood (and its source identified) well before it could grow and spread. The plaques in the blood vessels will be identified by the micro devices circulating and any serious stenosis can be identified. Clots in the heart or brain vessels (resulting in stroke) can be detected at the time of formation with a device releasing thrombolytic agents immediately alleviating the problem. Going for extra medical diagnosis could be very similar to how our cars are being serviced today: a device gets connected to the myriad of micro devices in your body and full picture of health status will be immediately visible to the medical staff. You could be walking on a road or in a car, witnessing a rare yet horrific accident (would there be accidents?) and the medical team would know whether you would suffer from PTSD and whether you would need certain therapies for it - they would know where you were and whether you witnessed the incident from your various measurements.

And of course, this is only the medical side. The way we work, entertain ourselves and interact with the outside world will be completely different. It is not very hard to imagine what it will be like: one cheesy way is to just take everything that you do at home and think of adding an automation/scheduling/verbal command to it. From making coffee, to checking information, to entertainment. But I will refrain from limiting your view by my imagination. What it is clear is that the presence and impact of the virtual world will be much more pronounced.

At the end of the day, it is all about the extra information coupled with machine intelligence.

The Bad

This section is not speculating on what it could look like. We can all go and read any of the dystopian books, many to choose from and could be like any or none.

But instead it is about simple reasoning: taking what we know, projecting the rate of change and looking at what we might get. It is very reasonable to think that machine intelligence will be at a point where it can reason very efficiently with a pretty good rate of success. And on the other hand, it is reasonable to think that there will be many many data points for every person. If we as humans can be represented as intelligent machines that turn data plus our characters into decisions, it is not silly to think that if our characters (historical data) known to the machines and the input perceived by us already available via the many many agents present in and around us, it is not unreasonable to think that the systems can estimate your decisions. So when you think of advertising, this gets really frightening since you would know pretty well what the reaction will be if you have enough information. And it is about, how much, how much money do you have to decide on…

You see, the fight for your disposable income (that part of your income that you can choose how to spend) could not be more fierce: it can make or break companies in the future. The future of advertising and the fight for this disposable income is what makes Eric Schmidt to come out and almost say there won’t be online privacy in the future:
"Some governments will consider it too risky to have thousands of anonymous, untraceable and unverified citizens - hidden people - they'll want to know who is associated with each online account... Within search results, information tied to verified online profiles will be ranked higher than content without... even the most fascinating content, if tied to an anonymous profile simply won't be seen because of its excessive low ranking." - The New Digital Age / page 33
And when you see how the top four companies have already moved into media industry, you get it. Your iPhone selects a handful of news items for you to see, Facebook controls your timeline, Amazon is a full-blown media company and Google controls youtube which has overtaken conventional media for the entertainment of the millennials. We must reiterate that none of these companies are by nature evil but when it is to choose between you and their income, it is natural that they will pick the latter. And guess what: they have what the state wants too.

Let’s revisit banking for a moment to clarify the point. Banks have what politicians need: capital to fund ever more expensive political campaigns. And the state has what the banks need: regulation or rather de-regulation which banks thought will help them prosper because they can enter the stock market’s casino with the high street bank deposits (which ironically has been the source of their losses). And above all, the state catches the banks if they fall, as it did in 2008. ECB uses its various funds (EFSF, ESM, etc) to keep the banks in Greece and Italy (and others, soon Germany?) afloat. And catches the stock market when they fall as it has constantly done by various QE measures, interest rate cuts, printing money, etc. In such a financial milieu, where there are cushions all along the path, there is no real risk anymore leading to irresponsible behaviour by the banks. And the party should never end, no wonder Obama could not move an inch towards bringing back some regulation. Heads of state’s financial institutions come from ex-CEOs of the likes of Goldman Sachs. This alliance of the state and banking has contributed to the growth of inequality (ultimately leading to modern slavery) and no wonder, state is not bothered, the state is made up of politicians in alliance with bankers and the bankers.

And what does it have to do with the future of information? Exactly the same thing can happen in the future, only with the state and heads of companies owning the information. If capital no longer holds the power and it is the information, then the alliance of state and info bosses will lead to the modern slavery. States control the legislations and information companies own private data and control the media: each one having what the other needs. 

The Ugly

Why ugly? Because we are already there, almost. First of all, the states have started gathering and controlling information. NSA is just an example. The states have started requesting companies owning the data to provide them. Legislations are under way to prevent effective encryption. This could all look harmless when we are busy checking out our twitter and facebook timelines, but I have already started to freak me out: companies are already started thinking and acting in this area. As we visited, Google's Eric Schmidt is portraying a future where anonymity has little value, either you agree or otherwise you need to speak out.

Going back to the politics, we do not have politicians or lawyers that have a correct understanding of the technology and its implications, and it is not their fault: they were not prepared for it. But soon, very soon, we will have heads of companies turning politicians. Very much like CEOs of the Goldman Sachs, and I do not mean it necessarily in a bad way, why? Because the power will be in the hands of the geeks and by the same token, we need strong oppositions, we need politicians among us to rise to the occasion and lead us safely into the future where we have meaningful legislations protecting our privacy while allowing safe data sharing. Problem is, we had 2500 years or so to think about democracy and government in the physical world (from the Greek and Roman philosophers to now) but we are confronted with a virtual world where the ethics and philosophy are not well-defined and do not quite map to the physical world we live in yet every lawmaker is trying to shoehorn it to the only thing they know about. Enough is enough.

But where do we start from? My point in this post/essay has been to ask the questions, I do not claim to have the answers. We have not yet explored the problems well enough to come up with the right answers... we need the help of think tanks, many of which I see rising amongst us.

We are surrounded by the questions whose answers (like all other aspects of our industry) tangled with so much of our personal opinions. When it comes to the court of law what doesn't matter is your or my opinion. Is Edwards Snowden a hero or a traitor? Was Julian Assange a visionary or an opportunist? What is ethical hacking, and how is it different from unethical, in fact could hacking ever be legitimate? Is Anonymous a bunch of criminals or a collection of selfless vigilantes working for the betterment of the virtual world in lack of a legal alternative? What is the right to privacy, and is there a right to be anonymous?

Needless to say, there could be some quick wins. I think defining privacy and data sharing is one of the key elements. One improvement could be turning small-print legal mumbo jumbo of the terms and conditions to bullet-wise fact sheets. Similar to “Key Fact Sheet” for mortgages where the APR and various fees are clearly defined, we can enforce a privacy fact sheet where the answers to questions such as “My anonymised/non-anonymised data might/might not be sold”, “I can ask for my records can be physically erased”, “My personal information can/cannot be given to third parties”, etc are clearly defined for non-technical consumers, as well as most of the rest of us who rarely read the terms and conditions.

Whatever the solutions, we need to start… now! And it could be already late.

Tuesday, 24 November 2015

Interactive DataViz: Rock albums by the genre since 1960


Interactive DataViz here: http://wiki-rock.azurewebsites.net/top10-album-genres.html
Last week I presented a talk in #BuildStuffLT titled “From Power Chords to the Power of Models” which was a study of the Rock Music by the way of Data Mining, Mathematical Modelling and also Machine Learning. It is such a fun subject to explore, especially for me that Rock Music has been one of my passions since I was a kid.

The slides from the talk is available and the videos will be available soon (although my performance during the talk was suboptimal due to lack of sleep, a problem which seems to be shared by many at the event). BuildStuffLT is a great event, highly recommended if you have never been to. It is a software conference with known speakers such as Michael Feathers, Randy Shoup, Venkat Subramaniam, Pieter Hintjens and this year was the host of Melvin Conway (yeah, the visionary who came up with Conway’s law in 1968) with really mind stimulating talks. You also get a variety of other speakers with very interesting talks.

I will be presenting my talk in CodeMash 2016 so I cannot share all of the material yet but I think this interactive DataViz alone is many many slides in a single representation. I can see myself spending hours just looking at the trends and artist names and their album covers - yeah this is how much I love Rock Music and its history - but even for others this could be fun and also help you discover some new to listen to.

DataViz

This is an interactive percentage-based stacked area chart of top 10 genres in a year, since 1960, where Rock Music as we know it started to appear. That is a mouthful but basically for every year, top 10 genres selected so the dataset contains only those Rock (or related) genres that at some point were among the top 10 genres. You can access it here or simply clone GitHub repo (see below) and host your own.


The data was collected from Wikipedia by capturing Rock Albums and then processing their genres, finding top 10 in every year and then presenting in a chart - I am using Highcharts which is really powerful and simple to use and has a non-commercial license too. The data itself I have shared so you can run your own DataViz if you want to. The license for the data is of course Wikipedia’s, which covers these purposes.



I highly recommend you start with the Visualisation with “All Unselected” (Figure 2) and then select a genre and visualise its rise and fall in the history.


Then you can click on a point (year/genre) to list all albums of that genre for that year (Figure 3). Please note that even when the chart shows 0%, there could be some albums for that genre - which are from a year which that genre was not among the top 10 genres.

Looking at the data in a different way

Here is the 50 years of Rock (starting from 1965) with the selected albums:



Things to bear in mind

  • The data has been captured by capturing all albums for all links found in documents that traversed from the list of rock genres then to the artist pages. As far as I know, the list includes all albums by the major (and minor) rock artists - according to Wikipedia. If you find a missing album (or artist), please let me know.
  • Every album will contribute all its genres to the list. This means if it has genres “Blues Rock” and “Rock”, then it will be counted once for each of the its genres and you can find it if you look at both Rock or Blues Rock genres.
  • Data has some oddities, sometimes an album occurs more than once, mainly due to nuances of data in Wikipedia, there are multiple entries (URLs) for the same document, etc. Data has already been cleansed through many processes and these oddities do not materially change the results. In the future however, there are things that can be done remove these remaining oddities.
  • Again, it is highly recommended that you click the “Unselect All” button and click on the genres that you are interested one by one and explore the name of the albums.
  • Clicking “Select All” or “Unselect All” takes a bit too much time. I am sure it has an easy solution (turn rendering off when changing the state) but have not been able to find it. Expect your PRs!
  • There are some genres in the list which are not really Rock genres. These genres would have been mentioned alongside a rock genre in the album cover or had been a not-so-much-rock album by an otherwise Rock artist.

Code and Data

All code and data published in GitHub. Code uses Highchartsjs, knockoutjs and foundations UI framework. Have fun!

Saturday, 19 September 2015

The Rule of "The Most Accessible" and how it can make you feel better

I remember when I was a kid, I watched a documentary on how to catch a monkey. Basically you dig a hole in a tree, big enough for a stretched monkey hand to go in but not too big that fisted hand can get out and sit and watch.

Source: http://www.tarekcoaching.com/blog/dont-fall-in-the-monkey-trap/


Apart from holes, buttons and levers (things that can be pushed) are concepts very easy for animals to learn. Without getting too Freudian, furrowing and protrusions (holes and buttons) are one of the first concepts we learn.

This is nice when dealing with animals. On the other hand, it can be dangerous - especially for kids. A meat mincer machine has exactly these two: a hole and a button. Without referring to the disturbing images of its victims on internet, it is imaginable what can happen - many children sadly lose their fingers or hands this way. Safety of these machines are much better now but I grew up with a kid who was left with pretty much a claw of his right hand after the accident.

Now, the point is: in confrontation with entities that we encounter for the first time or do not have enough appreciation of their complexity, we approach from the most accessible angle we can understand.  If this phenomenon did not have a name (and it is not BikeShedding, that is different), now it has: Rule of The Most Accessible TMA. The problem is, as the examples tried to illustrate, it is dangerous. Or it can be a sign of mediocrity.

*    *    *

Now what does it have to do with our geeky world?

Have you noticed that in some projects, critical bugs go unnoticed but there are half a dozen bugs raised for the layout being one pixel out? Have you written a document and the main feedback you got was the font being used? Have you attended a technical review meeting which you get a comment on the icons of your diagram? Have you seen a performance test project that focuses only on testing the API because it is the easiest to test? Have you witnessed a code review that results in a puny little comments on namings only?

When I say it can be a sign of mediocrity, I think you know now what I am talking about. I cannot describe my frustration when we replace the most critical with the most accessible. And I bet you feel the same.

Resist TMA in you

You know how bad it is when someone falls into the TMA trap? Then you shouldn't. Take your time, and approach from the angle which matters most. If you cannot comment anything worthwhile then don't. Don't be a hypocrite.

Ask for more time, break down the complexity and get a sense of the critical parts. And then comment.

Fight TMA in others

Someone does TMA to you? Show it to their face. Remind them that we need to focus on the critical aspects first. Ask them not to waste time on petty aspects of the problem.

If it cannot be fight, laugh inside

And I guess we all have cases where the person committing TMA is a manager high up that fighting TMA can have unpleasant consequences. Then you know what? Just remember face of the monkey cartoon above and laugh inside. It will certainly make you feel better :)



Thursday, 27 August 2015

No-nonsense Azure Monitoring in 20 Minutes (maybe 21) using ECK stack

Azure platform has been there for 6 years now and going from strength to strength. With the release of many different services and options (and sometimes too many services), it is now difficult to think of a technology tool or paradigm which is not “there” - albeit perhaps not exactly in the shape that you had wished for. Having said that, monitoring - even to the admission of some of the product teams - has not been the strongest of the features in Azure. Sadly, when building cloud systems, monitoring/telemetry is not a feature: it is a must.

I do not want to rant for hours why and how a product that is mainly built for external customers is different from the internal one which on its strength and success gets packaged up and released (as is the case with AWS) but a consistent and working telemetry option in Azure is pretty much missing - there are bits and pieces here and there but not a consolidated story. I am informed that even internal teams within Microsoft had to build their own monitoring solutions (something similar to what I am about to describe further down). And as the last piece of rant, let me tell you, whoever designed this chart with this puny level of data resolution must be punished with the most severe penalty ever known to man: actually using it - to investigate a production issue.

A 7-day chart, with 14 data points. Whoever designed this UI should be punished with the most severe penalty known to man ... actually using it - to investigate a production issue.

What are you on about?

Well if you have used Azure to deliver any serious solution and then tried to do any sort of support, investigation and root cause analysis, without using one of the paid telemetry solutions (and even with using them), painfully browsing through gigs of data in Table Storage, you would know the pain. Yeah, that's what I am talking about! I know you have been there, me too.

And here, I am presenting a solution to the telemetry problem that can give you these kinds of sexy charts, very quickly, on top of your existing Azure WAD tables (and other data sources) - tried, tested and working, requiring some setup and very little maintenance.


If you are already familiar with ELK (Elasticsearch, LogStash and Kibana) stack, you might be saying you already got that. True. But while LogStash is great and has many groks, it has been very much designed with the Linux mindset: just a daemon running locally on your box/VM, reading your syslog and delivering them over to Elasticsearch. The way Azure works is totally different: the local monitoring agent running on the VM keeps shovelling your data to durable and highly available storages (Table or Blob) - which I quite like. With VMs being essentially ephemeral, it makes a lot master your logging outside boxes and to read the data from those storages. Now, that is all well and good but when you have many instances of the same role (say you have scaled to 10 nodes) writing to the same storage, the data is usually much bigger than what a single process can handle and shoveling needs to be scaled requiring a centralised scheduling.

The gist of it, I am offering ECK (Elasticsearch, ConveyorBelt and Kibana), an alternative to LogStash that is Azure friendly (typically runs in Worker Role), out-of-the-box can tap into your existing WAD logs (as well as custom ones) and with a push of a button can be horizontally scaled to N, to handle the load for all your projects - and for your enterprise if you work for one. And it is open source, and can be extended to shovel data from any other sources.

At core, ConveyorBelt employs a clustering mechanism that can break down the work into chunks (scheduling), keep a pointer to the last scheduled point, pushing data to Elasticsearch in parallel and in batches and gracefully retry the work if fails. It is headless, so any node can fail, be shut down, restarted, added or removed - without affecting integrity of the cluster. All of this, without waking you up at night, and basically after a few days, making you forget it ever existed. In the enterprise I work for, we use just 3 medium instances to power analytics from 70 different production Storage Tables (and blobs).

Basic Concepts

Before you set up your own ConveyorBelt CB, it is better to know a few concepts and facts.

First of all, there is a one-to-one mapping between an Elasticsearch cluster and a ConveyorBelt cluster. ConveyorBelt has a list of DiagnosticSources, typically stored in an Azure Table Storage, which contains all data (and state) pertaining to a source. A source typically is a Table Storage, or a blob folder containing diagnostic data (or other) - but CB is extensible to accept other data stores such as SQL, file or even Elasticsearch itself (yes if you ever wanted to copy data from one ES to another). DiagnosticSource contains connection information for the CB to connect. CB continuously breaks down the work (schedules) for its DiagnosticSources and keeps updating the LastOffset.

Once the work is broken down to bite size chunks, they are picked up by actors (it internally uses BeeHive) and data within each chunk pushed up to your Elasticsearch cluster. There is usually a delay between data captured (something that you typically set in Azure configuration: how often copy data), so you set a Grace Period after which if the data isn't there, it is assumed there won’t be. Your Elasticsearch data will usually be behind realtime by the Grace Period. If you left everything as defaults, Azure copies data every minute which Grace Period of 3-5 minutes is safe. For IIS logs this is usually longer (I use 15-20 minutes).

The data that is pushed to the Elasticsearch requires:
  • An index name: by default the date in the yyyyMMdd format is used as the index name (but you can provide your own index)
  • The type name: default is PartitionKey + _ + RowKey (or the one you provide)
  • Elasticsearch mapping: Elasticsearch equivalent of a schema which defines how to store and index data for a source. These mappings are stored on a URL (a web folder or a public read-only Azure Blob folder) - schema for typical Azure data (WAD logs, WAD Perf data and IIS Logs) already available by default and you just need to copy them to your site or public Blob folder.

Set up your own monitoring suite

OK, now time to create our own ConveyorBelt cluster! Basically the CB cluster will shovel the data to a cluster of Elasticsearch. And you would need Kibana to visualise your data. Here I will explain how to set up Elasticsearch and Kibana in a Linux VM box. Further below I am explaining how to do this. But ...

if you are just testing the waters and want to try CB, you can create a Windows VM, download Elasticsearch and Kibana and run their batch files and then move to setting up CB. But after you have seen it working, come back to the instructions and set it up in a Linux box, its natural habitat.

So setting this up in Windows is just to download the files from the links below, unzip and then running the batch files elasticsearch.bat and kibana.bat. Make sure you expose the ports 5601 and 9200 from your VM, by creating endpoints.

https://download.elastic.co/kibana/kibana/kibana-4.1.1-windows.zip
https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.zip

Set up ConveyorBelt

As discussed above, ConveyorBelt is typically deployed as an Azure Cloud Service. In order to do that, you need to clone Github repo, build and then deploy it with your own credentials and settings - and all of this should be pretty easy. Once deployed, you would need to define various diagnostic source and point them to your ElasticSearch and then just relax and let CB do its work. So we will look at the steps now.

Clone and build ConveyorBelt repo

You can use command line:
git clone https://github.com/aliostad/ConveyorBelt.git
Or use your tool of choice to clone the repo. Then open administrative PowerShell window, move to the build folder and execute .\build.ps1

Deploy mappings

Elasticsearch is able to guess the data types of your data and index them in a format that is usually suitable. However, this is not always true so we need to tell Elasticserach how to store each field and that is why CB needs to know this in advance.

To deploy mappings, create a Blob Storage container with the option "Public Container" - this allows the content to be publicly available in a read-only fashion. 

You would need the URL for the next step. It is in the format:
https://<storage account name>.blob.core.windows.net/<container name>/

Also use the tool of your choice and copy the mapping files in the mappings folder under ConveyorBelt directory.

Configure and deploy

Once you have built the solution, rename tokens.json.template file to tokens.json and edit tokens.json file (if you need some more info, find the instructions here). Then in the same PowerShell window, run the command below, replacing placeholders with your own values:
.\PublishCloudService.ps1 `
  -serviceName <name your ConveyorBelt Azure service> `
  -storageAccountName <name of the storage account needed for the deployment of the service> `
  -subscriptionDataFile <your .publishsettings file> `
  -selectedsubscription <name of subscription to use> `
  -affinityGroupName <affinity group or Azure region to deploy to>
After running the commands, you should see the PowerShell deploying CB to the cloud with a single Medium instance. In the storage account you had defined, you should now find a new table, whose name you defined in the tokens.json file.

Configure your diagnostic sources

Configuring the diagnostic sources can wildly differ depending on the type of the source. But for standard tables such as WADLogsTable, WADPerformanceCountersTable and WADWindowsEventLogsTable (whose mapping file you just copied) it will be straightforward.

Now choose an Azure diagnostic Storage Account with some data, and in the diagnostic source table, create a new row and add the entries below:

  • PartitionKey: whatever you like - commonly <top level business domain>_<mid level business domain>
  • RowKey: whatever you like - commonly <env: live/test/integration>_<service name>_<log type: logs/wlogs/perf/iis/custom>
  • ConnectionString (string): connection string to the Storage Account containing WADLogsTable (or others)
  • GracePeriodMinutes (int): Depends on how often your logs gets copied to Azure table. If it is 10 minutes then 15 should be ok, if it is 1 minute then 3 is fine.
  • IsActive (bool): True
  • MappingName (string): WADLogsTable . ConveyorBelt would look for mapping in URL "X/Y.json" where X is the value you defined in your tokens.json for mappings path   and Y is the TableName (see below).
  • LastOffsetPoint (string): set to ISO Date (second and millisecond MUST BE ZERO!!) from which you want the data to be copied e.g. 2015-02-15T19:34:00.0000000+00:00
  • LastScheduled (datetime): set it to a date in the past, same as the LastOffset point. Why do we have both? Well each does something different so we need both. 
  • MaxItemsInAScheduleRun (int): 100000 is fine
  • SchedulerType (string): ConveyorBelt.Tooling.Scheduling.MinuteTableShardScheduler
  • SchedulingFrequencyMinutes (int): 1
  • TableName (string): WADLogsTable, WADPerformanceCountersTable or WADWindowsEventLogsTable
And save. OK, now CB will start shovelling your data to your Elasticsearch and you should start seeing some data. If you do not, look at the entries you have created in the Table Storage and you will find an Error column which tells you what went wrong. Also to investigate further, just RDP to one of your ConveyorBelt VMs and run DebugView while having "Capture Global Win32" enabled - you should see some activity similar to below picture. Any exceptions will also show in there.


OK, that is it... you are done! ... well barely 20 minutes, wasn't it? :)


Now in case you are interested in setting up ES+Kibana in Linux, here is your little guide.

Set up your Elasticsearch in Linux

You can run Elasticsearch on Windows or Linux - I prefer the latter. To set up an Ubuntu box on Azure, you can follow instructions here. Ideally you need to add a Disk Volume as the VM disks are ephemeral - all you need to know is outlined here. Make sure you follow instructions to re-mount the drive after reboots. Another alternative, especially for your dev and test environments, is to go with D series machines (SSD hard disks) and use the ephemeral disks - they are fast and basically if you lose the data, you can always set ConveyorBelt to re-add the data, and it does it quickly. As I said before, never use Elasticsearch to master your logging data so you can recover losing it.

Almost all of the commands and settings below, needs to be run in an SSH session. If you are a geek with a lot of linux experience, you might find some of details below obvious and unnecessary - in which case just move on.

SSH is your best friend

Anyway, back to setting up ES - after you got your VM box provisioned, SSH to the box and install Oracle JDK:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
And then install Elasticsearch:
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.deb
sudo dpkg -i elasticsearch-1.7.1.deb
Now you have installed ES v 1.7.1. To set Elasticsearch to start at reboots (equivalent of Windows services) run these commands in SSH:
sudo update-rc.d elasticsearch defaults 95 10
sudo /etc/init.d/elasticsearch start
Now ideally you would want to move the data and logs to the durable drive you have mounted, just edit the Elasticsearch config in vim and change:
sudo vim /etc/elasticsearch/elasticsearch.yml
and then (note uncommented lines):
path.data: /mounted/elasticsearch/data
# Path to temporary files:
#
#path.work: /path/to/work

# Path to log files:
#
path.logs:  /mounted/elasticsearch/data
Now you are ready to restart Elasticsearch:
sudo service elasticsearch restart
Note: Elasticsearch is Memory, CPU and IO hungry. SSD drives really help but if you do not have them (class D VMs), make sure provide plenty of RAM and enough CPU. Searches are CPU heavy so it will depend on number of concurrent users using it.
If your machine has a lot of RAM, make sure you set ES memory settings as the default ones will be small. So update the file below and set the memory to 50-60% of the total memory size of the box:
sudo vim /etc/default/elasticsearch
And uncomment this line and set the memory size to half of your box’s memory (here 14GB, just an example!):
ES_HEAP_SIZE=14g
There are potentially other changes that you might wanna do. For example, based on number of your nodes, you wanna set the index.number_of_replicas in your elasticsearch.yml - if you have a single node set it to 0. Also turning off the multicast/Zen discovery since will not work in Azure. But these are things you can start learning about when you are completely hooked on the power of information provided by the solution. Believe me, more addicting than narcotics!

Set up the Kibana in Linux

Up until version 4, Kibana was simply a set of static HTML+CSS+JS files that would run locally on your browser by just opening root HTML in the browser. This model could not really be sustainable and with version 4, Kibana runs as a service on a box, most likely different to your ES nodes. But for PoC and small use cases it is absolutely fine to run it on the same box.
Installing Kibana is straightforward. You just need to download and unpack it:
wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz
tar xvf kibana-4.1.1-linux-x64.tar.gz
So now Kibana will be downloaded to your home directory and be unpacked to kibana-4.1.1-linux-x64 folder. If you want to see where that folder is you can run pwd to get the folder name.
Now to run it you just run the command below to start kibana:
cd bin
./kibana
That will do for testing if it works but you need to configure it to start at the boot. We can use upstart for this. Just create a file in /etc/init folder:
sudo vim /etc/init/kibana.conf
and copy the below (path could be different) and save:
description "Kibana startup"
author "Ali"
start on runlevel [2345]
stop on runlevel [!2345]
exec /home/azureuser/kibana-4.1.1-linux-x64/bin/kibana
Now run this command to make sure there is no syntax error:
init-checkconf /etc/init/kibana.conf
If good then start the service:
sudo start kibana
If you have installed Kibana on the same box as the Elasticsearch and left all ports as the same, now you should be able to go to browser and browse to the server on port 5601 (make sure you expose this port on your VM by configuring endpoints) and you should see the Kibana screen (obviously no data).