Operations vs Development

Since I graduate from college (actually a little bit before that) I always felt like I was in between worlds. I had a lot of friends from the university that had very defined profiles and skill sets. Generally, they were either Developers (some of them very good coders) or they were oriented towards networking and systems administration (many of them literally hated programming).1

Nowadays, I still feel like a hybrid between the Sysadmin and Software Developer. Even though now I am "Software Architect" in a software development department, I really do a lot of stuff that is related to the operations part. As my team generally defines the best practices and the technology to use company-wise, It is crucial for us to be well versed in sysadmin tasks.

As I mentioned above, my team has to define/develop many of the technologies that the other development teams use, that includes SCM procedures tools, integration technologies, application servers, shared libraries, development environments, common functionality libraries and so on. As you see, many of these activities require a lot of interaction with operations people. So, we generally end up working, or at least talking, with the operations crew much more frequently that the common Software Developer. Sometimes, we even have to perform "operation's" activities on our development environment first, and then send a "How-to" manual to the operation people to perform.

As is common in many enterprises, Operations is clearly separated from Development and from QA. Moreover, QA is also split in two different teams: the "Technical QA" which "belongs" to Operations and the "Functional QA" which belongs to Software Development department 2. So, my company is no different from a common enterprise, both areas have different agendas and they both claim to be "aligned" with the "strategic objectives" of the business. However, when it comes to get a new application to production (which includes getting the hardware, testing, preparing the environments, and other related task) or simply releasing a new version of an already existing application into production, all the process (even in our development department) are either manual, duplicated or extremely tedious. And generally a combination of all of them.

Currently, we are in the process of improving our Software Development Methodology. I am in charge of the SCM process definition, but only for the dev team. But this is no easy task, given the broad scope of the process and that we are basically in diapers when referring to CM. Some could think that this is a great opportunity for us to automate and improve these process, including Release Management and Build Engineering (namely, continuous integration, unit testing and other industry best practices). However, although we certainly would do better if we apply this to our processes, at the end, we would hit operations' "wall" nevertheless.

In the beginning I believed operations was a bottle-neck because they were just not agile as they should, or not as good as they should. But there was another factor I did not take into account. Operations inherently wants to prevent change, given that they are the responsible for the stability of the system and they are also rewarded for that. If you make changes (e.g deploys, new applications, etc) possibilities of making the whole system unstable increases. But if you do stop changes (or make the process of releasing new changes extremely cumbersome) the change will be accumulative and big changes are far more dangerous than small ones.3

http://dev2ops.org/storage/WallOfConfusion.png

The "wall of confusion" Source

So, what are the specific problems I think we have when comes to our overall SDLC process:
  • Information silos. We have different systems to handle requirements from users, task planning, bug database, test results and deployment requirement. That is not necessarily bad, but none of those systems are connected, so each of them have a work-flow and database. Therefore, there is no way to trace back a change with a release to production or to bug (issue).
  • Cumbersome Processes: The process of getting a new application on-line is cumbersome, filled with a lot formats that ask impossible question to answer.
  • Manual Build & Release Process: Once the application is "on-line" and we want to release a new version of the software all is done "manually", from the compilation, testing to the configuration of environments. Some of those task are requested using ticketing system, asking for the same information over and over (We have even two different process for asking the same stuff and performing the same task). In other words a Build and Release process should be created.
  • Manual Environment Management The environment administration is done also manually, and a lot of problems occurs when a change is requested. Finding the origin of the change is difficult if not impossible. Environment configuration should be baselined and handled using appropriate tools. (Like or Chef or Puppet, just to name some Open Source alternatives). That is, the environments should be first class citizens and automated process for managing the changes should be put in place.
  • Dependency Management is not managed at all by operations, I had to create a Subversion repository with some manual procedures in order to stabilize the system. (See my older post). We have minimized the problem of "It works on my machine". However, there is still problems with configuration files that are not managed at all.
  • Split QA Teams: The QA department is split in two, technical and "functional", each managed differently and with different tooling and objectives. Many of the test are manual, making them difficult to re-schedule. There is no Code Coverage or Automated Unit Testing, only System testing. This not only makes the QA process slow but also may prevent a proper identification of application issues. I believe also the communication between teams (QA, Ops and Dev) can be improved.

We have more or less a broad view of the problems, and It is evident that if Operations and and Development worked together this could a lot of them could be tackled. Recently, a new "approach" called DevOps have taken strength in the last few years. Wikipedia (and the Original article) defines DevOps as

[…] a set of processes, methods and systems for communication, collaboration and integration between departments for development (applications/software engineering), technology operations and quality assurance (QA).

DevOps Approach vs Traditional Thinking:

Based on the This presentation by John Allspaw (from Flickr) here's the traditional thinking (slightly modified):

Developers job is to create software and add features.
Operations job is keep the site stable and fast.

But the truth is that

Operations job is NOT to keep the site stable and fast.
Operations job is to ENABLE the business
Change is the root of all failures and outages

http://upload.wikimedia.org/wikipedia/commons/4/4e/Devops.png

Basically, what DevOps is saying is that we areas should communicate more, or at least more assertive, tools and methodologies should be used to automate from the build process to release process. Which is an area that "belongs" to the Configuration Management set of processes. In other words, it creates new ways of integration between QA, Operations by Development by creating a culture, a set of process and automation tools, to lower the risk and support change, instead of avoiding it. Right now the definition is still somewhat unclear, however, at the end, what is being proposed is a pattern for managing IT areas.

Let me be clear, I am not arguing that my company should jump into the DevOps approach in order to succeed. However I do believe that DevOps people propose is valuable, those thing need to be taken into consideration urgently in my organization. They are:

  • Serious Change in the culture in both areas.4
  • Unified processes: The SDLC should be unified and the appropiate metrics should be defined.
  • Unified Tooling: Tools using in both sides should be the same, including versioning, ticketing system and version control system.
  • Version controlled Software Library.
  • Version controlled Environments
  • Automation of Manual Task, including testing and deployment.
  • Assertive communication (whatever that means in context).

Some folks in my organization argue that they are just following ITIL to counter my argument a about their responsibilities in some of the task I just mentioned above. Either they forget completely about the ITIL Service Transition process or their interpretation is quite limited to the minimum effort to get it done. In either way, you can implement the process in any way you like. The fact that you follow ITIL by doing some archaic process does not make you better than a organization that does not know what ITIL is, but has a highly automated and optimized processes. I mean, it is just a matter of common sense.

To summarize, I believe that Suramericana should re-think the whole SDLC process, with a strong emphasis in the definition and application of a good Configuration Management process across the areas (including Operations), with specific emphasis in the Build Engineering, Release Management, Environment Management processes and strong emphasis in inter team communication in order to solve the issues I just described.

Footnotes:

1 There was also the ones who did not think to hard about that, they were more attracted towards the "management" of software development.

2 I really believe that QA department should be a completely different area, as it seems to be the best practice.

310 Deploys Per Day - Dev and Ops Cooperation at Flickr by John Allspaw

4 I really like the hard way recipe of Ted Dziuba in his DevOps bashing article, which includes putting developers on call rotations, making develop in the same operating system they deploy and that a making sure a downtime never happens again.

Comments

Comments powered by Disqus