Since I graduate from college (actually a little bit before that) I always
felt like I was in between worlds. I had a lot of friends from the university that had very
defined profiles and skill sets. Generally, they were either Developers (some
of them very good coders) or they were oriented towards networking and systems
administration (many of them literally hated programming).
Nowadays, I still feel like a hybrid between the Sysadmin and Software
Developer. Even though now I am "Software Architect" in a software
development department, I really do a lot of stuff that is related to the operations part. As my team generally
defines the best practices and the technology to use company-wise, It is
crucial for us to be well versed in sysadmin tasks.
As I mentioned above, my team has to define/develop many of the technologies that the other
development teams use, that includes SCM procedures tools, integration
technologies, application servers, shared libraries, development environments, common functionality
libraries and so on. As you see, many of these activities require a lot of
interaction with operations people. So, we generally end up working, or at
least talking, with the operations crew much more frequently that the common Software Developer. Sometimes, we even have to perform "operation's"
activities on our development environment first, and then send a "How-to" manual to the operation people to perform.
As is common in many enterprises, Operations is clearly separated from
Development and from QA. Moreover, QA is also split in two different
teams: the "Technical QA" which "belongs" to Operations and the "Functional
QA" which belongs to Software Development department . So, my company is no different from a common enterprise, both areas have
different agendas and they both claim to be "aligned" with the "strategic objectives" of the business. However, when it comes to get a new
application to production (which includes getting the hardware, testing,
preparing the environments, and other related task) or simply releasing a new
version of an already existing application into production, all the process (even in
our development department) are either manual, duplicated or extremely tedious. And
generally a combination of all of them.
Currently, we are in the process of improving our Software Development
Methodology. I am in charge of the SCM process definition, but only for the
dev team. But this is no easy task, given the broad scope of the process and that we are basically in diapers when referring to CM. Some could think
that this is a great opportunity for us to automate and improve these process, including
Release Management and Build Engineering (namely, continuous integration, unit
testing and other industry best practices). However, although we certainly would
do better if we apply this to our processes, at the end, we would hit
operations' "wall" nevertheless.
In the beginning I believed operations was a bottle-neck because they were
just not agile as they should, or not as good as they should. But there was another factor I did not take
into account. Operations inherently wants to prevent change, given that they are the
responsible for the stability of the system and they are also rewarded for
that. If you make changes (e.g
deploys, new applications, etc) possibilities of making the whole
system unstable increases. But if you do stop changes (or make the
process of releasing new changes extremely cumbersome) the
change will be accumulative and big changes are far more dangerous than
small ones.
The "wall of confusion" Source
So, what are the specific problems I think we have when comes to our overall
SDLC process:
-
Information silos. We have different systems to handle requirements from
users, task planning, bug database, test results and deployment requirement. That is not necessarily
bad, but none of those systems are connected, so each of them have a
work-flow and database. Therefore, there is no way to trace back a change with a release
to production or to bug (issue).
-
Cumbersome Processes: The process of getting a new application on-line is cumbersome, filled with
a lot formats that ask impossible question to answer.
-
Manual Build & Release Process: Once the application is "on-line" and we want to release a new version of
the software all is done "manually", from the compilation, testing to the
configuration of environments. Some of those task are requested using
ticketing system, asking for the same information over and over (We have even two
different process for asking the same stuff and performing the same
task). In other words a Build and Release process should be created.
-
Manual Environment Management The environment administration is done also manually, and a lot of problems
occurs when a change is requested. Finding the origin of the change is
difficult if not impossible. Environment configuration should be baselined
and handled using appropriate tools. (Like or Chef or Puppet, just to name
some Open Source alternatives). That is, the environments should be first
class citizens and automated process for managing the changes should be put
in place.
-
Dependency Management is not managed at all by operations, I had to create
a Subversion repository with some manual procedures in order to stabilize the
system. (See my older post). We have minimized the problem of "It works on
my machine". However, there is still problems with configuration files that are not
managed at all.
-
Split QA Teams: The QA department is split in two, technical and "functional", each managed
differently and with different tooling and objectives. Many of the test are
manual, making them difficult to re-schedule. There is no Code Coverage or
Automated Unit Testing, only System testing. This not only makes the QA
process slow but also may prevent a proper identification of application
issues. I believe also the communication between teams (QA, Ops and Dev)
can be improved.
We have more or less a broad view of the problems, and It is evident that
if Operations and and Development worked together this could a lot of them
could be tackled. Recently, a new "approach" called DevOps have taken
strength in the last few years. Wikipedia (and the Original
article) defines DevOps as
[…] a set of processes, methods and systems for communication,
collaboration and integration between departments for development
(applications/software engineering), technology operations and quality assurance (QA).
DevOps Approach vs Traditional Thinking:
Based on the This presentation by John Allspaw (from Flickr) here's the traditional
thinking (slightly modified):
Developers job is to create software and add features.
Operations job is keep the site stable and fast.
But the truth is that
Operations job is NOT to keep the site stable and fast.
Operations job is to ENABLE the business
Change is the root of all failures and outages
Basically, what DevOps is saying is that we areas should communicate more,
or at least more assertive, tools and methodologies should be used to
automate from the build process to release process. Which is an area that
"belongs" to the Configuration Management set of processes. In other words, it creates new ways of integration between QA, Operations by
Development by creating a culture, a set of process and automation tools, to lower the risk and support change, instead of avoiding it.
Right now the definition is still somewhat unclear, however, at the end,
what is being proposed is a pattern for managing IT areas.
Let me be clear, I am not arguing that my company should jump into the DevOps approach in
order to succeed. However I do believe that DevOps people propose is
valuable, those thing need to be taken into consideration urgently in my organization. They
are:
-
Serious Change in the culture in both areas.
-
Unified processes: The SDLC should be unified and the appropiate
metrics should be defined.
-
Unified Tooling: Tools using in both sides should be the same, including
versioning, ticketing system and version control system.
-
Version controlled Software Library.
-
Version controlled Environments
-
Automation of Manual Task, including testing and deployment.
-
Assertive communication (whatever that means in context).
Some folks in my organization argue that they are just following ITIL to counter
my argument a about their responsibilities in some of the task I just
mentioned above. Either they forget completely about the ITIL Service Transition process or their interpretation is quite limited to the
minimum effort to get it done. In either way, you can implement the process
in any way you like. The fact that you follow ITIL by doing some archaic process does not make you
better than a organization that does not know what ITIL is, but has a
highly automated and optimized processes. I mean, it is just a matter of
common sense.
To summarize, I believe that Suramericana should re-think the whole
SDLC process, with a strong emphasis in the definition and application of a good
Configuration Management process across the areas (including Operations), with specific emphasis in
the Build Engineering, Release Management, Environment Management processes
and strong emphasis in inter team communication in order to solve the issues
I just described.