Saturday, April 24, 2010

MISE in the Attic

Over the past 30 years, I have learned the hard way that some IT professionals view softwarephysics as being a little too theoretical for practical use. This is all very understandable, my mind used to go blank in lots of my physics classes in college, as I spent the whole hour busily writing down the equations that the professor chalked upon the blackboard, with little understanding at the time of what it all meant in practical terms. A tangible application of softwarephysics makes a much more interesting and compelling case, so let me relate a recent one.

Presently, I am in the IT Middleware Operations group (MidOps) for my current employer. The challenge was how could I apply softwarephysics to this non-development type of work. For the most part, my job involves installing application and infrastructure software and keeping it up and running. In MidOps we do not make the light bulbs; we just screw them in and try to keep them lit. However, early in the development of softwarephysics, I realized that all IT jobs really just boil down to one thing – pushing buttons. Now the funny thing about pushing buttons for a living is that, sadly, not much has changed in IT since I started programming in 1972. At that time, I was pushing buttons on an IBM 029 keypunch machine, a machine that was the size of an office desk, and which was not much smarter. The IBM 029 was a purely mechanical machine that only used electricity to run motors to turn gears, belts and camshafts to punch little square holes into IBM punch cards.

Figure 1 - An IBM 029 keypunch machine like the one I first learned to program on at the University of Illinois in 1972.

Figure 2 - Each card could hold a maximum of 80 bytes. Normally, one line of code was punched onto each card.

Figure 3 - The cards for a program were held together into a deck with a rubber band, or for very large programs, the deck was held in a special cardboard box that originally housed blank cards. Many times the data cards for a run followed the cards containing the source code for a program. The program was compiled and linked in two steps of the run and then the generated executable file processed the data cards that followed in the deck.

Figure 4 - To run a job, the cards in a deck were fed into a card reader, as shown on the left above, to be compiled, linked, and executed by a million-dollar mainframe computer with a clock speed of about 750 KHz and about 1 MB of memory.

Figure 5 - Now I push these very same buttons that I pushed on IBM 029 keypunch machines, the only difference is that I now push them on a $500 machine with 2 CPUs running with a clock speed of 1.8 GHz and 2 GB of memory.

Now I basically push this same set of buttons, only now I push them on a laptop that is less than 1% the size of an IBM 029, but which has a fast dual-core processor running at 1.8 GHz and 2 GB of memory, about 1,000 times the memory of the mainframes that used to process my keypunch cards back in 1972. For more commentary on the dismally slow pace of progress we have seen on the software side of IT over the past 70 years, see my posting So You Want To Be A Computer Scientist?. Anyway, to perform just about any IT job, all you have to do is push the right buttons, in the right sequence, at the right time and with zero errors. In MidOps you also have to be able to push these buttons rather quickly in tense situations during website outages, with impatient upper-level managers listening in on the conference call through the whole thing! As you can imagine, this can make an IT professional rather tense.

Figure 6 - As depicted back in 1962, George Jetson was a computer engineer in the year 2062, who had a full-time job working 3 hours a day, 3 days a week, pushing the same buttons that I have been pushing for the past 31 years as an IT professional.

But it was not supposed to be this way. As a teenager growing up in the 1960s, I was led to believe that in the 21st century, I would be leading the life of George Jetson, who first appeared on ABC-TV Sunday nights from September 23, 1962, to March 3, 1963, in 24 episodes that were later replayed for many decades. George Jetson was a computer engineer in the year 2062, who had a full-time job working 3 hours a day, 3 days a week, pushing buttons. This was three years before the IBM OS/360 was introduced in 1965, so who knew things would turn out quite differently in the 21st century? The Jetsons did get some things right, but certainly not the 9-hour IT workweek! Anyway, little did I realize back in 1962 that, like George Jetson, I would be spending most of my adult life just pushing buttons for a living!

Now, why is pushing buttons for a living so hard? Suppose you are paged into a website outage conference call. In order to resolve the problem, I am going to let you push a maximum of 1,000 buttons. This includes pressing the buttons to find the necessary support documents, logging into a large number of servers, looking at many log files, running diagnostics and health checks, and finally punching in the necessary Unix commands to resolve the problem. Now 1,000 buttons might seem like a lot, but it really is not. This posting itself comes to pushing 18,736 buttons in the right sequence. There are 90 buttons on my laptop, so I can push 1,000 buttons in 901,000 different ways. That comes to:

1.75 x 101954 = 175 with 1952 zeroes behind it!

Remember, as an IT professional, all you have to do is to quickly push the right buttons, in the right sequence, at the right time and with zero errors. How hard can that be? But as I pointed out in Entropy - the Bane of Programmers and The Demon of Software, it is very difficult indeed, and the problem lies with the second law of thermodynamics. There are only a very few sequences of button pushes in the vast number of possible button pushes that will actually get the job done properly, and the odds of you doing that are quite small indeed. Worse yet, in Software Chaos I showed that pushing just one button in the sequence incorrectly can lead to catastrophic results. Now the right sequence of button pushes already exists, all you have to do is find it and execute it perfectly! As we saw in The Demon of Software, in order to accomplish this, we have to turn some unknown information, called entropy, into known information that we can use, and the only way we can do that is to turn an ordered form of energy into the disordered form of energy we call heat. Some of this heat generation will take place in your brain as you think through the problem. A programmer on a 2400 calorie diet (2400 kcal/day) produces about 100 watts of heat sitting at her desk and about 20 – 30 watts of that heat comes from her brain. The remainder of this heat generation will come from pushing the buttons themselves. This puts us in a bit of a bind. The second law of thermodynamics states that it is impossible to turn unknown information, entropy, into useable known information without an accompanying increase in entropy someplace else in the Universe because the total amount of entropy in the Universe must always increase whenever a change is made. So we are forced to extract the correct sequence of button pushes from the vast number of possible button pushes by dumping some entropy into heat as we degrade the high-grade chemical energy in our brains and fingers into heat energy. However, there is another possibility open to us. Why not let the computers dump some high-grade electrical energy into heat energy instead? Why not let the computers push most of the buttons for us?

That is the purpose of the Middleware Integrated Support Environment – MISE ( pronounced like “Mice”). In How to Think Like a Softwarephysicist, I explained how I use a great deal of Unix aliases and Korn shell scripts and Perl programs to do my job. The problem is that over the years I built up a large number of these utilities and I was having a hard time locating the proper ones to use in tough situations when I was under a lot of pressure to perform quickly. Also, it was hard for the other members of MidOps to use this software because they had no way of knowing what aliases were available or what they did. Since the first step in troubleshooting a website problem is to get your hands on the required information, the first thing I did was to include a brief comment about each Unix alias by adding an echo of what the alias did. This killed two birds with one stone. First, the aliases would now display what they were doing when executed, and it also made it easier to find the aliases via the strings that were echoed out. Again, for security purposes, I will pretend that MISE is installed at the mythical UnitedAmoco corporation in support of their www.unitedamoco.com website. So at UnitedAmoco, I could set the following MISE aliases:

alias apc="echo 'cd to the Apache configurations directory';cd /opt/apache/VER302/configs; ls"
alias apcon="echo 'cd to the Apache static content directory';cd /www/content; ls -l"
alias aps="echo 'cd to the Apache servers directory';cd /opt/apache/VER302/servers; ls -l"
alias apb="echo 'cd to the Apache Start/Stop scripts directory';cd /opt/apache/VER302/all_scripts; ls"
alias apl="echo 'cd to the Apache log files for today';cd /www/logs; ls -l | grep $day"
alias vs="echo 'cd to the Visual Sciences Start directory';cd /opt/apache/VER310/visualsciences/; ls -l"

Now when I used my “a” script to look for aliases, I could do so by feeding it the strings that I was interested in:

chit23l1[/home/zscj03]> a apache

apb='echo '\''cd to the Apache Start/Stop scripts directory'\'';cd /apps/apache/scripts; ls'
apc='echo '\''cd to the Apache servers directory'\'';cd /apps/apache/conf; ls -l'
apcon='echo '\''cd to the Apache static content directory'\'';cd /www/content; ls -l'
apl='echo '\''cd to the Apache log files for today'\'';cd /apps/apache/logs; ls -l | grep '\''May 02'\'
aps='echo '\''cd to the Apache servers directory'\'';cd /apps/apache/conf; ls -l'
vs='echo '\''cd to the Visual Sciences Start directory'\'';cd /opt/apache/VER310/visualsciences/; ls -l'

I also modified my “a” script to look for multiple strings to narrow the searches:

chit23l1[/home/zscj03]> a apache log

apl='echo '\''cd to the Apache log files for today'\'';cd /apps/apache/logs; ls -l | grep '\''May 02'\'

Many of the MISE aliases are used to login to the 300+ servers that MidOps supports. MidOps is a bit of a misnomer. MidOps actually supports nearly all of the infrastructure software that lies upon the Unix operating systems of our servers, which includes WebSphere, Apache, JBoss, Tomcat, ColdFusion, MQ, DB2Connect, CTG, and many third-party applications purchased from outside. As you can imagine, it is rather hard to just keep track of all those servers and what they do! So I modified all the aliases that are used to login to a server by including the string “Login” for each. That made it very easy to separate the login aliases from the aliases that did other things. For example, MISE sets the following aliases for a group of Apache webservers:

alias ra1="echo 'Login to EXT CH Prod Apache server1';ssh rchi11l1.ext.unitedamoco.com" #EXT
alias ra2="echo 'Login to EXT CH Prod Apache server2';ssh rchi12l1.ext.unitedamoco.com" #EXT
alias ra3="echo 'Login to EXT CH Prod Apache server3';ssh rchi13l1.ext.unitedamoco.com" #EXT
alias ra4="echo 'Login to EXT CH Prod Apache server4';ssh rchi14l1.ext.unitedamoco.com" #EXT

Now I can find the login aliases for a specific group of Apache webservers using the MISE “al” alias:

chit23l1[/home/zscj03]> al ext ch prod

ra1='echo '\''Login to EXT CH Prod Apache server1'\'';ssh rchi11l1.ext.unitedamoco.com'
ra2='echo '\''Login to EXT CH Prod Apache server2'\'';ssh rchi12l1.ext.unitedamoco.com'
ra3='echo '\''Login to EXT CH Prod Apache server3'\'';ssh rchi13l1.ext.unitedamoco.com'
ra4='echo '\''Login to EXT CH Prod Apache server4'\'';ssh rchi14l1.ext.unitedamoco.com'

MISE beats the second law of thermodynamics in two ways. First it lets you quickly find the correct alias to use, and secondly, by typing in the alias or better yet, doing a Copy/Paste of the alias, it reduces the errors associated with typing in long strings of commands. This also saves valuable time during an outage, when you might have 20 windows open to various servers and need to do many things all at the same time.

For Korn shell scripts and Perl programs, just set an alias to them with the same name. Then include an echo that tells the user what the Korn shell script or Perl program does:

alias lds="echo 'List WAS Datasources in a Cell - run on any node in the Cell';/home/zscj03/bin/lds"

So whenever I find myself doing repetitive button pushing operations, I just create an alias, Korn shell script, or Perl program to perform the same operation. I now have 690 aliases that I use on a regular basis. I also have software that lets me quickly login to the 300+ servers that MidOps supports and push new MISE code to them. I can do a code push in less than 2 minutes, so I can easily do several code pushes each day with little manpower. The MISE “mim” alias points the user to an online MISE Manual that is just a .txt file that I can easily distribute in the same manner.

One of the problems we have in MidOps is the enormity of the infrastructure. We have a large number of WebSphere Cells, and each WebSphere Cell can contain 4-5 servers or nodes, and each node can have hundreds of log files constantly spewing out messages. How can you possibly look at all those log files at the same time during an outage? So many of the MISE aliases are concerned with listing the ends of log files or looking for specific strings in the ends of log files. From a single server, it is possible to run a MISE alias that logs into each of the five servers in a WebSphere Cell and lists a specified number of lines in all the log files or looks for a string in the last lines of all the log files. For example:

rlal 50

would list the last 50 lines in all the application log files on five different WebSphere nodes. Similarly,

rlsys 1000 “ERROR – Exception”

would look for the string “ERROR – Exception” in the last 1000 lines of all the WebSphere SystemOut.log files on the same five servers.

As I have pointed out in numerous postings, the IT meme-complex is an extraordinarily conservative meme-complex that is very resistant to new memes or ideas, and I believe this resistance to new ideas helps to explain the glacial pace of software evolution over the past 70 years, compared to the great strides that have occurred with the evolution of hardware over this same period. Many of the software approaches that we are just stumbling upon today could have been done back in the 1960s if we had only adopted a biological approach to software from the start. So some very conservative IT professionals might object that MISE will cause MidOps to forget how to do things manually! For such individuals, the slow but sure manual approach of pushing lots of buttons is preferred. Pushing lots of buttons manually is certainly slow, but is it also “sure”? Again, I would suggest that such conservative thinking stems from the hazards of IT common sense, while MISE is a product of softwarephysics and its very useful effective theories of software behavior. In IT we are in a constant battle with the second law of thermodynamics and nonlinearity. As I pointed out in SoftwareBiology, living things face these same challenges. Instead of pushing lots of buttons, living things must arrange lots of simple atoms into complex organic molecules, and they must do so with nearly zero defects or catastrophic events will ensue. To do this, living things use many complex biochemical pathways to break down large organic molecules into smaller molecules called monomers, and then they use other biochemical pathways to reassemble these same monomers back into other large organic molecules required for the functions of life. Think of the monomers as “bricks” that can be recycled from an old brick building to later be re-laid into a new pattern for a new brick building. This happens every time you eat something. These biochemical pathways are billions of years old and many are shared by nearly all living things because they are time-tested processes that work!

So living things do not code on the fly - that is far too risky. Similarly, Unix is a very powerful operating system that can do fantastic things by stringing together a few simple Unix commands into a disposable one-time-only program that can be run a single time on the fly to do things, but I cringe whenever I see IT professionals perform such tricks while using production IDs that can easily erase all the installed Applications on a server and WebSphere itself with a single command! I have been using many of the Korn shell scripts and Perl programs called by the MISE aliases for 15 – 20 years, so they are a safe and effective alternative to the risky business of pushing lots of buttons on the fly during tense situations when disastrous mistakes can easily be made. Softwarephysics contends that it is much safer to use well-tested software to push lots of buttons, rather than having error-prone human beings do so on the fly. That alone would be a good enough reason to use MISE, beyond the fact that many times MISE lets you push buttons 10,000 times faster than any error-prone human can.

I hope this provides a simple example of how softwarephysics can be used in an IT Operations setting. Many times IT professionals are like the shoemaker’s children; we are so busy pushing buttons for other people that we have little time to push some buttons for ourselves. But a simple tool like MISE can have a huge payoff in that pushing two or three buttons can effectively do the same work as pushing a million buttons, and all this can be done with just a few lines of Korn shell or Perl code called by an alias or with simple aliases themselves. For example here is a very simple one:

alias e=exit

With that alias, you can just push the “e” button to exit a Unix session. How many times have you pushed four buttons instead? Just start spending a few minutes each day working on some software for yourself, and you won’t impact any of your projects. Plus, as your library of productivity software grows, you will find there will be more minutes available for such work. Also, try to share your tools with the members of your team and be sure to see what other members of your team have come up with too. Several MISE aliases point to Korn shell scripts and Perl programs that were developed by other people at my company, some of which have already departed. It would have been a shame if that software had left with them.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston