Tuesday, February 15, 2011

Cyber Defense

I just finished Cyber War (2010) by Richard Clarke, which I would strongly recommend for all IT professionals. Richard Clarke is most famous for being the counter-terrorism expert whose valiant efforts to warn the Bush Administration of the impending al-Qaeda attack upon the United States by Osama bin Laden went tragically unheeded, as outlined in his book Against All Enemies: Inside America's War on Terror—What Really Happened (2004). Richard Clarke is one of those forward-thinking individuals rarely found in human history. As demonstrated by the court-martial of Billy Mitchell in 1925 for his vociferous promotion of naval airpower in the face of a military high command seemingly obsessed with fighting the last war again, and consequently, in love with the gigantic battleships of the day that later never played a significant role in World War II, these rare individuals always seem to suffer the fate of Cassandra, who was given the gift of prophecy by Apollo, but also the curse that nobody would ever believe her. This time Richard Clarke is describing the dangers of conducting cyber war in cyberspacetime, and this time we need to listen to him before his prophesies come true, rather than after.

The Threat of Cyber War
For me, there is a personal reason for taking Richard Clarke’s warnings seriously. On the morning of September 11, 2001, I was working in the IT department of United Airlines supporting their Customer Interaction Database (CIDB), which was a large Oracle database of customer information fronted by a collection of Tuxedo services. I was in the group that supported the Tuxedo service middleware of the CIDB. Our initial client was the J2EE Weblogic Appservers used to run the www.united.com website. That morning, I was coming back from a walkthrough with Change Management for a large install planned for the coming weekend. When I returned to my cube, I heard some co-workers talking about an airplane crashing into one of the twin towers of the World Trade Center. I was concentrating on the upcoming install and did not pay much attention at the time because I thought it must have been a small private aircraft. Only later did I learn that it really was American Airlines Flight 11 that had crashed into the north tower at 8:46 AM ET. Seventeen minutes later the second tower was hit by our United Airlines Flight 175 at 9:03 AM ET. Then American Airlines Flight 77 crashed into the Pentagon at 9:37 AM ET and United Airlines Flight 93 went down in Shanksville Pennsylvania at 10:03 AM ET. A little later that morning I was in a preparation meeting planning for the upcoming weekend install when our manager walked into the room and told us that both towers of the World Trade Center had collapsed and that all scheduled installs had been canceled by our CIO. For the first time in history, all domestic flights were grounded and international flights were turned back. United Airlines lost about $700 million the first week after the attack, and within about two weeks about half of United’s IT department was laid off. I was one of the survivors since the CIDB was critical to keeping www.united.com up and running. The FBI took up residency in a war room down the hall from me. Over the next few months, I ran SQL queries against the CIDB database to help try to figure out what had happened.

Cyber-Attacks Are Already Happening
Clarke opens with a description of the September 6, 2007 destruction of a secret Syrian nuclear facility designed and constructed by the North Koreans. The Israeli Air Force destroyed the site with F-15 Eagles and F-16 Falcons attacking from the airspace of Turkey. This was a fairly typical event in the real world of human affairs, but there was one very unusual aspect to the attack. The Syrians had spent several billion dollars on a Russian air defense system, and the 1970s era F-15s and F-16s should have lit up the Syrian air defense radar screens like the Big Board in the War Room of Dr. Strangelove (1964), but they did not. That night the Syrian air defense system simply showed normal air traffic over Syria as the Israeli Air Force obliterated the site. Over time, the Syrians slowly realized that the Israelis had hacked their billion dollar air defense system! So what the Syrians saw on the night of September 6, 2007, was actually a computer simulation of a peaceful night over Syria, rather than a country under air attack.

A more recent cyber-attack was the Stuxnet worm that destroyed about 3,000 Iranian centrifuges that were enriching U-235 from natural uranium hexafluoride gas. Apparently, the Stuxnet worm caused the centrifuges to spin in an erratic manner, ultimately destroying the centrifuges, and their ability to work properly in a train of centrifuges to produce enriched U-235. Recall that natural uranium is about 0.7% U-235. Nuclear power plants need pellets with a U-235 content of about 3%, while nuclear weapons require an enrichment of 90% U-235 to be efficient. The problem is that once you have a train of 10,000 centrifuges, all you have to do is keep feeding the uranium hexafluoride gas through the train until you get to the 90% U-235 level. Once you have U-235 at a purity of 90%, all you have to do is slam two chunks together in a pipe bomb like the Little Boy atomic bomb that destroyed Hiroshima.

The Nations of the World Are Already in a Cyber Arms Race
Clarke goes on to explain that just about every defense department of every nation upon the face of the Earth has created a Cyber War department. After all, recruiting a small army of cyber-hackers is a comparatively low-budget affair for the world’s defense departments, even for the lowliest of nations. These cyber-warriors go about creating botnet networks of PCs infected with malware primed to commence DOS (Denial of Service) attacks upon designated websites on command from a central server, and they are also inserting trapdoors and logic bombs into the critical software of foreign nations that is used to run their national infrastructure, like their electrical power grids and banking systems. In the past, Richard Clarke had a lot to do with the nuclear defense policies of the United States, but he points out that the main problem with a massive cyber-attack is that you don’t know for sure who originated the attack. This greatly complicates the problem of deterrence by means of massive retaliation. If a large number of transformers and generators in the United States power grid were to all burn out at the same time, how would you ever know where the cyber-attack came from? As Clarke points out, it might be possible to determine that the attack originated from a small server in Estonia, but who was really behind the attack? If the United States was in a shooting war with China at the time, the most likely suspect would be China, but what if it were really a mischievous North Korea trying to exasperate the situation, and have the United States unleash its cyber arsenal upon China? Not being able to accurately tell where a cyber-attack originated from is a very destabilizing factor, and puts pressures on combatants to launch a first strike before their software infrastructure goes down.

A Possible Cyber Repeat of World War I
Clarke also points out that one of the greatest dangers of cyber war is that the nations of the world may not know what they have already wrought. As I pointed out in The Fundamental Problem of Everything, it seems that a necessary condition for civilization is the rise of the Powers That Be in societies. All civilized societies need the Powers That Be to run things, and all civilizations and societies necessarily have a mechanism for creating their Powers That Be, whether that be through the heredity of individuals within a monarchial line of descent, free and democratic elections of representatives, or through the infighting amongst a number of individuals within the central committee of a ruling oligarchical class. Now there is nothing wrong with having the Powers That Be run things, they just seem to be a necessary condition of civilization. But we must always remember that the Powers That Be are also human, and therefore, are subject to the limitations of human nature. For example, the Powers That Be who ran the world just prior to World War I did not recognize the game-changing effects of the mechanization of warfare. The development of high volume rail systems, capable of quickly transporting large numbers of troops and munitions, the invention of the machine gun, and the arrival of mechanized transport vehicles and tanks, greatly increased the killing power of nation-states. This was not generally recognized by the Powers That Be prior to the catastrophe of World War I, which resulted in 40 million casualties and the deaths of 20 million people for apparently no particular reason at all. Even so, the Powers That Be of the time did correctly recognize that they would not be part of the fighting and dying, wartime services that traditionally were always delegated to the little people in past centuries. Similarly, the Powers That Be who initiated World War II were of course fully aware of the horrors of mechanized warfare, but again, were quite confident that they would not be part of the fighting and dying, so the horrors of World War II even outdid those of World War I. This all changed with the Cold War. Suddenly, with large stockpiles of nuclear weapons on both sides, the Powers That Be realized that this time they would indeed be part of the fighting and dying, and miracle of miracles, World War III never happened! We came close a few times, but thanks to MAD (Mutually Assured Destruction), the Powers That Be always withdrew from the brink. Yes, this may be a very cynical way of looking at things, but given the limitations of human nature, I believe it is a valid model. If the Powers That Be of the world are convinced that they will personally be big losers in a worldwide cyber war, it may never happen.

So how can we extend the strategic concept of MAD to cyber war? I think the worldwide IT community has to let the Powers That Be of the world realize that once a full-scale global cyber war breaks out that, no matter what cyber war peace conventions are in place at the time, it will probably spill over and take out the world financial and banking systems. As I explained in MoneyPhysics, money is not “real”, it is just a collection of bits in cyberspacetime, and every maniacal dictator has his private Swiss bank accounts to protect. Going back to my definition of physical reality:

Physical Reality - Something that does not go away even when you stop believing in it.

Once people lose faith in money and who owns what, the world economy will quickly unravel. After all, as I also pointed out in MoneyPhysics, we nearly did this to ourselves already in the fall of 2008 without a cyber war even taking place! So it is important for IT organizations, like the ACM and IEEE, to acknowledge the possibility of a full-scale global cyber war and its likely effects upon the world economy and financial systems, in order to avoid the equivalent catastrophe of World War I in cyberspacetime. If you Google for “Cyber War” or “Cyber Defense” you will get many millions of hits and find a large number of organizations already devoted to cyber defense, so we have a good start at this. Now a strategic policy of Cyber-MAD would likely dissuade the Powers That Be of nation-states, like an economic version of the Doomsday Machine found in Dr. Strangelove, but what about the would-be cyber-terrorists of the world who would be more than happy to see the economies of the Western world in tatters?

Make Softwarephysics Part of Your Cyber Defense
I think this is where softwarephysics can be of use. Just as a good understanding of nuclear physics was essential during the Cold War, a good understanding of softwarephysics is essential to mount an effective cyber defense against cyber-attacks in cyberspacetime. In The Fundamental Problem of Software, I laid down the three laws of software mayhem:

1. The second law of thermodynamics tends to introduce small bugs into software that are never detected through testing.

2. Because software is inherently nonlinear these small bugs cause general havoc when they reach production.

3. But even software that is absolutely bug-free can reach a critical tipping point and cross over from linear to nonlinear behavior, with disastrous and unpredictable results, as the load on software is increased.

The first two laws of software mayhem are certainly to the advantage of nations planning for a national cyber defense because they predict that software is a very fragile and brittle substance indeed, so trapdoors and logic bombs are subject to the same frailties as all other production software, and are likely to fail just at the moment that you need them. This means that there is a very good chance that many of the trapdoors and logic bombs deployed in advance will fail just when they are being called upon to bring down a nation’s software infrastructure. As all IT professionals in Applications Development or Operations know, the slightest change to the IT infrastructure, like operating system upgrades of servers, patches to database management systems, or changes to firewall rules, load balancers, switches, routers, and domain name servers, as well as changes to infrastructure software like Apache configuration files, J2EE Appserver configuration changes, and changes to messaging and gateway software can all quickly bring the whole thing down like a collapsing house of cards. I am in the Middleware Operations group for my present employer and all day and night long we are in a constant battle to keep it all up and running. We are constantly troubleshooting problems and getting things back online by restarting software that has spontaneously gone into a bad state. And whenever we do an install we are always prepared to back it out when problems arise, and about 10% of the time we do back out installs, even though all the software to be installed went through rigorous unit and integration testing by Applications Development, followed by UAT testing by end-users, and Production Assurance testing by professional testers. So fielding trapdoors and logic bombs that work is not an easy task. Trapdoors can be periodically tested by an enemy, but logic bombs would be very hard to test in advance, and the constant churn of the IT infrastructure will mean that the half-life of trapdoors and logic bombs will most likely be on the order of several months or less. Just like all other production software, trapdoors and logic bombs require constant maintenance to keep them operational, and this is a great advantage to the defender.

On the other hand, the third law of software mayhem is to the advantage of an attacker because it means that it is possible to crash websites with botnet DOS (Denial of Service) attacks by simply overwhelming the website with 10 times its normal level of traffic. The good news is that DOS attacks require an active network of botnet PCs to maintain the necessary high levels of incoming traffic and this allows the defending website to do traceroutes on the incoming packets to find their source and possibly block their traffic. Also DOS attacks are much less damaging than trapdoors and logic bombs, since they primarily are used to bring down websites, but cannot affect the internal software used to run critical infrastructure like the electrical grid, air traffic control systems, financial and banking systems, nuclear power plants, refineries, chemical plants, and the phone systems.

Softwarephysics provides some additional insights into how to mount a cyber defense of a nation’s cyberspacetime. As we saw in Self-Replicating Information and The Fundamental Problem of Everything, the “real world” of human affairs is largely shaped by self-replicating information. It is all about the interplay of the self-replicating information found in genes, memes, and software, and with the fact that software is rapidly becoming the dominant form of self-replicating information on the planet. Since genes, memes, and software are all forms of self-replicating information, much can be learned about defending software against cyber-attacks by looking to the defense mechanisms of genes and memes. Since genes have been around much longer than memes, and consequently, have had much more time to evolve defense mechanisms against invasions by the foreign genes found in the DNA survival machines of bacteria, viruses, and parasites, softwarephysics would suggest starting there. In How to Think Like a Softwarephysicist, I explained that the solution to most IT problems can be found by looking to biology and adopting a biological approach, and this certainly applies to cyber defense as well. First of all, genes do not follow a strategy of “the best defense is a good offense” because genes, like the potential victims of a cyber-attack, do not know where the attacks by foreign genes will be coming from, so trying to take out the enemy in advance with a first strike is meaningless. Instead, genes create an immune system that constantly scans for potential attacks from foreign genes and focuses on defending critical biological infrastructure. Immune systems are composed of billions of specialized cells all working in parallel to defend an organism against invading foreign genes, and the same approach could be applied to the defense of software running in cyberspacetime.

In a similar manner, Richard Clarke suggests that the United States focus on defending a triad of critical infrastructure – the power grid, the Tier 1 ISP providers like AT&T, Verizon, Level 3, Qwest, and Sprint that form the backbone of the Internet, and the internal Department of Defense networks that carry sensitive military information. But how do you do that? Richard Clarke suggests that the power grid should be unplugged from the Internet. That means that a separate secured internal network, run by the Federal government, would need to be created to allow the software that electrical utilities use to communicate with each other for load balancing and the exchange of electrical generation capacity to operate. Disconnecting the electrical power grid from the Internet would make it much harder for cyber-attacks because it would eliminate the innumerable points of unsecured entry allowed by the Internet. If you have ever lost electrical power for an extended period, you know just how important electrical power is. I lost power for three days back in the 1980s when an ice storm hit the Chicago area in the middle of winter. It quickly takes you back to the 18th century with very dim candlelight and no central heat. I at least had natural gas to cook food with and a functional city water supply, but in a full-blown attack on the electrical grid, there would be no power to run the compressors that deliver natural gas in pipelines and no pumps to lift water in city water towers. There would be no gasoline to pump at service stations and very little food on grocery store shelves either. Clarke also recommends defending the core Tier 1 ISP providers by allowing them to do deep-packet scans of packets, looking for malware as the packets pass through their networks. Finally, Clarke recommends that the secured Department of Defense networks be totally isolated from the Internet and further hardened to prevent cyber-attacks upon these critical networks.

Use the Y2K Experience as a Guide
As IT professionals, we can all assist in cyber defense of our nations, since we are the ones on the front lines of defense. This is something the IT community can do. In many respects, it is much like the Y2K challenge of the late 1990s that the worldwide IT community successfully met. My son was born early in 1981, and later that year I was having a discussion with my stock broker about funding his college education. My broker explained that I could buy discounted stripped U.S. Treasury bonds for something like $125 that would pay out $1,000 in my son’s college years. Because inflation was raging at over 11% per year in those days, I could essentially lock in zero coupon bonds that were insured by the Federal government at a guaranteed rate of 12% interest and which could not be called before they matured in the far distant future. So I bought a bunch of stripped U.S. Treasury bonds that matured in 1999, 2000, 2001, and 2002. While I was on the phone with my broker, I commented that it seemed so strange to be talking about years like 2000, 2001 and 2002 because the both of us had only dealt with 20th century years like 1965 for our entire lives. When I got off the phone, I had one of those “uh-oh” moments, as I thought about all the code that I had written with two-digit years that read and wrote files containing dates like 101265 for October 12, 1965. Doing arithmetic in my code, like 65 – 51 = 14 years, worked just great in the 20th century, but would not work for 2002 – 1998 because 02 – 98 was going to yield -96 years instead of 4 years! That’s when the Y2K problem first hit me, but this was back in 1981, so I figured that somebody else would surely fix it all in the far distant future, so not to worry. Now scroll forward to late 1996. I got a call from Amoco’s first Y2K Czar asking me if I would like to join Amoco’s Y2K project. I had written Amoco’s Application Portfolio System with BSDE back in the mid-1980s, so I was familiar with collecting lots of data on Amoco’s systems, and that’s how I got drafted for the Y2K project. Amoco’s first Y2K Czar brought in a consulting company to scan our source code libraries and the initial scans revealed that we indeed had a very serious problem. Like all the other IT departments throughout the entire world, we had strewn our systems with millions of logic bombs all set to go off at the same time as we approached the year 2000. The consulting company we brought in racked up some pretty serious bills scanning our code, but unfortunately was totally clueless about how to fix the code because all of the affected systems were intimately tied together into a huge processing knot. You could not simply fix one system at a time because the systems exchanged data with each other via files and databases, so you had to carefully remediate groups of related systems at the same time and that required an intimate knowledge of the systems that the consulting company did not have. Meanwhile, our Y2K group could not get any help from the Applications Development groups because they were all just trying to survive through the day and were not thinking much past their next install weekend. Besides, our Y2K group had that pricy consulting company that was going to do all the work and fix everything for them! People were still pretty much in denial about the Y2K problem back in 1996. All along our first Y2K Czar kept telling our CIO that the glass was half full, but that we were making steady progress. But in 1997 the Y2K problem began to appear in the IT trade rags, and that got our CIO’s attention. Suddenly our CIO realized that his glass was not half full, it was actually half empty! So our first Y2K Czar was summarily terminated for cause and escorted from the building! Things were a little tense back in those days on the Y2K projects around the world. A few blocks away from Amoco’s headquarters, at CNA insurance, two members of their Y2K team got into a fistfight outside of an elevator and were immediately terminated under CNA’s zero-tolerance policy! Amoco’s second Y2K Czar came in with both barrels blazing. He immediately fired the old consulting company and hired one of the big-gun consulting companies to take its place and come in and finish (really start) the job. Suddenly there were hundreds of young kids swarming all over our Applications Development groups trying to apply the consulting company’s brand-new Y2K methodology. The burn rate for this effort was a little over $2 million/month, and after a few months of that, our CIO decided that our second Y2K Czar should take early retirement. Amoco’s third Y2K Czar came in with a completely different attitude. Instead of charging off in a mad rush in the arms of a consulting company, we spent about a month just sitting around trying to figure out how we could get ourselves out of this mess all by ourselves, without using consulting companies at all. By then we had figured out that the consulting companies really did not know how to fix the Y2K problem at all. Out of these brain-storming sessions we developed an overall strategy to push the Y2K problem down to the grass-roots level of the Applications Development groups within Amoco because they were the only ones who knew Amoco’s systems. So we split up Amoco along its subsidiary lines. I had the Amoco Corporation holding company and its Amoco Production Company subsidiary, for a total of about one third of Amoco’s total number of applications. I had about a dozen Y2K sub-coordinators under me and each of them had a Y2K sub-coordinator under them in each of the Applications Development groups. Our Y2K group then set up the policies and procedures to do the Y2K remediation of Amoco’s software and provided the tools to do the work, but it was the responsibility of each Applications Development group to remediate the software that they supported. One of the tools our Y2K group provided was a database application to keep track of the Y2K remediation efforts for each application. The first thing we did was to classify each application by criticality – High, Medium, or Low and then to focus on the High and Medium applications first. For Y2K certification testing we created a mainframe LPAR and a Y2K lab filled with Unix and Windows servers. Applications were then run through a 2 week flight in the Y2K lab in which we ran the remediated software through a series of 32 critical dates by changing the system clocks on the mainframe LPAR and the Unix and Windows servers. The first date was for the infamous January 1, 1999 or “010199” problem. You see lots of programmers came up with the brilliant trick of using a year date of “99” to signify something special, like the last record in a file, so we had to test for that condition. Naturally, the date January 1, 2000 (010100) was in the list of dates. This new Y2K strategy of Amoco doing its own Y2K remediation and using Applications Development to do the work in a massively parallel manner really worked and Amoco finished up its Y2K remediation by early 1999, just in time for its take over by BP! Again, doing things in a massively parallel manner is a hallmark of taking a biological approach to solve IT problems as outlined in How to Think Like a Softwarephysicist.

So I think there are some lessons that can be learned from the worldwide Y2K effort of the past decade that can be applied to defending against a cyber-attack. First, the nations of the world should set the agenda by defining a national cyber defense policy for dealing with cyber-attacks upon their citizens. This would include raising the awareness of their native IT community to the dangers of cyber-attacks upon the nation and providing defensive software weapons where possible. But ultimately the work needs to be pushed down to the IT professionals of each nation who know and live with the national software infrastructure on a daily basis, especially Applications Development, Operations and IT Security. Just as few IT professionals were thinking about the Y2K problem in 1996, very few IT professionals today are worrying about defending their nation against a worldwide cyber war.

In the meantime, there are a few things we can all do to help defend against cyber-attacks. First of all, as in any dangerous urban situation, be aware of your surroundings and be suspicious of strange behaviors. For example, back in the early 1980s, Amoco sent me out to Denver to convince the manager in charge of the Accounting Department for the Amoco Pipeline company to migrate his accounting processing from the Data General minicomputers that he had in Denver to the Amoco Corporate Timesharing System running on VM/CMS. Amoco was paying a fortune to rent these old Data General minicomputers, and I showed him how he could save a bundle by migrating to the Corporate Timesharing System. For me, it was a real no-brainer decision, like can you tell the difference between a big number and a little number, but no matter how I tried, I just could not convince the manager to migrate his processing. Several years later I learned that the accounting manager had been arrested for arson! Apparently, he had been using the Data General minicomputers to cook the books and embezzle funds from Amoco. He was facing an audit of the Data Generals and got the brilliant idea to set his data center on fire with an incendiary device to burn up the Data Generals and their data. Luckily, the incendiary device failed. So when things don’t seem to make sense, be suspicious of sabotage and contact your IT Security department. Many IT professionals in Applications Development and Operations tend to view IT Security as more of a nuisance than an ally. After all, there is nothing worse than being called into a conference call for an outage, only to find that you cannot login to a server or that your sudo access to production IDs is no longer working because of an IT Security problem. For some reason, the people on the conference call still seem to blame you for not being able to login and fix the problem. But we really need to work with IT Security if outsiders are planting trapdoors and logic bombs in the software we support. So be vigilant and practice good IT security hygiene. Be sure to change your passwords frequently, make sure that your hard drive is encrypted, ensure that the Unix permissions on your files are restricted to allow no more access than is essential, watch for strange executables in your production libraries or updated versions of executables that seem to appear outside of normal install operations, don’t kill the weekly virus scans of your work PC even though they slow down your work when you are under the gun, when designing applications think hard about security and take seriously the idea of a foreign nation assaulting your code, help your IT Security department conduct penetration tests and vulnerability scans of your servers, and be sure to alert your IT Security department whenever you are suspicious of foul play. These are all good IT practices, and since defending your company or organization against a national cyber-attack also defends against cyber-crime, your employer should be very supportive of your efforts.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

No comments: