The CERT Coordination Center (CERT-CC)
reports that, despite increased awareness, the first time many
organizations start thinking about how to handle a computer security
incident is
an intrusion has occurred.
Obviously, this isn't a great approach. You need a plan for how
you're going to respond to a computer security incident at your
site, and you need to develop that plan well before an incident
occurs.
There isn't room here to detail everything you need to know to
deal with a security incident: attacks are many and varied and change
constantly; responding to them can involve a byzantine assortment of
legal and technical issues. This chapter is intended to give you an
outline of the issues involved and the practical steps you can take
ahead of time to smooth the process. Appendix A, "Resources",
provides a list of resources that may provide additional help.
27.1. Responding to an Incident
This section discusses a number of steps you'll need to take
when you respond to a security incident. You won't necessarily
need to follow these steps in the order they're given, and not
all of these steps are appropriate for all incidents. But, we
recommend that you at least contemplate each of them when you find
yourself dealing with an incident.
In Section 27.4, "Planning Your Response", later in this
chapter, we'll look again at each of these steps and help you
figure out how to work them into the overall response plan that you
should develop before an incident actually occurs.
Rules for Incident Response
In their book Practical UNIX & Internet
Security, Simson Garfinkel and Gene Spafford provide two
excellent, overriding rules for incident response. Keep these rules
in mind as you read this chapter and during any real-life incident
response:
- Rule 1: Don't Panic!
- Rule 2: Document!
|
27.1.1. Evaluate the Situation
The first step in responding to a security incident is to decide what
response, if any, needs to be made immediately. Ask these questions:
- Has an attacker succeeded in getting into your systems?
- If so, you have a genuine emergency on your hands, whether or not the
attacker is currently active.
- Is the attack currently in progress?
- If so, you need to decide how you're going to react right now.
If the attack isn't currently in progress, you may not be in
such a hurry.
If the incident looks like an aggressive attack on your system, you
probably want to take strong steps quickly. These steps might include
shutting down the system or your Internet connection until you figure
out how to deal with the situation.
On the other hand, if the incident is a less aggressive one --
perhaps someone has just opened a Telnet connection to your machine
and is trying various login/password pairs -- then you may want
to move more slowly. If you're reasonably confident that the
attack won't succeed (e.g., you can see that the attacker is
trying passwords that consist of all lowercase letters, and you know
for certain that no account on the system has such a password), you
might want to leave things alone and just watch for a while to see
what the attacker does. This may give you an opportunity to trace the
attack. (However, see the Section 27.3, "Pursuing and Capturing the Intruder"
section, later in this chapter, for a discussion of the issues
involved in tracing an attack.)
Whatever you do, remember Rule 1: Don't panic!
27.1.2. Start Documenting
As soon as you determine that you actually have a problem that you
need to respond to, start documenting what's going on. You
don't need to get fancy at this point (you don't have
time to, until you've taken the next step), but you should at
least start a log by making a note of what time it is.
27.1.3. Disconnect or Shut Down, as Appropriate
Once
you've evaluated the situation, your next priority is to give
yourself the time to respond without risking your systems further.
The least disruptive alternative is usually to disconnect the
affected machine from all networks; this will shut down any active
connections. Shutting down active connections may make it harder to
trace the intruder, but it will allow the rest of the people at your
site to continue to do their work, and it will leave the
intruder's programs running. This may help you to identify who
the intruder might be.
If you're afraid that other machines have been compromised or
are vulnerable to the same attack, you'll probably want to
disconnect as many machines as you can as a unit. This may mean
taking down your connection to the Internet, if possible. If your
Internet connection is managed elsewhere in your organization, you
may need to detach just your portion of the network, but you'll
also need to talk to other parts of your organization as soon as
possible to let them know what's happening.
In some situations, you may want to shut down the compromised system.
However, this action should be a last resort for a number of reasons:
- It destroys information you may need.
- You won't be able to analyze or fix the machine while
it's down; you'll have to disconnect it from the network
eventually anyway to bring it back up again.
- It's even more disruptive to legitimate users than removing the
network connection.
- It protects only one machine at a time. (It's much easier to
cleanly disconnect a set of systems than to cleanly shut them down.)
Even if you're responding to an incident that has already
ended, you still might want to disconnect or shut down the system, or
at least close it to users, while you analyze what happened and make
any changes necessary to keep it from happening again. This will keep
you from being confused by things users are doing, and it will
prevent the intruder from returning before you're done.
27.1.4. Analyze and Respond
Your next priority is to start to fix what's gone wrong. The
first step in actually correcting the problem is to relax, think for
a while, and make sure you really understand what's happening
and what you're dealing with. The last thing you want to do is
make the situation worse by doing something rash and ill considered.
Whatever corrective actions you're contemplating, think them
through carefully. Will they really solve the problem? Will they, in
turn, cause other problems?
When you're working in an unusual, high-stress situation like
this, the chances increase of making a major error. Because
you're probably going to be working with system privileges (for
example, working as root on a Unix system), the consequences of an
error could be serious.
There are several ways you can reduce the chances of making an error.
One good way is to work with a partner; each of you can check the
other's commands after they're typed but before
they're executed. Even if you're working alone, many
people find that reading commands aloud and checking the arguments in
reverse order before executing them helps avoid mistakes. Resist the
temptation to try to work fast. You will go home sooner if you work
slowly and carefully.
Try not to let your users get in the way of your response. You may
want to give someone the specific job of dealing with user inquiries
so the rest of your response team can concentrate on responding to
the incident.
Also, try to keep your responders from tripping over each other. Make
it clear which system managers and investigators are working on which
task, so they won't step on each other's toes (or wind up
unintentionally chasing each other as part of the investigation!).
27.1.5. Make "Incident in Progress" Notifications
You're not the only person who
needs to know what's going on. A number of other people --
in a number of different places -- have to be kept informed.
27.1.5.1. Your own organization
Within your own organization are people who need to know that
something is happening: management, users, and staff. At the very
least, let them know that you are busy responding to an incident and
that you may not be available to them for other matters. They usually
need to know why they're being inconvenienced and what they
should do to speed recovery (even if the only thing they can do is to
go away and leave you alone).
It is particularly important that management and other staff know
what's going on. Otherwise, you risk having them act in
opposition to you. For instance, if you've disconnected the
Internet connection, the chances are high that somebody's going
to notice the service outage and try to fix it. That's a
problem if it's another staff member, but it can be a disaster
if it turns into a management requirement.
If people call management to complain about some side effect of your
response, and the manager they get has been briefed about
what's going on, the chances are that the manager will defend
your need to make a response. At worst, the manager will make a
reasoned decision about the importance of incident response versus
other needs of the company. However, if the manager doesn't
know what's going, he or she will probably respond the same way
the manager would to any other network outage: "Gee,
that's terrible, we'll fix it as soon as possible."
The manager has then promised the user something, and the chances are
very small that the manager will go back on that promise. Instead,
your response will be curtailed by the need to restore service as
soon as possible.
Depending on the nature of your site and the incident in question,
you may also need to inform your legal, audit, public relations, and
security departments. You will always want to contact the security
department if:
If multiple computer facilities are at your site, you'll need
to inform the other facilities as soon as possible; they are likely
sources and future targets for similar attacks.
27.1.5.2. CERT-CC or other incident response teams
If your organization is served by an
incident response team such as CERT-CC, or has its own such team, let
them know what's going on and try to enlist their aid. (For
instructions on how to contact CERT-CC or another response team, see
Appendix A, "Resources".) What steps response teams can take to
help you will depend on the charter and resources of the response
team. Even if they can't help you directly, they can tell you
whether the attack on your site looks as if it is part of a larger
pattern of incidents. In that case, they may be able to coordinate
your response with the responses of other sites.
27.1.5.3. Vendors and service providers
You might want to get in touch with your vendor support contacts or
your
Internet service
provider(s) if you think they might be able to help or should be
aware of the situation. For example, if the attackers appear to be
exploiting an operating system bug, you should probably contact the
vendor to see if they know about it and have a fix for it. At the
very least, they'll be able to warn other sites about the bug.
Similarly, your Internet provider is unlikely to be able to do much
about your immediate problem, but they may be able to warn other
customers. There is also a possibility that your Internet provider
has itself been compromised, in which case, they need to know
immediately. Your vendors and service provider may have special
contacts or procedures for security incidents that will yield much
faster results than going through normal support channels.
You may get little or no visible response when you make these
reports. This might be because you're being ignored or because
companies are putting self-defense before the interests of their
customers. On the other hand, it's often due to sensible
precautions that are intended to make certain that problems are not
publicized before fixes are available (jeopardizing places not yet
under attack), that the fixes that are made are appropriate to the
problem, and that attackers don't get valuable information by
pretending to be sites under attack. You might as well give your
suppliers the benefit of the doubt, since it's almost
impossible to tell which of these is going on.
27.1.5.4. Other sites
Finally, if the incident appears to involve other sites -- that
is, if the attack appears to be coming from a particular site, or if
it looks as if the attackers have gone after that site after breaking
into yours -- you should inform those other sites. These sites
are usually easy to identify as the sources or destinations of
connections. It's often much harder to figure out how to find
an actual human being with some responsibility for the computer in
question, who is awake and reachable and has a common language with
you.
Once again, you may get little or no apparent response for any number
of different reasons, some of them annoying and reprehensible, and
some of them perfectly sensible. The other site may not care whether
their users are attacking you, or they may care desperately but have
no way of telling you about it without revealing information to the
attackers. While it's always nice to get somebody who makes an
immediate, visibly effective response and thanks you promptly for the
information, don't expect it and don't be upset when you
don't get it.
If you don't know who to inform, talk to your response team (or
CERT-CC). They will probably either know or know how to find out, and
they have experience in calling strangers to tell them they have
security problems.
27.1.6. Snapshot the System
Another early step to take is to
make a "snapshot" of each compromised system. You might
do so by doing a full backup to tape or by copying the whole system
to another disk. In the latter case, if your site maintains its own
spare parts inventory, you might consider using one of the spares for
this purpose, instead of a disk that is already in use and might
itself turn out to have been compromised.
The snapshot is important for several reasons:
- If you misdiagnose the problem or blow the recovery, you can always
get back to the time of the snapshot.
- The snapshot may be vital for investigative and legal proceedings. It
lets you get on with the work of recovering the system without fear
of destroying evidence.
- You can examine the snapshot later, after you're back in
operation, to determine what happened and why.
Because the snapshot may become important for legal proceedings, you
need to secure the evidence trail. Here are some
guidelines:
[187]
- Uniquely identify (label) the snapshot media and put the date, time,
your name, and your signature on it.
- Write-protect the media -- permanently, if possible.
- Safeguard the media against tampering (for example, put it in a
locked container) so that if and when you hand it over to
law-enforcement or other authorities, you can tell them whose custody
the media has been in and why you're certain it hasn't
been tampered with since it was first created.
It's a good idea to set aside an adequate supply of fresh media
just for snapshots because you never know when you're going to
need to produce one. It's very frustrating to respond to an
incident, and be ready to do the snapshot, only to discover that the
last blank tape got used for backups the day before and the new order
hasn't come in yet.
27.1.7. Restore and Recover
Finally, you're at the point of
actually dealing with the incident. What do you do? It depends on the
circumstances. Here are some possibilities:
- If the attacker didn't succeed in compromising your system, you
may not need to do much. You may decide not to bother reacting to
casual attempts. You may also find that your incident was actually
something perfectly innocent, and you don't need to do anything
at all.
- If the attack was a particularly determined one, you may want to
increase your monitoring (at least temporarily), and you'll
probably want to inform other people to watch out for future
attempts.
- If the attacker became an intruder (that
is, he or she actually managed to get into your computers),
you're going to need to at least plug the hole the intruder
used, and check to make certain he hasn't damaged anything or
left anything behind.
At worst, you may need to rebuild your system from scratch. Sometimes
you end up doing this because the intruder damaged things,
purposefully or accidentally. More often, you'll rebuild your
system because it's the only way to ensure you have a clean
system that hasn't been booby-trapped. Most intruders start by
making sure they'll be able to get back into your system, even
if you close their initial entry point. As a result, your systems may
be compromised even if the intruder was present for only a short
time.
TIP:
Always assume that intruders have created back doors into your system
so that they can get back in again easily. It's one of the
first things many intruders do when they break in to a system.
If
you need to rebuild your system, first ensure that your hardware is
working properly. You want to make sure it passes all relevant
self-tests and diagnostics; you don't want to restore onto a
flaky system. A reinstall may reveal previously unnoticed hardware
problems. For instance, a disk may have bad spots that are in unused
files. When you reinstall the operating system, you will attempt to
write over the bad parts, and the problem will suddenly become
apparent.
Next, make sure you are using trusted
media and programs, not necessarily your last backup, to restore the
system. Unless you are absolutely sure that you can accurately date
the first time the intruder accessed your system, you don't
know whether or not programs had already been modified at the time
the backups happened. It's often best to rebuild your system
from vendor distribution media (that is, the tapes or CD-ROM your
operating system release came on) and then reload only user data (not
programs that multiple users share) from your backup tapes.
If you need programs you didn't get from your vendor (for
instance, packages from the Internet), then do one of the following:
- Rebuild and reinstall these programs from a trusted backup (one
you're absolutely positive contains a clean copy).
- Obtain and install fresh copies from the site you got the packages
from in the first place.
Do not recompile software until you've reinstalled the
operating system, including the compiler; you don't know
whether the compiler itself, and the libraries it depends on, have
been compromised.
This
implies that if you're heavily customizing your system or
installing a lot of extra software beyond what your vendor gives you,
you need to work out a way of archiving those customizations and
packages that you're sure can't be tampered with by an
attacker. This way, you can easily restore those customizations and
packages if you need to. One good way is to make a special backup
tape of new software immediately after it's installed and
configured, before an attacker has a chance to modify it.
You may have programs that were locally written, and in these cases,
you may not be able to find even source code that's guaranteed
to be uncontaminated. In this situation, someone -- preferably
the original author -- will need to look through the source
code. People rarely bother to modify source code, and when they do,
they aren't particularly subtle most of the time. That's
because they don't need to be; almost nobody actually bothers
to look at the source before recompiling it.
In one case, a programmer installed a back door into code he expected
would run on only one machine, as a personal convenience. The program
turned out to be fairly popular and was adopted in a number of
different sites within his university. Years after he wrote it, and
long after the original machine was running a version without the
back door, he discovered that the back door was still present on all
the other sites, despite the fact that it was clearly marked and
commented and within the first page of code. You can't make a
comprehensive search of a large program, but you can at least avoid
humiliation by looking for obvious changes.
27.1.8. Document the Incident
Life gets very confusing when you're discovering,
investigating, and recovering from a security incident. A good chain
of communication is important in keeping people informed and
preventing them from tripping over each other. Keeping a written
(either hardcopy or electronic) record of your activities during the
incident is also important. Such a record serves several purposes:
- It can help keep people informed (and thereby help them to resolve
the incident more quickly).
- It tells you what you did and when, in responding, so that you can
analyze your response later on (and maybe do better next time).
- It will be vital if you intend to pursue any legal action.
From a legal standpoint, the best
records are hardcopy records generated and identified at the time of
occurrence. Just about anything else (particularly anything kept
online) could be tampered with or falsified fairly easily -- or
at least a judge and jury could be convinced of that. You need to
produce records on pieces of paper, label, date, and sign them.
Furthermore, unless the pages are actually bound together, so that
pages can't be inserted or removed without indication,
you'll need to date and sign every page. (And you thought
continuous tractor-feed paper was useless these days!)
You need to have legal documentation even if you aren't
completely certain you're going to need it. An incident that
initially looks fairly simple may turn out to be serious. Don't
assume it isn't going to be worth bringing in the police.
For both legal and practical reasons, it's useful to put in
exact times when things occurred. Legally, this helps to show that
entries were being made in order. Practically, it's extremely
helpful when you need to correlate multiple sources of information
(for instance, when you need to compare your logs against event logs
on computers or against somebody else's actions).
Here are several useful documentation methods you might want to
consider:
- Notebooks -- carbon copy lab notebooks are especially useful
because you can write a note, tear it out and give it to someone and
still have a copy of the note. Another benefit is that the pages are
usually numbered, so you can determine later on whether any pages
have been removed or added.
- Terminals running with attached printers or old-fashioned printing
terminals.
- A shell running under the Unix script command,
with the resulting typescript immediately printed and identified.
- A personal computer terminal program running in "capture"
mode, with the resulting typescript immediately printed and
identified.
- A microcassette recorder for verbal notes.
You will probably want to use multiple methods, one to record
what's happening online and one to record what's
happening outside of the computer. For example, you might have a
typescript of the commands you were typing, but a handwritten log for
phone calls.
It's easy to decide what to record online; you simply record
everything you do. Remember to use the terminal or session
that's being recorded. (With some methods, like
script, you can record every session
you've got going; just make sure you record each session in a
separate file.) It's harder to decide what to record of the
events that don't just get automatically captured. You
certainly want to record at least this much:
In addition to the journal, a log of time spent for everyone working
on an incident can be invaluable. You may need to justify some level
of "loss" in order for some law enforcement agencies to
be able to open an investigation, and if the intruder didn't do
any damage to the machines, the time that was spent cleaning up is
the main loss.
Time logs may also be useful if you are having difficulty in
convincing management that the organization needs to allocate
additional resources to be prepared to deal with incidents.
It's a way of showing how much these incidents cost. It's
particularly helpful if you can show which areas could have been
anticipated and mitigated by planning.