I have an SCO OpenServer system that is experiencing
one of the two following problems:
1. The system panics but does not save a valid dump, either due to
insufficient swap/dump space or some other source corruption.
2. The system hangs and does not provide any information on the
cause of the hang.
Please Note: If the system is producing a valid dump, you won't need
to follow the instructions below; instead see Technical Article 105619, "Panic/Crash
Analysis if a dump is available".
CAUSE:
Although there can be several reasons why a dump is not saved,
often it is due to insufficient swap space. Increasing swap
for the purpose of obtaining a valid system dump, however
depends on additional disk space availability and usually,
re-dividing the root disk partition into differently-sized
"divvy(ADM)" divisions.
***********************IMPORTANT********************************
The information that follows refers to creating more space to
hold a system dump.
You _MUST_ have a FULL SYSTEM BACKUP (using either the backup
utility provided with SCO OpenServer or a third party backup program
with appropriate emergency boot/root floppies) before attempting
the following.
To create emergency boot/root floppies, see the man page
"mkdev(ADM)" (the "fd" section of the man page refers to creating
the emergency floppies). For information on creating a backup using
the Server backup utility, see the online "Help" documentation,
accessed through the GUI, "System Administration Guide, Chapter 3,
Backing up filesystems" or the SCO System Administration Guide Book.
On most systems, swap and dump are the same device, but this is not
necessarily the case. To determine if swap and dump are the same
device for your system, do the following:
# cd /etc/conf/cf.d <Enter>
View the "sassign" file.
Below is an example of the sassign file where the swap and
dump devices are the same:
swap hd 41
dump hd 41
The value "41" may be different on your system, however it is the
default value. If both swap and dump have the same number, they
are the same device. If the numbers are different, they are
different devices.
The appropriate action to take to obtain a larger dump device
depends on what situation applies to you:
1. If the swap device is the dump device and you want to increase
its size, you must use the divvy (ADM) utility to alter the
division layout of the drive on which the swap area is usually
located. Since this is usually the root drive and the swap area
is usually located physically on the drive before the root
(and perhaps other) filesystem, the procedure involves
overwriting root, and involves the same steps as an emergency
restore procedure from a set of up-to-date backup tapes.
See Technical Article 104767, "Recover OpenServer 5 root filesystem after crash
if backups, boot/root exist" in order to perform this procedure;
when running divvy(ADM) to create the disk layout (under section
II of the aforementioned Technical Article) you can modify the
divisions in order to accommodate a larger swap area.
2. If you have an additional drive on which to allocate a separate
dump device, you must initialize that drive and alter the default
bootstring to recognize the new dump device. See Technical Article 105920,
"In OpenServer 5.0.x, I don't have enough swap space to complete
a panic dump" for the detailed procedure.
***************************************************************
When a system hangs, there may be no indication of the reason for
the hang and getting information from the system at this point may
be extremely difficult or impossible. However, in some cases, it
may be possible to use the SCO Kernel Debugger, scodb, to
troubleshoot the problem. The information below describes how to
configure and use scodb.
As always, for system hangs or panics, refer to the following
logfiles for information that may be relevant to the problems a
system is experiencing:
# /usr/adm/messages
# /usr/adm/syslog
|
The SCO Kernel Debugger, scodb, may be an extremely valuable tool
for analyzing panics and hangs on your system. The primary features
of scodb, when used in these situations, are:
1. When in scodb debug mode, kernel execution is suspended,
allowing the system administrator to observe snapshots of
important system information, such as the system stack
trace of the currently active process. You can manually
enter debug mode at any time by typing "<ctrl> x" on the
console keyboard on a 5.0.4 or 5.0.5 system, or by typing
"<cntl-alt> d" on a 5.0.6 system, after configuring scodb
on the system.
2. After scodb is configured on the system, debug mode is
automatically entered when a panic occurs on the system,
3. When used in combination with a serial console, scodb
enables the System Administrator to troubleshoot
software-related system hangs.
When OpenServer 5.0.x is installed, scodb is also installed, but
it is not active until it is linked into the kernel.
To link scodb into the kernel:
Change directories to:
# /etc/conf/sdevice.d
Edit the following file with vi or your favorite text editor:
scodb
Once the scodb file is open it should look like this:
scodb N 1 0 0 0 0 0 0 0
To allow scodb to be linked into the kernel, change the "N" to
a "Y" so that it looks like this:
scodb Y 1 0 0 0 0 0 0 0
Once the file has been changed and saved you can relink the
kernel by typing:
# /etc/conf/cf.d/link_unix -y
Please Note: It is necessary to relink the kernel twice, as the
kernel symbol table has been adjusted during the first relink.
You may get the following message after this first relink:
db_symtable in unix would only fit 200000 out of 216984 bytes
Corrected table sizes in
/var/opt/K/SCO/link/1.1.1Eb/etc/conf/pack.d/scodb/tune.h
ready for next re-link
This is normal; after relinking the second time you will not receive
the error.
After the kernel is relinked, restart the system.
*********************** IMPORTANT ****************************
The X Desktop should not be the active console multiscreen
while running scodb. If the X Desktop is active when scodb is
started by a panic, or if started manually, you will not be
able to get to a character tty session to run scodb. (scodb
will not run in an X Window.)
Thus, it may be good practice to not run the X Desktop at all
while scodb is configured in the kernel. For instructions
on how to disable scologin, which controls Desktop startup, see
Technical Article Technical Article 109072, "Can the console be the default
screen after bootup instead of scologin?"
**************************************************************
Manually Starting the SCO Kernel Debugger (scodb):
To manually enter scodb you will need to be at a character tty.
The key sequence to enter scodb on a 5.0.4 or 5.0.5 system is
# <ctrl> x
and on a 5.0.6 system,
# <cntl-alt> d
Please Note: If you have an application that uses the "<ctrl> x"
sequence to run a command on a 5.0.4 or 5.0.5 system, or the
"<cntl-alt> d" sequence to run a command on a 5.0.6
system, scodb will start and the application's "<ctrl> x" or
"<cntl-alt> d", in effect, will no longer be active
while scodb is configured into the kernel.
Reconfiguring the scodb Command Key Sequence:
If you need to configure the key sequence of "<ctrl> x" on
OpenServer 5.0.4 or 5.0.5 to another key sequence see Chapter 1 of
the SCODB User's Guide under "Running SCODB" and "Modifying tuneable
parameters" and see the examples that follow. On an OpenServer
5.0.6 system, see the comments at the bottom of the file
/etc/conf/pack.d/scodb/tune.h
in order to modify the key sequence for entering scodb.
To see the currently configured parameters for scodb, type:
# cd /etc/conf/pack.d/scodb <Enter>
# bin/modtunes -f tune.h <Enter>
The key sequence for entering scodb is next to the "DBKEY" value.
The value corresponds to the ASCII table decimal value. For
example, "<ctrl> x" is represented by "24" on the ASCII table.
To change the key sequence for entering scodb, type:
# cd /etc/conf/pack.d/scodb <Enter>
# bin/modtunes -f tune.h DBKEY=26
This example will set the key sequence to be "<ctrl> z" (an
ASCII value of "26"), instead of "<ctrl> x", to enter scodb
manually.
After making the above change the kernel will have to be
relinked for the change to take effect. To relink the kernel,
type:
# /etc/conf/cf.d/link_unix -y <Enter>
Once the kernel is relinked the system will have to be shut
down and restarted.
==============================================================
System Panics:
When a system panics, scodb should start up automatically and
you will see the following prompt:
debug0:1>
where "0" is the entry level of the debugger and the "1" is
the command number in the history.
Once in scodb there are several commands you can use.
The command that should show where the system panicked is
stack (that is, at the debug prompt you would type in "stack"
and <Enter> as in the example below):
debug0:1> stack <Enter>
The "stack" command does not list the processes that were running;
rather, it shows the kernel stack of the active process in a
readable format.
For multiprocessor (SMP) computers the next command would be:
debug0:2> eps() <Enter>
The above command will give information for each processor on
the system.
Please Note: You should write down the information from the "stack"
and eps() commands and have that available for your SCO Technical
Support provider.
To exit out of scodb, type:
quit <Enter>
In some cases you may need to press <Enter> a second time.
Once scodb is exited, the system will attempt to save a system dump
image to the dump device. Whether or not this dump finishes will
depend on the factors listed in the "Cause" section of this article.
Once the dump completes or quits, you will see the "Safe to Power
Off" message and the system will need to be restarted.
====================================================================
System Hangs:
System hangs can be more problematic. Depending on the type
of system hang scodb may not be useful since the keyboard will
be "locked up" and won't respond. If the keyboard is responding,
you can enter the following to manually start scodb:
# <ctrl> x
Press the <Ctrl> and <x> keys at the same time or the key
sequence
that you have changed to.
If the keyboard responds this will start scodb and you will see
the debug prompt as above.
Using the stack command, gather the information that is displayed,
as in the section entitled "System Panics" above.
For system hangs, a dump will not be run when you exit scodb.
If you have enough space to dump, you can do so by entering the
following at the scodb debug prompt:
debug0:1> sysdump() <Enter>
Please Note: The debug prompt may have different numeric values than
in the example above.
Whether or not a dump will complete will depend on several factors
such as having enough room to dump to, the state of the system and
whether or not it can dump, and so on.
If a system dump is generated, see Technical Article 105935, "How do I create
customized system dump images on demand?"
To extract the dump from swap, you could use the /etc/sysdump
command:
# /etc/sysdump -i /dev/swap -n /unix -fumbo my.minidump
Again, to exit out of scodb type in "quit" <Enter> <Enter>.
If the keyboard does not respond, there is still a chance that the
hang can be analyzed the next time it occurs by configuring a serial
console to take the place of the default console device. This is
valuable in those cases where the system is hung inside the
operating system, and can be interrupted by a device driver running
at a sufficiently high priority. Since the serial driver runs at the
highest priority level (7) on the system, these types of hangs can
be analyzed by typing <Ctrl><x> (or the remapped key sequence)
at
the serial console keyboard when the hang occurs.
A serial console is used as if it were the "normal" console. Once
again, use the stack command at the debug prompt to gather
information as in the steps above.
For information on how to set up a serial terminal, see Technical Article 109287,
"How do I set up a serial console on SCO OpenServer 5?"
===================================================================
Removing the SCO Kernel Debugger (scodb):
To remove or deconfigure the SCO Kernel Debugger, scodb, do the
following:
Change directories to:
# /etc/conf/sdevice.d
Edit the following file with vi or your favorite text editor:
# scodb
Once the scodb file is open it should look like this:
# scodb Y 1 0 0 0 0 0 0 0
To remove scodb from the kernel change the "Y" to an "N" so it
looks like this:
# scodb N 1 0 0 0 0 0 0 0
Once the file has been changed and saved you can relink the
kernel by typing:
# /etc/conf/cf.d/link_unix -y
After the kernel relinks, shut down and restart the system and
scodb will no longer be functional on the system.
************************* IMPORTANT ****************************
The SCO kernel debugger, scodb, is a powerful tool and should be
used only when necessary. The kernel debugger should not be used
on systems (particularly production systems) unless there is no
other method of finding out information about what is causing
system panics/hangs. If possible scodb should be installed on a
test system so that the user can become familiar with it before
attempting to configure it on a system that contains important,
critical or unrecoverable data.
****************************************************************
One possible problem of running scodb on a production system is
the use of the "ctrl x" option to start scodb. If an application
that is on the system also uses "<ctrl> x" to save a file, or some
other operation, the use of "<ctrl> x" will start scodb, freezing
the kernel; the system may need to be restarted if exiting out of
scodb restores connectivity or other functionality. See the section
for configuring the "<ctrl> x" option under the "System Hangs"
portion of this technical article for reconfiguring this key
sequence for scodb.
This technical article is not meant to be a tutorial for scodb
nor is it a comprehensive overview of scodb; rather, it is
intended for use on a system that is panicking or hanging and
not providing either a dump or information regarding the hang.
If the system is producing valid dumps and the crash utility can
be run against the dumps scodb does not have to be used.
Running scodb should have a minimal effect on system performance.
SEE ALSO:
man pages scodb(ADM), crash(ADM), divvy(ADM), mkdev(ADM)
SCO System Administration Guide book or the Online Help for
the System Administration Guide.
Online Help for scodb
To access the online help, double-click on the "Help" icon
on the SCO OpenServer GUI Desktop, select "Navigate" then
"Search". When the new window opens, click on "Entire
Library" and enter "scodb" in the "Search for" field and
click on the "Search" button.
Online Help for configuring scodb: Chapter 1 of the SCODB
User's Guide.
See:
http://osr507doc.sco.com/en/SCODB/CONTENTS.html
http://osr507doc.sco.com/en/man/html.ADM/scodb.ADM
Technical Article 105935, "How do I create customized system dump images on demand?"
Technical Article 105411, "Filesystem Repair Utilities for SCO OpenServer 5.0.0,
5.0.2 and 5.0.4."
Technical Article 105619, "Panic/Crash Analysis if a dump is available."
Technical Article 105840, "Techniques to help identify the failing function
of a kernel panic."
Technical Article 105920, "In OpenServer 5.0.x, I don't have enough swap space
to complete a panic dump."
Technical Article 109072, "Can the console be the default screen after bootup
instead of scologin?"
Technical Article 104767, "Recover OpenServer 5 root filesystem after crash if
backups, boot/root exist"
Technical Article 109287, "How do I set up a serial console on OpenServer 5?"
|