The system is failing to allocate streams resources. This is
evidenced by one or more of the following:
- Failures in the system logs or on the console that say "Out of
streams resources" or "Out of streams memory (NSTRPAGES = XXXX
exceeded)" or "allocb failed"
- netstat -m shows non-zero numbers in the fail column
- ndstat -l shows a non-zero number in "No STREAMS Buffers"
column
- crash -> strstat shows non-zero numbers in the FAIL column
- Mysterious system hangs or lockups
Note: A few occasional failures reported by netstat, ndstat, or crash
do not necessarily indicate a serious problem and can most likely be
ignored.
CAUSE:
There can be many reasons why the system fails to allocate streams
resources. The usual causes are:
1. Improper kernel tuning
2. Streams leak in the network card driver
3. Streams leaks in 3rd party serial card drivers or management
drivers
4. Failing hardware
5. External network hardware misbehaving
6. Extremely high network traffic
7. Streams leak in a base operating system driver
8. Improper synchronization of data transfer between the client
and server components of a network application
Any of the above factors can lead to exhaustion of the configured
amount of STREAMS memory available for use by kernel drivers and
modules.
Background information: The kernel tunable NSTRPAGES controls the
amount of pages of memory available for streams. However, not all of
these resources are immediately available. The streams daemon (strd)
reserves a subset of NSTRPAGES in a few pools of memory for streams
allocation during interrupt time.
These pools are defined in /etc/conf/pack.d/str/space.c:
unsigned int str_pool_size = 20; /* size of interrupt pool in pages */
unsigned int mblk_pool_size = 70; /* size of mblk interrupt reserve */
So, one important distinction to make when debugging streams failures
is whether you have actually exceeded the maximum streams parameter,
NSTRPAGES, or whether there was not enough resources in the available
pools at interrupt time to satisfy the requests. A simple way to
determine this is with netstat -m.
Consider the following output:
streams allocation:
config alloc free total max fail
stream 7200 250 6950 6016 259 0
queues 1248 552 696 14327 577 0
mblks 5732 1442 4290 3560707 699 0
buffer headers 6458 6319 139 460399 6336 0
class 1, 64 bytes 128 67 61 1206187 99 0
class 2, 128 bytes 96 0 96 329540 91 0
class 3, 256 bytes 352 30 322 33563 1351 0
class 4, 512 bytes 16 6 10 10696 13 0
class 5, 1024 bytes 20 0 20 10258 18 0
class 6, 2048 bytes 5394 1032 4362 749789 5394 41784
class 7, 4096 bytes 123 123 0 2911 123 0
class 8, 8192 bytes 6 0 6 1172 6 0
class 9, 16384 bytes 0 0 0 31 3 0
class 10, 32768 bytes 0 0 0 0 0 0
class 11, 65536 bytes 0 0 0 0 0 0
class 12, 131072 bytes 0 0 0 0 0 0
class 13, 262144 bytes 0 0 0 0 0 0
class 14, 524288 bytes 0 0 0 0 0 0
total configured streams memory: 32000.00KB
streams memory in use: 2754.73KB
maximum streams memory used: 11792.11KB
As you can see, the system failed to allocate 41784 2KB buffers. But,
the total configured streams memory (NSTRPAGES) was not exceeded.
This could indicate that a large amount of data has been coming in
from the network and the corresponding large number of NIC interrupts
are handled by attempting to allocate STREAMS messages at interrupt
time, to pass the data upstream. These allocations commonly occur in
2KB chunks. Such failures, as explained above, could very likely be
caused by an inadequate value of str_pool_size. In situations like
this, you can increase str_pool_size and/or mblk_pool_size to attempt
to stop the failures (relink and reboot).
Please Note: Often tuning streams resources upward (NSTRPAGES,
str_pool_size, mblk_pool_size, ...) will only delay or mask the
actual problem.
If the failures continue no matter how many resources you allocated
to the streams subsystem, the problem is likely not tuning and is
one of the others mentioned above.
|