When you list an ordinary file with the l(C) command, it looks
something like this:
# l /bin/true
-rwxr-xr-x 4 bin bin 477 Jun 11 1990 /bin/true
^\-------/ ^ ^_owner ^ ^ \---------/ \-------/
| modes | |__group | date file
| number | name
type of links size
to this
file
The first character in this output (i.e., the ``-'') indicates
that this is an ordinary file. A device file, when listed with
the l(C) command, will have a ``b'' or a ``c'' in place of the
``-'' to signify that the device file is a block or character
device, respectively.
Block devices use the kernel's buffer cache. This means that
any data written to or read from a block device will be
sent/received by the device driver in 512 byte blocks rather
than one character at a time. Since the data is being blocked,
this gives the device drivers the opportunity to process the
data for optimal throughput.
Character devices, which by-pass the kernel's buffer cache,
send/receive data one character at a time. Character devices
are commonly called "raw" devices because they do not process
the data.
There is a device file on all SCO UNIX Systems called /dev/root.
This device is used in reading or writing data on the hard disk.
The minor number tells the device driver specifically where to
write on the disk (i.e., in the "root filesystem"). When l(C) is
used to list a device file, such as /dev/root, it looks something
like this:
# l /dev/root
br--r----- 1 root backup 1, 40 Jan 01 09:17 /dev/root
^\-------/ ^ ^_owner ^ ^ ^ \---------/ \-------/
| modes | |__group | | date file
| number | | name
type of links major minor
to this number number
file
Notice, as discussed earlier, that the first character in the
output above is a ``b'' signifying that this is a block device.
There is a similar device, /dev/rroot, which is a character
device. When we use l(C) to list /dev/rroot, it looks something
like this:
# l /dev/rroot
crw-r----- 1 sysinfo sysinfo 1, 40 Jun 12 1990 /dev/rroot
^\-------/ ^ ^_owner ^ ^ ^ \---------/ \-------/
| modes | |__group | | date file
| number | | name
type of links major minor
to this number number
file
Notice that the first character in the output above is a ``c''
signifying that this is a character device. Earlier, it was
pointed out that character devices are commonly called raw
devices. The initial ``r'' in the names of character device files
such as /dev/rroot stands for "raw". This is a common naming
convention used for device files when the driver provides both
block and character devices (e.g., the floppy device driver
provides the block device /dev/fd0 and the character device
/dev/rfd0).
This brings up an important point -- device drivers needn't
provide both a block device and character device. Usually,
a device driver need only provide a character interface. For
instance, the cartridge tape driver only provides character
devices (e.g., /dev/rct0). As stated earlier, the ``r'' in
"/dev/rct0" implies that this is a raw or character device.
Note, however, that not all character devices start with names
beginning with ``r''. For instance, /dev/lp0 looks something
like this when listed with l(C):
# l /dev/lp0
crw------- 2 bin bin 6, 0 Jun 13 1990 /dev/lp0
Again, the ``r'' as the first character of a device file's name
implies that the device is a raw device. However, it is the
``c'' as the first character in the output of the l(C) command
that positively identifies a device as a raw device.
Returning to our earlier example of /dev/rroot, we see that l(C)
reports the following:
# l /dev/rroot
crw-r----- 1 sysinfo sysinfo 1, 40 Jun 12 1990 /dev/rroot
^\-------/ ^ ^_owner ^ ^ ^ \---------/ \-------/
| modes | |__group | | date file
| number | | name
type of links major minor
to this number number
file
Note that where l(C) would have printed the size of an ordinary
file (i.e., to the left of the date), l(C) lists the major and
minor numbers of the device file. In the case of /dev/rroot,
the major number is 1 and the minor number is 40.
Now note that the major and minor numbers for /dev/rroot and
/dev/root are identical. The only (significant) difference in
the l(C) listing between the two devices is the block versus
character distinction (i.e., one says ``b'' and the other says
``c'').
Let's begin understanding major numbers with the following
simplified definition: The major number is simply an index into
an array of device drivers in the kernel. For those readers who
have used arrays in computer programming, this explanation should
be fairly revealing. But for the others, we now offer the
following explanation of what an array is.
An array is a data structure that can hold multiple elements of
a similar nature. For instance, we could define an array called
"fruit". As its name suggests, the items that this array will
hold will be different types of fruit. Arbitrarily, we'll say
that this array called "fruit" can hold up to five items.
Pictorially, this might be represented as follows:
fruit
________
| |
0 | cherry |
|________|
| |
1 | |
|________|
| |
2 | banana |
|________|
| |
3 | apple |
|________|
| |
4 | cherry |
|________|
Notice the numbers to the left of the picture of the "fruit"
array. These number the different "slots". In programming, we
refer to any of these numbers as an "index". For instance, the
index for the slot in fruit that contains "banana" is 2. In
programs, we refer to the contents of any given slot by saying
"fruit[n]" where ``n'' is the slot number. In our example:
fruit[0]=cherry fruit[2]=banana
fruit[3]=apple fruit[4]=cherry
When talking about arrays such as the "fruit" array, it is
common to refer to the array as fruit[].
Notice that in our example, the numbering of the indecies begins
at 0. This is common with arrays. For reasons beyond the scope
of this article, having 0 as the first index is advantageous
because accessing any element of the array is faster than if you
started numbering with 1 (or any other number for that matter).
The original major number definition was: The major number is
simply an index into an array of device drivers in the kernel.
Pictorially, this might be represented as follows:
_____
| |
0 | o--|-->console driver
|_____|
| |
1 | o--|-->hard disk driver
|_____|
| |
2 | o--|-->floppy disk driver
|_____|
| |
3 | o--|-->tty driver
|_____|
| |
4 | o--|-->memory driver
|_____|
| |
5 | o--|-->serial I/O driver
|_____|
We now return to the distinction between block and character
devices. Our previous major number definition was an over-
simplification. Namely, it is inaccurate to say that the major
number is an index into a (single) array in the kernel. There
are actually two arrays, one which lists the device drivers for
the block drivers and one which lists the device drivers for the
character devices. These arrays are called bdevsw[] and
cdevsw[] respectively. bdevsw[] and cdevsw[] stand for "block
device switch table" and "character device switch table"
respectively.
So a more accurate pictorial representation of the arrays in
the kernel might be represented as follows:
bdevsw cdevsw
_____ _____
| | | |
0 | | 0 | o--|-->console driver
|_____| |_____|
| | | |
1 | o--|-->hard disk driver 1 | o--|-->hard disk driver
|_____| |_____|
| | | |
2 | o--|-->floppy disk driver 2 | o--|-->floppy disk driver
|_____| |_____|
| | | |
3 | | 3 | o--|-->tty driver
|_____| |_____|
| | | |
4 | | 4 | o--|-->memory driver
|_____| |_____|
| | | |
5 | | 5 | o--|-->serial I/O driver
|_____| |_____|
Again, notice that all device drivers must provide a character
interface but a block device is not necessary.
Our major number definition is still an over-simplification.
It is true that the major number is simply an index into either
bdevsw[] or cdevsw[]. However, it is an over-simplification to
say that each element of these arrays is a device driver.
Rather, each element of the arrays is a structure.
A structure is a collection of not necessarily similar objects.
First we'll study the structure of each element of bdevsw[].
Each element is a structure of type "bdevsw" (in the C programming
language, it is legal to have a structure which has the same name
as an array). The bdevsw structure is defined in the
file /usr/include/sys/conf.h as follows:
struct bdevsw {
int (*d_open)();
int (*d_close)();
int (*d_strategy)();
int (*d_print)();
char *d_name;
struct iobuf *d_tab;
};
extern struct bdevsw bdevsw[];
Note: Each device driver provides the following routines
amongst others: open(), close(), read(), write().
Because each driver provides these, they are generically
referred to as xxopen(), xxclose(), xxread(), and
xxwrite(). For a particular device driver, the "xx" is
replaced by the driver's handle. We'll return to the
notion of a "handle" later. For instance, the serial I/O
device driver's handle is "sio". So the serial I/O
driver provides sioopen(), sioclose(), sioread(), and
siowrite().
You needn't be concerned with understanding the exact meaning of
the source code for the bdevsw structure. What is important to
see is that the bdevsw structure has a pointer to a particular
driver's xxopen() and its xxclose().
Also note that the bdevsw structure has a function pointer to a
routine called xxstrategy(). Earlier we said that block drivers
process the data passed to them. It is the xxstrategy() routine
that does this processing. As a matter of fact, it is the fact
that block devices provide an xxstrategy() that makes them
different from a character device.
So we now return to character devices. Specifically, we now
examine the format of the cdevsw structure.
struct cdevsw {
int (*d_open)();
int (*d_close)();
int (*d_read)();
int (*d_write)();
int (*d_ioctl)();
struct tty *d_ttys;
struct streamtab *d_str;
char *d_name;
};
extern struct cdevsw cdevsw[];
Notice that the cdevsw structure has function pointers to each
device driver's xxopen(), xxclose(), xxread(), and xxwrite().
In addition, the cdevsw structure also has a function pointer
to a previously undiscussed routine called xxioctl(). We will
return to xxioctl() later. For now, just notice that only
character devices provide the xxioctl().
Again, we present a more accurate pictorial representation of
the bdevsw[] and cdevsw[] arrays:
bdevsw cdevsw
_____ _____
| | | | cnopen(), cnclose(),
0 | | 0 | o--|-->cnread(), cnwrite(),
|_____| |_____| cnioctl()
| | | |
1 | o--|-->hdopen(), hdclose(), 1 | o--|-->hdopen(), hdclose(),
|_____| hdstrategy() |_____| hdread(), hdwrite(),
hdioctl()
Now the bdevsw[] and cdevsw[] arrays have been explained fairly
accurately, let us again remember the original definition of a
major number: The major number is simply an index into an array
of device drivers in the kernel. While we'll use this over-
simplified explanation, you'll understand what's really going
on at a detailed level.
If you use a command such as cat(C) to examine (read) a file,
the cat(C) command does three things: (1) it opens the requested
file; (2) it then reads the data from that file; and (3) it closes
the file. In C source code, this would look something like this:
fd=open("file_name");
read(fd, buffer);
close(fd);
The first line of this over-simplified program sets a variable
called "fd". "fd" stands for "file descriptor". Each user
process has an array known as the "User File Descriptor Table".
This array lists how to access each file that a process has
open. "fd" is an index into this array.
Consequently, when a statement such as the second in our example
program is executed, the operating system will read the data
from the file pointed to by "fd" and will put the data into a
variable called "buffer".
The final line of our example program causes the operating
system to close the file pointed to by "fd".
While it is true to say that each element of the User File
Descriptor Table lists how to access each file that a process
has open, there is more to the story than just that.
Files are a basic concept in any operating system. Although a
file's data may be scattered across a hard disk drive, the
operating system is able to present it to the user in one piece.
When a user opens a file, the operating system records this
information in two places. The first is the "User File
Descriptor Table". Simply put, this table records the files
that a user has open. This is in contrast to another table,
the "File Table" which records all the files that are open on
the system.
As explained earlier, the User File Descriptor Table gets its
name from the fact that when you open a file with the open(S)
system call, you get back a "file descriptor". A file descriptor
is simply an index into the User File Descriptor Table for a
particular user. We'll return to the concept of a file descriptor
shortly.
User File
Descriptor File
Table (user #1) Table
_____ _____
| | | |
0 | | | |
|_____| |_____| Note that there is only one File Table in
| | | | the system. This table records all open
1 | | | | files, system wide.
|_____| |_____|
| | | | In contrast, there is a User File Descriptor
2 | | -->| | Table for each user logged into the system.
|_____| / |_____| This table records the files that each user
| | / | | has opened. A user's table cannot be
3 | o--|--- | | accessed by any other user.
|_____| |_____|
| | | |
4 | o--|------->| |
|_____| |_____|
| | | |
5 | | | |
|_____| |_____|
| |
User File | |
Descriptor |_____|
Table (user #2) | |
_____ | |
| | |_____|
0 | | | |
|_____| | |
| | |_____|
1 | | | |
|_____| | |
| | |_____|
2 | | | |
|_____| --->| |
| | / |_____|
3 | o--|--- | |
|_____| | |
| | |_____|
4 | | | |
|_____| --->| | <--- In the example below, we assume that this is
| | / |_____| the File Table entry for the file "my_file".
5 | o--|--- | |
|_____| | |
|_____|
Returning to our discussion of the "file descriptor" concept,
consider the illustration above. Let's suppose that we are the
user using the second User File Descriptor Table. If that user
runs a program which opens a file, it will issue a statement
such as:
fd=open("my_file");
As stated earlier, the operating system will open the file and
record this fact in the File Table. This is illustrated above.
The open(S) system call will return a file descriptor. In our
example above, the open(S) system call returned a file descriptor
of 5.
Now when the user's program wants to read from the file, it
passes the file descriptor as an argument to the read(S) system
call:
count=read(fd,buffer,size);
The statement above would read in the file pointed to by file
descriptor number 5. The amount of data read would be limited
to "size". It would place the data read into "buffer" and would
set the variable "count" equal to the number of bytes successfully
read.
The write(S) system call works in a similar fashion.
The notion of a file, however, is only provided as a convenience
to the user. UNIX implements a layer under the notion of a file.
This mechanism is called the inode. Simply put, an inode is a
data structure that contains (amongst other things) the size of
a file, its permissions, and pointers to the data blocks on the
disk. However, the name of the file is not contained in the inode.
Each time a file is opened, the system locates the file in the
directory tree, finds the inode associated with that file, and
opens the data blocks that are listed in the inode. So a more
complete version of the diagram above follows:
User File
Descriptor File Inode
Table (user #1) Table Table
_____ _____ _____
| | | | | |
0 | | | | | |
|_____| |_____| |_____|
| | | | | |
1 | | | | -->| |
|_____| |_____| / |_____|
| | | | / | |
2 | | -->| o--|--- | |
|_____| / |_____| |_____|
| | / | | | |
3 | o--|--- | | | |
|_____| |_____| |_____|
| | | | | |
4 | o--|------->| o--|------->| |
|_____| |_____| |_____|
| | | | | |
5 | | | | | |
|_____| |_____| |_____|
| | | |
User File | | | |
Descriptor |_____| |_____|
Table (user #2) | | | |
_____ | | | |
| | |_____| |_____|
0 | | | | | |
|_____| | | | |
| | |_____| |_____|
1 | | | | | |
|_____| | | | |
| | |_____| |_____|
2 | | | | | |
|_____| --->| o--|------->| |
| | / |_____| |_____|
3 | o--|--- | | | |
|_____| | | -->| | <-- In our example, this Inode Table
| | |_____| / |_____| entry would contain the inode
4 | | | | / | | information for the file "my_file".
|_____| --->| o--|--- | |
| | / |_____| |_____|
5 | o--|--- | | | |
|_____| | | | |
|_____| |_____|
We are now able to give an accurate explanation of how the kernel
does I/O to hardware via device drivers. One of the powerful
aspects of the design of UNIX is that it treats just about
everything as a file. This is true of device drivers.
When the kernel receives a request to do an open() on a file,
the file is opened and entries are put in the User File Descriptor
Table, the File Table and the Inode Table as described above.
If, when reading the file's inode, we find that this is a device
file, then the kernel calls xxopen(). The kernel knows which
xxopen() to call by looking in the inode (the major and minor
numbers are recorded in the inodes of device files).
In addition, the kernel indicates in the File Table that the
file which has been opened is a device file. At the same time,
it records the major and minor number in the File Table. All
subsequent references to that file will flag the kernel that
this is a device file. If the user tries to do a read(), then
the kernel will call xxread(). If the user tries to do a
write(), then the kernel will call xxwrite(). The same is true
of close() and ioctl().
We now return to the notion of xxioctl(). Not all I/O that occurs
to a piece of hardware falls under the categories of open, close,
read and write. For instance, one operation that a device driver
should provide for a tape drive is the rewind operation.
xxioctl()'s are device dependent I/O commands. "ioctl" stands
for I/O control.
Understand that each device driver provides whatever ioctl()'s
it chooses. For example, the tape driver provides a rewind
ioctl, whereas the hard disk driver does not, which makes sense
since you can't rewind a hard disk drive.
We now turn to the notion of a minor number. A minor number is
simply a piece of data that is passed to the driver indicated by
the major number. Each device driver is free to interpret a
given minor number any way it wants. For instance, the tape
driver interprets certain minor numbers to mean "don't rewind
the tape when you're done." On the other hand, minor numbers
to the hard disk driver specify which physical disk, which fdisk
partition, and which divvy division to access.
For a detailed explanation of hard drive minor numbers, see the
article in this database titled "Explanation of the hard drive
minor number scheme." You are strongly encouraged to read this
article as it will increase your understanding of major and
minor numbers.
The last issue covered in this article is how to tell, given a
major number, what device driver is associated with that major
number. The answer lies in the file /etc/conf/cf.d/mdevice which
is documented under mdevice(F). The mdevice file looks something
like this:
cn pIocrwi iHrcst cn 0 0 1 1 -1
hd hoc irobCcGk hd 1 1 1 1 -1
fd Iocrwih iHODbrcC fl 2 2 1 2 2
sy orwi icor sy 0 3 1 1 -1
mm rw irsco mm 0 4 1 1 -1
sio Iocrwip iHctk sio 0 5 1 100 -1
Each line of the mdevice file is a description of a device
driver. And each of these lines has nine fields. Note that we
say "fields" and not "columns". Fields are separated by white
space (i.e., spaces or tabs) and do not necessarily line up in
nice columns. If you look at the lines for "hd" and "fd",
you'll see that the third field in each is so long that the
columns get out of alignment.
The fifth field is the major number for the block interface of
the device driver. The sixth field is the major number for the
character interface of the device driver.
As an example, suppose that we wanted to know what device driver
the device /dev/rct0 uses. First we would need to know what the
major number is for /dev/rct0:
# l /dev/rct0
crw-rw-rw- 1 root other 10, 0 Dec 04 1991 /dev/rct0
From the output of the l(C) command, we see that the major
number for /dev/rct0 is 10. We also see that /dev/rct0 is a
character device. Knowing this, look in the sixth field of
the mdevice file for the line which says the character major
number is 10. We find the following line:
ct Iocrwi iHcsk ct 0 10 1 1 1
So we find that the driver name is "ct" which stands for
"cartridge tape".
It is recommended that you read the mdevice(F) manual page.
It describes the format of the mdevice file. Armed with the
knowledge that you've gained from this article, you should be
able to understand most aspects of this file.
SEE ALSO:
mdevice(F), chmod(C), l(C), open(S), close(S), read(S),
write(S), hd(HW), tape(HW), fd(HW), serial(HW)
|