We were debugging an issue of missing boot drives on an emulated CPU board, thus leading to the bootlaoder not being able to find an OS to boot into. It is obvious that the bootloader was not able to find any boot partitions that are specified in code, e.g./dev/sda
, or /dev/nvme0n1p1
. In particular, device files are missing. So what went wrong? One idea is figure out where and when Linux is supposed to identify the boot drives and have relevant device files created under /dev. So I went digging...and following is a summary of my learnings.
What is it? Files under /dev/
like /dev/mem
are special file(s), or device nodes. Two main types are character and block devices. Block devices handles data in blocks, with individual block size between 128 bytes and 1k bytes. Character devices handle data in a stream of characters (bytes), without a structure. Character devices are read from and written to with blocking read() and write() calls, which do not return until the operations finish. While block devices do not implement read() and write() functions, they simply have blocks access functions for blocks access and read and write blocks to the device. See kernel's record of all devices devices.txt.
How they are created? Device files normally get created during installation*. But, in a booted OS, you can create them with mknod
utility. For a simple example, you can run the command, mknod [-F format] name [c | b] major minor
(see man page for more). So you can create a new device file like,
✗ sudo mknod tstdevice c 0 9
✗ ls -l tstdevice
0x9 Dec 16 00:41 tstdevice
✗
You then created a character device file of a major number as 0, and minor number as 9, and which is called tstdevice
.Basically, major version allows Linux to identify which driver to talk to, and the minor number tells the device driver which device it is talking to. For example, you have multiple ram disks block devices, all talking through same driver, but need different numbers to identify which ram disk is being talked to,
1 block RAM disk
0 = /dev/ram0 First RAM disk
1 = /dev/ram1 Second RAM disk
...
250 = /dev/initrd Initial RAM disk
So 1 as part of "1 block" on the left is major number identifying "RAM disk" accessed via block drivers, while 0 ~ 249 identify as many ram disks there are. One exception here is that 250 is reserved as "Initial RAM disk", which is picked up as initial ramfs to boot. RAM disk (also called RAM drive) is a block of random-access memory that computer is treating as if the memory were a disk drive.
A device file can be seen as an interface to device driver that appears in a file system as if it were an ordinary file.
Examples:
Block Devices
/dev/sda
), and b
for second discovered device (/dev/sdb
). The major number is 8, indicating to Linux that the device driver is SCSI drivers, with 0, 16, 32, ... and so on as minor number for each SCSI disk (Why incrementing by 16? think about it, for a moment, and there is a hint following), and 240 identifying the sisteenth SCSI disk.
8 block SCSI disk devices (0-15)
0 = /dev/sda First SCSI disk whole disk
16 = /dev/sdb Second SCSI disk whole disk
32 = /dev/sdc Third SCSI disk whole disk
...
240 = /dev/sdp Sixteenth SCSI disk whole disk
Partitions are handled in the same way as for IDE
disks (see major number 3) except that the limit on
partitions is 15.
When you list the device, you can see the matching information,
$HOST:~# ls -la /dev/sda
brw-rw---- 2 root disk 8, 0 Dec 15 16:10 /dev/sda
nvme
, followed by a number for device controller, nvme0
for first controller, nvme1
for second, and so on. Due to the way NVMe devices are connected, there are subsequent namespace number for each device identified in order, which is not deterministic across boots (Yikes! depend on the speed of each devices, and other aspects I don't know). So you will have /dev/nvme0n1
for first discovered device on first discovered controller, and /dev/nvme01n2
as second discovered device on first discovered host controller.
$HOST:~$ ls -la /dev/nvme0
crw------- 1 root root 247, 0 Dec 15 22:12 /dev/nvme0
$HOS:~$ ls -la /dev/nvme0n1
brw-rw---- 1 root disk 259, 0 Dec 18 22:12 /dev/nvme0n1
You may have noticed that there is a variatiion between the major numbers from listing the controller device file, and the first discovered device under that controller. Note that 247 belongs to the range for LOCAL/EXPERIMENTAL USE
, while 259 belongs to Block Extended Major
, which is "used dynamically to hold additional partition minor numbers and allow large numbers of partitions per device", per Linux devices.txt. I couldn't reason why 247 is used for the controller (why not 248? 249? etc), but 259 being used to identify namespaces seem to make sense, as "namespaces" does sort of fall into the "additional partitional minor" category, though namespaces themselves are not partitions, but 259 in respect to 247 is some sort of "minor". So for each device under each discovered controller, we expect there will be partitions, with `p1`, `p2`, and so on. That is `/dev/nvme0n1p1`, `/dev/nvme0n1p2`, and so on.
$HOST:~$ ls -la /dev/nvme0n1p1
brw-rw---- 1 root disk 259, 1 Dec 15 22:12 /dev/nvme0n1p1
$HOST:~$ ls -la /dev/nvme0n1p2
brw-rw---- 1 root disk 259, 2 Dec 15 22:12 /dev/nvme0n1p2
$HOST:~$ ls -la /dev/nvme0n1p3
brw-rw---- 1 root disk 259, 3 Dec 15 22:12 /dev/nvme0n1p3
$HOST:~$ ls -la /dev/nvme0n1p4
brw-rw---- 1 root disk 259, 4 Dec 15 22:12 /dev/nvme0n1p4