Hire me if you like: blake.irvin@gmail.com

Thursday, August 09, 2007

Solaris 10 Hangs with Multiple SATA Drives on a PCI SATA Controller

(note: this bug has been fixed in build 72 of Solaris Nevada - snv_72)

It seems there is an unpatched bug in the generic ATA driver for Solaris 10/Open Solaris. The bug manifests itself only on older x86 systems that have fewer interrupt lines - boards which use typically a pair of Programmable Interrupt Controllers like the 8259, which is capable of only 8 interrupt lines. Newer boards (such as those supporting x64 architectures) are more likely to use a controller like the APIC (Advanced Programmable Interrupt Controller) which can theoretically provide 255 physical hardware IRQ lines, though most systems don't employ more than 24.

The symptom of the bug is that the machine hangs when trying to probe the SATA controller, if the controller has more than 1 device attached. The hang occurs because Solaris (originally designed to run on machines not limited by such a small number of interrupts as older x86 hardware) doesn't yet have an interrupt handler loaded when the SATA device starts sending interrupt requests - not having a way to deal with interrupt requests, the system hangs.

The bug itself is filed here.

Open Solaris developer Jürgen Keil has provided an unoffical patch that I was able to use successfully on snv_65 running on 32-bit x86 . I outline the procedure for applying this patch below. (Please note that this patch can render your system unbootable. I have not tested this patch on versions of Solaris other than snv_65, nor on architectures other than 32-bit x86. I recommend backing up your system beforehand with 'flarcreate'.)

PROCEDURE:

Boot your machine

At the GRUB boot options screen, use the 'e' key to edit your primary boot option.

At the GRUB edit screen for your boot command, append '-kd'. Your new boot command should look something like this:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -kd

Press the 'enter/return' key to temporarily apply your edit to the boot command. Then press the 'b' key to boot the system using the edited GRUB boot command.

Solaris will begin to boot, but will halt at the kernel debugger prompt, which looks like this:

Loading kmdb...

Welcome to kmdb
Loaded modules: [ unix krtld genunix ]
[0]>

At this point you will enter some commands for the kernel debugger to act on. You are not yet making any permanent changes, simply temporarily modifying the kernel module for ATA support at load time. Your changes will not remain through a reboot. Use the following commands, pressing 'enter/return' after each command:

::bp ata`ata_id_common
:c
::delete 1
ata_id_common+0x3c?w a6a
:c

Here is a screen capture
Jürgen supplied:



This procedure boots Solaris, applying changes directly to the ATA driver as it is loaded into memory. The driver in question is located at /kernel/drv/ata on your boot disk.

If the system boots properly and functions as expected, you can proceed to make the change permanent with the 'adb' tool.


(STOP. Before proceeding, back up the generic Solaris ATA driver. In my case, this meant these commands:

# mkdir -p /backups_directory/kernel
# cp -pr /kernel/drv/ata /backups_directory/kernel/ata

In case something goes wrong, the backup 'ata' file can be copied back to the /kernel/drv directory from single-user mode boot.)

To apply the patch directly to the ATA driver file, use this sequence of commands (comments are marked with ##):

# adb -w /kernel/drv/ata ## The 'adb -w' command allows editing of the 'ata' driver file - this is followed by the path to the driver.

Press 'enter/return'

ata_id_common+0x3c?w a6a ## This line is the actual edit command that makes changes to hex values in the driver binary.

Press 'enter/return'

$q ## The '$q' command exits the adb tool after the change has been made.


Reboot your machine and test using multiple SATA devices on a single PCI SATA controller that uses the generic ATA driver. In my case, these were cards using a variety of Silicon Image chipsets (SiI 3112 and 3114).

All technical credit for this fix goes to Jürgen Keil.

If he was in the U.S. instead of Germany, I would buy him a lot of beer........................>

No comments:

Archive