AHCI SATA hotswap broken? [Update: Workaround]

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

AHCI SATA hotswap broken? [Update: Workaround]

Postby maxtog » Jun 7th, '12, 05:12

I have a system on which I was running 64 bit Mandriva 2010.0 and had zero problems using AHCI SATA hotswap. I could remove a SATA drive, device node would disappear, and plug in another one and the device node would appear.

On this same machine, I have now loaded 64 bit Mageia 2 and SATA hotswap is totally broken. When I remove a drive, there is /var/log/messages chatter like this:
Code: Select all
Jun  6 22:38:38 kram kernel: [177009.212215] ata6: exception Emask 0x10 SAct 0x0 SErr 0x10202 action 0xe frozen
Jun  6 22:38:38 kram kernel: [177009.212218] ata6: irq_stat 0x00400000, PHY RDY changed
Jun  6 22:38:38 kram kernel: [177009.212220] ata6: SError: { RecovComm Persist PHYRdyChg }
Jun  6 22:38:38 kram kernel: [177009.212224] ata6: hard resetting link
Jun  6 22:38:39 kram kernel: [177009.935102] ata6: SATA link down (SStatus 0 SControl 300)
Jun  6 22:38:44 kram kernel: [177014.935073] ata6: hard resetting link
Jun  6 22:38:44 kram kernel: [177015.240027] ata6: SATA link down (SStatus 0 SControl 300)
Jun  6 22:38:44 kram kernel: [177015.240035] ata6: limiting SATA link speed to 1.5 Gbps
Jun  6 22:38:49 kram kernel: [177020.240093] ata6: hard resetting link
Jun  6 22:38:49 kram kernel: [177020.545109] ata6: SATA link down (SStatus 0 SControl 310)
Jun  6 22:38:49 kram kernel: [177020.545117] ata6.00: disabled
Jun  6 22:38:49 kram kernel: [177020.556065] ata6: EH complete
Jun  6 22:38:49 kram kernel: [177020.556073] ata6.00: detaching (SCSI 5:0:0:0)
Jun  6 22:38:49 kram kernel: [177020.556561] sd 5:0:0:0: [sdd] Synchronizing SCSI cache
Jun  6 22:38:49 kram kernel: [177020.556585] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun  6 22:38:49 kram kernel: [177020.556588] sd 5:0:0:0: [sdd] Stopping disk
Jun  6 22:38:49 kram kernel: [177020.556593] sd 5:0:0:0: [sdd] START_STOP FAILED
Jun  6 22:38:49 kram kernel: [177020.556594] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


and the device disappears. When I plug another drive in, not only is there no device node created, there is NOTHING written to /var/log/messages. I have to reboot the computer to gain access to the connected drive (which is very irritating).

Is there something I am supposed to do to enable AHCI in Mageia 2 or the newer 3.x kernels?

Thanks for any ideas.
Last edited by maxtog on Jun 13th, '12, 23:42, edited 1 time in total.
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby wintpe » Jun 7th, '12, 07:55

Have not tried this in mga2 yet, but definitly works in mga1

Regards peter
Redhat 6 Certified Engineer (RHCE)
Sometimes my posts will sound short, or snappy, however its realy not my intention to offend, so accept my apologies in advance.
wintpe
 
Posts: 1204
Joined: May 22nd, '11, 17:08
Location: Rayleigh,, Essex , UK

Re: AHCI SATA hotswap broken?

Postby wilcal » Jun 8th, '12, 14:41

maxtog wrote:......Thanks for any ideas.

What is the format of the drive you are hot plugging?
FAT-32, EXT2/3/4??
OS is 32 or 64 bit?
Have you tried hot plugging using the Live-CD?
"DISK BOOT FAILURE - INSERT SYSTEM DISK AND PRESS ENTER"
is my friend
wilcal
 
Posts: 567
Joined: Jun 20th, '11, 02:01
Location: San Diego CA

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 8th, '12, 18:58

wilcal wrote:What is the format of the drive you are hot plugging?
FAT-32, EXT2/3/4??

ext4

OS is 32 or 64 bit?

64, same as Mandriva

Have you tried hot plugging using the Live-CD?

no
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby wilcal » Jun 9th, '12, 04:48

I'm interested in this because I kind am
doing the same thing. I have all my important
data on a 1TB FAT32 drive. That for everywhere
compatibility. All the data is backed up on
another 1TB ext4 drive on the LAN.

For this test I used the Mageia 2 32-bit
KDE Live-CD. The test system has a removable
HD tray so I can A<->B drives all I want.
Using Gparted I formated a drive entirely ext4.
One partition.

So, I installed the drive but turned power off
to it. I then brought up the M2 KDE Live-CD.
Once the OS had settled down I turned power
on to the ext4 drive. It mounted but in a
read only mode. Opening a terminal I typed
su - and opened Dolphin. Using Dolphin it
told me that me as a user could see the
drive but could not write to the drive. So
using Dolphin in "Admin" mode I changed
the permissions on the drive to anyone can
read or write to the drive, saved that,
turned off the power to the drive and
rebooted. Once the OS had settled down
I turned on power to the ext4 drive.
It mounted, in read/write mode, just fine.

So open a terminal and type "su -". Enter
your Admin password then open Dolphin.
Then go to /media, find your mounted drive
and change the permissions to everything
so anyone can read and write to the drive.
Then reboot and I think you'll be able
to hot plug the drive.

I donno if this qualifies as a bug or not.
"DISK BOOT FAILURE - INSERT SYSTEM DISK AND PRESS ENTER"
is my friend
wilcal
 
Posts: 567
Joined: Jun 20th, '11, 02:01
Location: San Diego CA

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 11th, '12, 23:59

Can't ANYONE confirm that AHCI SATA hotswap is broken or working??? I would have thought other people use this regularly.

All my drives are mounted in hotswap trays, and I leave two spun down (/sbin/hdparm -S 6) and unmounted as backup drives. Not only is hotswap broken, but even worse, there is something dreadfully broken with udisksd or something that now results in the drives being completely hung a while after mounting and unmounting them, without even trying to swap or remove them:

Code: Select all
udisksd[5232]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/ST32000542AS_9XW0AWPM: Error updating SMART data: sk_disk_check_sleep_mode: Operation not supported (udisks-error-quark, 0)
 udisksd[5232]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/ST32000542AS_9XW0AWPM: Error updating SMART data: sk_disk_check_sleep_mode: Operation not supported (udisks-error-quark, 0)


I come back to my system and the red activity light on the drive is on solid and I have to reboot the entire system just to regain access to the drives so I can make a backup or restore a file. It is turning into a major problem for me with Mageia 2. Should I be filing a bug report?
Last edited by maxtog on Jun 12th, '12, 13:30, edited 1 time in total.
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby wilcal » Jun 12th, '12, 01:22

maxtog wrote:Should I be filing a bug report?

I have always found that if I have gathered all the facts about
what I consider a problem and then file those facts as best
I can with a bug then your issue will in fact get carefully
reviewed. I always expect that I will be expected to do
more testing and technical details. Be prepared for your
issue to possibly morph into something you did not
expect. Also there is the possibility that your issue
has been reported by someone else or in a different
way. Do search https://bugs.mageia.org/ for something,
anything, that may resemble your issue.

File the Bug then follow the process.
"DISK BOOT FAILURE - INSERT SYSTEM DISK AND PRESS ENTER"
is my friend
wilcal
 
Posts: 567
Joined: Jun 20th, '11, 02:01
Location: San Diego CA

Re: AHCI SATA hotswap broken?

Postby wintpe » Jun 12th, '12, 13:15

ill see if i can test this tonight on my system for you

regards peter
Redhat 6 Certified Engineer (RHCE)
Sometimes my posts will sound short, or snappy, however its realy not my intention to offend, so accept my apologies in advance.
wintpe
 
Posts: 1204
Joined: May 22nd, '11, 17:08
Location: Rayleigh,, Essex , UK

Re: AHCI SATA hotswap broken?

Postby ralf » Jun 12th, '12, 17:32

I've just hit the same problem after upgrading from Mandriva 2010.2 to Mageia 2. My online research indicates that the following patch (which must not have made it into the Mageia 2 release) should fix the regression introduced in the Linux 3.3 kernel:
http://www.lingrok.org/xref/linux-linus ... 94eebd50fd

A one-liner version was posted just before the full patch:
http://www.spinics.net/lists/linux-ide/msg43233.html
ralf
 
Posts: 5
Joined: Jun 12th, '12, 17:25

Re: AHCI SATA hotswap broken?

Postby ralf » Jun 12th, '12, 18:04

I believe I've found a work-around, which was mentioned in
http://www.spinics.net/lists/linux-ide/msg43237.html

You need to find the proper sysfs entry for the port you want to hotplug:
Code: Select all
 find /sys/devices -name ataN

(replace the N as appropriate). On my system, this yields
/sys/devices/pci0000:00/0000:00:11.0/ata5
/sys/devices/pci0000:00/0000:00:11.0/ata5/ata_port/ata5

Now add "/power/control" to the first of those lines, and write "on" to the resulting path:
Code: Select all
sudo echo on >/sys/devices/pci0000:00/0000:00:11.0/ata5/power/control

At least for me, this reactivated the port after it became non-responsive following an unplugging, and has kept it active through a couple of plug/unplug cycles so far. I've added the command to /etc/rc.d/rc.local so that I won't have to apply it manually after booting.
ralf
 
Posts: 5
Joined: Jun 12th, '12, 17:25

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 12th, '12, 23:22

It seems like they are talking only about after suspend/resume (which I am not doing; not a laptop either). Still, I am willing to play with it. But how do you know which ata number? I tried searching for "sdc" and "sdd" in /var/log/messages... nothing. Trying with dmesg is ambiguous. So I am left with:

Code: Select all
# find /sys/devices -name ata[0-9]* | grep port
/sys/devices/pci0000:00/0000:00:04.0/0000:05:00.0/ata7/ata_port/ata7
/sys/devices/pci0000:00/0000:00:04.0/0000:05:00.0/ata8/ata_port/ata8
/sys/devices/pci0000:00/0000:00:04.0/0000:05:00.1/ata1/ata_port/ata1
/sys/devices/pci0000:00/0000:00:04.0/0000:05:00.1/ata2/ata_port/ata2
/sys/devices/pci0000:00/0000:00:11.0/ata3/ata_port/ata3
/sys/devices/pci0000:00/0000:00:11.0/ata4/ata_port/ata4
/sys/devices/pci0000:00/0000:00:11.0/ata5/ata_port/ata5
/sys/devices/pci0000:00/0000:00:11.0/ata6/ata_port/ata6
/sys/devices/pci0000:00/0000:00:14.1/ata9/ata_port/ata9
/sys/devices/pci0000:00/0000:00:14.1/ata10/ata_port/ata10


And no way to narrow it down! So my machine is currently sitting with a dead /dev/sdc drive and the light is on solid. I tried this:

Code: Select all
for FILE in `find /sys/devices -name ata[0-9]* | grep port`
 do DEST=$FILE/power/control
 if test -e $DEST
   then echo Trying $DEST
   echo on > $DEST
 fi
done


Then tried "fdisk -l /dev/sdc" it came back with nothing. But there was tons of stuff spewed to /var/log/messages like:

Code: Select all
Jun 12 17:14:05  kernel: [266399.382570] sd 4:0:0:0: [sdc] Unhandled error code
Jun 12 17:14:05  kernel: [266399.382573] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 12 17:14:05  kernel: [266399.382576] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
Jun 12 17:14:05  kernel: [266399.382580] end_request: I/O error, dev sdc, sector 0
Jun 12 17:14:05  kernel: [266399.382582] Buffer I/O error on device sdc, logical block 0
Jun 12 17:14:05  kernel: [266399.382590] Buffer I/O error on device sdc, logical block 1
Jun 12 17:14:05  kernel: [266399.382594] Buffer I/O error on device sdc, logical block 2
Jun 12 17:14:05  kernel: [266399.382596] Buffer I/O error on device sdc, logical block 3
Jun 12 17:14:05  kernel: [266399.382627] sd 4:0:0:0: [sdc] Unhandled error code
Jun 12 17:14:05  kernel: [266399.382630] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 12 17:14:05  kernel: [266399.382632] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Jun 12 17:14:05  kernel: [266399.382636] end_request: I/O error, dev sdc, sector 0

(Goes on and on like that)
So I unplugged the drive (first time on this boot). Resulted in this:
Code: Select all
Jun 12 17:17:58  kernel: [266633.059161] ata5: exception Emask 0x10 SAct 0x0 SErr 0x10200 action 0xe frozen
Jun 12 17:17:58  kernel: [266633.059163] ata5: irq_stat 0x00400000, PHY RDY changed
Jun 12 17:17:58  kernel: [266633.059166] ata5: SError: { Persist PHYRdyChg }
Jun 12 17:17:58  kernel: [266633.059170] ata5: hard resetting link
Jun 12 17:17:59  kernel: [266633.782102] ata5: SATA link down (SStatus 0 SControl 300)
Jun 12 17:17:59  kernel: [266633.793076] ata5: EH complete
Jun 12 17:17:59  kernel: [266633.793083] ata5.00: detaching (SCSI 4:0:0:0)
Jun 12 17:17:59  kernel: [266633.795099] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
Jun 12 17:17:59  kernel: [266633.795134] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 12 17:17:59  kernel: [266633.795137] sd 4:0:0:0: [sdc] Stopping disk
Jun 12 17:17:59  kernel: [266633.795142] sd 4:0:0:0: [sdc] START_STOP FAILED
Jun 12 17:17:59  kernel: [266633.795143] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


(At least I now know sdc = ata5. Plugged the drive back in. Nothing. Nothing in messages AND the /dev/sdc nodes are GONE. So I ran the power control script thing again. Nothing. Nothing in messages AND the /dev/sdc nodes are still just GONE. :(
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 13th, '12, 01:09

Someone (one of you) posted the AHCI being broken as a bug right when I was preparing to do the same:
https://bugs.mageia.org/show_bug.cgi?id=6433

I also posted a related bug about the problems with the drive going strange for some reason:
https://bugs.mageia.org/show_bug.cgi?id=6440
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby ralf » Jun 13th, '12, 01:55

maxtog wrote:It seems like they are talking only about after suspend/resume (which I am not doing; not a laptop either).

I'm on a desktop, not using suspend.

(At least I now know sdc = ata5. Plugged the drive back in. Nothing. Nothing in messages AND the /dev/sdc nodes are GONE. So I ran the power control script thing again. Nothing. Nothing in messages AND the /dev/sdc nodes are still just GONE. :(

Do the "echo on" command while the drive is unplugged, then plug it back in. That's what worked for me. You won't get any messages from the "echo on" command.

Also, your script is accessing the wrong path -- use "grep -v port" as you want the path that ends in "ataN" to append the /power/control.
Last edited by ralf on Jun 13th, '12, 02:02, edited 1 time in total.
ralf
 
Posts: 5
Joined: Jun 12th, '12, 17:25

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 13th, '12, 02:02

ralf wrote:
(At least I now know sdc = ata5. Plugged the drive back in. Nothing. Nothing in messages AND the /dev/sdc nodes are GONE. So I ran the power control script thing again. Nothing. Nothing in messages AND the /dev/sdc nodes are still just GONE. :(

Do the "echo on" command while the drive is unplugged, then plug it back in. That's what worked for me. You won't get any messages from the "echo on" command.


Just tried that. Nope. No device node is created. Plus, I find it hard to believe a missing device node would be re-created without anything appearing in messages...
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby ralf » Jun 13th, '12, 02:03

Crossing replies -- I edited my previous response to point out that your script is trying to write "on" to the wrong path.
ralf
 
Posts: 5
Joined: Jun 12th, '12, 17:25

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 13th, '12, 03:35

ralf wrote:Crossing replies -- I edited my previous response to point out that your script is trying to write "on" to the wrong path.


Eeeek! You are so right, I was using the wrong path.... "my bad". When I tried it again, sure enough, it worked just like it did for you. When I echoed I got this:

Code: Select all
Jun 12 21:32:07 kernel: [281881.393100] ata5: SATA link down (SStatus 0 SControl 300)


Then this when I plugged the drive back in:

Code: Select all
Jun 12 21:32:16  kernel: [281890.691059] ata5: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Jun 12 21:32:16  kernel: [281890.691061] ata5: irq_stat 0x00000040, connection status changed
Jun 12 21:32:16  kernel: [281890.691064] ata5: SError: { CommWake 10B8B DevExch }
Jun 12 21:32:16  kernel: [281890.691069] ata5: hard resetting link
Jun 12 21:32:26  kernel: [281900.691026] ata5: softreset failed (1st FIS failed)
Jun 12 21:32:26  kernel: [281900.691030] ata5: hard resetting link
Jun 12 21:32:28  kernel: [281902.538109] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 12 21:32:28  kernel: [281902.574126] ata5.00: ATA-8: ST32000542AS, CC34, max UDMA/133
Jun 12 21:32:28  kernel: [281902.574128] ata5.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Jun 12 21:32:28  kernel: [281902.575522] ata5.00: configured for UDMA/133
Jun 12 21:32:28  kernel: [281902.586074] ata5: EH complete
Jun 12 21:32:28  kernel: [281902.586152] scsi 4:0:0:0: Direct-Access     ATA      ST32000542AS     CC34 PQ: 0 ANSI: 5
Jun 12 21:32:28  kernel: [281902.586336] sd 4:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Jun 12 21:32:28  kernel: [281902.586460] sd 4:0:0:0: [sdf] Write Protect is off
Jun 12 21:32:28  kernel: [281902.586495] sd 4:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jun 12 21:32:28  kernel: [281902.586641] sd 4:0:0:0: Attached scsi generic sg3 type 0
Jun 12 21:32:28  kernel: [281902.613082]  sdf: sdf1
Jun 12 21:32:28  kernel: [281902.613405] sd 4:0:0:0: [sdf] Attached SCSI disk


Of course, it didn't come back as sdc, it came back as sdf, which is pretty annoying :)
Thanks for your help, I hope the Mageia team will fix this soon.
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby isadora » Jun 13th, '12, 08:24

Maxtog, glad your problem is somewhat solved right now.
Would you please be so kind to mark the topic as solved, if really so.
Thanks!!!
..........bird from paradise..........

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
User avatar
isadora
 
Posts: 2763
Joined: Mar 25th, '11, 16:03
Location: Netherlands

Re: AHCI SATA hotswap broken?

Postby maxtog » Jun 13th, '12, 13:32

isadora wrote:Maxtog, glad your problem is somewhat solved right now.
Would you please be so kind to mark the topic as solved, if really so.
Thanks!!!


Should it be considered solved? Hotswap is still broken, we just have a workaround for it. Also, I don't see a "Solved" or "Workaround Found" button or function, is it just editing the subject or something? THanks
maxtog
 
Posts: 38
Joined: Jun 7th, '12, 05:05

Re: AHCI SATA hotswap broken?

Postby isadora » Jun 13th, '12, 14:22

maxtog wrote:
isadora wrote:Maxtog, glad your problem is somewhat solved right now.
Would you please be so kind to mark the topic as solved, if really so.
Thanks!!!


Should it be considered solved? Hotswap is still broken, we just have a workaround for it. Also, I don't see a "Solved" or "Workaround Found" button or function, is it just editing the subject or something? THanks

Exactly that. ;)
..........bird from paradise..........

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
User avatar
isadora
 
Posts: 2763
Joined: Mar 25th, '11, 16:03
Location: Netherlands

Re: AHCI SATA hotswap broken?

Postby wilcal » Jun 13th, '12, 16:40

maxtog wrote:Should it be considered solved? Hotswap is still broken, we just have a workaround for it.

I for one can be rightly accused of sidestepping problems
with "workarounds". This hot-swap thing is a good example.
I'm not convinced that it's a problem it's just in my
mind a cumbersome operation. I've dealt with things like
this all the way back into the early days of Mandriva.
Kinda like sweep the problem under the rug.

A good example of this for me is in the early alpha/beta
days of Mageia 2 attaching a USB printer was a very
complex and cumbersome process. But in the end it
worked for me. Ultimately it got simplified and now it's
as easy as it gets.

The hot swap thing has always for me been a bit of a
wrinkle but can be worked around.
"DISK BOOT FAILURE - INSERT SYSTEM DISK AND PRESS ENTER"
is my friend
wilcal
 
Posts: 567
Joined: Jun 20th, '11, 02:01
Location: San Diego CA

Re: AHCI SATA hotswap broken?

Postby ralf » Jun 13th, '12, 18:51

Thanks for your help, I hope the Mageia team will fix this soon.

Since there's already an official patch for the regression, it will hopefully be included in the next kernel update; the question is whether we have to wait for a security update to get it.
ralf
 
Posts: 5
Joined: Jun 12th, '12, 17:25


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest