(Bug Report Submitted) BtrFS Boot failures ...

This forum is dedicated to testing early releases and cauldron : Howtos, tips, tricks and user global feedback and thoughts...

Helpful tip :
For bugs tracking we use : https://bugs.mageia.org = The Mageia Bug Tracker
In this bug tracker you'll find already reported bugs and you'll be able to report those you have found....

(Bug Report Submitted) BtrFS Boot failures ...

Postby ghmitch » Apr 4th, '13, 07:05

I am seeing repeated early stage boot failures with the latest Cauldren kernel. Its sporadic. Sometimes it makes the boot OK and sometimes it doesn't. I had no problem at all with this with the two earlier kernels. I am getting "dracut: signal caught" and then instructions to copy off data from /run. But I use a USB keyboard which works fine with the BIOS, but obviously not with dracut. Can this be solved by a usb module for Grub 2? Any suggestions? - George
Last edited by ghmitch on Apr 13th, '13, 17:30, edited 2 times in total.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: Boot failures with latest kernel ...

Postby ghmitch » Apr 8th, '13, 17:38

I FINALLY got so annoyed with this that I did the obvious thing and unsilenced the boot sequence. Everything proceeds normally until the btrfs root filesystem gets mounted, and then comes the infamous "dracut: signal caught". Any ideas from this clue? Many times, once the root filesystem has been mounted, the startup sequence suddenly jackrabbits ahead so fast I would need to video the sequence to be able to decipher it. But the system does come up normally if I can get past the initial root file system mount OK. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: Boot failures with latest kernel ...

Postby gohlip » Apr 8th, '13, 19:37

Don't have your problem but......
o enabled usb support in bios? If yes, insmod usb_keyboard and insmod uhci may help, but check bios first and try without insmod uhci.
o insmod btrfs? (if only grub2)
o still running grub-legacy "co-jointly" grub2? Then have /boot in separate ext2 partition and / in btrfs.

Good luck.
Why do we live? To prove not everything in nature has a purpose.
gohlip
 
Posts: 573
Joined: Jul 9th, '12, 10:50

Re: Boot failures with latest kernel ...

Postby ghmitch » Apr 8th, '13, 20:07

gohlip wrote:Don't have your problem but......
o enabled usb support in bios? If yes, insmod usb_keyboard and insmod uhci may help, but check bios first and try without insmod uhci.


I thought that becuase I have USB keyboard access to the BIOS itself, that means it is enabled. Now that you bring it up, I realize that may not necessarily hold true. I need to check on this one. Thanks!

gohlip wrote:o insmod btrfs? (if only grub2)


Yup, got this one covered AND verified that draktools handles it fine as well.

gohlip wrote:o still running grub-legacy "co-jointly" grub2? Then have /boot in separate ext2 partition and / in btrfs.


At this point I have decided to be brave and have gotten rid of Mageia 2 completely. Mageia 3 on btrfs boots fine MOST OF THE TIME. But it fails unpredictably on an intermittant basis. All my main Mageia 3 system is on btrfs RAID1 partitions at this point. I finally figured out how to stop the incessant fsck's at boot time by modifying fstab. That made it boot up like a rocket. Next step will be to replace the old 3ware RAID cards (which I am now using in JBOD mode) with a simple SATA2 controller. The only other OS I have on this box at this point is a basic Mageia3 64 bit system located on a single etx4 partition for troubleshooting purposes. It has saved my neck on multiple occasions. And it works fine with grub 2. AND ... just to be safe, I have the grub 2 bootloader installed on multiple drives. So far I am REALLY happy with the way everything is running.

Good luck.[/quote]
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: Boot failures with latest kernel ...

Postby gohlip » Apr 9th, '13, 05:45

Finally, since you've "modified/raided/moved/btrfsed" your partitions after installation and if you've updated to latest upgrades (beta4), "grub2-install /dev/sda" will work now. Suggest you do just that.
Code: Select all
grub2-install /dev/sda
update-grub


[edit]
drats! it's "update-grub2" (I had an alias 'update-grub' pre-beta4 which works)
Code: Select all
grub2-install /dev/sda
update-grub2
Why do we live? To prove not everything in nature has a purpose.
gohlip
 
Posts: 573
Joined: Jul 9th, '12, 10:50

Re: Boot failures with latest kernel ...

Postby ghmitch » Apr 12th, '13, 19:51

FINALLY, I *think* I have got this thing pinned down. It seems to be related to how dracut interacts with BtrFS. I am running Mageia 3 B4 on BtrFS RAID 1 spread over multiple hard drives. When all of those drives are on the host controller. Everything works fine. If any of those drives are running on a controller card, stuff starts to go wrong. When running two of the drives in a set of three on a 3ware controller in JBOD mode, the problem was intermittant. When running one of the drives in a set of four on a Silicone Image controller, the problem is continuous and the system unbootable. This just has to be some sort of timing issue whereby initramfs is trying to mount the root BtrFS filesystem before the controller card is ready and the result is an immediate failure and drop back to the dracut shell. At this point, suddenly my USB keyboard has started working in dracut shell and I was able to pull out a sosreport.txt file. I have submitted all of this info on a bug report, so we will see what happens. In the mean time I have deleted the offending drive and will not be able to use the backup controller until this issue becomes resolved. That may take a while since I'm sure there are much more pressing issues going on at this point in time, but at least it is documented and on the table. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby isadora » Apr 12th, '13, 20:47

Ghmitch, it would be nice adding a link to the mentioned bug-report.

Thank you! :)
..........bird from paradise..........

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
User avatar
isadora
 
Posts: 2765
Joined: Mar 25th, '11, 16:03
Location: Netherlands

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby ghmitch » Apr 13th, '13, 03:45

Isadora, Unfortunately, as soon as I *thought* I had this solved due to the pattern that seemed to emerge, my machine proved me wrong and began to fail even with all drives on the host controller. The good thing is that I was able to get a dump on the failure this time. I have updated the bug report with this information and that might make it a dup of another similar bug report. But I am determined to get at the root of this. At this point I have shoved USR off to its own partition as I suspect the size of the root partition might be a factor in the dracut failures. USR is the 900 ton gorrilla on the root tree hands down. After doing that my first attempted boot succeeded uneventfully, but that could be a fluke of course. Right now I am balancing the partitions across the two controller groups and will see what happens from here. I am really determined to make btrfs work on a production machine. And I have the hardware that can give me the options to somehow make it happen. I will continue to keep you updated as I work through this.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby ghmitch » Apr 13th, '13, 04:54

Well, there is an interesting pattern emerging. Initially boot was failing at the point of initial root mount. Now I have moved USR off on to a separate partition. NOW boot fails at the point of mounting the USR partition. Interesting no? I suspect that dracut and the resulting initrd are having trouble mounting large filesystems? That would explain why the problem seemed to worsen when additional drives are added.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby ghmitch » Apr 13th, '13, 05:30

Doing "balances" and "scrubs" on the USR filesystem are consistantly helpful. And doing filesystem "shows" before and after indicate that they tend to reduce the size of the filesystems in terms of payload, which I suspect explains the underlying benefit. Something in the booting kernel is very sensitive to the size of the filesystems. I have noticed with btrfs, that unlike ext4 and other filesystems, as btrfs filesystems grow larger, the time it takes to mount them increases significantly. And that could be causing a timing issue if the booting kernel assumes that a mount has failed simply because it takes longer than expected.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby isadora » Apr 13th, '13, 09:30

But a link? ;)
..........bird from paradise..........

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
User avatar
isadora
 
Posts: 2765
Joined: Mar 25th, '11, 16:03
Location: Netherlands

Re: (Bug Report Submitted) Boot failures with latest kernel

Postby ghmitch » Apr 13th, '13, 16:50

Aha! Yes ... the link. Here is the link.

https://bugs.mageia.org/show_bug.cgi?id=9714

The good news is that, aside from this glitch, this system is working EXTREMELY smoothly and extremely well. I am VERY VERY satisfied with btrfs. I have crashed this machine multiple times inadvertantly and btrfs has never let me down so far. I just yesterday had a situation where I was in the process of removing two drives from a filesystem and accidentally touched the power button on the system forcing a shutdown. On shutdown it spewed out complaints about not being able to unmount the filesystem in question and forced it down anyway. I rebooted with much fear and trepidation but everything came up normally. I really seems like they have btrfs in a pretty watertight condition at this point. The boot problem is a serious one though because when you add a new drive, btrfs WILL grow to fill it. That is because it habitually takes snapshots and saves old data. Everytime you change a file like a package update for example, btrfs does not overwrite the change files, but adds new ones and changes the pointers, so the amount of data in megabytes just grows and grows till it fills the filesystem. When you run it you can watch it happen. Rebalancing a filesystem erases all of this back up data and will cut the size of the filesystem dramatically. So that is my intererim solution at this point. When I am no longer able to boot, I boot up on a maintenance/rescue system and rebalance the /usr filesystem partitions. That has worked for me on multiple occasions so far and now I think I am understanding more of the why.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) BtrFS Boot failures ...

Postby ghmitch » Apr 14th, '13, 17:29

It seems that this is actually related to a known problem in the kernel btrfs architecture and there are some fixes out there. On the Mageia side there will now be an effort to get these fixes backported into the Mageia kernel until they get finalized into the upstream kernel. So it looks like relief is on the way. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) BtrFS Boot failures ...

Postby ghmitch » Apr 20th, '13, 18:58

Just as an update for anyone else who might be using or contemplating the use of btrfs on root, this problem is being actively studied by both Mageia devs and upstream. As a result, several updates have been released and I have seen some improvement as a result. What I am currently realizing is that the problem appears to have multiple causes as the failures are manifesting differently at various times. At least one of these scenarios appears fixed at this point, and I can only assume that others are being looked at. I am trying to get myself into a mode of capturing whatever data I can and attaching it to the initital bug report as quickly as possible. Part of the problem is that I am often not able to get keyboard access even with an old PS2 keyboard when the boot failure occurs. I get the Dracut prompt, but no keyboard response indicating that the system at that point is either dead or has lost IO capability. So I will probably be resorting to capturing a video of the boot process and delivering that data up to the bug report. One way or another this is going to get fixed, but it is going to take a while. I will update this topic periodically. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) BtrFS Boot failures ...

Postby ghmitch » May 5th, '13, 05:05

I have just received a tip on the btrfs mailing list to try adding the kernel option "rootdelay=1". I am trying that now and will edit this post as to how this works out.

Well, nice try, but doesn't seem to be working for me. A new btrfs fix just came down for dracut. Hopefully that may help.
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: (Bug Report Submitted) BtrFS Boot failures ...

Postby sander85 » May 5th, '13, 21:32

ghmitch wrote:I have just received a tip on the btrfs mailing list to try adding the kernel option "rootdelay=1". I am trying that now and will edit this post as to how this works out.

Well, nice try, but doesn't seem to be working for me. A new btrfs fix just came down for dracut. Hopefully that may help.

Did you try to add longer delay? 10 or 15?
Stand for something, or you will fall for nothing.
-- Richard Stallman
User avatar
sander85
 
Posts: 88
Joined: Jan 28th, '12, 20:41
Location: Estonia

Re: (Bug Report Submitted) BtrFS Boot failures ...

Postby ghmitch » May 5th, '13, 22:15

Actually, I tried "rootdelay=10". It seemed to make it worse. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA


Return to Testing : Alpha, Beta, RC and Cauldron

Who is online

Users browsing this forum: No registered users and 1 guest