[SOLVED] Mageia 4 btrfs issue

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

[SOLVED] Mageia 4 btrfs issue

Postby mla » Feb 8th, '14, 20:21

Having upgraded both my machines to Mageia4 a few days ago, I now have a btrfs problem which was absent in Mageia3. The slower of the two machines fails to boot very roughly 50% of the time. The faster machine fails to boot very occasionally. Why do I blame btrfs? Because boot is aborted due to fsck reporting errors on the btrfs partition. Trouble is, there are actually no problems with that partition.

What I have is the actual btrfes partition itself which mounrs unimaginatevely as /btrfs and a number of subvolumes, which are also mounted as file systems -- e.g. /btrfs/home gets mounted as /home. What happens is that all the subvolumes get mounted just fine, but /btrfs itself fails due to fsck failing its check. If boot fails, I can run fsck on /btrfs as soon as I get to the command prompt, and it passes with no problem. It mounts with no problem either, of course.

What appears to be happening is that there is a timing issue due to parallelisation of the boot process. If fsck on /btrfs happens to overlap in time with fsck on one of its subvolumes, it fails. And indeed, if I do not automatically mount /btrfs on boot, the problem goes away, but it is a serious nuisance.

Has anybody seen anything like that, or am I barking up a wrong tree? :-) I have both boot.log and the output of jounalctl -b, if anybody wants to see them (about 15000 lines altogether :-) ).
Last edited by mla on Feb 9th, '14, 01:57, edited 1 time in total.
mla
 
Posts: 292
Joined: Sep 16th, '11, 16:10

Re: Mageia 4 btrfs issue

Postby doktor5000 » Feb 8th, '14, 20:30

You may want to look at viewtopic.php?f=15&t=4659
Does that sound like your issue?
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18018
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Mageia 4 btrfs issue

Postby mla » Feb 8th, '14, 20:41

Nope. It's definitely an fsck failure. No "dracut: signal caught" message. just zillions of lines of fsck complaints culminating in

systemd-fsck[525]: Errors found in extent allocation in tree or chunk allocation
mla
 
Posts: 292
Joined: Sep 16th, '11, 16:10

Re: Mageia 4 btrfs issue

Postby doktor5000 » Feb 8th, '14, 23:04

Did you really read through the thread, and especially the linked bugreport, including the workarounds and remedies used?
Maybe you should ask George to take a look at your thread.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18018
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Mageia 4 btrfs issue

Postby mla » Feb 8th, '14, 23:58

I have read through both, and looked at the supplied logs. As far as I can tell, the only similarity is that (a) the boot fails and (b) btrfs is involved. I get no dracut complaints or reports of "open c_tree" failing. He gets no complaints from fsck.

Should I upload the boot.log and the journalctl -b output, both harvested at the point of boot failure?

BTW, my setup is way simpler. No RAID (I am running btrfs for snapshots), no additional controller and the / partition is an ext4 filesystem.
mla
 
Posts: 292
Joined: Sep 16th, '11, 16:10

Re: Mageia 4 btrfs issue

Postby doktor5000 » Feb 9th, '14, 00:26

Well, best report it directly upstream at kernel bugzilla https://bugzilla.kernel.org/ after looking through the wiki: http://btrfs.wiki.kernel.org/
Especially this one as it mentions your error message: https://btrfs.wiki.kernel.org/index.php/Gotchas

Debugging that looks fun: https://btrfs.wiki.kernel.org/index.php ... UML_Kernel
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18018
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Mageia 4 btrfs issue

Postby mla » Feb 9th, '14, 00:51

OK. Will do. Bed tim now. :-) And a busy day tomorrow.

But looking at the Gotchas, I see nothing about "my" error message. If you mean "open c_tree" failing mentioned at the very end, that was George's problem, which I *don't* get. And FWIW, my feeling is that the problem is with systmd effectively running two fsck processes on the same BTRFS partition.
mla
 
Posts: 292
Joined: Sep 16th, '11, 16:10

Re: Mageia 4 btrfs issue

Postby ghmitch » Feb 9th, '14, 00:55

There should NEVER be a fsck on btrfs filesystems at boot. NEVER! Unlike other filesystems, btrfs does not rely on fsck to maintain filesystem integrity. Most filesystem integrity issues with btrfs are resolved ONLINE rather than OFFLINE. So the solution is to edit EVERY btrfs volume and subvolume in /etc/fstab and change the fsck on boot flag to zero like so:

Code: Select all

[ghmitch@localhost ~]$ cat /etc/fstab
LABEL=MAGEIA3BTR / btrfs relatime 1 0
LABEL=MAGEIA3BTR-SUBS /usr btrfs subvol=USR,relatime 1 0
LABEL=MAGEIA3BTR-SUBS /var btrfs subvol=VAR,relatime 1 0
LABEL=MAGEIA3BTR-SUBS /opt btrfs subvol=OPT,relatime 1 0
LABEL=MAGEIA3BTR-BOOT /boot btrfs relatime 1 0
LABEL=HOME /home btrfs relatime 1 0
LABEL=COMMON /common btrfs relatime 1 0
/dev/cdrom /media/cdrom auto umask=0,users,iocharset=utf8,noauto,ro,exec 0 0
none /proc proc defaults 0 0
none /tmp tmpfs defaults 0 0



Note the relevant paragraph from the manual on /etc/fstab:

The sixth field (fs_passno).
This field is used by the fsck(8) program to determine the order in which filesystem checks are done at reboot
time. The root filesystem should be specified with a fs_passno of 1, and other filesystems should have a
fs_passno of 2. Filesystems within a drive will be checked sequentially, but filesystems on different drives
will be checked at the same time to utilize parallelism available in the hardware. If the sixth field is not
present or zero, a value of zero is returned and fsck will assume that the filesystem does not need to be
checked.




This should solve your problem. If it doesn't let us know. - George
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: Mageia 4 btrfs issue

Postby ghmitch » Feb 9th, '14, 01:19

What's the difference between btrfsck and fsck.btrfs

btrfsck is the actual utility that is able to check and repair a filesystem
fsck.btrfs is a utility that should exist for any filesystem type and is called during system setup when the corresponding /etc/fstab entries contain non-zero value for fs_passno. (See fstab(5) for more.)

Traditional filesystems need to run their respective fsck utility in case the filesystem was not unmounted cleanly and the log needs to be replayed before mount. This is not needed for btrfs. You should set fs_passno to 0.

Note, if the fsck.btrfs utility is in fact btrfsck, then the filesystem is unnecessarily checked upon every boot and slows down the whole operation. It is safe to and recommended to turn fsck.btrfs into a no-op, eg. by cp /bin/true /sbin/fsck.btrfs.


From btrfs wiki https://btrfs.wiki.kernel.org/index.php/FAQ#When_will_Btrfs_have_a_fsck_like_tool.3F.

Actually I think there is an error in the above wiki, at least in the case of Mageia. /usr/sbin/fsck.btrfs is a symlink that points to /usr/sbin/btrfsck. If you copy /bin/true to the simlink (/usr/sbin/fsck.btrfs) you will overwrite its target btrfsck with /usr/bin/true which is NOT what you want to do. What you actually want to do is to unlink /usr/sbin/fsck.btrfs from /usr/sbin/btrfsck and relink it to /usr/bin/true.

Code: Select all
[ghmitch@localhost ~]$ ls -l /usr/sbin/fsck.btrfs
lrwxrwxrwx 1 root root 22 Jan 31 12:43 /usr/sbin/fsck.btrfs -> ../../usr/sbin/btrfsck*
[root@localhost ghmitch]# ls -l /usr/sbin/btrfsck
-rwxr-xr-x 1 root root 458008 Dec 27 11:16 /usr/sbin/btrfsck*
[ghmitch@localhost ~]$ ls -l /usr/bin/true
-rwxr-xr-x 1 root root 28240 May  3  2013 /usr/bin/true*
[root@localhost ghmitch]# rm /sbin/fsck.btrfs
rm: remove symbolic link ‘/sbin/fsck.btrfs’? y
[root@localhost ghmitch]# ln -s /usr/bin/true /usr/sbin/fsck.btrfs
[root@localhost ghmitch]# ls -l /usr/sbin/fsck.btrfs
lrwxrwxrwx 1 root root 13 Feb  8 15:33 /usr/sbin/fsck.btrfs -> /usr/bin/true*
ghmitch
 
Posts: 325
Joined: Mar 30th, '11, 03:05
Location: Eureka California USA

Re: Mageia 4 btrfs issue

Postby mla » Feb 9th, '14, 01:56

Um... I was about to reply that I set 0 0 on all entries in fstab other than /, /var and /boot, but then I thought I'd double-check, and blow me you aer right. I blush! The entry for /btrfs was 1 2, though 0 0 for all of its subvolumes.

Have now fixed and did 6 quick reboots. Not a single failure. I declare the problem solved -- with many thanks!

Mike
mla
 
Posts: 292
Joined: Sep 16th, '11, 16:10


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest