kernel stalls on boot [Solved]

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

kernel stalls on boot [Solved]

Postby jiml8 » Apr 12th, '22, 18:43

This past weekend, I just got around to migrating to Mageia 8 from Mageia 7.

Yeah, yeah. I know. But the fact is that I have been frantically busy for the last year and, given the nightmare that my Mageia 6->Magiea 7 transition was, I was very reluctant to make the move because I was afraid of the downtime that might result if the migration went bad. This workstation isn't alone; over the last year I have fallen behind on updates on all my systems, and now I am getting all that sorted out.

Over the time I have run Mageia 7, I have done a motherboard/processor/memory swap (within the AMD family, and I now am running a 5800X processor with 128 GB RAM on an Asus motherboard), and I installed a 2 TB Samsung 980 Pro SSD, which I made into the boot volume. I migrated from grub to grub2 and I made an attempt to get this system to boot using UEFI, but I failed. So my boot setup is supposed to be UEFI but does not work. I have not figured out why; might be mobo firmware. Don't know.

Anyway, this means the box actually boots using grub 2 in the legacy fashion.

So, this system has some legacy stuff on it (there is still a lot of grub stuff on it, and I never uninstalled grub though it is no longer used) and conceivably has some misconfiguration associated with the UEFI stuff. I don't know if any of that is relevant, but I provide the info just in case.

The update was not too bad. I had an immediate problem, where I downloaded all the packages and did a test install using the --test flag on urpmi. Initially , it failed with the message:
Code: Select all
Installation failed:  file /boot/EFI/EFI/mageia conflicts between attempted installs of efi-filesystem-4-1.mga8.noarch and fwupdate-efi-12-2.mga8.x86_64


I got past this by manually installing/forcing fwupdate-efi-12-2.mga8.x86_64. I then got another error message (again using the --test flag) that was similar involving grub2-common-2.06-1.1.mga8.x86_64 which I resolved by manually installing/forcing that package.

After I resolved those two errors (and confirmed the box would reboot with the changed packages), I went ahead with the install.

The install went more or less OK; I did have to force a couple of packages and I had to run URPMI several times to get everything installed. Finally, everything went in and I tried to boot into Mageia 8.

Boot stalled; the new kernel wouldn't fully boot and the reason why was not obvious; it has scrolled off of the console by the time things stalled.

So, some fiddling around showed that my system WOULD boot using the last mga7 kernel, and everything seems to work. So, presently I am booted into a complete Mageia 8 environment using the last Mageia 7 5.10 kernel.

Further fiddling indicated that I had to blacklist the nouveau driver on the kernel command line; the blacklists in /etc/modprobe.d/ were being ignored.

Also, I determined that I had to require the nvidia-drm module to be loaded at start time (in /etc/modules); it was not being loaded and consequently my X session was not starting (at least, on the mga7 kernel).

I should point out that I get my nvidia drivers from the nvidia site and install them manually; this is for me something I have done for 20 years. I no longer remember why I did that originally, but it is part of my process now.

So, anyway. there is something wrong with my mga8 kernel startup.

The startup section that works using the mga7 kernel (in grub2.cfg) is this:
Code: Select all
        menuentry 'Mageia (5.10.46-desktop-1.mga7) 8' --class mageia --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.10.46-desktop-1.mga7-advanced-6a458599-d90a-4264-8540-497672301635' {
                set gfxpayload=text
                insmod gzio
                insmod part_gpt
                insmod ext2
                search --no-floppy --fs-uuid --set=root 6a458599-d90a-4264-8540-497672301635
                linux   /boot/vmlinuz-5.10.46-desktop-1.mga7 root=UUID=6a458599-d90a-4264-8540-497672301635 ro acpi_enforce_resources=lax vga=788 splash
                initrd  /boot/initrd-5.10.46-desktop-1.mga7.img
        }


And the startup section that doesn't work using the mga8 kernel is this:
Code: Select all
 menuentry 'Mageia' --class mageia --class gnu-linux --class gnu --class os --unrestricted $menuentry_i
d_option 'gnulinux-simple-6a458599-d90a-4264-8540-497672301635' {
        set gfxpayload=text
        insmod gzio
        insmod part_gpt
        insmod ext2
        search --no-floppy --fs-uuid --set=root 6a458599-d90a-4264-8540-497672301635
        linux   /boot/vmlinuz-5.15.32-desktop-1.mga8 root=UUID=6a458599-d90a-4264-8540-497672301635 ro
 acpi_enforce_resources=lax rd.driver.blacklist=nouveau vga=788 splash
        initrd  /boot/initrd-5.15.32-desktop-1.mga8.img
}


Note that I have build my own initrd using dracut against the possibility that the one built at install time was not right for some reason; no effect. Also, as I mentioned, I did add the nouveau blacklist command. I have also tried it with and without the acpi_enforce_resources option set, and did not observe any difference.

I am sure that there is some missing or incorrect option in this boot section, but I have no idea what and my (fairly quick) scan of this site and the search engines didn't show me the answer. I will bet someone here knows. If no one knows, my next step will be to do a clean install on another SSD and see what I get. But that's a lot of extra work, though it might also give me my answer to the UEFI problem.
Last edited by jiml8 on May 8th, '22, 20:02, edited 1 time in total.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby morgano » Apr 12th, '22, 20:53

For tests:

Have you tried to boot with nouveau instead of nvidia?

Have you tried Mageia 8 Live? (and at boot you can select to use nvidia or nouveau)
At home & work Mandriva since 2006, Mageia 2011. Thinkpad T40, T43, T60, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1491
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden

Re: kernel stalls on boot

Postby benmc » Apr 12th, '22, 22:02

disclaimer: I have only one Nvidia system, and I have only ever used the Mageia supplied driver.

for me I need to have the -devel supplemental kernel and dkms-nvidia packages installed for the Nvidia graphics. As you install and use the Nvidia supplied driver, I am unsure if you need them.

to check, see if you have a kernel-desktop-devel version for your 5.10.46-desktop-1.mga7. if so you will also need one for your 5.15.32-desktop-1.mga8 kernel.
benmc
 
Posts: 1214
Joined: Sep 2nd, '11, 12:45
Location: Pirongia, New Zealand

Re: kernel stalls on boot

Postby jiml8 » Apr 12th, '22, 22:45

morgano wrote:For tests:

Have you tried to boot with nouveau instead of nvidia?

Have you tried Mageia 8 Live? (and at boot you can select to use nvidia or nouveau)

I learned I needed to blacklist nouveau in the command line because my first attempts to boot did start nouveau. And it did hang. I think that whatever is hanging it is not related to the graphics driver, but I am not certain of that, so I reported on the things I did WRT graphics.

I have not tried Mageia 8 live, but that is a good idea. If it works (and I would expect it to) then it would give me a working grub2.cfg command section.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby jiml8 » Apr 12th, '22, 22:47

benmc wrote:disclaimer: I have only one Nvidia system, and I have only ever used the Mageia supplied driver.

for me I need to have the -devel supplemental kernel and dkms-nvidia packages installed for the Nvidia graphics. As you install and use the Nvidia supplied driver, I am unsure if you need them.

to check, see if you have a kernel-desktop-devel version for your 5.10.46-desktop-1.mga7. if so you will also need one for your 5.15.32-desktop-1.mga8 kernel.


I need the kernel-desktop-devel in order to compile the nvidia driver, and to compile vmware workstation, and a number of other things. So, yes, it is there.

As I said in my previous post, I don't really think the graphics is the problem; if it was, I would expect to get to a command prompt. Once I get to a command prompt, solving the problem is just a thing. But I am not getting that far.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby jiml8 » Apr 12th, '22, 22:48

So what is in your grub2.cfg file?
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby benmc » Apr 12th, '22, 23:53

jiml8 wrote:So what is in your grub2.cfg file?


I will have to dig the machine out, it is an old test machine that I dont use very often.
benmc
 
Posts: 1214
Joined: Sep 2nd, '11, 12:45
Location: Pirongia, New Zealand

Re: kernel stalls on boot

Postby jiml8 » Apr 13th, '22, 04:41

benmc wrote:
jiml8 wrote:So what is in your grub2.cfg file?


I will have to dig the machine out, it is an old test machine that I dont use very often.


Again, I seriously doubt this is graphics related. What is the section in your daily driver?
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby benmc » Apr 13th, '22, 08:44

as I mentioned an old test system: Compaq 8510w mobile workstation.

Code: Select all
Graphics:  Device-1: NVIDIA G84GLM [Quadro FX 570M] vendor: Hewlett-Packard driver: nouveau v: kernel
           bus ID: 01:00.0 chip ID: 10de:040c
           Display: server: Mageia X.org 1.20.10 compositor: kwin_x11 driver: nouveau,v4l
           resolution: 1920x1200~60Hz s-dpi: 96
           OpenGL: renderer: NV84 v: 3.3 Mesa 20.3.0 direct render: Yes


Code: Select all
uname -r
5.9.12-desktop-1.mga8

~

from /boot/grub2/grub.cfg:
Code: Select all
menuentry 'Mageia' --class mageia --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-simple-f95429ec-e9ec-41aa-b848-49ad87d1e22c' {
   savedefault
   set gfxpayload=text
   insmod gzio
   insmod part_gpt
   insmod ext2
   set root='hd0,gpt2'
   if [ x$feature_platform_search_hint = xy ]; then
     search --no-floppy --fs-uuid --set=root --hint-ieee1275='ieee1275//disk@0,gpt2' --hint-bios=hd0,gpt2 --hint-efi=hd0,gpt2 --hint-baremetal=ahci0,gpt2  f95429ec-e9ec-41aa-b848-49ad87d1e22c
   else
     search --no-floppy --fs-uuid --set=root f95429ec-e9ec-41aa-b848-49ad87d1e22c
   fi
   linux   /boot/vmlinuz-5.9.12-desktop-1.mga8 root=UUID=f95429ec-e9ec-41aa-b848-49ad87d1e22c ro  splash quiet noiswmd resume=UUID=88f91e71-f1af-4037-94eb-cbdf1825a039 audit=0 vga=791
   initrd   /boot/initrd-5.9.12-desktop-1.mga8.img
}


hope this is helpful.

any useful info from editing out " splash quiet" from the kernel boot line?
for even more info you can add "verbose" to the kernel boot line after removing "splash quiet", but I suspect I might be guilty of attempting to "to teach one's grandmother to suck eggs" ;)
benmc
 
Posts: 1214
Joined: Sep 2nd, '11, 12:45
Location: Pirongia, New Zealand

Re: kernel stalls on boot

Postby jiml8 » Apr 13th, '22, 10:16

benmc wrote:hope this is helpful.

any useful info from editing out " splash quiet" from the kernel boot line?
for even more info you can add "verbose" to the kernel boot line after removing "splash quiet", but I suspect I might be guilty of attempting to "to teach one's grandmother to suck eggs" ;)


So you do not use Mageia in your day to day life? OK.

Actually, though I make my living developing, for the last 8 years I have been pretty much a linux user and a freebsd developer. Linux has moved a long way since the last time I did anything serious with the kernel.

There are about a golzillion different kernel command line options, and I am just now re-familiarizing myself with the linux kernel. So I really do appreciate the ideas and the help; this world has changed a lot.

Last July, we decided to migrate our product from FreeBSD to Linux for a number of technical and business reasons, and I had been working on a totally new generation product (written in C rather than the PHP that our older product uses), so from July I and another person have been working on that migration. We now have it completed and a first-generation is in the marketplace (available on amazon. If you are a gamer, look for the gaming edge bullet), so I have been learning the OpenWRT dev environment. This is also getting me into the current linux kernel, but as I say, my experience is musty and this system has moved a long way.

So I do thank you for your ideas and assistance.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby benmc » Apr 13th, '22, 21:03

jiml8 wrote:So you do not use Mageia in your day to day life? OK.


work is windows, due to office accounting system requiring windows. (Quickbooks.).
workshop comp - not used very often, is on linux. its only use is transcribing from audio cassette to cd. not networked.

at home, Linux only.
benmc
 
Posts: 1214
Joined: Sep 2nd, '11, 12:45
Location: Pirongia, New Zealand

Re: kernel stalls on boot

Postby doktor5000 » Apr 13th, '22, 21:31

You mentioned the mga8 kernel boot stalls, but at what point does it stall in particular? During initrd/early boot, or after pivotroot to the installed system?

I'd also say add something like rd.break to kernel options and remove splash & quiet and let it boot for a few minutes and see where it stalls. Maybe attach journalctl -ab logs if you're fine with sharing those.

FWIW my daily driver is also an nvidia box, and it boots just fine with the proprietary driver and with nouveau, and I'd say the grub configuration is not your issue.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18054
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: kernel stalls on boot

Postby jiml8 » Apr 13th, '22, 22:52

I think it is stalling after the pivot root, but I am not sure. I was getting the password requester to decrypt my volumes, which means the filesystems were being mounted, and I think that happens after the pivot root, but I am not sure.

The journald gets started after the pivot root, but how early? Though it is something to look at, and I don't know why sharing them would be a problem. So I will check.

I have the workstation in production now; I have work to do. But obviously I must solve this issue and soon. So, likely this weekend I will take it down again and try the suggestions from this thread. I will report back when I know more. Thanks.
Last edited by doktor5000 on Apr 13th, '22, 23:06, edited 1 time in total.
Reason: removed fullquote
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: kernel stalls on boot

Postby doktor5000 » Apr 13th, '22, 23:18

Mounting is still early boot, and journal is also running during early boot, it passes journal entries later on to the running system so you don't lose those from early boot.
You may want to have a look at https://www.freedesktop.org/software/sy ... otup.html# which shows some details.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18054
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: kernel stalls on boot

Postby filip » Apr 16th, '22, 17:10

benmc wrote:
Code: Select all
uname -r
5.9.12-desktop-1.mga8


This is very odd. The mga8 release kernel was 5.10.16.
filip
 
Posts: 478
Joined: May 4th, '11, 22:10
Location: Kranj, Slovenia

Re: kernel stalls on boot

Postby jiml8 » May 5th, '22, 01:36

OK I finally got back to this thing. I've been running without rebooting since the last time I posted here. Getting busy again...

Now, you may recall that I said that last summer I transferred my system from an old SSD (Samsung 840 Pro) to a new SSD (Samsung 980 Pro). Well, since making that move, I have been booting from the 980 Pro, and it is the installation on the 980 Pro that I upgraded to Mageia 8, but the installation on that 840 Pro has been there and untouched - and recognized by the system as a bootable drive.

So, I booted into the mageia 8 live system, then installed and updated that system on the 840 Pro, overwriting the old Mageia 7 system. I then booted into the Mageia 8 system on the 840 Pro, mounted the /boot from the 980 Pro, and copied over the kernel, map, and initrd from the 840 Pro to the 980 Pro. I then manually edited the grub.cfg file on the 980 Pro to match the information in the 840 Pro (on the boot line, that is...including the command line options used), and the system started right up.

So I still have some things to do. I need to edit the grub config file in /etc to make sure the right options get put into grub.cfg on an upgrade, and I need to take a careful look at how dracut is being built. This shouldn't be too difficult given that I have a working dracut configuration in the /etc on the 840 Pro so I can probably just copy that over (though I will study it first).

Also, the latest kernel on my 980 Pro was 5.15.32-1 and I manually installed 5.15.35-2 so my rpm database now must have an issue. I would imagine that will work itself out in time, but I'm aware of it so if anything goes wrong I know what to look for.

Does anyone else know of anything I need to look at to make this changeover complete?
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest