kernel stalls on boot [Solved]

This past weekend, I just got around to migrating to Mageia 8 from Mageia 7.
Yeah, yeah. I know. But the fact is that I have been frantically busy for the last year and, given the nightmare that my Mageia 6->Magiea 7 transition was, I was very reluctant to make the move because I was afraid of the downtime that might result if the migration went bad. This workstation isn't alone; over the last year I have fallen behind on updates on all my systems, and now I am getting all that sorted out.
Over the time I have run Mageia 7, I have done a motherboard/processor/memory swap (within the AMD family, and I now am running a 5800X processor with 128 GB RAM on an Asus motherboard), and I installed a 2 TB Samsung 980 Pro SSD, which I made into the boot volume. I migrated from grub to grub2 and I made an attempt to get this system to boot using UEFI, but I failed. So my boot setup is supposed to be UEFI but does not work. I have not figured out why; might be mobo firmware. Don't know.
Anyway, this means the box actually boots using grub 2 in the legacy fashion.
So, this system has some legacy stuff on it (there is still a lot of grub stuff on it, and I never uninstalled grub though it is no longer used) and conceivably has some misconfiguration associated with the UEFI stuff. I don't know if any of that is relevant, but I provide the info just in case.
The update was not too bad. I had an immediate problem, where I downloaded all the packages and did a test install using the --test flag on urpmi. Initially , it failed with the message:
I got past this by manually installing/forcing fwupdate-efi-12-2.mga8.x86_64. I then got another error message (again using the --test flag) that was similar involving grub2-common-2.06-1.1.mga8.x86_64 which I resolved by manually installing/forcing that package.
After I resolved those two errors (and confirmed the box would reboot with the changed packages), I went ahead with the install.
The install went more or less OK; I did have to force a couple of packages and I had to run URPMI several times to get everything installed. Finally, everything went in and I tried to boot into Mageia 8.
Boot stalled; the new kernel wouldn't fully boot and the reason why was not obvious; it has scrolled off of the console by the time things stalled.
So, some fiddling around showed that my system WOULD boot using the last mga7 kernel, and everything seems to work. So, presently I am booted into a complete Mageia 8 environment using the last Mageia 7 5.10 kernel.
Further fiddling indicated that I had to blacklist the nouveau driver on the kernel command line; the blacklists in /etc/modprobe.d/ were being ignored.
Also, I determined that I had to require the nvidia-drm module to be loaded at start time (in /etc/modules); it was not being loaded and consequently my X session was not starting (at least, on the mga7 kernel).
I should point out that I get my nvidia drivers from the nvidia site and install them manually; this is for me something I have done for 20 years. I no longer remember why I did that originally, but it is part of my process now.
So, anyway. there is something wrong with my mga8 kernel startup.
The startup section that works using the mga7 kernel (in grub2.cfg) is this:
And the startup section that doesn't work using the mga8 kernel is this:
Note that I have build my own initrd using dracut against the possibility that the one built at install time was not right for some reason; no effect. Also, as I mentioned, I did add the nouveau blacklist command. I have also tried it with and without the acpi_enforce_resources option set, and did not observe any difference.
I am sure that there is some missing or incorrect option in this boot section, but I have no idea what and my (fairly quick) scan of this site and the search engines didn't show me the answer. I will bet someone here knows. If no one knows, my next step will be to do a clean install on another SSD and see what I get. But that's a lot of extra work, though it might also give me my answer to the UEFI problem.
Yeah, yeah. I know. But the fact is that I have been frantically busy for the last year and, given the nightmare that my Mageia 6->Magiea 7 transition was, I was very reluctant to make the move because I was afraid of the downtime that might result if the migration went bad. This workstation isn't alone; over the last year I have fallen behind on updates on all my systems, and now I am getting all that sorted out.
Over the time I have run Mageia 7, I have done a motherboard/processor/memory swap (within the AMD family, and I now am running a 5800X processor with 128 GB RAM on an Asus motherboard), and I installed a 2 TB Samsung 980 Pro SSD, which I made into the boot volume. I migrated from grub to grub2 and I made an attempt to get this system to boot using UEFI, but I failed. So my boot setup is supposed to be UEFI but does not work. I have not figured out why; might be mobo firmware. Don't know.
Anyway, this means the box actually boots using grub 2 in the legacy fashion.
So, this system has some legacy stuff on it (there is still a lot of grub stuff on it, and I never uninstalled grub though it is no longer used) and conceivably has some misconfiguration associated with the UEFI stuff. I don't know if any of that is relevant, but I provide the info just in case.
The update was not too bad. I had an immediate problem, where I downloaded all the packages and did a test install using the --test flag on urpmi. Initially , it failed with the message:
- Code: Select all
Installation failed: file /boot/EFI/EFI/mageia conflicts between attempted installs of efi-filesystem-4-1.mga8.noarch and fwupdate-efi-12-2.mga8.x86_64
I got past this by manually installing/forcing fwupdate-efi-12-2.mga8.x86_64. I then got another error message (again using the --test flag) that was similar involving grub2-common-2.06-1.1.mga8.x86_64 which I resolved by manually installing/forcing that package.
After I resolved those two errors (and confirmed the box would reboot with the changed packages), I went ahead with the install.
The install went more or less OK; I did have to force a couple of packages and I had to run URPMI several times to get everything installed. Finally, everything went in and I tried to boot into Mageia 8.
Boot stalled; the new kernel wouldn't fully boot and the reason why was not obvious; it has scrolled off of the console by the time things stalled.
So, some fiddling around showed that my system WOULD boot using the last mga7 kernel, and everything seems to work. So, presently I am booted into a complete Mageia 8 environment using the last Mageia 7 5.10 kernel.
Further fiddling indicated that I had to blacklist the nouveau driver on the kernel command line; the blacklists in /etc/modprobe.d/ were being ignored.
Also, I determined that I had to require the nvidia-drm module to be loaded at start time (in /etc/modules); it was not being loaded and consequently my X session was not starting (at least, on the mga7 kernel).
I should point out that I get my nvidia drivers from the nvidia site and install them manually; this is for me something I have done for 20 years. I no longer remember why I did that originally, but it is part of my process now.
So, anyway. there is something wrong with my mga8 kernel startup.
The startup section that works using the mga7 kernel (in grub2.cfg) is this:
- Code: Select all
menuentry 'Mageia (5.10.46-desktop-1.mga7) 8' --class mageia --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.10.46-desktop-1.mga7-advanced-6a458599-d90a-4264-8540-497672301635' {
set gfxpayload=text
insmod gzio
insmod part_gpt
insmod ext2
search --no-floppy --fs-uuid --set=root 6a458599-d90a-4264-8540-497672301635
linux /boot/vmlinuz-5.10.46-desktop-1.mga7 root=UUID=6a458599-d90a-4264-8540-497672301635 ro acpi_enforce_resources=lax vga=788 splash
initrd /boot/initrd-5.10.46-desktop-1.mga7.img
}
And the startup section that doesn't work using the mga8 kernel is this:
- Code: Select all
menuentry 'Mageia' --class mageia --class gnu-linux --class gnu --class os --unrestricted $menuentry_i
d_option 'gnulinux-simple-6a458599-d90a-4264-8540-497672301635' {
set gfxpayload=text
insmod gzio
insmod part_gpt
insmod ext2
search --no-floppy --fs-uuid --set=root 6a458599-d90a-4264-8540-497672301635
linux /boot/vmlinuz-5.15.32-desktop-1.mga8 root=UUID=6a458599-d90a-4264-8540-497672301635 ro
acpi_enforce_resources=lax rd.driver.blacklist=nouveau vga=788 splash
initrd /boot/initrd-5.15.32-desktop-1.mga8.img
}
Note that I have build my own initrd using dracut against the possibility that the one built at install time was not right for some reason; no effect. Also, as I mentioned, I did add the nouveau blacklist command. I have also tried it with and without the acpi_enforce_resources option set, and did not observe any difference.
I am sure that there is some missing or incorrect option in this boot section, but I have no idea what and my (fairly quick) scan of this site and the search engines didn't show me the answer. I will bet someone here knows. If no one knows, my next step will be to do a clean install on another SSD and see what I get. But that's a lot of extra work, though it might also give me my answer to the UEFI problem.