Kernel building problems & nvidia driver conflicts.

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

Kernel building problems & nvidia driver conflicts.

Postby shp3 » Sep 13th, '14, 03:18

I am haveing several problems building a new kernel on a clean installation of 64 bit Mageia 4.1 and using
the nvidia vendor video driver. I suspect that there may be a common factor causing these problems.

First the backround:
I am not new to building kernels since I've done it successfully many times between 1996 and 2010.
I am running the 64 bit version 3.12.25-desktop-3.mga4 kernel. I've configured Mageia to
boot to run level 3 rather than deal with the X problems at boot. After loggin in I execute "startx"
to start X for easier use.
The hardware is a ASUS P5B motherboard with 6GB of RAM, a Intel(R) Core(TM)2 Quad CPU Q8200
@ 2.33GHz, 320GB SATA drive, and ASUS GeForce GT 630. All disk partitions were formatted by the
Mageia installer to make it a clean base to test.

I installed the kernel.org tarball for linux-3.16.2.
I used the Mageia control center software management tool to add and install kernel-3.12.25-3.mga4,
nvidia-current-331.79-1.mga4.nonfree, and kernel-3.12.25-desktop-3.mga4.

==============================
First problem:

Installing Mageia 4.1 from the DVD it only starts X if the nonfree nvidia driver is installed.
If I install Mageia without enabling the nonfree driver it fails to boot because X can not start.
Also the final configuration of the video system in the installation process generates errors when I
try to use the Xorg driver and refuses to change from the nonfree Nvidia driver even if it wasn't installed.

I suspect that the mageia nvidia driver installation is flawed and is causing a lot of these problems.

==============================
Second problem:
I attempted to build a kernel identical to the running one supplied by Mageia 4.1 but with a custom
"local version" (_SHP_) string but it failed to build. Something seems to be broken in the Mageia
kernel source package kernel-3.12.25-3.mga4.

First I created a directory to build the kernel at /usr/src/Build/kernel_3.12.25 .

Then as the regular user:
    cd /usr/src/kernel-3.12.25-3.mga4/
    make O=/usr/src/Build/kernel_3.12.25/ oldconfig
    make O=/usr/src/Build/kernel_3.12.25/ menuconfig
    make -j8 O=/usr/src/Build/kernel_3.12.25
This last step failed about the half way point with the error
Makefile:130: recipe for target 'sub-make' failed


==============================
Third problem:

I built a kernel using the stable tarball linux-3.16.2.tar.xz but it only runs X with the nouveau driver.

First I created a directory to build the kernel at /usr/src/Build/kernel_3.16.2 .

Then as the regular user:
    cd /usr/src/linux-3.16.2
    make O=/usr/src/Build/kernel_3.16.2 oldconfig
    make O=/usr/src/Build/kernel_3.16.2 menuconfig
    make -j8 O=/usr/src/Build/kernel_3.16.2
    make -j8 O=/usr/src/Build/kernel_3.16.2 modules
Then as root
    make -j8 O=/usr/src/Build/kernel_3.16.2 modules_install install
I tuned up GRUB with Mageia control center to include the new kernel in the boot menu and
eliminate the "quiet" option so that I could monitor the startup process.

While the reboot process brought up the new 3.16 kernel it attempted to build and install the
nvidia-current-331.79 driver. This build failed and installation was skipped. After logging in
and running "startx" the X windows system started successfully. On checking the Xorg.0.log
I found that it was using the NOUVEAU driver.

Rebooting to the original Mageia 4 kernel hangs at the message: "switching to nouveaufb from simple"
This can be repaired by booting to the safe mode and running XFdrake to reinstall the "nvidia" 331.79 driver.

As root the command "dkms status" returns
nvidia-current, 331.79-1.mga4.nonfree: added

nvidia-current, 331.79-1.mga4.nonfree, 3.12.25-desktop-3.mga4, x86_64: installed-binary from 3.12.25-desktop-3.mga4

xtables-addons, 2.3-3.mga4, 3.12.25-desktop-3.mga4, x86_64: installed-binary from 3.12.25-desktop-3.mga4

After booting the new 3.16 kernel I attempted to use dkms to build & install the nvidia driver with the following command:
dkms build -m nvidia-current -v 331.79-1.mga4.nonfree -k 3.16.2_SHP_
This failed with the following message.
Error! Bad return status for module build on kernel: 3.16.2_SHP_ (x86_64)

The dkms build log had several errors and the following seem to be the most important pointers:
nvidia_uvm_linux.h:153:2

nvidia_uvm_linux.h:165:27: fatal error: asm/semaphore.h: No such file or directory


At this point neither the old or new kernels would reboot. Only repeating the safe mode XFdrake fix would
repair the system to boot the old kernel and start X.

==============================
Fourth problem:

I dowloaded "NVIDIA-Linux-x86_64-340.32.run" from the Nvidia web site.

I booted the new 3.16 kernel and ran the nvidia installer which aborted because of incompatability with the nouveau driver. I allowed it to add the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file and rebooted the new kernel.
Rerunning the installer failed again but this time with several
conftest failed!
errors and
/usr/src/linux-3.16.2/include/linux/cputime.h:4:25: fatal error: asm/cputime.h: No such file or directory

Again at this point, neither the old 3.12 or new 3.16 kernels would reboot. Only repeating the safe mode XFdrake fix would repair the system to boot the old kernel and start X.

After restarting the old kernel I again ran the nvidia installer which seemed to complete successfully.
However rebooting the old kernel fell back to the nouveau driver and hung as before.
The safe mode XFdrake reported that the nvidia driver had not been installed properly. Running "nvidia-uninstall" before rerunning XFdrake made it possible to boot the old kernel and run X with the old nvidia 331.79 driver.

Again, this looks like the mageia nvidia driver installation is flawed and is causing a lot of these problems.
shp3
 
Posts: 4
Joined: Sep 13th, '14, 02:12

Re: Kernel building problems & nvidia driver conflicts.

Postby martinw » Sep 14th, '14, 12:45

What exactly is it you are trying to achieve? If it is just that you want to use the nouveau driver instead of the proprietary driver, it would be better to focus on debugging that problem rather than building/installing custom kernels. Once you've gone away from a standard Mageia installation, it's very hard for anyone else to help you.
martinw
 
Posts: 608
Joined: May 14th, '11, 10:59

Re: Kernel building problems & nvidia driver conflicts.

Postby doktor5000 » Sep 14th, '14, 16:20

Martin is right there, especially with the last part. Apart from that, it seems your approach is flawed. The dkms driver packages are provided for those Mageia kernels where no binary modules are provided. They do not magically fix any other issues, like incompatibility with newer kernels.

A few other notes:

- as a general hint and golden rule: Please only one problem per thread
- startx is deprecated and not supported anymore
- for kernel-3.16 you need at least one of the newer nvidia 340.xx drivers, earlier versions are not compatible
- for your error message in step 2, that's only the result of previous compile/linker issues, which you didn't post
- for the failed nvidia driver build with 3.16, see the second bullet point above
- for the fourth problem, as you built your own kernel and use upstream nvidia driver, you're on your own there
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18020
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Kernel building problems & nvidia driver conflicts.

Postby shp3 » Sep 15th, '14, 00:09

Sorry that I didn't properly express my goals.
My goal was to rebuild the original Mageia 64 bit version 3.12.25-desktop-3.mga4 kernel and continue using the nvidia driver.
This would be my stable base and proof that I had a working process to build a kernel.
I could then start trying to update a vendor driver for an old raid card that hasn't been supported since the 2.6 kernel. That card is not installed in the system and that driver is not involved with this thread.
My building of the functional 3.16 kernel proved that I had a valid process to build a kernel and that something was wrong with the 3.12.25-desktop-3.mga4 kernel source rather than my procedure.

I'd not been aware of dkms before and it was brought to my attention by the "NVIDIA-Linux-x86_64-340.32.run" installer. Since dkms recompiles driver source to link to a new kernel it is sensitive to flaws in the kernel package. dkms seems to be invoked in the boot process so I thought that running it manually would expose more error messages that might be useful.

I'm still not sure that the problems aren't traceable back to a common cause. Otherwise I agree that one problem per thread is golden. I believed that giving a complete view of what I've done would help point out where I've made a mistake and prevent recommendations to try something I've already done.
The problem with the nvidia driver when installing Mageia seemed to be an early symptom of the later problems. After running into the conflicts with the Nvidia 340.32 installer I discovered the problem when I attempted a clean install of Mageia without the nvidia driver .
If I were to split this into two threads: One would be the nvidia driver problems and the other would be the failed Mageia kernel source package.

I've had no problem running startx . What has replaced it?

I appreciate learning that kernel 3.16 is not compatible with the nvidia 331.79 driver. So far I've not been able to get it to work with the 340.32 one either.

I assume that for the error message referred to as step 2 you meant
Makefile:130: recipe for target 'sub-make' failed

This is the most damming error message since make oldconfig step should reproduce a .config identical to the one that produced the kernel shipped by Mageia. No error messages were reported from this process.
The only change I made in the make menuconfig step was to add a local identifier string and again got no error messages.
If every file used to build the Mageia 3.12.25-desktop was included correctly in the kernel source package it should have built the same kernel with no problems.
A quick glance through the make log found complaints about missing files. This problem is so fundamental that I didn't think to add more details.

I was lazy and simply grabbed the latest stable kernel and Nvidia driver installer to run the test of building a kernel without the failed Mageia sources. I still can't get them to work together so I will rerun the process with versions closer to the Mageia ones to run further tests.
shp3
 
Posts: 4
Joined: Sep 13th, '14, 02:12

Re: Kernel building problems & nvidia driver conflicts.

Postby martinw » Sep 15th, '14, 01:34

OK, I understand. I've just built the kernel from the Mageia kernel source package, using the following commands (all as the root user):
Code: Select all
urpmi kernel-source-latest
cd /usr/src/kernel-3.12.25-3.mga4
cp /boot/config-3.12.25-desktop-3.mga4 .config
make oldconfig
make -j8 all

I've not yet installed and tested it (because I've run out of space on the root partition, and it's getting late), but so far I see nothing wrong with the kernel source package.
martinw
 
Posts: 608
Joined: May 14th, '11, 10:59

Re: Kernel building problems & nvidia driver conflicts.

Postby martinw » Sep 15th, '14, 21:13

I've now reproduced your problem by using the O= option to locate output in a different directory. The problem is caused by the unofficial 3rd party extensions (ndiswrapper in particular). Providing you don't need these extensions, you can work round the problem by running 'make menuconfig' and disabling the 3rd party extensions. I haven't checked whether the problem exists with all extensions, or whether it is just ndiswrapper.

Might be worth opening a bug report for this.
martinw
 
Posts: 608
Joined: May 14th, '11, 10:59

Re: Kernel building problems & nvidia driver conflicts.

Postby martinw » Sep 16th, '14, 22:13

I've now rediscovered an issue with the dkms build system when using full kernel sources. See my writeup in https://forums.mageia.org/en/viewtopic.php?f=8&t=3981#p28821. I'd completely forgotten about this :oops:

There's another bug in the fglrx build script that stops it working if you build the kernel in a different directory, but hopefully that won't be present in the nvidia build script.
martinw
 
Posts: 608
Joined: May 14th, '11, 10:59

Re: Kernel building problems & nvidia driver conflicts.

Postby shp3 » Oct 14th, '14, 04:14

Thank you martinw. I've gotten much farther with your suggestions.

Removing the unofficial 3rd party extensions does allow the mageia kernel package to compile successfully.
This solves my "second problem".

Your fix for the rediscovered dkms issue which recommends using the dkms options " --no-prepare-kernel --no-clean-kernel " solves the problem of dkms removing files before trying to use them. This eliminates the error messages complaining about missing Module.symvers. However I still have a problem with dkms failing to build the nvidia driver which seems be caused by an incomplete set of #defines. The reported fatal error comes from NV_LINUX_SEMAPHORE_H_PRESENT not being defined and fails to look for semaphore.h in the "include/linux" instead of the "include/asm" directory of the kernel source. At this point I have no idea about where this should have been defined. Although the "NV_" prefix suggests that nvidia might be responsible. I think that there is progress on my "third problem" but it is still not solved.
shp3
 
Posts: 4
Joined: Sep 13th, '14, 02:12

Re: Kernel building problems & nvidia driver conflicts.

Postby jiml8 » Oct 14th, '14, 21:24

I use the nvidia driver from nvidia, and I compile it whenever I upgrade drivers.

I have never gotten dkms to work with this driver; running the nvidia installer package and selecting dkms always results in failure. I have also never debugged the problem because it was not that important to me.

However, I very strongly suspect you are right; the problem is an nvidia problem rather than something with the kernel source. I've built highly modified kernels many times in Mageia without difficulty, though I was never trying to compile to a different directory the way you apparently were.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: Kernel building problems & nvidia driver conflicts.

Postby pierreleonard » Oct 15th, '14, 20:36

Hi,

I understand the frustration of Martin.
I have bought a big screen 2560*1440 and a gtx 750ti. I found a forum wher a guy compile the 3.15 kernel install the nvidia kernel and hop it work's.
I try and hop bad exec format when loading the nvidia module.
So I will try with the 3.17 kernel but with a separate disque installation because when you install the nvidia advance drivers and library you cannot go back whith the older kernel :evil:

So I hope that the mageia 5 still late will solve the problem and recognize my graphic card.

Sincerely.

Pierre Léonard
pierreleonard
 
Posts: 19
Joined: Sep 11th, '14, 21:02

Re: Kernel building problems & nvidia driver conflicts.

Postby shp3 » Oct 16th, '14, 03:10

In the past I had no problems using the Nvidia installer to add their driver to a newly compiled kernel.
My attempts to use their installer after building a kernel on my Mageia system seem to remove Module.symvers before using it just like dkms. I attempted to use dkms to eliminate the Nvidia installer as a source of problems since the installer has a very complex list of things it is trying to do at the same time. The documentation for dkms promises a simple and straight-forward way to automatically update drivers for a new kernel. The need for dkms options " --no-prepare-kernel --no-clean-kernel " suggests that reality is not delivering on this promise.

I vaguely remember that before Nvidia provided their installer, the driver source was just patched into the kernel source before compiling. I have not found any current instructions to do this. I would apreciate a pointer if anyone knows the procedure. I wish Nvidia had included this process as a readme in the source directory delivered by their installer.
shp3
 
Posts: 4
Joined: Sep 13th, '14, 02:12

Re: Kernel building problems & nvidia driver conflicts.

Postby doktor5000 » Oct 16th, '14, 07:05

pierreleonard wrote:I have bought a big screen 2560*1440 and a gtx 750ti. I found a forum wher a guy compile the 3.15 kernel install the nvidia kernel and hop it work's.
I try and hop bad exec format when loading the nvidia module.
So I will try with the 3.17 kernel but with a separate disque installation because when you install the nvidia advance drivers and library you cannot go back whith the older kernel :evil:

So I hope that the mageia 5 still late will solve the problem and recognize my graphic card.

Seems unrelated. Why do you need kernel 3.15? Your card should be supported by latest drivers, which are installable, you should even be able to use the normal Mageia nvidia packages for that. Please open a separate thread for your issue.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18020
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Kernel building problems & nvidia driver conflicts.

Postby pierreleonard » Oct 18th, '14, 22:29

Hi Docktor5000,

I am very sory to sya that in all forum you find talk that say that these card with the last chip Maxwell from nvidia is only supported beginning with the 3.15 kernel.
When I install the mageia 4 or the mageia 5A2 on another disk the install process don't recognize the card and then the screen.
If I install a 3.15 or 3.16 kernel and the dedicated driver from nvidia, modprobe answers that it can't exec the module.


That card is interesting because it is affordable, quiet, low energy consumming and had new instruction for computing 3D objects. and it's the new nvidia core that will tkae the place of the kepler.

If you have tested it your advices will be interresting for me.

Many thank's

Pierre Léonard
pierreleonard
 
Posts: 19
Joined: Sep 11th, '14, 21:02

Re: Kernel building problems & nvidia driver conflicts.

Postby doktor5000 » Oct 19th, '14, 00:40

doktor5000 wrote:Please open a separate thread for your issue.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18020
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest

cron