Hardware: NVIDIA GP108M / GeForce MX150
Driver: 580.xx (Mageia nonfree)
Symptom: GPU can never reach powertop 0.0% or wont reach it after nvidia-modprobe -u -c=0 is run.
Update:
After running a series of power tests, I have to admit I was surprised by the results. On my machine, there’s no measurable difference in power usage between the NVIDIA card being “suspended” (Powertop showing 0.0%) and it being fully active (Powertop showing 100%), suggesting the NVIDIA driver is already managing its own power-down behavior internally. So this post turns out to be largely irrelevant 😄
Using the mageia-prime package (https://wiki.mageia.org/en/Mageia-prime_for_Optimus) with GPU offloading achieves essentially the same outcome in terms of overall power draw:
Feel free to keep reading if you’re still interested.
Issue
We want the NVIDIA discrete graphics card to be completely powered off when not in use, but still be able to turn it on when needed for offloading — without rebooting.
This assumes the system is already set up for PRIME render offload (not using NVIDIA for the whole desktop).
I'm posting this as I have spent several days trying to get the nvidia card to power off like it used to in the bumblebee bbswitch days. Hoping this post may save others some time.
Nvidia hardware
My system uses: PCI Device NVIDIA Corporation GP108M [GeForce MX150]
It's possible this isn't an issue for later nvidia cards which might support 'ture off' suspend without this hak.
bbswitch: Doesn't work, it says the card is OFF, but looking at powertop it is still at 100%
All my code examples assume:
- nvidia gpu located at pci:0000:01:00.0 (you may need to change this)
- everything is run as root (you may need to put 'sudo' in front of all the commands if you don't 'su' into root)
My minimum working install was:
- Code: Select all
rpm -qa | grep nvidia
- nvidia-current-doc-html-580.119.02-1.mga9.nonfree
lib64nvidia-egl-wayland1-1.1.13.1-2.mga9
dkms-nvidia-current-580.119.02-1.mga9.nonfree
nvidia-current-utils-580.119.02-1.mga9.nonfree
x11-driver-video-nvidia-current-580.119.02-1.mga9.nonfree
nvidia-current-cuda-opencl-580.119.02-1.mga9.nonfree
nvidia-current-devel-580.119.02-1.mga9.nonfree
The package mageia-prime (https://wiki.mageia.org/en/Mageia-prime_for_Optimus) should also work (Which I recommend you use. If I had seen it before I started down this path I would have. I praise the mageia team for building it!. Please post if this works with it and ask questions if it doesn't. I'm happy to install it and get this going with it .... spent too many days to try now
Note: if you are using mageia-prime: At first the nvidia card will not be able to be turned off. But you can skip to 'mageia-prime users can start here.' (below) to truly turn the nvidia card off.
The symptom
After boot, the GPU can suspend correctly (assuming nvidia drivers weren't loaded at boot, for this setup that's as simple as not doing anything to load them.
If you are using mageia-prime Skip this bit I'm pretty sure you wont be able to get the power to 0% yet:
Note: Toggling bad -> good for the NVIDIA card in powertop 'Tunables' will do the same as the first line in this code, and may help you easily find the pci address of your card.
- Code: Select all
echo "auto" > /sys/bus/pci/devices/0000:01:00.0/power/control
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
Result: suspended (Great!)
Powertop shows: 0.0% PCI Device: NVIDIA (totally off!)
Then to start the nvidia driver:
(I'm pretty sure mageia-prime already does this at boot so skip this also)
- Code: Select all
nvidia-modprobe -u -c=0
mageia-prime users can start here.
The GPU becomes permanently stuck is active (no suspend) mode:
- Code: Select all
echo "auto" > /sys/bus/pci/devices/0000:01:00.0/power/control
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
Result: active
Powertop shows: 100.0% PCI Device: NVIDIA (wasting power if we're not using it)
Other things I tried to get it working (I literally spent days researching ... going down 'wrong' rabbit holes ... giving up ... starting again):
- modprobe -r nvidia*
bbswitch
remove/rescan
blacklisting modules
reboots to totally turn off and on again (this worked, but was less satisfying)
The GPU will not power off again until reboot.
I believe this is a PCI driver binding problem.
When you run:
- Code: Select all
nvidia-modprobe -u -c=0
udev binds the PCI device to the NVIDIA driver:
- Code: Select all
/sys/bus/pci/devices/0000:01:00.0 → /sys/bus/pci/drivers/nvidia
As long as a driver is bound to the PCI device: Linux PCI runtime power management is disabled.
Unloading modules does not unbind the driver.
That is why all normal methods fail.
The actual fix (no reboot)
Check who owns the device:
- Code: Select all
readlink /sys/bus/pci/devices/0000:01:00.0/driver
You will see it points to nvidia.
Now do this:
- Code: Select all
echo 0000:01:00.0 > /sys/bus/pci/drivers/nvidia/unbind
echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control
Immediately:
- Code: Select all
echo "auto" > /sys/bus/pci/devices/0000:01:00.0/power/control
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
Result: suspended (Great!)
Powertop shows: 0.0% PCI Device: NVIDIA (totally off!)
No module unload. No remove/rescan. No reboot.
Simple on/off scripts
Make a file (to be run as root) EG:
- Code: Select all
touch /usr/local/sbin/nvidia-suspend.sh
chmod +x /usr/local/sbin/nvidia-suspend.sh
Contents [updated]:
- Code: Select all
#!/bin/bash
GPU="0000:01:00.0"
if [ "${1}" == "status" ]
then
cat "/sys/bus/pci/devices/${GPU}/power/control"
cat "/sys/bus/pci/devices/${GPU}/power/runtime_status"
elif [ "${1}" == "on" ]
then
if [ -L "/sys/bus/pci/devices/$GPU/driver" ]
then
echo "Already on? Try nvidia-smi as user"
elif ! lsmod|grep -q nvidia
then
nvidia-modprobe -u -c=0
else
nvidia-modprobe -u -c=0
echo "$GPU" > /sys/bus/pci/drivers/nvidia/bind
fi
elif [ "${1}" == "off" ]
then
if [ -L "/sys/bus/pci/devices/$GPU/driver" ]
then
echo "$GPU" > /sys/bus/pci/drivers/nvidia/unbind
sleep 1
fi
echo "auto" > /sys/bus/pci/devices/$GPU/power/control
else
echo "usage: ${0##*/}: on|off|status"
fi
Now you can run (as root):
- Code: Select all
nvidia-suspend.sh status
nvidia-suspend.sh on
nvidia-suspend.sh off
You can check results using:
- Code: Select all
powertop
And check the nvidia card is working as a normal user. EG
- Code: Select all
nvidia-smi
or if you installed mageia-prime pakage:
- Code: Select all
mageia-prime-offload-run glxspheres64
Important notes
You do not need:
- blacklist rules
initramfs edits
bbswitch
module unloading
remove/rescan tricks
