[Solved] Bad update

This forum is dedicated to basic help and support :

Ask here your questions about basic installation and usage of Mageia. For example you may post here all your questions about getting Mageia isos and installing it, configuring your printer, using your word processor etc.

Try to ask your questions in the right sub-forum with as much details as you can gather. the more precise the question will be, the more likely you are to get a useful answer

[Solved] Bad update

Postby jiml8 » May 6th, '16, 05:50

In the last day or two, I have received updates for java, perl, and python. One of these updates, and I do not know which, broke my system rather painfully. I think I have the problem solved, but I am not certain yet.

Symptoms are these:

With screen locker on, enter password to unlock screen, and get a message (I forget the exact wording) that the screen is not unlocked because the screen unlocker is not working. Switch to console window (ctrl-alt-Fn) and log in as root. Run top, discover kwin is running 100% of one core. Kill kwin, switch back to desktop window and try to restart kwin using alt-F2, or using any available shell window that can be reached.

At this point, alt-F2 does not respond, no widgets in the taskbar respond to the mouse, and a cursor cannot be activated in a shell window. In other words, cannot restart kwin by any means.

Switch back to console window, and restart dm (service dm restart). I should also note that killing dm also killed several running virtual machines (which is not supposed to happen...they are supposed to keep running). Killing those VMs caused me a significant amount of separate trouble, but that is off of this topic.

KDE desktop restarts after dm is restarted, and I can log in. I do so, and things seem to be working. I switch back to console window, log out, switch back to desktop...and desktop display stays black (with working mouse). Switch back to console window, log in again, run top, and see kwin is again running 100% of one core. Switch back to desktop, watch black screen for awhile waiting to see if it will come to life. It doesn't. Switch back to console and restart dm again.

Upshot is that if I switch away from the desktop, kwin runs away, and if I allow screen locker to activate on desktop, screen unlocker does not reliably work (it did work sometimes).

After a considerable amount of aggravation, I finally solved the problem (I think) by using rsync to restore my system to the day before yesterday (love my backup procedures...they can get me out of anything). I observed what files were changed, and the ones that appear relevant were the aforementioned python, perl, and java. So one of those updates has a problem.
Last edited by jiml8 on May 8th, '16, 23:42, edited 1 time in total.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: Bad update

Postby jiml8 » May 6th, '16, 10:11

Well, the problem is NOT solved. I just experienced the runaway kwin again, after switching to a console and back. Grrrrr.....
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: Bad update

Postby jiml8 » May 6th, '16, 18:12

...and, after some sleep, I have looked at this again. I realize that I replaced files in /usr/lib and /usr/lib64, but not in /usr/share or /usr/bin. So I just rsync'ed all of /usr, and my problem went away. The changes included these files in /usr/bin:
Code: Select all
bin/abs2rel
bin/build-classpath
bin/build-classpath-directory
bin/build-jar-repository
bin/check-binary-files
bin/clean-binary-files
bin/create-jar-links
bin/diff-jars
bin/find-jar
bin/jvmjar
bin/rebuild-jar-repository
bin/shade-jar
bin/showchange.pl
bin/svn
bin/svnlook
bin/svnversion
bin/xmvn-builddep


I doubt the changes to subversion are relevant, and some of these files are clearly associated with java. Not sure about all of them, particularly showchange.pl and abs2rel.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: Bad update

Postby jiml8 » May 8th, '16, 23:41

This problem sent me up the wall for a few days, and I kept thinking I had it solved, but it kept coming back at odd intervals.

Finally, though, I DO have it solved...and I was wrong; it was not a bad update from Mageia.

Basically, in my normal usage pattern on my workstation, I have as many as 9 virtual machines running at one time on my Mageia host. I have 32 GB of RAM, but I have been experiencing some issues with the VM environment that have made portions of my life rather unpleasant. So, I have been doing some kernel tweaks in Mageia in order to see if I can make this environment behave better.

One set of tweaks I made about two weeks ago seemed to have greatly improved the VM environment performance. These tweaks affect how Mageia allocates virtual memory, and for several days after I made the tweaks I not only noticed improved VM performance, but I also noticed no downside in day to day host performance...until, suddenly, I did start having problems, which I describe in this thread.

Log monitoring and log searching was turning up no clues that I could make sense of, except that the problems appeared to have started after an update, so I blamed the update. But, ultimately, backing the update out did not solve the problem. I finally found the problem when I had most of the VM environment shut down (because of all the trouble I was having) and suddenly Chromium would start, then crash. A search of xsession-errors turned up an "out of memory" error, when Chromium tried to do things.

This was a link to my virtual memory tweaks...which I promptly rolled back. Rolling back those tweaks solved the problem; the system has been solid for two days now.

So, the update was not at fault. Now, I have to figure out how to change this system so that I can put the tweak back in place (to make the VM environment work well) while not having problems with "out of memory" in the host. As configured, I only have a 2 GB swap partition in place; I figured with 32 GB RAM I didn't need much. But I think I will add a 30 to 40 GB swap file on one of my hard drives to see if this prevents the OOM consequence. It should... I hope... :D
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: [Solved] Bad update

Postby doktor5000 » May 9th, '16, 00:40

Yep, as rarely as the kernel's oom killer strikes during casual use, usually it either hits VMs consuming up big chunks of memory and having a high runtime, but it hits in most other cases the X-server - the process which uses up most memory together with its childs, which usually takes down all your work including any VMs running as childs inside, if you don't start them headless outside the domain of the X server.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18058
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: [Solved] Bad update

Postby jiml8 » May 9th, '16, 01:06

Well, I have just places a 33 GB swapfile on one hard drive (giving me a total of about 35 GB of swap), and I have returned the tweaks to a slightly more conservative setting than I was using before. Hopefully, the defined swap is now large enough that memory allocations won't fail at all, while placing me in a position where my VM environment will behave.

Time will tell.

I did have it tweaked like this:
Code: Select all
vm.overcommit_ratio=100
vm.overcommit_memory=2

which seemed to make my VM environment well-behaved and happy, but which caused problems after awhile with the host, given there was only 2 GB swap. Now it is tweaked like this:
Code: Select all
vm.overcommit_ratio=80
vm.overcommit_memory=2

...and, along with 35 GB of swap, I am hoping that now all my goals will be met.

For reference, the default kernel setting is this:
Code: Select all
vm.overcommit_ratio=50
vm.overcommit_memory=0

This setting led to no problems at all in day-to-day usage on the host, but after a few days of uptime the VM environment became tangled up so that I could not safely suspend or close some VMs (notably, Linux VMs...FreeBSD and Windows did not seem to be affected) without possibly locking up the entire machine, forcing me to punch the reset button. Also, I could not stop and restart the vmware services reliably; commonly the various daemons would not shut down and I would have to kill them.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09

Re: [Solved] Bad update

Postby jiml8 » May 15th, '16, 23:57

To follow up on this, after 6 days of uptime since my previous post on this thread, the system is performing flawlessly with the tweaks and the big swapfile. The behavior of my VM environment is greatly improved; there are no discrepancies at all in its behavior.

Much to my surprise, though my swappiness is set to 10 (which discourages swapping), my swap usage is presently about 5 GB. This, even though vmstat tells me that some 20 GB of RAM is being used for cache.

I guess the moral is that, even in a big memory system, Linux really wants its swap. I have seen many posts on many forums that say you don't need swap if you have so much RAM, but this is clearly shown to be false in my case.

Also, I think vmstat is wrong. I believe vmstat is reporting much of the RAM that is assigned to various virtual machines as being cache RAM. This would be false; the RAM that is assigned to a VM probably cannot be released for other purposes the way cache RAM could be. This misinformation has been at the heart of my troubles; I never looked at swap because my Linux tools told me that I had plenty of RAM available.

While I am not certain of this, it looks to me like the way to actually measure memory usage in a system that has virtual machines running is to take the in-use memory described by vmstat (or top, or any other tool on the host) and add to it the memory assigned to each virtual machine that is running. This should approximate the actual in-use memory on that system.
jiml8
 
Posts: 1254
Joined: Jul 7th, '13, 18:09


Return to Basic support

Who is online

Users browsing this forum: Google [Bot] and 1 guest