[SOLVED] System freezing now and then for 20..30 seconds

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

[SOLVED] System freezing now and then for 20..30 seconds

Postby morgano » Apr 23rd, '16, 15:26

We have several mageia5 systems.
My sons desktop only, have a bad habit of taking a nap a couple times per hour.
Extremely irritating especially when he is online gaming...

But this also happen with i.e only firefox or dolphin running.

I need some advice tracking this down

The machine dual-boots mageia5 / SteamOS, and SteamOS do not have this issue.

System
Mageia 5, 64 bit, KDE
CPU: AMD six core, RAM: 16GB, disk: SSD, GPU: Nvidia 760 (borrowed from my machine, was all OK there)

What happens
* The machine seem to stop completely for between 20 to 30 seconds
* The only thing that seem to still work is the mouse pointer
* Not only the graphics; also the programs seem frozen - i.e if my daughter is logged into minecraft on my sons machine, connection get lost out when my sons machine pauses.

Tried without result
* Could not find anything in xsession-errors, journal or X log that i can directly say is wrong
* Tried with and without desktop effexts enabled
* Tried Nvida and noveau drivers
* Tried gnome instead of KDE

Any idea what exactly to look for in what log?

Any idea what more to try?

I see there is a new kernel etc in testing... maybe try that...

Or maybe we should try cauldron. Unless it have problems... (i already noticed we have to use noveau, not nvidia proprietary driver then, but that is OK)
Last edited by morgano on Apr 29th, '16, 11:17, edited 1 time in total.
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1314
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden

Re: System freezing now and then for 20..30 seconds

Postby doktor5000 » Apr 23rd, '16, 16:19

Freezes are usually pretty hard to debug. It would be helpful if you could do a clean reboot into Mageia, and don't do anything out of the ordinary and use it normally
but keep track at what time the freezes occur. Then get a full journalctl -ab log and attach it here, along with the timestamps when the freezes occured.

From the tests you did, it seems to be some more lowlevel issue, like processor power management, some interrupts on chipset level or something related to firmware.
Could you also post the output of
Code: Select all
lspcidrake -v
please?
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17659
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: System freezing now and then for 20..30 seconds

Postby jiml8 » Apr 24th, '16, 07:45

If the machine is doing a lot of I/O to a slow device (and for this purpose, a network device could very well be called a slow device...particularly if there is a home router in the path with limited capability) then you could have a problem with how Linux handles memory caching of dirty pages - which are pages that need to be written to disk, but have not yet been written. This can cause exactly the symptoms you are seeing, and can also cause lockups and crashes. The problem is more likely to occur in a big memory system than in one with limited memory. If the system has 16 GB or more of RAM, then you could very well encounter this problem, particularly with heavy network I/O, or with slow hard drives.

Without going into all the dirty details, the default Mageia configuration is sometimes inappropriate and the kernel needs to be tuned.

You need to investigate the following kernel configuration items:
vm.dirty_background_bytes
vm.dirty_background_ratio
vm.dirty_bytes
vm.dirty_expire_centisecs
vm.dirty_ratio
vm.dirty_writeback_centisecs

In the Mageia system, the default vm.dirty_ratio is 20 and vm.dirty_background_ratio is 10 (these are percentages of total RAM to be used for caching of dirty pages). when ratio hits 10, writes MUST start in background if they have not been happening already. When ratio hits 20, asynchronous writes stop, the system writes until it flushes the whole cache, and writes become synchronous until cache is flushed, which halts everything due to a long I/O delay as the writes occur. Try turning these ratios down (I use 5 and 10 respectively) to require more frequent flushes, and to minimize the I/O delay if the thing goes synchronous.

You can watch these caches in action by using the following command:
Code: Select all
cat /proc/vmstat | egrep "dirty|writeback"

Run the command repeatedly to see the pattern of behavior and to understand what is happening.

To see what the variables currently are set to, as root enter the command:
Code: Select all
sysctl -a | grep dirty

To alter the variables in a running system, as root enter the following commands:
Code: Select all
sysctl vm.dirty_background_ratio=5
sysctl vm.dirty_ratio=10

To make the change persist across a reboot, edit the file /etc/sysctl.conf to add these settings.

I don't know that this is the cause of your hesitations, but it is a likely cause, and very difficult to diagnose if you are not aware of these settings. You should research this; there are many subtleties to these settings and they will have a very profound impact on system efficiency. The settings I suggested are safe and have always been sufficient for me, to solve problems like what you are having. You can turn those caches down still smaller, but there is an efficiency tradeoff. The default settings are not appropriate for a big memory system and might not be appropriate for a smaller memory system that is doing heavy I/O to a slow channel.

YMMV, but give it a try.
jiml8
 
Posts: 1253
Joined: Jul 7th, '13, 18:09

Re: System freezing now and then for 20..30 seconds

Postby jiml8 » Apr 24th, '16, 07:53

You may also get some benefit from playing with /proc/sys/vm/swappiness. Again, you should research this; the explanation is involved and has some subtleties. System default is 60, which makes the system moderately aggressive about swapping programs out of memory. I have lots of RAM and I run SSDs on my system, so I have turned swappiness down to 10 to minimize program swaps, which is better for the SSDs and doesn't affect my day to day use because I have enough RAM.

For a gaming environment with a hard drive, you probably should turn swappiness down to increase responsiveness for the user. If your system runs SSDs you positively should turn it down to maximize SSD life. If your machine is heavy on throughput and does lots of I/O, you probably should turn swappiness up to optimize for I/O caching, particularly if you have limited RAM.
jiml8
 
Posts: 1253
Joined: Jul 7th, '13, 18:09

Re: System freezing now and then for 20..30 seconds

Postby morgano » Apr 26th, '16, 00:43

Thanks for all the hints.

I took logs yesterday, now reading it offline, that computer is unaccessible atm.
lspcidrake -v
journalctl -ab freeze at 19:22:33 to :44

Additional info
§ When system freeze, sometimes audio do not freeze immediately, but after a few seconds (buffered somehow)
§ Freeze happens also after very little and simple work, and there is 16GB RAM, so no need to swap and very little data to write to disk.
§ This system have had its mainboard and CPU replaced without reinstalling the system.
Before: two dual Opteron on server grade mainboard (my workstation built 2006), now a hexacore AMD on cheap mainboard (most bang for reasonable buck)
As it just continued working we just praised compatibility and let it be. (and runs several times faster while consuming much less power)
The problem about freezing was not there initially but have been going on for a couple weeks.
I can not say it started when we changed sometning. So i was hoping it would go away again... ;)
§ The audio is by an external USB dongle, simply because the old mainboard had broken audio and this just contined to work.
(some time i will try to get onboard audio to work, first attempt strangely failed, but that is another issue)
§ Graphics; we have rotated three Nvidia and an AMD between families computers, ending up in this being a Nvidia GTX760
§ sleep and hibernate both works
§ Drives: we use bios menu to choose which drive to boot. When mageia runs it see:
sda: an 80GB Intel X25 SSD on which mga5 entirely runs, partitions in LVM
sdb: an mechanical drive on which SteamOS is installed. SteamOS / and home are mounted in /mnt

Reading the log
It seem i was wrong about that i could not see problems in the log, now i see:

1) bios bug warning
Code: Select all
kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20150410/tbfadt-654)

No idea if i should worry.

2) Wrong order of loading modules? I spot this line in log:
Code: Select all
kernel: Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after

What can i do about that?

3) SteamOS installer seem to have set something about its /home partition wrong (mageia mount it in /mnt)
Code: Select all
kernel: EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)
kernel: usb 7-2: Warning! Unlikely big volume range (=4096), cval->res is probably wrong.

I do not know if i better try to fix it or let it be...

4) festival segfaults
Code: Select all
kernel: sd_festival[3188]: segfault at 2d0 ip 00007f09d5956860 sp 00007fff80728468 error 4 in libpthread-2.20.so[7f09d5949000+17000]
apr 24 19:06:47 silver systemd[1]: speech-dispatcherd.service: main process exited, code=exited, status=1/FAILURE

festival seem not be needed...

5) lots of "WARN Successful completion on short TX"
Code: Select all
kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?

but from a quick internet search that should be USB3 device?
need to check what ports are used for what...
What is the "XHCI_TRUST_TX_LENGTH quirk" anyway and how do I apply it?

6) Stupid SMART incompatible drive ? <slaps face>
Code: Select all
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], found in smartd database: Intel X18-M/X25-M/X25-V G2 SSDs
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], WARNING: This drive may require a firmware update to
apr 24 19:06:25 silver smartd[2897]: fix possible drive hangs when reading SMART self-test log:
apr 24 19:06:25 silver smartd[2897]: http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=18363
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Attribute Autosave.
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], SMART Automatic Offline Testing unsupported...
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Automatic Offline Testing.
apr 24 19:06:25 silver acpid[2980]: starting up with netlink and the input layer
apr 24 19:06:46 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:06:46 silver kernel: ata1.00: failed command: SMART
apr 24 19:06:46 silver kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
                                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:06:46 silver kernel: ata1.00: status: { DRDY }
apr 24 19:06:46 silver kernel: ata1: hard resetting link

And yes: the booting pauses about 20 seconds about 10 seconds into the boot; about 19:06:25 in this case.
I think i have updated it but it was ages ago in the laptop it served then. Need to check what version i have...
Yes... i know we installed SMART just a month ago or so (in order to check the drive we then installed SteamOS to...)
Quick "fix": i will uninstall smart

7) color mager problem
Code: Select all
apr 24 19:06:54 silver systemd[1]: Started Manage, Install and Generate Color Profiles.
apr 24 19:06:54 silver colord[4227]: Profile added: canon-silver-Gray..
apr 24 19:06:54 silver colord[4227]: Profile added: canon-silver-CMYK..
apr 24 19:06:54 silver colord[4227]: (colord:4227): Cd-WARNING **: failed to get session [pid 4158]: Unknown error -2


What happened when it freeze
There was a freeze from 19:22:33 to :44. there are no lines in log for that period,
but a few seconds later it reports SMART failed, and hard resets ata1.
Here i am a bit confused. ata1 = sda ? : the INTEL SSD on which mageia runs!

Even later it timed out waiting for mounting of mnt/sdb1 and /mnt/sdb2 which is where we reach SteamOS system and home.
Weird, we have not noticed that malfunctioning (we check logs and load game extensions that way into Steam on SteamOS from mageia)
sdb is an elder mechanical disk where SteamOS is installed, while mga5 is entirely on sda which is an SSD.

Code: Select all
apr 24 19:22:25 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:22:46 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:22:46 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
- here i cut out many similar lines -
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:23:06 silver kernel: ata1.00: failed command: SMART
apr 24 19:23:06 silver kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 12 pio 512 in
                                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:23:06 silver kernel: ata1.00: status: { DRDY }
apr 24 19:23:06 silver kernel: ata1: hard resetting link
apr 24 19:23:06 silver kernel: handle_tx_event: 934 callbacks suppressed
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
apr 24 19:23:06 silver kernel: ata1.00: configured for UDMA/133
apr 24 19:23:06 silver kernel: ata1: EH complete
apr 24 19:23:06 silver kernel: ata1.00: Enabling discard_zeroes_data
apr 24 19:23:06 silver kernel: handle_tx_event: 928 callbacks suppressed
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
- here i cut out many similar lines -
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1.00: NCQ disabled due to excessive errors
apr 24 19:23:06 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:23:06 silver kernel: ata1.00: failed command: SMART
apr 24 19:23:06 silver kernel: ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 1 pio 512 in
                                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:23:06 silver kernel: ata1.00: status: { DRDY }
apr 24 19:23:06 silver kernel: ata1: hard resetting link
apr 24 19:23:06 silver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
apr 24 19:23:06 silver kernel: ata1.00: configured for UDMA/133
apr 24 19:23:06 silver kernel: ata1: EH complete
apr 24 19:23:06 silver kernel: ata1.00: Enabling discard_zeroes_data
apr 24 19:23:10 silver kernel: handle_tx_event: 932 callbacks suppressed
apr 24 19:23:10 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
--- here i cut out many similar lines ---
apr 24 19:23:10 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:15 silver systemd[1]: Job dev-disk-by\x2duuid-f6c06b85\x2d3498\x2d435f\x2d934e\x2d17cc4fefe4bc.device/start timed out.
apr 24 19:23:15 silver systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-f6c06b85\x2d3498\x2d435f\x2d934e\x2d17cc4fefe4bc.device.
apr 24 19:23:15 silver systemd[1]: Dependency failed for /mnt/sdb2.
apr 24 19:23:15 silver systemd[1]: Job dev-disk-by\x2duuid-25121562\x2d1d6d\x2d41c2\x2da0da\x2de96341f81524.device/start timed out.
apr 24 19:23:15 silver systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-25121562\x2d1d6d\x2d41c2\x2da0da\x2de96341f81524.device.
apr 24 19:23:15 silver systemd[1]: Dependency failed for /mnt/sdb1.
apr 24 19:23:15 silver kernel: handle_tx_event: 934 callbacks suppressed
apr 24 19:23:15 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?


I will as quick test uninstall smart, unplug unneccesary USB devices, and report back tomorrow.
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1314
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden

Re: System freezing now and then for 20..30 seconds

Postby doktor5000 » Apr 26th, '16, 19:25

morgano wrote:6) Stupid SMART incompatible drive ? <slaps face>
Code: Select all
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], found in smartd database: Intel X18-M/X25-M/X25-V G2 SSDs
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], WARNING: This drive may require a firmware update to
apr 24 19:06:25 silver smartd[2897]: fix possible drive hangs when reading SMART self-test log:
apr 24 19:06:25 silver smartd[2897]: http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=18363
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Attribute Autosave.
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], SMART Automatic Offline Testing unsupported...
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Automatic Offline Testing.
apr 24 19:06:25 silver acpid[2980]: starting up with netlink and the input layer
apr 24 19:06:46 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:06:46 silver kernel: ata1.00: failed command: SMART
apr 24 19:06:46 silver kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
                                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:06:46 silver kernel: ata1.00: status: { DRDY }
apr 24 19:06:46 silver kernel: ata1: hard resetting link

[...]
Quick "fix": i will uninstall smart


You can do that for testing purposes, but I'd say it's definitely the wrong thing to do, monitoring smart data does never hurt. Better fix the underlying issue, it's even spelled out:

Device: /dev/sda [SAT], found in smartd database: Intel X18-M/X25-M/X25-V G2 SSDs
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], WARNING: This drive may require a firmware update to
apr 24 19:06:25 silver smartd[2897]: fix possible drive hangs when reading SMART self-test log:
apr 24 19:06:25 silver smartd[2897]: http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=18363



morgano wrote:What happened when it freeze
There was a freeze from 19:22:33 to :44. there are no lines in log for that period,
but a few seconds later it reports SMART failed, and hard resets ata1.
Here i am a bit confused. ata1 = sda ? : the INTEL SSD on which mageia runs!

I'd be very wary about the link hard reset, which is most probably the cause for your freezes.

Take backups, do the SSD firmware upgrade, and then run a full diagnosis and possibly also read/write process on the whole SSD.
Maybe you can use https://downloadcenter.intel.com/downlo ... ve-Toolbox on some windows live media for that.


How old is that SSD, and can you provide the complete smart data? Even if it may not be reliable due to the firmware issue, it might provide some hints
Code: Select all
smartctl -a /dev/sda
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17659
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: System freezing now and then for 20..30 seconds

Postby morgano » Apr 28th, '16, 10:33

As a test I uninstalled a couple SMART packages and the freeze never happened in five hours pretty heavy use.
I have prepared a drive for backup and a USBstick with bootable SSD firmware upgrade.
Then we will re-enable smart and watch logs again.
And read the smart data again - i remember i read it before but found nothing worrying.

Thanks for the input and the hint on the SSD toolbox, will test the drive using that :)

Any idea what the fix is for the
Code: Select all
WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1314
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden

Re: System freezing now and then for 20..30 seconds

Postby doktor5000 » Apr 28th, '16, 18:20

morgano wrote:Any idea what the fix is for the
Code: Select all
WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?

Counter-question, why do you think this needs a fix, and what problem with USB do you have in particular related to this message?
FWIW, xhci_hcd is the driver for an usb3 host controllers, not for an endpoint device. And sometimes those host controllers also controls usb2 ports.

From your lspcidrake output, trimmed down to relevant lines:
Code: Select all
xhci_pci        : Renesas Technology Corp.|uPD720202 USB 3.0 Host Controller [SERIAL_USB] (vendor:1912 device:0015 subv:1462 subd:7693) (rev: 02)
xhci_pci        : Renesas Technology Corp.|uPD720202 USB 3.0 Host Controller [SERIAL_USB] (vendor:1912 device:0015 subv:1462 subd:7693) (rev: 02)
hub             : Linux 4.1.15-desktop-2.mga5 xhci-hcd|xHCI Host Controller [Hub|Unused|Full speed (or root) hub] (vendor:1d6b device:0003)
hub             : Linux 4.1.15-desktop-2.mga5 xhci-hcd|xHCI Host Controller [Hub|Unused|Full speed (or root) hub] (vendor:1d6b device:0002)
hub             : Linux 4.1.15-desktop-2.mga5 xhci-hcd|xHCI Host Controller [Hub|Unused|Full speed (or root) hub] (vendor:1d6b device:0003)
hub             : Linux 4.1.15-desktop-2.mga5 xhci-hcd|xHCI Host Controller [Hub|Unused|Full speed (or root) hub] (vendor:1d6b device:0002)


See also some related links for that message:
http://comments.gmane.org/gmane.linux.usb.general/68206
http://www.spinics.net/lists/linux-usb/msg111798.html
http://lists.kernelnewbies.org/pipermai ... 12998.html
https://bugs.debian.org/cgi-bin/bugrepo ... bug=797471
https://bugs.launchpad.net/ubuntu/+sour ... ug/1039478
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17659
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: System freezing now and then for 20..30 seconds

Postby morgano » Apr 29th, '16, 11:16

Drive updated, smart is active, problem gone :)
jornalctl -ab now
smartctl say the drive have been writing/reading for a total of 6500 years, that is what i call quality!
( Workload_Minutes : 3426279902 )
... yes i may connect it to my MSW7 laptop and see what the Intel tool say about it...

The XHCI_TRUST_TX_LENGTH messages disappeared when I reseated an USB2 hub from an USB3 to an USB2 host port.
Probably got into "wrong" port after last dust cleaning. We did not think an USB2 hub to be incompatible with USB3 host.
My son said he had some problems about how he connected what port, and directly to host or hub. Whatever...
I see from your links there have been some discussions, lets leave it here accepting USB2 is not entirely compatible with USB3 on all systems.

The high rate of messages scared me a bit as i have had my workstation go down when /var filled up a while after i attached an USB device...
( viewtopic.php?f=8&t=4864, Mageia bug 10038, upstream https://bugzilla.kernel.org/show_bug.cgi?id=43191 )

Marking as solved.

There are still a couple strange things, but not important, i just note for reference:
* When the on-screen logging is at about 4 seconds, it pauses for about five seconds. I see no correlation to the journal.
* Some problems with sdb during boot, but works OK running
* I am pretty sure i always unmount the USB stick by the KDE/Plasma system tray eject function, yet there is in the log:
Code: Select all
apr 29 09:25:08 silver kernel: FAT-fs (sdc1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
apr 29 09:25:08 silver udisksd[4654]: Mounted /dev/sdc1 at /run/media/fabian/OrangeFAT32 on behalf of uid 10704
apr 29 09:25:43 silver udisksd[4654]: Cleaning up mount point /run/media/fabian/OrangeFAT32 (device 8:33 is not mounted)
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1314
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden

Re: System freezing now and then for 20..30 seconds

Postby doktor5000 » Apr 29th, '16, 21:37

morgano wrote:* I am pretty sure i always unmount the USB stick by the KDE/Plasma system tray eject function, yet there is in the log:
Code: Select all
apr 29 09:25:08 silver kernel: FAT-fs (sdc1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
apr 29 09:25:08 silver udisksd[4654]: Mounted /dev/sdc1 at /run/media/fabian/OrangeFAT32 on behalf of uid 10704
apr 29 09:25:43 silver udisksd[4654]: Cleaning up mount point /run/media/fabian/OrangeFAT32 (device 8:33 is not mounted)


Well, simply run an fsck when it's not mounted yet. Maybe it has been unplugged by someone else?
If you'd connect it to a windows box, it would propose to run a filesystem check for that, linux does too, but only tells you in the log about this.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17659
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: [SOLVED] System freezing now and then for 20..30 seconds

Postby morgano » May 1st, '16, 10:44

Yes it seem to have been. I forgot my sone use to use that stick too, I tell him.
fsck found dirty bit, orphaned name, and wrong free cluster summare (fixed now), needed not change in filsystem.
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC
morgano
 
Posts: 1314
Joined: Jun 15th, '11, 17:51
Location: Kivik, Sweden


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest

cron