Thanks for all the hints.
I took logs yesterday, now reading it offline, that computer is unaccessible atm.
lspcidrake -vjournalctl -ab freeze at 19:22:33 to :44Additional info§ When system freeze, sometimes audio do not freeze immediately, but after a few seconds (buffered somehow)
§ Freeze happens also after very little and simple work, and there is 16GB RAM, so no need to swap and very little data to write to disk.
§ This system have had its mainboard and CPU replaced without reinstalling the system.
Before: two dual Opteron on server grade mainboard (my workstation built 2006), now a hexacore AMD on cheap mainboard (most bang for reasonable buck)
As it just continued working we just praised compatibility and let it be. (and runs several times faster while consuming much less power)
The problem about freezing was not there initially but have been going on for a couple weeks.
I can not say it started when we changed sometning. So i was hoping it would go away again...
§ The audio is by an external USB dongle, simply because the old mainboard had broken audio and this just contined to work.
(some time i will try to get onboard audio to work, first attempt strangely failed, but that is another issue)§ Graphics; we have rotated three Nvidia and an AMD between families computers, ending up in this being a Nvidia GTX760
§ sleep and hibernate both works
§ Drives: we use bios menu to choose which drive to boot. When mageia runs it see:
sda: an 80GB Intel X25 SSD on which mga5 entirely runs, partitions in LVM
sdb: an mechanical drive on which SteamOS is installed. SteamOS / and home are mounted in /mnt
Reading the log It seem i was wrong about that i could not see problems in the log, now i see:1) bios bug warning- Code: Select all
kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20150410/tbfadt-654)
No idea if i should worry.
2) Wrong order of loading modules? I spot this line in log:- Code: Select all
kernel: Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after
What can i do about that?
3) SteamOS installer seem to have set something about its /home partition wrong (mageia mount it in /mnt)- Code: Select all
kernel: EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)
kernel: usb 7-2: Warning! Unlikely big volume range (=4096), cval->res is probably wrong.
I do not know if i better try to fix it or let it be...
4) festival segfaults- Code: Select all
kernel: sd_festival[3188]: segfault at 2d0 ip 00007f09d5956860 sp 00007fff80728468 error 4 in libpthread-2.20.so[7f09d5949000+17000]
apr 24 19:06:47 silver systemd[1]: speech-dispatcherd.service: main process exited, code=exited, status=1/FAILURE
festival seem not be needed...
5) lots of "WARN Successful completion on short TX" - Code: Select all
kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
but from a quick internet search that should be USB3 device?
need to check what ports are used for what...
What is the "XHCI_TRUST_TX_LENGTH quirk" anyway and how do I apply it?
6) Stupid SMART incompatible drive ? <slaps face>- Code: Select all
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], found in smartd database: Intel X18-M/X25-M/X25-V G2 SSDs
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], WARNING: This drive may require a firmware update to
apr 24 19:06:25 silver smartd[2897]: fix possible drive hangs when reading SMART self-test log:
apr 24 19:06:25 silver smartd[2897]: http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=18363
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Attribute Autosave.
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], SMART Automatic Offline Testing unsupported...
apr 24 19:06:25 silver smartd[2897]: Device: /dev/sda [SAT], enabled SMART Automatic Offline Testing.
apr 24 19:06:25 silver acpid[2980]: starting up with netlink and the input layer
apr 24 19:06:46 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:06:46 silver kernel: ata1.00: failed command: SMART
apr 24 19:06:46 silver kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:06:46 silver kernel: ata1.00: status: { DRDY }
apr 24 19:06:46 silver kernel: ata1: hard resetting link
And yes: the booting pauses about 20 seconds about 10 seconds into the boot; about 19:06:25 in this case.
I think i have updated it but it was ages ago in the laptop it served then. Need to check what version i have...
Yes... i know we installed SMART just a month ago or so (in order to check the drive we then installed SteamOS to...)
Quick "fix": i will uninstall smart
7) color mager problem- Code: Select all
apr 24 19:06:54 silver systemd[1]: Started Manage, Install and Generate Color Profiles.
apr 24 19:06:54 silver colord[4227]: Profile added: canon-silver-Gray..
apr 24 19:06:54 silver colord[4227]: Profile added: canon-silver-CMYK..
apr 24 19:06:54 silver colord[4227]: (colord:4227): Cd-WARNING **: failed to get session [pid 4158]: Unknown error -2
What happened when it freezeThere was a freeze from 19:22:33 to :44. there are no lines in log for that period,
but a few seconds later it reports SMART failed, and
hard resets ata1.Here i am a bit confused. ata1 = sda ? :
the INTEL SSD on which mageia runs!Even later it timed out waiting for mounting of mnt/sdb1 and /mnt/sdb2 which is where we reach SteamOS system and home.
Weird, we have not noticed that malfunctioning (we check logs and load game extensions that way into Steam on SteamOS from mageia)
sdb is an elder mechanical disk where SteamOS is installed, while mga5 is entirely on sda which is an SSD.
- Code: Select all
apr 24 19:22:25 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:22:46 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:22:46 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
- here i cut out many similar lines -
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:23:06 silver kernel: ata1.00: failed command: SMART
apr 24 19:23:06 silver kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 12 pio 512 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:23:06 silver kernel: ata1.00: status: { DRDY }
apr 24 19:23:06 silver kernel: ata1: hard resetting link
apr 24 19:23:06 silver kernel: handle_tx_event: 934 callbacks suppressed
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
apr 24 19:23:06 silver kernel: ata1.00: configured for UDMA/133
apr 24 19:23:06 silver kernel: ata1: EH complete
apr 24 19:23:06 silver kernel: ata1.00: Enabling discard_zeroes_data
apr 24 19:23:06 silver kernel: handle_tx_event: 928 callbacks suppressed
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
- here i cut out many similar lines -
apr 24 19:23:06 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:06 silver kernel: ata1.00: NCQ disabled due to excessive errors
apr 24 19:23:06 silver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
apr 24 19:23:06 silver kernel: ata1.00: failed command: SMART
apr 24 19:23:06 silver kernel: ata1.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 1 pio 512 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
apr 24 19:23:06 silver kernel: ata1.00: status: { DRDY }
apr 24 19:23:06 silver kernel: ata1: hard resetting link
apr 24 19:23:06 silver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
apr 24 19:23:06 silver kernel: ata1.00: configured for UDMA/133
apr 24 19:23:06 silver kernel: ata1: EH complete
apr 24 19:23:06 silver kernel: ata1.00: Enabling discard_zeroes_data
apr 24 19:23:10 silver kernel: handle_tx_event: 932 callbacks suppressed
apr 24 19:23:10 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
--- here i cut out many similar lines ---
apr 24 19:23:10 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
apr 24 19:23:15 silver systemd[1]: Job dev-disk-by\x2duuid-f6c06b85\x2d3498\x2d435f\x2d934e\x2d17cc4fefe4bc.device/start timed out.
apr 24 19:23:15 silver systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-f6c06b85\x2d3498\x2d435f\x2d934e\x2d17cc4fefe4bc.device.
apr 24 19:23:15 silver systemd[1]: Dependency failed for /mnt/sdb2.
apr 24 19:23:15 silver systemd[1]: Job dev-disk-by\x2duuid-25121562\x2d1d6d\x2d41c2\x2da0da\x2de96341f81524.device/start timed out.
apr 24 19:23:15 silver systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-25121562\x2d1d6d\x2d41c2\x2da0da\x2de96341f81524.device.
apr 24 19:23:15 silver systemd[1]: Dependency failed for /mnt/sdb1.
apr 24 19:23:15 silver kernel: handle_tx_event: 934 callbacks suppressed
apr 24 19:23:15 silver kernel: xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
I will as quick test uninstall smart, unplug unneccesary USB devices, and report back tomorrow.
Mandriva since 2006, Mageia 2011 at home & work. Thinkpad T40, T43, T400, T510, Dell M4400, M6300, Acer Aspire 7. Workstation using LVM, LUKS, VirtualBox, BOINC