EXT4-fs errors after upgrading

This forum is dedicated to advanced help and support :

Ask here your questions about advanced usage of Mageia. For example you may post here all your questions about network and automated installs, complex server configurations, kernel tuning, creating your own Mageia mirrors, and all tasks likely to be touchy even for skilled users.

EXT4-fs errors after upgrading

Postby RagingRaven » Feb 18th, '22, 13:12

I recently upgraded to mageia 8 and have been getting multiple EXT4-fs errors recently:

EXT4-fs error (device md3): htree_dirblock_to_tree:1028: inode #213779159: comm smbd: Directory block failed checksum
EXT4-fs error (device md3): htree_dirblock_to_tree:1028: inode #213779159: comm smbd: Directory block failed checksum
EXT4-fs error (device md3): __ext4_find_entry:1623: inode #213779159: comm smbd: checksumming directory block 0
EXT4-fs error (device md3): __ext4_find_entry:1623: inode #213779159: comm smbd: checksumming directory block 0


Which will then repeat a couple of times.

From what i've read online, it could be due to a known issue with mdadm? Apparently only the metadata is corrupt, not the data?
However I have no idea what to do to fix this.
User avatar
RagingRaven
 
Posts: 60
Joined: Aug 18th, '14, 16:40
Location: Oud-Beijerland, Near Rotterdam, The Netherlands

Re: EXT4-fs errors after upgrading

Postby doktor5000 » Feb 18th, '22, 17:00

Well, do you actually use mdadm or any other form of software or hardware RAID ? Also some more context information would probably be helpful,
like for which filesystem this appears, on what device/partition this is located and if you have already run an fsck on that filesystem and to what effect. And how old are the underlying disks?
Also do you run a samba server on top of that filesystem?

Would also be helpful if you could post the smartctl -a output for the device/disk for that filesystem.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18054
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: EXT4-fs errors after upgrading

Postby RagingRaven » Feb 19th, '22, 10:30

I do indeed use mdadm for software raid on multiple partitions.
Here's is the output of cat /proc/mdstat:

Code: Select all
Personalities : [raid6] [raid5] [raid4]
md3 : active raid5 sdc7[2] sde7[5] sdd7[3] sda7[0] sdb7[1]
      3885397504 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 4/8 pages [16KB], 65536KB chunk

md2 : active raid5 sdc6[2] sda6[0] sde6[5] sdd6[3] sdb6[1]
      4193536 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md1 : active raid5 sdd5[3] sda5[0] sdc5[2] sde5[5] sdb5[1]
      12641024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>


md1 (/) = ext4
md2 (swap) = swap
md3 (/var) = ext4

The disks are a few years old, enterprise level WD disks and have all passed the latest S.M.A.R.T. test (smartctl /dev/sd[a-e] -H)

I tried fsck on /dev/md3, but it can't execute as it's mounted, and I'm not sure if I can/should just unmount a mdadm raid device without issues.

I am running samba on the device and it seems to work without issue.

I'll post smartctl -a output of all drives below:

Code: Select all
="sda"]smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.18-desktop-2.mga8] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD1003FBYZ-010FB0
Serial Number:    WD-WCAW37704250
LU WWN Device Id: 5 0014ee 20a484864
Firmware Version: 01.01V03
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 19 09:22:54 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (16800) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 165) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   174   172   021    Pre-fail  Always       -       4266
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       138
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22615
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       129
194 Temperature_Celsius     0x0022   120   098   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22477         -
# 2  Extended offline    Completed without error       00%     22411         -
# 3  Extended offline    Completed without error       00%     22312         -
# 4  Extended offline    Completed without error       00%     22190         -
# 5  Extended offline    Completed without error       00%     22023         -
# 6  Extended offline    Completed without error       00%     21855         -
# 7  Extended offline    Completed without error       00%     21687         -
# 8  Extended offline    Completed without error       00%     21519         -
# 9  Extended offline    Completed without error       00%     21353         -
#10  Extended offline    Completed without error       00%     21186         -
#11  Extended offline    Completed without error       00%     21018         -
#12  Extended offline    Completed without error       00%     20850         -
#13  Extended offline    Completed without error       00%     20682         -
#14  Extended offline    Completed without error       00%     20436         -
#15  Extended offline    Completed without error       00%     20184         -
#16  Extended offline    Completed without error       00%     20114         -
#17  Extended offline    Completed without error       00%     19946         -
#18  Extended offline    Completed without error       00%     19778         -
#19  Extended offline    Completed without error       00%     19611         -
#20  Extended offline    Completed without error       00%     19443         -
#21  Extended offline    Completed without error       00%     19274         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all
="sdb"]smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.18-desktop-2.mga8] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD1003FBYZ-010FB0
Serial Number:    WD-WCAW3RX9PLRX
LU WWN Device Id: 5 0014ee 20aa1db49
Firmware Version: 01.01V03
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 19 09:26:14 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (16860) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 166) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   173   169   021    Pre-fail  Always       -       4350
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       138
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22615
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       132
194 Temperature_Celsius     0x0022   119   095   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       19

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22477         -
# 2  Extended offline    Completed without error       00%     22411         -
# 3  Extended offline    Completed without error       00%     22312         -
# 4  Extended offline    Completed without error       00%     22190         -
# 5  Extended offline    Completed without error       00%     22022         -
# 6  Extended offline    Completed without error       00%     21855         -
# 7  Extended offline    Completed without error       00%     21687         -
# 8  Extended offline    Completed without error       00%     21519         -
# 9  Extended offline    Completed without error       00%     21353         -
#10  Extended offline    Completed without error       00%     21186         -
#11  Extended offline    Completed without error       00%     21018         -
#12  Extended offline    Completed without error       00%     20850         -
#13  Extended offline    Completed without error       00%     20682         -
#14  Extended offline    Completed without error       00%     20436         -
#15  Extended offline    Completed without error       00%     20184         -
#16  Extended offline    Completed without error       00%     20114         -
#17  Extended offline    Completed without error       00%     19946         -
#18  Extended offline    Completed without error       00%     19778         -
#19  Extended offline    Completed without error       00%     19611         -
#20  Extended offline    Completed without error       00%     19443         -
#21  Extended offline    Completed without error       00%     19274         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all
="sdc"]smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.18-desktop-2.mga8] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD1003FBYZ-010FB0
Serial Number:    WD-WCAW3RX9PNK8
LU WWN Device Id: 5 0014ee 20aa194b6
Firmware Version: 01.01V03
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 19 09:27:00 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (16800) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 165) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   173   170   021    Pre-fail  Always       -       4333
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       138
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22615
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       129
194 Temperature_Celsius     0x0022   118   095   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22476         -
# 2  Extended offline    Completed without error       00%     22411         -
# 3  Extended offline    Completed without error       00%     22312         -
# 4  Extended offline    Completed without error       00%     22190         -
# 5  Extended offline    Completed without error       00%     22022         -
# 6  Extended offline    Completed without error       00%     21855         -
# 7  Extended offline    Completed without error       00%     21687         -
# 8  Extended offline    Completed without error       00%     21519         -
# 9  Extended offline    Completed without error       00%     21353         -
#10  Extended offline    Completed without error       00%     21185         -
#11  Extended offline    Completed without error       00%     21018         -
#12  Extended offline    Completed without error       00%     20850         -
#13  Extended offline    Completed without error       00%     20682         -
#14  Extended offline    Completed without error       00%     20436         -
#15  Extended offline    Completed without error       00%     20184         -
#16  Extended offline    Completed without error       00%     20114         -
#17  Extended offline    Completed without error       00%     19946         -
#18  Extended offline    Completed without error       00%     19778         -
#19  Extended offline    Completed without error       00%     19610         -
#20  Extended offline    Completed without error       00%     19443         -
#21  Extended offline    Completed without error       00%     19274         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all
="sdd"]smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.18-desktop-2.mga8] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD1003FBYZ-010FB0
Serial Number:    WD-WCAW3RX9P46V
LU WWN Device Id: 5 0014ee 2b54cef0f
Firmware Version: 01.01V03
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 19 09:27:47 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (16260) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 160) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   178   174   021    Pre-fail  Always       -       4091
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       138
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22615
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       128
194 Temperature_Celsius     0x0022   117   095   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22477         -
# 2  Extended offline    Completed without error       00%     22411         -
# 3  Extended offline    Completed without error       00%     22312         -
# 4  Extended offline    Completed without error       00%     22190         -
# 5  Extended offline    Completed without error       00%     22022         -
# 6  Extended offline    Completed without error       00%     21855         -
# 7  Extended offline    Completed without error       00%     21687         -
# 8  Extended offline    Completed without error       00%     21519         -
# 9  Extended offline    Completed without error       00%     21353         -
#10  Extended offline    Completed without error       00%     21186         -
#11  Extended offline    Completed without error       00%     21018         -
#12  Extended offline    Completed without error       00%     20850         -
#13  Extended offline    Completed without error       00%     20682         -
#14  Extended offline    Completed without error       00%     20436         -
#15  Extended offline    Completed without error       00%     20184         -
#16  Extended offline    Completed without error       00%     20114         -
#17  Extended offline    Completed without error       00%     19946         -
#18  Extended offline    Completed without error       00%     19778         -
#19  Extended offline    Completed without error       00%     19611         -
#20  Extended offline    Completed without error       00%     19443         -
#21  Extended offline    Completed without error       00%     19274         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code: Select all
="sde"]smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.18-desktop-2.mga8] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD1003FBYZ-010FB0
Serial Number:    WD-WCAW3RX9PLAP
LU WWN Device Id: 5 0014ee 20aa1d66d
Firmware Version: 01.01V03
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 19 09:28:37 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (16800) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 165) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   177   172   021    Pre-fail  Always       -       4141
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       138
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22615
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       128
194 Temperature_Celsius     0x0022   118   097   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22477         -
# 2  Extended offline    Completed without error       00%     22411         -
# 3  Extended offline    Completed without error       00%     22312         -
# 4  Extended offline    Completed without error       00%     22190         -
# 5  Extended offline    Completed without error       00%     22022         -
# 6  Extended offline    Completed without error       00%     21855         -
# 7  Extended offline    Completed without error       00%     21687         -
# 8  Extended offline    Completed without error       00%     21519         -
# 9  Extended offline    Completed without error       00%     21353         -
#10  Extended offline    Completed without error       00%     21186         -
#11  Extended offline    Completed without error       00%     21018         -
#12  Extended offline    Completed without error       00%     20850         -
#13  Extended offline    Completed without error       00%     20682         -
#14  Extended offline    Completed without error       00%     20436         -
#15  Extended offline    Completed without error       00%     20184         -
#16  Extended offline    Completed without error       00%     20114         -
#17  Extended offline    Completed without error       00%     19946         -
#18  Extended offline    Completed without error       00%     19778         -
#19  Extended offline    Completed without error       00%     19611         -
#20  Extended offline    Completed without error       00%     19443         -
#21  Extended offline    Completed without error       00%     19274         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited by isadora on Feb 19th, '22, 12:24, edited 1 time in total.
Reason: Please place command-output between [CODE]-tags, to improve readability, thanks ahead!!! ;)
User avatar
RagingRaven
 
Posts: 60
Joined: Aug 18th, '14, 16:40
Location: Oud-Beijerland, Near Rotterdam, The Netherlands

Re: EXT4-fs errors after upgrading

Postby doktor5000 » Feb 21st, '22, 15:18

One slightly similar issue with ext4 corruption on top of mdadm, although RAID1, was https://bbs.archlinux.org/viewtopic.php ... 1#p1979271
which links to https://wiki.archlinux.org/title/RAID#Scrubbing which is what I'd recommend.

Apart from that, from the log messages you posted, this only occurs on md3 which is /var, correct?
You can also check the logs easily as root via e.g.
Code: Select all
journalctl -a|grep -iE "EXT4-fs error"
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18054
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: EXT4-fs errors after upgrading

Postby RagingRaven » Feb 21st, '22, 19:05

Correct i've only seen the error on md3(/var).
One thing I have noticed though is that the error only seems to occur when transfering files using samba, so there seems to be a relation (which the error message also seems to suggest).

I'll have a look at the raid scrubbing, because that that seems like that could be a solution for corrupt metadata.
Will be tomorrow though, workday is done here ;) I'll report back on the findings.
User avatar
RagingRaven
 
Posts: 60
Joined: Aug 18th, '14, 16:40
Location: Oud-Beijerland, Near Rotterdam, The Netherlands

Re: EXT4-fs errors after upgrading

Postby doktor5000 » Feb 22nd, '22, 14:57

RagingRaven wrote:Correct i've only seen the error on md3(/var).
One thing I have noticed though is that the error only seems to occur when transfering files using samba, so there seems to be a relation (which the error message also seems to suggest).

From what I remember, /var contains the session logs for samba, although that shouldn't be an issue for the RAID (writing/updating several smaller logfiles frequently).
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 18054
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: EXT4-fs errors after upgrading

Postby RagingRaven » Feb 23rd, '22, 10:58

Took a bit longer to get back to this, but I've just initiated a data scrub, so we'll see how that goes.

As for /var, we actually have samba data shares on /var/samba/data/ and /var/www/html/, so it's not strange to me the errors refer to md3.
I'm also pretty sure there is a connection with those data shares, because i've not done any data transfers the last few days, and also have not had any more errors so far.
So the errors seem to only popup when I'm working with data through samba on those shares.

After the data scrub is done, I'll do some more testing and report back.

Edit: scrubbing has finished. It doesn't really show if anything was found, so I guess I'll just start testing by copying/moving files.
User avatar
RagingRaven
 
Posts: 60
Joined: Aug 18th, '14, 16:40
Location: Oud-Beijerland, Near Rotterdam, The Netherlands


Return to Advanced support

Who is online

Users browsing this forum: No registered users and 1 guest

cron