I o error dev sda sector

имеется сервер у хетзнера(debian wheezy), в нем 2 hdd 3tb
один из дисков не проходит длинный тест(smartctl —test=long /dev/sdb)

сектора не релокейтятся, до этого была та же проблема, я вывел винт из массива забил нулями и параметр 198 Offline_Uncorrectable стал равен 1, так он проработал более 3ех месяцев тесты проходили нормально, потом опять перестали проходить тесты, я опять вывел, забил нулями и параметр 198 Offline_Uncorrectable стал равен 0 и винт работал месяц нормально, мне нужно было остановить тест(рабочий день выпал на субботу и были небольшие тормоза), я запустил тест вечером и он не прошел

что смущает так это параметр 200 Multi_Zone_Error_Rate он меняет свое значение периодически

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 173 051 Pre-fail Always — 0
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always — 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always — 4
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always — 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always — 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always — 10232
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always — 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always — 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always — 4
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always — 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always — 0
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always — 3
194 Temperature_Celsius 0x0022 117 112 000 Old_age Always — 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always — 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always — 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline — 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always — 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline — 2

Читайте также:  Бесплатный антивирус майкрософт для windows 8

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 10222 —
# 2 Extended offline Completed: read failure 90% 10219 34492202
# 3 Extended offline Completed: read failure 90% 10209 34551059
# 4 Short offline Completed without error 00% 10198 —
# 5 Extended offline Completed: read failure 90% 10190 34551059
# 6 Extended offline Aborted by host 20% 10182 —
# 7 Short offline Completed without error 00% 10175 —
# 8 Short offline Completed without error 00% 10151 —
# 9 Short offline Completed without error 00% 10127 —
#10 Short offline Completed without error 00% 10103 —
#11 Short offline Completed without error 00% 10079 —
#12 Short offline Completed without error 00% 10055 —
#13 Short offline Completed without error 00% 10031 —
#14 Extended offline Completed without error 00% 10021 —
#15 Short offline Completed without error 00% 10007 —

Лог месяц назад

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 173 051 Pre-fail Always — 0
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always — 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always — 4
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always — 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always — 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always — 9857
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always — 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always — 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always — 4
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always — 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always — 0
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always — 3
194 Temperature_Celsius 0x0022 117 112 000 Old_age Always — 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always — 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always — 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline — 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always — 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline — 0

Читайте также:  Tp link ac750 re200 как подключить

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 9853 —
# 2 Short offline Completed without error 00% 9839 —
# 3 Short offline Completed without error 00% 9815 —
# 4 Short offline Completed without error 00% 9791 —
# 5 Extended offline Completed without error 00% 9767 —

могут ли в хетзнере заменить этот винт? или есть какое то решение?

I just ordered a new server with a 1TB Samsung SSD. Installed Ubuntu 14.04.5 LTS.

After booting into the newly installed system, I see this in my dmesg and /var/lib/syslog. Output of grep error /var/log/syslog :

I am mostly concerned about these entries: blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx

I ran badblocks -v /dev/sda which returned no errors.

I then ran smartctl —all /dev/sda , which also returned no errors. See output below. This one includes a short self test

My question is simple: What do you think might be wrong? The SSD should be brand new. It’s hard for me, in good conscience, to put this server into production with those errors in the logs. And the box is otherwise acting normal.

Since a few days ago, my computer freezes whenever I sync my HDD with an external USB3 drive. When the keyboard gets temporarily a bit responsive I am able to switch to another terminal (Ctrl+Alt+F2) and I see the following image repeating continuously. I am on Ubuntu 11 using LuckyBackup.

What is going on? Is my hdd defective? Is it the external one? Something else?

2 Answers 2

You need to check and see if your disks are indeed failing. There are command line tools for monitoring SMART data (which is data that the hd reports about it’s own health).

Читайте также:  Linux mint запуск с флешки

Gsmartcontrol is a gui version of the same tool, and is very easy to use. Select the disk and run a short or long test in addition to viewing any errors the hd is reporting.

To run via the the command line, do the following

Make sure SMART is turned on. If not,

or if it’s a serial ata drive.

Your HDD definitely looks like its reaching (EOL), End of Life. To elaborate on Javier’s comment, I would suggest using an Ubuntu Live CD as opposed to the Disk Utility within your current Ubuntu instance, (for the obvious reason that when you plug in your external HDD the OS becomes unusable), and run the following command from the command line to check for disk errors:

Where sdX is the mount point of your external HDD. The external HDD should be plugged in but not mounted, you can unmount the disk using the Disk Utility, and as Javier pointed out you can also use the Disk Utility to check the SMART status of the HDD if it is SMART capable/ enabled.

Again, do all of this from within the Ubuntu LiveCD. While you are at it, check for errors/ health of your primary HDD.