Summary
This post describes how to shorten the time and lessen the wear on the drives of a Synology NAS when replacing all the disks. My method delays the “reshaping” phase until the last drive has been inserted (from where you can let the NAS do its thing) by manually removing the newly created partitions for SHR expansion that is created when each new disk is inserted.
Disk replacement in DS918
Time to fix this device, since there has been a long time since one of the drives in this unit failed (all backed up, I could simply just replace all disks and start over with it).
This unit used the surviving 3TB drives from my DS1517 after two of them had failed, and I later replaced all the disks in that unit to 14TB ones to get more storage space. This is documented in Inner secrets of Synology Hybrid RAID (SHR) – Part 1
I have written a short summary as a reply in this thread on Reddit: Replacing all drives with larger drives, should I expect it to progressively take longer for repairs with each new larger drive that is swapped in?
Another post on this topic in the Synology community forum: Replacing all Disks: Hot Swap & Rebuild or Recreate
Replacing the first drive (the failed one)
To make sure I correctly identified the drive that had to be replaced, I checked logs, raid status and disk status before pulling the drive. As I already knew, /dev/sdd was the one that had failed, so I needed to find out in which slot (as expected, the fourth slot, but this should always be checked) it was fitted in:
dmesg output (filtered)
This confirms the problems with /dev/sdd:
[ 5.797428] sd 3:0:0:0: [sdd] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB) [ 5.797439] sd 3:0:0:0: [sdd] 4096-byte physical blocks [ 5.797656] sd 3:0:0:0: [sdd] Write Protect is off [ 5.797666] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 5.797767] sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.869271] sdd: sdd1 sdd2 sdd5 [ 5.870466] sd 3:0:0:0: [sdd] Attached SCSI disk [ 7.851964] md: invalid raid superblock magic on sdd5 [ 7.857051] md: sdd5 does not have a valid v0.90 superblock, not importing! [ 7.857169] md: adding sdd1 ... [ 7.857175] md: sdd2 has different UUID to sda1 [ 7.857205] md: bind[ 7.857336] md: running: [ 7.857368] md: kicking non-fresh sdd1 from array! [ 7.857376] md: unbind [ 7.862026] md: export_rdev(sdd1) [ 7.890854] md: adding sdd2 ... [ 7.893244] md: bind [ 7.893365] md: running: [ 33.692736] md: bind [ 33.693189] md: kicking non-fresh sdd5 from array! [ 33.693209] md: unbind [ 33.696096] md: export_rdev(sdd5)
/proc/mdstat
The content of /proc/mdstat also confirms that /dev/sdd is not used for the main storage (md2) and md0 (DSM):
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/3] [UU_U____________]
As seen above, for md2 the last device is indicated as missing, and reading on the line above “md2 : active raid5 sda5[0] sdc5[2] sdb5[1]” list the order of the drives in “[UUU_]”, so this translates to [sda5 sdb5 sdc5 -]
The same goes for the md0 status where the order is different “md0 : active raid1 sda1[0] sdb1[1] sdc1[3]”, which translates to [sda1 sdb1 – sdc1]
smartctl output
I used smartctl to find out the drives mapped to /dev/sda[b-d]
root@DS918:~# smartctl --all /dev/sda smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HDN724030ALE640 ... Serial number: PK2238P3G3B8VJ
root@DS918:~# smartctl --all /dev/sdb smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HDN724030ALE640 ... Serial number: PK2234P9JGDEXY
root@DS918:~# smartctl --all /dev/sdc smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HDN724030ALE640 ... Serial number: PK2238P3G343GJ
root@DS918:~# smartctl --all /dev/sdd smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1ER166 Serial Number: W50090JM
As the fourth drive was a Seagate, it was easy to shut down the unit and check which drive it was, but with smartctl, you will be able to identify the drives by reading the serial number on its label.
The full smartctl output for the failed drive:
root@DS918:~# smartctl --all /dev/sdd smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1ER166 Serial Number: W50090JM LU WWN Device Id: 5 000c50 07c46d0aa Firmware Version: CC43 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri May 2 15:45:31 2025 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 113) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 122) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 332) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 086 084 006 Pre-fail Always - 221839714 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 84 5 Reallocated_Sector_Ct 0x0033 098 098 010 Pre-fail Always - 1968 7 Seek_Error_Rate 0x000f 090 060 030 Pre-fail Always - 998592914 9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 43677 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 34 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 2714 188 Command_Timeout 0x0032 100 097 000 Old_age Always - 4 7 8 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always - 30 (Min/Max 27/38) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 065 065 000 Old_age Always - 70652 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 16 0 0 0) 197 Current_Pending_Sector 0x0012 001 001 000 Old_age Always - 49760 198 Offline_Uncorrectable 0x0010 001 001 000 Old_age Offline - 49760 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 38460h+46m+12.675s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 15195564747 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1092464909408 SMART Error Log Version: 1 ATA Error Count: 2713 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2713 occurred at disk power-on lifetime: 32907 hours (1371 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 33d+13:49:54.056 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 33d+13:49:54.048 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 33d+13:49:54.048 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 33d+13:49:54.047 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 33d+13:49:54.047 SET FEATURES [Set transfer mode] Error 2712 occurred at disk power-on lifetime: 32907 hours (1371 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 33d+13:49:49.959 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:49.958 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 33d+13:49:49.949 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 33d+13:49:49.949 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 33d+13:49:49.949 IDENTIFY DEVICE Error 2711 occurred at disk power-on lifetime: 32907 hours (1371 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 33d+13:49:46.267 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:46.267 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:46.267 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:46.266 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 33d+13:49:46.258 SET FEATURES [Enable SATA feature] Error 2710 occurred at disk power-on lifetime: 32907 hours (1371 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 33d+13:49:41.370 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:41.370 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:41.370 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:41.370 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:41.369 READ FPDMA QUEUED Error 2709 occurred at disk power-on lifetime: 32907 hours (1371 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 33d+13:49:36.656 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:36.656 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:36.656 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:36.656 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 33d+13:49:36.656 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 10% 43133 4084368632 # 2 Short offline Completed: read failure 10% 42391 4084368632 # 3 Short offline Completed: read failure 40% 41719 4084368632 # 4 Short offline Completed: read failure 10% 40975 4084368632 # 5 Short offline Completed: read failure 80% 40231 4084368632 # 6 Short offline Completed: read failure 10% 39511 4084368632 # 7 Short offline Completed: read failure 10% 38766 4084368632 # 8 Short offline Completed: read failure 10% 32938 4084368632 # 9 Short offline Completed without error 00% 32193 - #10 Short offline Completed without error 00% 31449 - #11 Short offline Completed without error 00% 30743 - #12 Short offline Completed without error 00% 29998 - #13 Short offline Completed without error 00% 29278 - #14 Short offline Completed without error 00% 28534 - #15 Short offline Completed without error 00% 27790 - #16 Short offline Completed without error 00% 27070 - #17 Short offline Completed without error 00% 26328 - #18 Short offline Completed without error 00% 25608 - #19 Short offline Completed without error 00% 24865 - #20 Short offline Completed without error 00% 24196 - #21 Short offline Completed without error 00% 23452 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
I powered down the unit (using DSM ui) and identified and removed the broken drive. Then I started it up again (the replacement drive was not inserted). When the replacement drive was inserted DSM didn’t immediately see it, so I just did a reboot of the unit to make it appear as an unused drive, then selecting “Repair” from “Storage Manager/Storage Pool”.
The rebuilding process – first drive
I monitored the rebuilding process a few times, but did not take any notes of when (how long it took). Just let it finish during the night:
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [>....................] recovery = 0.5% (15458048/2925435456) finish=534.6min speed=90708K/sec md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices:root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=>...................] recovery = 9.6% (282353128/2925435456) finish=434.7min speed=101335K/sec md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices: root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=======>.............] recovery = 38.2% (1118697376/2925435456) finish=525.1min speed=57343K/sec md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices: root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=======>.............] recovery = 39.4% (1152686672/2925435456) finish=402.3min speed=73435K/sec md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices: root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=========>...........] recovery = 49.3% (1443636996/2925435456) finish=297.2min speed=83074K/sec md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices: root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdc5[2] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] sdc1[3] 2490176 blocks [16/4] [UUUU____________] unused devices:
Partition layout before and after first disk swap
With the broken disk removed, the partition layout of the remaining disks looked like this:
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd
When the rebuild process had started, the new disk (/dev/sdd) got the same partition layout as the others, but also a partition for the remaining space (for now unused/unusable)
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd /dev/sdd6 5860342336 15627846239 9767503904 fd
Second disk pulled out
Now that the first disk had been replaced and the raid was rebuild, I just pulled out the second one to replace.
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdd5[4] sda5[0] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U] md1 : active raid1 sdd2[3] sdb2[1] sda2[0] 2097088 blocks [16/3] [UU_U____________] md0 : active raid1 sdd1[2] sda1[0] sdb1[1] 2490176 blocks [16/3] [UUU_____________] unused devices:
When I inserted the replacement disk, it was (this time) detected by the unit (since the drive was already known before pulling it).
[53977.141054] ata3: link reset sucessfully clear error flags [53977.157449] ata3.00: ATA-9: ST8000AS0002-1NA17Z, AR17, max UDMA/133 [53977.157458] ata3.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [53977.157462] ata3.00: SN: Z841474Z [53977.158764] ata3.00: configured for UDMA/133 [53977.158779] ata3.00: Write Cache is enabled [53977.163030] ata3: EH complete [53977.164533] scsi 2:0:0:0: Direct-Access ATA ST8000AS0002-1NA17Z AR17 PQ: 0 ANSI: 5 [53977.165256] sd 2:0:0:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [53977.165273] sd 2:0:0:0: [sdc] 4096-byte physical blocks [53977.165298] sd 2:0:0:0: Attached scsi generic sg2 type 0 [53977.165534] sd 2:0:0:0: [sdc] Write Protect is off [53977.165547] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [53977.165662] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [53977.217123] sdc: sdc1 sdc2 [53977.218062] sd 2:0:0:0: [sdc] Attached SCSI disk
Full content from dmesg output since starting the unit. This shows that the rebuild time of md2 was about 10 hours, more or less as expected when it was started (early output of /proc/mdstat).
[ 323.429093] perf interrupt took too long (5018 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 [36561.042407] md: md2: recovery done. [36561.200565] md: md2: set sdd5 to auto_remap [0] [36561.200576] md: md2: set sda5 to auto_remap [0] [36561.200581] md: md2: set sdc5 to auto_remap [0] [36561.200585] md: md2: set sdb5 to auto_remap [0] [36561.405942] RAID conf printout: [36561.405954] --- level:5 rd:4 wd:4 [36561.405959] disk 0, o:1, dev:sda5 [36561.405963] disk 1, o:1, dev:sdb5 [36561.405967] disk 2, o:1, dev:sdc5 [36561.405971] disk 3, o:1, dev:sdd5 [53370.783902] ata3: device unplugged sstatus 0x0 [53370.783962] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [53370.791503] ata3: irq_stat 0x00400040, connection status changed [53370.797628] ata3: SError: { PHYRdyChg DevExch } [53370.802258] ata3: hard resetting link [53371.525046] ata3: SATA link down (SStatus 0 SControl 300) [53371.525054] ata3: No present pin info for SATA link down event [53373.531047] ata3: hard resetting link [53373.836045] ata3: SATA link down (SStatus 0 SControl 300) [53373.836054] ata3: No present pin info for SATA link down event [53373.841917] ata3: limiting SATA link speed to 1.5 Gbps [53375.841041] ata3: hard resetting link [53376.146048] ata3: SATA link down (SStatus 0 SControl 310) [53376.146056] ata3: No present pin info for SATA link down event [53376.151920] ata3.00: disabled [53376.151928] ata3.00: already disabled (class=0x2) [53376.151933] ata3.00: already disabled (class=0x2) [53376.151958] ata3: EH complete [53376.151980] ata3.00: detaching (SCSI 2:0:0:0) [53376.152704] sd 2:0:0:0: [sdc] tag#21 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 [53376.152717] sd 2:0:0:0: [sdc] tag#21 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 [53376.152730] blk_update_request: I/O error, dev sdc, sector in range 4980736 + 0-2(12) [53376.153061] md: super_written gets error=-5 [53376.153061] syno_md_error: sdc1 has been removed [53376.153061] raid1: Disk failure on sdc1, disabling device. Operation continuing on 3 devices [53376.177112] sd 2:0:0:0: [sdc] Synchronizing SCSI cache [53376.177232] sd 2:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00 [53376.177238] sd 2:0:0:0: [sdc] Stopping disk [53376.177269] sd 2:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00 [53376.183106] RAID1 conf printout: [53376.183118] --- wd:3 rd:16 [53376.183125] disk 0, wo:0, o:1, dev:sda1 [53376.183130] disk 1, wo:0, o:1, dev:sdb1 [53376.183135] disk 2, wo:0, o:1, dev:sdd1 [53376.183140] disk 3, wo:1, o:0, dev:sdc1 [53376.184338] SynoCheckRdevIsWorking (11054): remove active disk sdc5 from md2 raid_disks 4 mddev->degraded 0 mddev->level 5 [53376.184376] syno_md_error: sdc5 has been removed [53376.184387] md/raid:md2: Disk failure on sdc5, disabling device. md/raid:md2: Operation continuing on 3 devices. [53376.196472] SynoCheckRdevIsWorking (11054): remove active disk sdc2 from md1 raid_disks 16 mddev->degraded 12 mddev->level 1 [53376.196491] syno_md_error: sdc2 has been removed [53376.196497] raid1: Disk failure on sdc2, disabling device. Operation continuing on 3 devices [53376.198033] RAID1 conf printout: [53376.198035] --- wd:3 rd:16 [53376.198038] disk 0, wo:0, o:1, dev:sda1 [53376.198040] disk 1, wo:0, o:1, dev:sdb1 [53376.198042] disk 2, wo:0, o:1, dev:sdd1 [53376.206669] syno_hot_remove_disk (10954): cannot remove active disk sdc2 from md1 ... rdev->raid_disk 2 pending 0 [53376.330347] md: ioctl lock interrupted, reason -4, cmd -2145908384 [53376.446860] RAID conf printout: [53376.446869] --- level:5 rd:4 wd:3 [53376.446874] disk 0, o:1, dev:sda5 [53376.446879] disk 1, o:1, dev:sdb5 [53376.446883] disk 2, o:0, dev:sdc5 [53376.446886] disk 3, o:1, dev:sdd5 [53376.454062] RAID conf printout: [53376.454072] --- level:5 rd:4 wd:3 [53376.454077] disk 0, o:1, dev:sda5 [53376.454082] disk 1, o:1, dev:sdb5 [53376.454086] disk 3, o:1, dev:sdd5 [53376.460958] SynoCheckRdevIsWorking (11054): remove active disk sdc1 from md0 raid_disks 16 mddev->degraded 13 mddev->level 1 [53376.460968] RAID1 conf printout: [53376.460972] --- wd:3 rd:16 [53376.460978] disk 0, wo:0, o:1, dev:sda2 [53376.460984] disk 1, wo:0, o:1, dev:sdb2 [53376.460987] md: unbind[53376.460992] disk 2, wo:1, o:0, dev:sdc2 [53376.460998] disk 3, wo:0, o:1, dev:sdd2 [53376.467047] RAID1 conf printout: [53376.467056] --- wd:3 rd:16 [53376.467062] disk 0, wo:0, o:1, dev:sda2 [53376.467066] disk 1, wo:0, o:1, dev:sdb2 [53376.467070] disk 3, wo:0, o:1, dev:sdd2 [53376.470067] md: export_rdev(sdc1) [53376.475613] md: unbind [53376.480044] md: export_rdev(sdc5) [53377.207047] SynoCheckRdevIsWorking (11054): remove active disk sdc2 from md1 raid_disks 16 mddev->degraded 13 mddev->level 1 [53377.207072] md: unbind [53377.212034] md: export_rdev(sdc2) [53958.581765] ata3: device plugged sstatus 0x1 [53958.581811] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen [53958.589278] ata3: irq_stat 0x00000040, connection status changed [53958.595322] ata3: SError: { DevExch } [53958.599069] ata3: hard resetting link [53964.371031] ata3: link is slow to respond, please be patient (ready=0) [53968.757039] ata3: softreset failed (device not ready) [53968.762111] ata3: SRST fail, set srst fail flag [53968.766667] ata3: hard resetting link [53974.538032] ata3: link is slow to respond, please be patient (ready=0) [53977.141041] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [53977.141054] ata3: link reset sucessfully clear error flags [53977.157449] ata3.00: ATA-9: ST8000AS0002-1NA17Z, AR17, max UDMA/133 [53977.157458] ata3.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [53977.157462] ata3.00: SN: Z841474Z [53977.158764] ata3.00: configured for UDMA/133 [53977.158779] ata3.00: Write Cache is enabled [53977.163030] ata3: EH complete [53977.164533] scsi 2:0:0:0: Direct-Access ATA ST8000AS0002-1NA17Z AR17 PQ: 0 ANSI: 5 [53977.165256] sd 2:0:0:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [53977.165273] sd 2:0:0:0: [sdc] 4096-byte physical blocks [53977.165298] sd 2:0:0:0: Attached scsi generic sg2 type 0 [53977.165534] sd 2:0:0:0: [sdc] Write Protect is off [53977.165547] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [53977.165662] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [53977.217123] sdc: sdc1 sdc2 [53977.218062] sd 2:0:0:0: [sdc] Attached SCSI disk
Even when the new drive had been detected, I rebooted the unit “just to be sure”, then I initiated the repair process from DSM again. After letting it run for a while, I checked the status:
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdc5[5] sda5[0] sdd5[4] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U] [=>...................] recovery = 9.1% (266365608/2925435456) finish=564.1min speed=78549K/sec md1 : active raid1 sdc2[3] sdd2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdc1[3] sda1[0] sdb1[1] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
Partition layout after the second disk swap
After changing the second drive (/dev/sdc) and starting the rebuild process, the new disk got the same partition layout as the first replaced one (/dev/sdd):
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdc6 5860342336 15627846239 9767503904 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd /dev/sdd6 5860342336 15627846239 9767503904 fd
The rebuild will again take about 10 hours to finish.
What’s expected to happen next
Because there are now two partitions with unused space on the new drives, the md2 volume will be rebuilt as RAID5 on sd[a-d]5 + the additional space of RAID1 on sdc6 and sdd6. There seems no way of stopping this stupidity, since it will have to be redone again after swapping the next disk. Just sit back and wait for the expansion of the mdraid volume.
Unless…
It might be a time saver to delete the unused partition on the first disk, so that the storage cannot be expanded (what happens will depend on if DSM notices the non-partitioned space and still makes that mirror of sdd6 + sdc6)
There’s only one way to find out:
root@DS918:~# parted /dev/sdd GNU Parted 3.2 Using /dev/sdd Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) rm 6 rm 6 (parted) print print Model: ATA ST8000AS0002-1NA (scsi) Disk /dev/sdd: 8002GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 2551MB 2550MB ext4 raid 2 2551MB 4699MB 2147MB linux-swap(v1) raid 5 4840MB 3000GB 2996GB raid (parted) quit
10 hours later…
At right about 10 hours later, md2 was almost rebuilt. No problems so far, but what follows will be interesting as I removed that extra partition (which would have been a part of the LV used for storage). I really hope that the NAS would be ready to accept the next disk in the replacemend procedure right after the sync is finished.
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdc5[5] sda5[0] sdd5[4] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U] [==================>..] recovery = 94.4% (2763366352/2925435456) finish=35.3min speed=76346K/sec md1 : active raid1 sdc2[3] sdd2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdc1[3] sda1[0] sdb1[1] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
Results after the second disk swap
As hoped for, no automatic LV change was initiated, saving me a lot of hours (at least, for now) skipping the reshape operation which would then have to be done at least one time after swapping out the remaining disks.
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdc5[5] sda5[0] sdd5[4] sdb5[1] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdc2[3] sdd2[2] sdb2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdc1[3] sda1[0] sdb1[1] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
Partitions on /dev/sdc at this stage:
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdc6 5860342336 15627846239 9767503904 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd
Replacing the third disk
I’m doing it exactly the same way as when I replaced the second disk:
Pull out the drive
Replace and check
Reboot just to be sure
Rebuild
Remove the extra partition on sdc to prevent reshaping after rebuild
After the 3rd disk change had been accepted (to be resynced), I got some unexpected things happening. Even with the removed partition on sdc, DSM decided that it could make partition changes to make the most use of the disks available:
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdb6 5860342336 15627846239 9767503904 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdc6 5860342336 15627846239 9767503904 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd /dev/sdd6 5860342336 15627846239 9767503904 fd
The removed partition from sdd was recreated, and now sdb6, sdc6 and sdd6 will be a RAID5 which will be striped onto the storage LV. Not what I hoped for, but probably nothing that could have been done to prevent it from happening (I think all three extra partitions would have been created even if I removed the one from sdc).
Checking the mdraid status, I noticed that there might be some hope (again, by removing the extra partition on each of the disks that have been completely replaced):
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdb5[6] sda5[0] sdd5[4] sdc5[5] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [U_UU] [=>...................] recovery = 8.5% (249970824/2925435456) finish=696.8min speed=63988K/sec md1 : active raid1 sdb2[3] sdd2[2] sdc2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdb1[1] sda1[0] sdc1[3] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
As the new partitions are not in use yet, I just remove them from the disks (sdc and sdd) using parted.
After removing these partitions, the disks looks like I want them for now:
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdb6 5860342336 15627846239 9767503904 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd
On the next disk replacement (the last), I will let it expand the storage pool to use the free space from the new disks (as they are 8TB each and the old ones were 3TB, this will add 15TB to the volume).
Snapshots from DSM web UI
The first snapshot of the UI was done after replacing the third disk when something unexpected happened, but I include the story up to that point for those few interested in reading my stuff 🙂
These snapshots (taken while disk 3 is being rebuilt) are still a vaild representation for how the unit was configured before changing (disk 4, 3 and 2, as I began from the bottom with the broken one).
I began with a total volume of 8TB, which I replaced the failing drive with a new 8TB. This made the volume size unchangeable (because redundancy cannot be done with the help of only that 5TB unused space on the new drive).
When changing the second drive, DSM told me the new size of the would be about 12TB, which is the old 8TB (RAID5 across the four disks) + the 5GB free space from the new drives (partition 6 mirrored). This was not what I wanted, so I deleted partition 6 from one of the drives, and that worked, preventing the storage pool from being expanded.
Replacing the third disk (as I have detailed just above) made the assumption that I really wanted to use the free space from the two other drives + the third of the same kind (even with the extra partition removed from sdc). This time I got noticed that the storage pool will grow to about 17TB. Still not what I wanted, and after checking that nothing had been changed, I went on removing the 5TB partitions from sdc and sdd.
11,7 hours later…
Storage pool untouched.
root@DS918:~# cat /etc/lvm/backup/vg1 # Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Sat May 3 05:20:41 2025 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/pvresize /dev/md2'" creation_host = "DS918" # Linux DS918 4.4.59+ #25426 SMP PREEMPT Mon Dec 14 18:48:50 CST 2020 x86_64 creation_time = 1746242441 # Sat May 3 05:20:41 2025 vg1 { id = "jkiRc4-0zwx-ye9v-1eFm-OL0u-7oSS-x51FA8" seqno = 4 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "yu1P7E-7o1a-8CsP-mbaR-mye5-N4pk-1fAk8O" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 17552611584 # 8.17357 Terabytes pe_start = 1152 pe_count = 2142652 # 8.17357 Terabytes } } logical_volumes { syno_vg_reserved_area { id = "3YdjJW-zkx6-DoKs-jEz0-kTXo-rpke-eYIw8P" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 3 # 12 Megabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } volume_1 { id = "BFxwgA-3pr2-3BHr-AXo3-rJ6r-F7tP-vC7Te7" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 2142649 # 8.17356 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 3 ] } } } }
mdraid volumes untouched:
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdb5[6] sda5[0] sdd5[4] sdc5[5] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdb2[3] sdd2[2] sdc2[1] sda2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sdb1[1] sda1[0] sdc1[3] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
LV also untouched, just as I wanted.
root@DS918:~# lvdisplay --- Logical volume --- LV Path /dev/vg1/syno_vg_reserved_area LV Name syno_vg_reserved_area VG Name vg1 LV UUID 3YdjJW-zkx6-DoKs-jEz0-kTXo-rpke-eYIw8P LV Write Access read/write LV Creation host, time , LV Status available # open 0 LV Size 12.00 MiB Current LE 3 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 252:0 --- Logical volume --- LV Path /dev/vg1/volume_1 LV Name volume_1 VG Name vg1 LV UUID BFxwgA-3pr2-3BHr-AXo3-rJ6r-F7tP-vC7Te7 LV Write Access read/write LV Creation host, time , LV Status available # open 1 LV Size 8.17 TiB Current LE 2142649 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 252:1
Replacing the last drive
I follow the same procedure as with the other ones, with one exception: I let the Synology do its magic and expand the storage pool by leaving the 5GB partitions on the drives.
Pull out the drive
Replace and check
Reboot just to be sure
Rebuild
Let the Synology expand the storage pool
After the reboot, I just did a “Repair” on the pool again, and confirmed that the new size will be about 21TB (old size of 8TB + RAID5 on four 5TB partitions giving the 15TB extra space):
Partition layout on the disks after starting the rebuild:
root@DS918:~# sfdisk -l /dev/sda1 2048 4982527 4980480 fd /dev/sda2 4982528 9176831 4194304 fd /dev/sda5 9453280 5860326239 5850872960 fd /dev/sda6 5860342336 15627846239 9767503904 fd /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdb6 5860342336 15627846239 9767503904 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdc6 5860342336 15627846239 9767503904 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd /dev/sdd6 5860342336 15627846239 9767503904 fd
Now I just have to wait…
Something unexpected happened
After that reboot (before initiating rebuild), “md2” for some reason changed to “md4”. The reason for this could be that “md2” and “md3” were unavailable because the last disk came from a FreeBSD:ed older Buffalo, so mdraid detected this and reassigned “md2” as “md4”.
For reference only, the partition tables just after inserting the disk that now should be the new and last replacement:
root@DS918:~# sfdisk -l /dev/sda1 2048 2002943 2000896 83 /dev/sda2 2002944 12003327 10000384 83 /dev/sda3 12003328 12005375 2048 83 /dev/sda4 12005376 12007423 2048 83 /dev/sda5 12007424 14008319 2000896 83 /dev/sda6 14008320 7814008319 7800000000 83 /dev/sda7 7814008832 15614008831 7800000000 83 /dev/sdb1 2048 4982527 4980480 fd /dev/sdb2 4982528 9176831 4194304 fd /dev/sdb5 9453280 5860326239 5850872960 fd /dev/sdb6 5860342336 15627846239 9767503904 fd /dev/sdc1 2048 4982527 4980480 fd /dev/sdc2 4982528 9176831 4194304 fd /dev/sdc5 9453280 5860326239 5850872960 fd /dev/sdd1 2048 4982527 4980480 fd /dev/sdd2 4982528 9176831 4194304 fd /dev/sdd5 9453280 5860326239 5850872960 fd
So at least until the next reboot, the output from /proc/mdstat would look like this:
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md4 : active raid5 sda5[7] sdb5[6] sdd5[4] sdc5[5] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [_UUU] [==============>......] recovery = 73.4% (2148539264/2925435456) finish=112.3min speed=115278K/sec md1 : active raid1 sda2[3] sdd2[2] sdc2[1] sdb2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sda1[0] sdb1[1] sdc1[3] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
Thinking…
The expasion of the storage should not take a long time using my method for preventing expansion in between every disk swap.
The manual method of doing this expansion would be to create a mdraid5 of the four drives, adding these to the LVM configuration as pv, then adding that pv to the “volume_1” stripe. Unless the Synology decides to merge md2 and md3 (which I assume will be created using the 4x5TB partitions)…
Expanding the storage volume
When resyncing md4 (previously named md2) finished, a new mdraid using the four 5TB partitions was created, and a resync was initiated (as it’s not ZFS, this might be necessary even when there is “no data” to sync). As it looks like now, this step will take about 52 hours (going much slower than previous resyncs, so it might be a temporary low speed).
root@DS918:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sda6[3] sdd6[2] sdc6[1] sdb6[0] 14651252736 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] [=>...................] resync = 5.1% (249940160/4883750912) finish=3159.2min speed=24445K/sec md4 : active raid5 sda5[7] sdb5[6] sdd5[4] sdc5[5] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sda2[3] sdd2[2] sdc2[1] sdb2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sda1[0] sdb1[1] sdc1[3] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
mdadm –detail /dev/md2 gives some more information:
root@DS918:~# mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Sun May 4 20:17:34 2025 Raid Level : raid5 Array Size : 14651252736 (13972.52 GiB 15002.88 GB) Used Dev Size : 4883750912 (4657.51 GiB 5000.96 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Sun May 4 23:45:30 2025 State : active, resyncing Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Resync Status : 5% complete Name : DS918:2 (local to host DS918) UUID : cc2a3e88:4f844ebd:2fbf5461:f29bbaf0 Events : 43 Number Major Minor RaidDevice State 0 8 22 0 active sync /dev/sdb6 1 8 38 1 active sync /dev/sdc6 2 8 54 2 active sync /dev/sdd6 3 8 6 3 active sync /dev/sda6
I also found out that the storage pool (but not the volume) has now been expanded to its final size of 21.7TB:
On the “Volume” page, I can go on creating a new volume,which is not what I want. I suppose expanding the current volume will be possible after the resync of the newly added space is done.
I cancelled on the last step where the volume was going to be created, as I want to expand the main storage volume instead.
On the “Linux” side (mdraid and LVM), I found out that the “Physical Volume” had been created and that volume had been added to the “Volume Group” vg1:
When md2 was fully synced
At the end of the the resync of md2, which took about 79 hours (estimated time was 52 hours, but the speed dropped during the resync, and the estimated time increased over the two following days), I was still not able to extend the storage volume from the location I expected it to be in (the “Action” drop-down button under “Volume” in “Storage Manager”). My mistake here was to not check “Configure” under that same drop-down button.
I added new drives to my Synology NAS, but the available capacity didn’t increase. What can I do?
So for DSM 6.2 (for the Fakenology), this is where it’s done:
From the “Configuration” page, the volume size can be changed to any size greater than the current size, or to “max” which will add the newly created storage to the volume.
This option to change the size of the volume might have been there all the time (during synchronization), but in any case, it would probably had been better to leave it alone until first sync finalized anyway.
Now the mdraid volumes look like this:
root@DS918:/volume1# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sda6[3] sdd6[2] sdc6[1] sdb6[0] 14651252736 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md4 : active raid5 sda5[7] sdb5[6] sdd5[4] sdc5[5] 8776306368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sda2[3] sdd2[2] sdc2[1] sdb2[0] 2097088 blocks [16/4] [UUUU____________] md0 : active raid1 sda1[0] sdb1[1] sdc1[3] sdd1[2] 2490176 blocks [16/4] [UUUU____________] unused devices:
At this stage, the storage pool is still untouched, but as shown in the images above, another pv has been added:
root@DS918:/volume1# cat /etc/lvm/backup/vg1 # Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Thu May 8 03:03:03 2025 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/pvresize /dev/md2'" creation_host = "DS918" # Linux DS918 4.4.59+ #25426 SMP PREEMPT Mon Dec 14 18:48:50 CST 2020 x86_64 creation_time = 1746666183 # Thu May 8 03:03:03 2025 vg1 { id = "jkiRc4-0zwx-ye9v-1eFm-OL0u-7oSS-x51FA8" seqno = 7 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "yu1P7E-7o1a-8CsP-mbaR-mye5-N4pk-1fAk8O" device = "/dev/md4" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 17552611584 # 8.17357 Terabytes pe_start = 1152 pe_count = 2142652 # 8.17357 Terabytes } pv1 { id = "YZWW7p-8HaZ-9kDy-7hVv-v2Sk-Vlyu-LkkhXU" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 29302504320 # 13.645 Terabytes pe_start = 1152 pe_count = 3576965 # 13.645 Terabytes } } logical_volumes { syno_vg_reserved_area { id = "3YdjJW-zkx6-DoKs-jEz0-kTXo-rpke-eYIw8P" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 3 # 12 Megabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } volume_1 { id = "BFxwgA-3pr2-3BHr-AXo3-rJ6r-F7tP-vC7Te7" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 2142649 # 8.17356 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 3 ] } } } }
The next (and last step) is to add the new space to the storage volume (Volume 1). This is being done by adding a second segment to “volume_1”, which contains pv1 in the stripe list. of “volume_1”. When the segment has been added, the file system on “volume_1” is resized using the resize2fs command (this took a couple of minutes to finish).
root@DS918:/volume1# cat /etc/lvm/backup/vg1 # Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Sun May 11 21:15:43 2025 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/lvextend --alloc inherit /dev/vg1/volume_1 --size 22878208M'" creation_host = "DS918" # Linux DS918 4.4.59+ #25426 SMP PREEMPT Mon Dec 14 18:48:50 CST 2020 x86_64 creation_time = 1746990943 # Sun May 11 21:15:43 2025 vg1 { id = "jkiRc4-0zwx-ye9v-1eFm-OL0u-7oSS-x51FA8" seqno = 8 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "yu1P7E-7o1a-8CsP-mbaR-mye5-N4pk-1fAk8O" device = "/dev/md4" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 17552611584 # 8.17357 Terabytes pe_start = 1152 pe_count = 2142652 # 8.17357 Terabytes } pv1 { id = "YZWW7p-8HaZ-9kDy-7hVv-v2Sk-Vlyu-LkkhXU" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 29302504320 # 13.645 Terabytes pe_start = 1152 pe_count = 3576965 # 13.645 Terabytes } } logical_volumes { syno_vg_reserved_area { id = "3YdjJW-zkx6-DoKs-jEz0-kTXo-rpke-eYIw8P" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 3 # 12 Megabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } volume_1 { id = "BFxwgA-3pr2-3BHr-AXo3-rJ6r-F7tP-vC7Te7" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 2 segment1 { start_extent = 0 extent_count = 2142649 # 8.17356 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 3 ] } segment2 { start_extent = 2142649 extent_count = 3576903 # 13.6448 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 0 ] } } } }
root@DS918:/volume1# df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 2.3G 987M 1.2G 45% / none 983M 0 983M 0% /dev /tmp 996M 944K 995M 1% /tmp /run 996M 8.2M 988M 1% /run /dev/shm 996M 4.0K 996M 1% /dev/shm none 4.0K 0 4.0K 0% /sys/fs/cgroup cgmfs 100K 0 100K 0% /run/cgmanager/fs /dev/vg1/volume_1 22T 7.2T 15T 33% /volume1 root@DS918:/volume1#