UNIX – WebIT Tech Blog

Buffalo LS220D – borked again

March 16, 2025March 17, 2025 Super Admin Leave a comment

Another Seagate drive failed

Yesterday, when I had the JotteCloud client started to backup the content of the shares on one of my LS220s, I noticed a slowdown in reading files from that device.

Checking the dmesg output I found a repeating pattern of IO errors

end_request: I/O error, dev sda, sector 1231207416
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:f8:bb:62/00:00:49:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:f8:bb:62/40:00:49:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]  Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]  Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        49 62 bb f8
sd 0:0:0:0: [sda]  ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 49 62 bb f8 00 00 08 00
end_request: I/O error, dev sda, sector 1231207416
ata1: EH complete

So, this time sda (the first of the two drives) were failing (sdb had been replaced earlier), and as with the other LS220s I run these in RAID0 (stripe) mode, so both drives are needed for proper operation. I do probably have most of the content backed up since the last failure and since the last regular backups to JottaCloud)

As usual, I fired up my computer used for data-rescuing, and as usual it complained that it hadn’t been used for a too-long time, so the BIOS settings had been forgotten. Had to change that back to booting from CD (Trinity Rescue Kit), but probably missed to enable the internal SATA connectors this time.

Running both the source and destination drives on the same controller makes the rescuing a bit slow, but I’ll just let it run.

I stumbled upon errors early on (during copying the system partitions), so I stopped here and investigated what’s going on:

dmesg output on my rescue-system

sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 00 72 b5 e7 00 00 80 00
end_request: I/O error, dev sdb, sector 7517744

This is the layout of the partitions from another LS220s

Number  Start      End          Size         File system  Name     Flags
 1      34s        2000000s     1999967s     ext3         primary
 2      2000896s   11999231s    9998336s                  primary
 3      11999232s  12000000s    769s                      primary  bios_grub
 4      12000001s  12000001s    1s                        primary
 5      12000002s  14000000s    1999999s                  primary
 6      14000128s  7796875263s  7782875136s               primary

As seen in the dmesg output, the problem was within the second partition, so I restarted ddrescue with the -i parameter to set the start position at start of partition 3 (dealing with 2 later, which is a part of the root filesystem (md1, which consists of sda2 and sdb2) that should be mirrored on the other drive (I have had problems with broken mirrors before, so it might even be that I have no valid copy of this partition).

ddrescue -i 11999232b -d /dev/sdb /dev/sdc /sda1/ddrescue/buff6-1.log

About 12 hours later, I’m almost halfway through the data partition (partition 6) for the mdraid volume. A few errors so far, but ddrescue will get back to those bad parts and try splitting them into smaller pieces later on.

Initial status (read from logfile)
rescued:         0 B,  errsize:       0 B,  errors:       0
Current status
rescued:     1577 GB,  errsize:    394 kB,  current rate:   36765 kB/s
   ipos:     1583 GB,   errors:       3,    average rate:   38855 kB/s
   opos:     1583 GB

I will eventually loose some files here, but the primary goal is to get the drive recognized as a direct replacement for the failed one.

Getting closer…

About 30 hours later, the most of the drive had been copied over to the replacement disk. I saved the errors on the root partition to the last step (which of the final, but most time consuming part is taking place now), where I first copied as much as possible from around where the early errors occured, then getting closer to the problematic section on each additional run.

Initial status (read from logfile)
rescued:     4000 GB,  errsize:   4008 kB,  errors:      83
Current status
rescued:     4000 GB,  errsize:    518 kB,  current rate:        0 B/s
   ipos:     3828 MB,   errors:      86,    average rate:      254 B/s
   opos:     3828 MB
Splitting error areas...

I gave it another couple of hours and was ready to abort and test if it would boot.. Then it just finished.

Current status
rescued:     4000 GB,  errsize:    493 kB,  current rate:        0 B/s
   ipos:     6035 MB,   errors:      94,    average rate:        2 B/s
   opos:     6035 MB
Finished

Options from here

As I have many backup copies of the content from the NAS this disk is a part of the storage volume on, only a few files (if any at all) will be missing if I restore what I have (have to check and remove duplicates from the backups, but that’s another story), so doing this rescue has already from the beginning only been for educational purposes.

Ignore the 518kB (finished at 493kB) of errors and test if it boots
Before going on, I will create a partial backup image from the drive I recovered the data to, which will be all partitions up to the gap between the system partitions and the beginning of the data partition. The size of the non-data partitions is only about 7GB (14 million blocks) as seen in the partition table:

Number  Start      End          Size         File system
 1      34s        2000000s     1999967s     ext3
 2      2000896s   11999231s    9998336s     
 3      11999232s  12000000s    769s         
 4      12000001s  12000001s    1s           
 5      12000002s  14000000s    1999999s     
 6      14000128s  7796875263s  7782875136s

It’s probably a good idea to not include the start block of the data partition (14000128s), but also safe to skip halfway through the gap between the partitions.
dd if=/dev/sdc of=/sda1/ddrescue/buffa6-1-p1-5 bs=512 count=14000064
This way, I can easily restore the system partitions whenever something goes wrong.

Trying to boot the NAS with the recreated disk and the working one
This is the easiest thing I can try. After running ddrescue from the third partition onwards and until the end, there were only about 40kB unrecoverable data (with or without content). This means that I at least can connect the disks to a Linux machine and mount the mdraid volume there for recovery.
But booting it up in the NAS is the “better” solution in this case.

If booting the disks in the NAS fails
If the NAS won’t boot, my next step will be to try to recover more of the missing content from the root partition (I actually ended up doing this before trying to boot, and was able to recover 25kB more).

Use root partitions from another Buffalo
This would have been my next idea to try out. I have a few more of these 2-disk devices, so I can shut one down and clone the system partitions of both the disks, then dd them back to the disks for “Buffalo 6”. This will give it the same IP as the cloned one, but that’s easy to change if this will make it boot again.
I didn’t have to try this. Can save this for the next crash 🙂

The boot attempt..

Mounted the “new” disk 1 in the caddy, then started up the NAS.. Responds to ping on its IP..
And I was also able to log in to it as “root” (previous modifications to allow SSH root access).. Looking good..

[root@BUFFALO-6 ~]# df -h
Filesystem                Size      Used Available Use% Mounted on
udev                     10.0M         0     10.0M   0% /dev
/dev/md1                  4.7G    784.9M      3.7G  17% /
tmpfs                   121.1M     84.0K    121.0M   0% /tmp
/dev/ram1                15.0M    108.0K     14.9M   1% /mnt/ram
/dev/md0                968.7M    216.4M    752.2M  22% /boot
/dev/md10                 7.2T    707.2G      6.6T  10% /mnt/array1
[root@BUFFALO-6 ~]#

mdraid looks OK, except (what I already suspected) that the mirrors for the system partitions were broken (I forgot to fix that the last time I replaced a disk (the other one) in it)..

[root@BUFFALO-6 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sda1[0]
      999872 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
      4995008 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda5[0]
      999424 blocks super 1.2 [2/1] [U_]

Another easy fix..

mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdb2
mdadm --manage /dev/md2 --add /dev/sdb5

All mirrored partitions OK now:

[root@BUFFALO-6 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sdb1[1] sda1[0]
      999872 blocks [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sdb2[2] sda2[0]
      4995008 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sdb5[2] sda5[0]
      999424 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

Buffalo LS220D – lost drive (hiccup)

July 14, 2022July 14, 2022 Super Admin Leave a comment

Yesterday I noticed that the LEDs were blinking amber on one of my LS220D boxes. My initial thought was that a disk had failed (it’s just a backup of my backup). Checked with the “NAS Navigator” application, and it stated that it was unable to mount the data array (md10) (I have not logged the full error message here, as I continued the attempts to solve the situation).

dmesg output

I logged in as root (see other posts) to check what had gone wrong.
‘dmesg’ revealed that a disk had been lost during smartctl (multiple repeats of the below content):

program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Unable to handle kernel NULL pointer dereference at virtual address 000000a4
pgd = c27d4000
[000000a4] *pgd=0fe93831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#50]
Modules linked in: usblp usb_storage ohci_hcd ehci_hcd xhci_hcd usbcore usb_common
CPU: 0    Tainted: G      D       (3.3.4 #1)
PC is at sg_scsi_ioctl+0xe0/0x374
LR is at sg_scsi_ioctl+0xcc/0x374
pc : []    lr : []    psr: 60000013
sp : cafb5d58  ip : 00000000  fp : 00000024
r10: 00000006  r9 : c41d1860  r8 : 00000012
r7 : 00000000  r6 : 00000024  r5 : beee5550  r4 : beee5548
r3 : cafb4000  r2 : cafb5d58  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 027d4019  DAC: 00000015
Process smartctl (pid: 1027, stack limit = 0xcafb42e8)
Stack: (0xcafb5d58 to 0xcafb6000)
5d40:          c057c2b8 60000013
5d60: c21f27f0 beee5548 c2274800 0000005d cafb5de4 00000000 c998edcc 00000004
5d80: c99800c8 c00a6e64 c9d034e0 00000028 c998edc8 00000029 c27d4000 c00a8fc0
5da0: 00000000 00000000 00000000 c998ed08 c2274800 56e6994b beee5a48 beee5548
5dc0: 0000005d 0000005d c2274800 c21f27f0 cafb4000 56e6994b beee7e34 beee5548
5de0: 0000005d 0000005d c2274800 c21f27f0 cafb4000 ffffffed beee7e34 c0245494
5e00: 00000053 fffffffd 00002006 00000024 beee5af8 beee5ae0 beee5ab8 00004e20
5e20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5e40: c27d4000 00000000 c27d4000 cb0023e0 c87f3d30 00000028 beee7e34 c00be67c
5e60: c27d4000 00000028 cafb5fb0 56e6994b 00000001 0000005d c8014040 beee5548
5e80: 0000005d c0245530 beee5548 0000005d 00000001 00000001 beee5548 c222c000
5ea0: c8014040 c02a6284 beee5548 beee5548 c8014040 c2274800 00000001 0000005d
5ec0: 00000000 c02422a0 beee5548 c0242be0 00000000 cafb5f78 00000001 c2949000
5ee0: ffffff9c c8014040 00000000 00000007 c054ff34 00039db8 cafb5fb0 beee5548
5f00: c21e0470 00000003 00000003 c000e3c8 cafb4000 00000000 beee7e34 c00e0060
5f20: 00000000 00000000 cf34be00 2c1b812a 5e6a6136 2c1b812a cf1a2548 00000000
5f40: 00000000 00000000 00000003 00000003 c95a2ec0 c2949000 c95a2ec8 00000020
5f60: 00000003 c95a2ec0 beee5548 00000001 00000003 c000e3c8 cafb4000 00000000
5f80: beee7e34 c00e010c 00000003 00000000 beee5548 beee5548 0006d614 beee5a8c
5fa0: 00000036 c000e200 beee5548 0006d614 00000003 00000001 beee5548 00000000
5fc0: beee5548 0006d614 beee5a8c 00000036 00000000 00000003 00000006 beee7e34
5fe0: beee5ae0 beee5540 00039688 b6da5cec 80000010 00000003 cfcfcfcf 00000014
[] (sg_scsi_ioctl+0xe0/0x374) from [] (scsi_cmd_ioctl+0x39c/0x3fc)
[] (scsi_cmd_ioctl+0x39c/0x3fc) from [] (scsi_cmd_blk_ioctl+0x3c/0x44)
[] (scsi_cmd_blk_ioctl+0x3c/0x44) from [] (sd_ioctl+0x8c/0xb8)
[] (sd_ioctl+0x8c/0xb8) from [] (__blkdev_driver_ioctl+0x20/0x28)
[] (__blkdev_driver_ioctl+0x20/0x28) from [] (blkdev_ioctl+0x670/0x6c0)
[] (blkdev_ioctl+0x670/0x6c0) from [] (do_vfs_ioctl+0x49c/0x514)
[] (do_vfs_ioctl+0x49c/0x514) from [] (sys_ioctl+0x34/0x58)
[] (sys_ioctl+0x34/0x58) from [] (ret_fast_syscall+0x0/0x30)
Code: e1a0200d e7d3a2a8 e3c23d7f e3c3303f (e1c0aab4)
---[ end trace 660c9d3c9b4a9034 ]---

fdisk output

Using ‘fdisk’ (incorrect for this NAS), I listed the partitions on /dev/sda and /dev/sdb (nothing about /dev/sda):

[root@BUFFALO-4 ~]# fdisk -l /dev/sda
[root@BUFFALO-4 ~]# fdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes
255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1  4294967295  2147483647+  ee  GPT
Partition 1 does not start on physical sector boundary.

smartctl output

[root@BUFFALO-4 ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device

[root@BUFFALO-4 ~]# smartctl --all /dev/sda
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

Segmentation fault

[root@BUFFALO-4 ~]# smartctl --all /dev/sdb
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD40EZRX-22SPEB0
Serial Number:    WD-WCC4E1UUZH74
LU WWN Device Id: 5 0014ee 2b768eeb4
Firmware Version: 80.00A80
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 14 12:10:33 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
          was completed without error.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
          without error or no self-test has ever
          been run.
Total time to complete Offline
data collection: (52320) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
          General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 523) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
          SCT Feature Control supported.
          SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   196   187   021    Pre-fail  Always       -       7183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   055   054   000    Old_age   Always       -       33525
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       36
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       7866202
194 Temperature_Celsius     0x0022   113   103   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status   Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Nothing more to do than to reboot.

After reboot

The storage array was still not mounted, smartctl could now find /dev/sda:

[root@BUFFALO-4 ~]# df -h
Filesystem Size      Used Available Use% Mounted on
udev      10.0M         0     10.0M   0% /dev
/dev/md1   4.7G    766.8M      3.7G  17% /
tmpfs    121.1M     84.0K    121.0M   0% /tmp
/dev/ram1 15.0M    100.0K     14.9M   1% /mnt/ram
/dev/md0 968.7M    216.4M    752.2M  22% /boot

[root@BUFFALO-4 ~]# smartctl --all /dev/sda
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD40EZRX-22SPEB0
Serial Number:    WD-WCC4E1XUDU4T
LU WWN Device Id: 5 0014ee 20cbde2d7
Firmware Version: 80.00A80
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 14 12:13:56 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
          was suspended by an interrupting command from host.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
          without error or no self-test has ever
          been run.
Total time to complete Offline
data collection: (52560) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
          General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 526) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
          SCT Feature Control supported.
          SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   250   204   021    Pre-fail  Always       -       4500
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       38
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   053   051   000    Old_age   Always       -       34713
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       30
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       7823449
194 Temperature_Celsius     0x0022   122   106   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       13
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       11
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       14

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status   Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Partition table after reboot

Now when both disks are in place again, I ran the (correct) command to list the partitions on all drives:

[root@BUFFALO-4 ~]# parted -l /dev/sdb
Model: ATA WDC WD40EZRX-22S (scsi)
Disk /dev/sda: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  1024MB  1024MB  ext3         primary
 2      1024MB  6144MB  5119MBprimary
 3      6144MB  6144MB  394kB primary  bios_grub
 4      6144MB  6144MB  512B  primary
 5      6144MB  7168MB  1024MBprimary
 6      7168MB  3992GB  3985GBprimary


Model: ATA WDC WD40EZRX-22S (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  1024MB  1024MB  ext3         primary
 2      1024MB  6144MB  5119MBprimary
 3      6144MB  6144MB  394kB primary  bios_grub
 4      6144MB  6144MB  512B  primary
 5      6144MB  7168MB  1024MBprimary
 6      7168MB  3992GB  3985GBprimary

...

Looks ok, so I tried mounting /dev/md10:

root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1/
[root@BUFFALO-4 ~]# df -h
Filesystem Size      Used Available Use% Mounted on
udev      10.0M         0     10.0M   0% /dev
/dev/md1   4.7G    766.8M      3.7G  17% /
tmpfs    121.1M     84.0K    121.0M   0% /tmp
/dev/ram1 15.0M    100.0K     14.9M   1% /mnt/ram
/dev/md0 968.7M    216.4M    752.2M  22% /boot
/dev/md10  7.2T      5.7T      1.6T  79% /mnt/array1
[root@BUFFALO-4 ~]# ls /mnt/array1/
backup/         buffalo_fix.sh* share/          spool/
[root@BUFFALO-4 ~]# ls /mnt/array1/share/
acp_commander/    buff4_public.txt  buff4_share.txt   buff4_web.txt

Checking the file system for errors

As I was able to mount the partition, I did a file system check after unmounting it:

[root@BUFFALO-4 ~]# xfs_repair /dev/md10
Phase 1 - find and verify superblock...
Not enough RAM available for repair to enable prefetching.
This will be _slow_.
You need at least 1227MB RAM to run with prefetching enabled.
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
...
        - agno = 30
        - agno = 31
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
...
        - agno = 30
        - agno = 31
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
doubling cache size to 1024
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
[root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1
[root@BUFFALO-4 ~]# ls /mnt/array1/
backup/         buffalo_fix.sh* share/          spool/

Another reboot, then checking to find out that md10 was still not mounted.
The error in NAS Navigator is: “E14:RAID array 1 could not be mounted. (2022/07/14 12:36:18)”

Time to check ‘dmesg’ again:

md/raid1:md2: active with 1 out of 2 mirrors
md2: detected capacity change from 0 to 1023410176
md: md1 stopped.
md: bind
md/raid1:md1: active with 1 out of 2 mirrors
md1: detected capacity change from 0 to 5114888192
md: md0 stopped.
md: bind
md/raid1:md0: active with 1 out of 2 mirrors
md0: detected capacity change from 0 to 1023868928
 md0: unknown partition table
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md0): using internal journal
EXT3-fs (md0): mounted filesystem with writeback data mode
 md1: unknown partition table
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md1): using internal journal
EXT3-fs (md1): mounted filesystem with writeback data mode
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md1): using internal journal
EXT3-fs (md1): mounted filesystem with writeback data mode
 md2: unknown partition table
Adding 999420k swap on /dev/md2.  Priority:-1 extents:1 across:999420k
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md0): using internal journal
EXT3-fs (md0): mounted filesystem with writeback data mode

The above shows that md0, md1 and md2 went up, but are missing its mirror partition (this from /dev/sda that disappeared).
Further down in dmesg output

md: md10 stopped.
md: bind
md: bind
md/raid0:md10: md_size is 15565748224 sectors.
md: RAID0 configuration for md10 - 1 zone
md: zone0=[sda6/sdb6]
      zone-offset=         0KB, device-offset=         0KB, size=7782874112KB

md10: detected capacity change from 0 to 7969663090688
 md10: unknown partition table
XFS (md10): Mounting Filesystem
XFS (md10): Ending clean mount
XFS (md10): Quotacheck needed: Please wait.
XFS (md10): Quotacheck: Done.
udevd[3963]: starting version 174
md: cannot remove active disk sda6 from md10 ...
[root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1/
[root@BUFFALO-4 ~]# ls -l /mnt/array1/
total 4
drwxrwxrwx    3 root     root            21 Dec 14  2019 backup/
-rwx------    1 root     root           571 Oct 14  2018 buffalo_fix.sh*
drwxrwxrwx    3 root     root            91 Sep 16  2019 share/
drwxr-xr-x    2 root     root             6 Oct 21  2016 spool/

What the h… “cannot remove active disk sda6 from md10”

Checking md raid status

[root@BUFFALO-4 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sdb1[1]
      999872 blocks [2/1] [_U]

md1 : active raid1 sdb2[1]
      4995008 blocks super 1.2 [2/1] [_U]

md2 : active raid1 sdb5[1]
      999424 blocks super 1.2 [2/1] [_U]

unused devices: 
[root@BUFFALO-4 ~]# mdadm --detail /dev/md10
/dev/md10:
        Version : 1.2
  Creation Time : Fri Oct 21 15:58:46 2016
     Raid Level : raid0
     Array Size : 7782874112 (7422.33 GiB 7969.66 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Fri Oct 21 15:58:46 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : LS220D896:10
           UUID : 5ed0c596:60b32df6:9ac4cd3a:59c3ddbc
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       1       8       22        1      active sync   /dev/sdb6

So here, md10 is fully working and md0, md1 and md2 are missing their second device. Simple to correct, just adding them back:

[root@BUFFALO-4 ~]# mdadm --manage /dev/md0 --add /dev/sda1
mdadm: added /dev/sda1
[root@BUFFALO-4 ~]# mdadm --manage /dev/md1 --add /dev/sda2
mdadm: added /dev/sda2
[root@BUFFALO-4 ~]# mdadm --manage /dev/md2 --add /dev/sda5
mdadm: added /dev/sda5
[root@BUFFALO-4 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sda1[0] sdb1[1]
      999872 blocks [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[2] sdb2[1]
      4995008 blocks super 1.2 [2/1] [_U]
      [====>................]  recovery = 24.2% (1212672/4995008) finish=1.2min speed=48506K/sec

md2 : active raid1 sda5[2] sdb5[1]
      999424 blocks super 1.2 [2/1] [_U]
        resync=DELAYED

unused devices:

Some time later, sync was finished, and I rebooted again. Finally, after this reboot /dev/md10 is automatically mounted to /mnt/array1 again.

Problem solved 🙂

smartctl notes

The values of attributes 5, 197 and 198 should be zero for a healthy drive, so one disk in the NAS is actually failing, but the cause of the hiccup (disconnect) was a core dump by smatctl weekly scan.

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       13
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      11

Windows for UNIX users

August 21, 2021August 16, 2022 Super Admin Leave a comment

Command examples written for manipulating files on an old drive with remaining crap from windows.

chown, rm: Delete old windows / program files from second drive

chown -r root:root <Directory>

takeown /F "F:\ProgramData" /A /R /D Y
icacls "F:\ProgramData" /T /grant administrators:F

rm -rf <Directory>
(after chown above)

rd /s /q "F:\ProgramData"

ln
https://www.howtogeek.com/howto/16226/complete-guide-to-symbolic-links-symlinks-on-windows-or-linux/

cp -rp
xcopy <source>\*.* /s/e/f <dest>

Inner secrets of Synology Hybrid RAID (SHR) – Part 2b – My Synology case

August 16, 2021July 5, 2022 Super Admin Leave a comment

At about 30% into the reshaping phase (after the first disk swap), my NAS went unresponsive (disconnected both shell and GUI), and I had to wait all day until I came home and did a hard reset on it and hoped everything went well..

In the meantime, I logged a case to the Synology support. They were not of any direct help, and the hard reset did take the NAS back to continuing the reshaping process.

My case with Synology support

==
2020-12-01 13:51:37
==
Replaced one of the smallest drives in my NAS yesterday (SHR) as a first step for later expansion (I will replace all drives with larger ones before expanding – if possible to delay any automatic expansion until then).

About 80% finished with rebuilding yesterday, but for some reason it started over after the first round.

Today about 30% finished when I lost the connection to the NAS (over ssh and the web interface). It does not auto-reboot and does not respond to ping.

To lessen the risk of data loss, what should my first step be ? Can I just pull the plug and hard-reboot the NAS with the current disks mounted (14TB, 3TB, 3TB, 8TB, 8TB in a SHR config), or is it better to replace or remove the disk that I recently replaced (in slot 1: 14TB in place of the previous still untouched 3TB) ?

What are the steps to getting the volume back online if it does not mount automatically ?

As the NAS is down, I am not able to upload any logs, but attached is the rebuild status before the crash.

==
2020-12-01 15:28:58
Synology response (besides the auto response “send us logs”)
Not useful at all, exactly what I did, “Mark” who replied did not read anything..
==
Hello,

Thank you for contacting Synology.

If you wish to replace a drive in your unit, please perform these steps one by one allowing for the repair to complete before replacing any further drives.
1. Pull out the drive in question.
2. Insert a replacement drive.
3. Proceed to the Storage Manager > Storage Pool > select the volume in question and click “Manage/Action”
4. Run through the wizard to repair the volume in question with the replacement drive.
5. Once complete, proceed to the Storage Manager > Volume and Configure/Edit the volume to configure the volume to have additional size.
Please see the link below for more help.
https://www.synology.com/en-uk/knowledgebase/DSM/help/DSM/StorageManager/storage_pool_expand_replace_disk

Please bare in mind that you benefit from the additional space from the drives you will need to replace at least 2 drives for larger ones in RAID 5/SHR or 3 drives in RAID6/SHR2.
You can see the type of RAID used via – DSM > Storage Manager > Storage Pool.

If you have any further questions please do not hesitate to get in touch.

Best Regards,
Mark

==
2020-12-01 16:02:14
My reply
==
Ok, so I restart the problem description then:

I did (yesterday):
0. Power down Synology
1. Pull out the drive in question.
2. Insert a replacement drive.
3. Proceed to the Storage Manager > Storage Pool > select the volume in question and click “Manage/Action”
4. Run through the wizard to repair the volume in question with the replacement drive.

THEN, today:
4b. Today about 30% finished when I lost the connection to the NAS (over ssh and the web interface). It does not auto-reboot and does not respond to ping.

SO what now ?
As the NAS is unresponsive I will never reach step 5:

What are the steps to getting the volume back online if it does not mount automatically ?

Also, is there an option to DELAY the expansion until all drives have been replaces, as you replied changeing the first drive will not expand the volume, but I’m not there yet since I’m stuck in a crash (unresponsive system)

==
2020-12-02 23:25:46
My reply on Synologys’ suggestion to collect logs using the support centre
==
How do I launch “Support Center” on the device when it is unresponsive (which was my initial question – what to do when it hangs in the middle of repairing/reshaping) ?

I forced it off and restarted and hoped for the best – reshaping continued and the second disk is now in reshaping mode.

My other question has not yet been answered:

Is it possible to delay the time consuming step of reshaping until all disks have been replaced ?

Initial configuration: 3TB 3TB 3TB 8TB 8TB

After replacement of the first disk: 14TB 3TB 3TB 8TB 8TB, after reshaping the first disk got a partition to match the 8TB disks.

After replacement of the second disk: 14TB 14TB 3TB 8TB 8TB, while reshaping again, now disk 1 and 2 looks similar with one partition matching the largest of the remaining 3TB disk, one matching the largest on the 8TB disks and the remainder (roughly about 6TB) the same on both 14TB disks.

When replacing the third 3TB disk, I assume the following would happen:
(14TB 14TB 14TB 8TB 8TB)

On the first and second disk, the (about) 3TB partition will be replaced with a partition to match the 8TB disks. Then the remainder (3 disks with 6TB unallocated space) will be used for another raid5 (after yet another reshape)

So my question again; is it possible to delay reshaping until I have had all the disks replaced. I understand that the “rebuild” is needed in between every replacement, but “reshape” should be needed only once.

==
2020-12-03 12:19:07
Synology response
==
Hello,

Thank you for the reply.

I’m afraid you cannot delay or prevent this process, once it starts it needs to run until fruition.

I would suggest to leave this running for now, if the volume does crash fully in the mean time I can take a look at what we can do to recover the volume, but there is not much I can do currently I’m afraid.

If you have any further question please do not hesitate to get in touch.

Best Regards,
Mark
==

The crash

https://unix.stackexchange.com/questions/299981/recover-from-raid-5-to-raid-6-reshape-and-crash-mdadm-reports-0k-sec-rebuild
https://www.google.com/search?q=restart+synology+while+rebuilding
https://community.synology.com/enu/forum/17/post/20414

General SHR and mdraid links

https://www.youtube.com/results?search_query=synology+shr
https://bobcares.com/blog/raid-resync/
https://www.google.com/search?q=mdraid+reshape

Inner secrets of Synology Hybrid RAID (SHR) – Part 2

December 14, 2020June 10, 2022 Super Admin Leave a comment

Changing the first disk and my case to Synology support

Now it was time to replace the first disk. As I assumed this would never go wrong (!) and did not plan to document the upgrade, I did not take out any information about the partitions, mdraids and volumes during this first disk swap.

The instructions from Synology are quite good for this (until something breaks down):
Replace Drives to Expand Storage Capacity

Basically it says: replace the disks one by one, start with the smallest and wait until completion before replacing the next.

For the first disk swap, I actually shut down my DS1517 before replacing the disk (many models, including DS1517, supports hot swapping the disks). When the disk was replaced and I powered up the DS1517, and as expected I got the “RAID degraded” beep.
Did a check that the new drive was recognized, and then started the repair of the storage pool. As this will usually take many hours, and this was done in the evening, I have no idea of the actual time spent for repairing (rebuilding) the pool. This was about 90% finished when I stopped looking at the status around midnight that day.

The next day, I see that it had “restarted” (lower percentage than yesterday), but this is actually the next step that is initiated directly after repairing the pool. It’s called “reshaping” and during that process other mdraids are changed and adjusted (if possible) against the new disk.

Changes during the first disk swap

These are only assumptions, because I did not take enough info in between swapping the disk and until about a third into reshaping.

At the point of changing the first disk (refer to the previous part of my article), my storage pool/volume consisted of two mdraids joined together:
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

When I pulled the first drive (3TB) and replaced it with a 14TB drive, I assume the partition table on that disk was created like this (status pulled from the mid of reshaping after first disk swap, so I’m pretty sure this is correct):

/dev/sda1                  2048         4982527         4980480  fd
/dev/sda2               4982528         9176831         4194304  fd
/dev/sda5               9453280      5860326239      5850872960  fd
/dev/sda6            5860342336     15627846239      9767503904  fd

sda5 was matched up with the size of the old sda5 (and the ‘5’-partitions on the other disks)
sda6 was also created in either the step before rebuild, or right before reshaping (this partition match the size with the ‘6’-partitions on sdd and sde.
Because the (14T) disk is larger than the previous largest (8TB) one, there are some non-partitioned wasted space (about 5.8TB which will come into use after the next disk swap).

Reshaping

Again, I have not taken any full status dumps so that my information can be confirmed, but this is what I see afterwards, and adding my guesses to it because of the better logging of later disk swaps.

After the storage pool was repaired, reshaping started automatically. During this step, the RAID1 consisting of sdd6 and sde6 (md3) were changed into RAID5 consisting of sda6, sdd6 and sde6.

At about 30% into the reshaping phase, my NAS went unresponsive (disconnected both shell and GUI), and I had to wait all day until I came home and did a hard reset on it and hoped everything went well..

In the meantime, I logged a case to the Synology support (see “Part 2b” of this article). They were not of any direct help, and the hard reset did take the NAS back to continuing the reshaping process.

Inner secrets of Synology Hybrid RAID (SHR) – Part 1

December 5, 2020May 3, 2025 Super Admin Leave a comment

I’m currently taking notes about this process again, now when replacing a broken disk and expanding the storage of my “DS918” Fakenology (a Buffalo). This unit got the three remaining (working) disks from my DS1517 + the fourth one which failed a long time ago (nothing of importance that has not been backed up on this unit). The difference with this unit is that it’s a 4-drive unit, giving me only about 8TB usable space with these 3TB disks.
My notes for the disk replacements will be in another post (secret for now), but if I find out something new about the process, I will also update the documentation here.

Inner workings of Synology Hybrid RAID

Maybe a too much promising title for this post, but this is my guesswork on how SHR works when replacing drives. If anyone have a spare DS1517 (or later device, with at least 4 slots) to donate, I will investigate this further, cannot afford to do it on my primary NAS because of risk of loosing data – and now even not possible without upgrading the disks again to larger ones).

I will also post here my case (more or less in full) sent to Synology when the NAS got unresponsive (crashed) during the rebuild/reshaping process.

What is Synology Hyrbrid RAID ?

This is in fact the only thing Synology themselves have briefly explained in their documentation:
What is Synology Hybrid RAID (SHR)

My short explanation is that it is a software RAID that is able to maximize the utilization of mixed sized hard drives. For simplicity, Synology illustrates this with drives varying of 500GB to 2TB (in 500GB increments), possibly fooling some people to think that the disks are always split into 500GB partitions.

My findings while expanding my DS1517 (from 3TB, 3TB, 3TB, 8TB, 8TB to all 14TB) is that the remaining space of the drives are splitted in as few parts as possible to obtain the maximum available space (after setting aside about 2.5GB for the DSM (operating system) and 2GB for swap).

Replacing disks and rebuilding the RAID

Before I replaced the first disk, I actually forgot to view and save down the info about the partitions, mdraid volumes and logical volumes (I might have that somewhere else, but I will not look for it now). Based on how it looked after the first disk had been replaced, and the rebuild was done (in the process of reshaping) it should have been something like this:

# sfdisk -l
/dev/sda1                  2048         4982527         4980480  83
/dev/sda2               4982528         9176831         4194304  82
/dev/sda5               9453280      5860326239      5850872960  fd

/dev/sdb1                  2048         4982527         4980480  83
/dev/sdb2               4982528         9176831         4194304  82
/dev/sdb5               9453280      5860326239      5850872960  fd

/dev/sdc1                  2048         4982527         4980480  83
/dev/sdc2               4982528         9176831         4194304  82
/dev/sdc5               9453280      5860326239      5850872960  fd

/dev/sdd1                  2048         4982527         4980480  fd
/dev/sdd2               4982528         9176831         4194304  fd
/dev/sdd5               9453280      5860326239      5850872960  fd
/dev/sdd6            5860342336     15627846239      9767503904  fd

/dev/sde1                  2048         4982527         4980480  fd
/dev/sde2               4982528         9176831         4194304  fd
/dev/sde5               9453280      5860326239      5850872960  fd
/dev/sde6            5860342336     15627846239      9767503904  fd

Note: The partition types for sd[a-c][1-2] seems incorrect as these where changed to “fd” later on during the process, or it might have been something changed by Synology on later DSM versions (but not at the point of updating DSM).

Partitions 1-2 are the system and swap partitions on all the drives, sized 2.5GB respectively 2GB.
Partition 5 is a part of the storage space available in the volume on the NAS. In this case it is about 2.9TB in size (the maximum available on the smallest disks).
Partition 6 is the second part of the total storage space. At this time those partitions are about 4.8TB in size.

mdraid volumes

Out of the partitions above, the Synology creates these mdraid volumes:
md0: RAID 1 of sda1, sdb1, sdc1, sdd1, sde1: total size 2.5GB used for DSM
md1: RAID 1 of sda1, sdb2, sdc2, sdd2, sde2: total size 2GB used for swap
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

LVM logical disk

md2 and md3 are joined together into a logical disk using LVM, which gives about 16.5TB space in total for the storage volume on the NAS (Synology DSM says 15.5TB, but the difference is only because of how I estimate the space and how Synology does – I just take the block count, divide by two, then use a one decimal precision – which is adequate enough for this description).

DSM Storage Manager before replacing the first disk

… to be continued in part 2 …

Synology NAS – Those @eaDir folders

March 23, 2020March 23, 2020 Super Admin 2 Comments

About those @eaDir folders taking space everywhere

Just a few forum and blog links

Synology official forum
@eaDir folders again [Nov 10 2015]
How to prevent @eaDir everywhere [Oct 04, 2008]
How to Delete & Prevent @eaDir Folders [May 16, 2014]
@eadir troublesome [May 07, 2013]
Synology NAS absolutely worthless to photographers, casual users for photo backups (indexing using 50% disk space) [Jan 09, 2019]

Synology Tips & Tricks
Disabling File Indexing on Default Media Folders
How to Find and Remove @eaDir Directories on Synology NAS

Reddit Synology
preventing @eaDir from being created

Other
@eaDir folder in every shares directories. How to remove? [Jan 11, 2017]
Synology NAS and those pesky @eaDir folders
@eaDir synology [nl]
Disable indexing and generation of @eaDir directories on Synology NAS

Synology NAS – PHP

August 11, 2019March 21, 2020 Super Admin Leave a comment

Enabling extensions in PHP CLI

https://blackswan.ch/archives/594

Become root
Find out which configuration file is used:
php --ini

I get:
Configuration File (php.ini) Path: /usr/local/etc/php70 Loaded Configuration File: /usr/local/etc/php70/php.ini

Edit the php.ini file, check/change extensions dir to where it is located. If PHP was installed through DSM it should be something like ‘/volume1/\@appstore/PHP7.0’
extension_dir = "/volume1/@appstore/PHP7.0/usr/local/lib/php70/modules/"

Enable the extensions you wish to use:
extension = mysqli.so extension = phar.so extension = openssl.so extension = zip.so extension = curl.so

Installing Composer

Composer documentationhttps://medium.com/unhandled-code/installing-composer-on-synology-6-1-4-eebd1a1c4891

Become root
Enable the extensions as above (phar, openssl and zip are needed for Composer).

Get and install Composer:
cd /usr/local/bin curl -s http://getcomposer.org/installer | php70

Create shortcut script ‘composer’ in /usr/local/bin:
#!/bin/bash php70 /usr/local/bin/composer.phar $*

Set permissions:
chmod --reference=composer.phar composer

Test:
composer --version

Synology NAS – Entware

December 28, 2018December 28, 2018 Super Admin Leave a comment

Installing Entware-ng on Synology NAS
https://github.com/Entware/Entware-ng/wiki/Install-on-Synology-NAS

Using GCC (native compilation)
https://github.com/Entware/Entware-ng/wiki/Using-gcc-%28native-compilation%29

Synology NAS – Add disk and include in md0+md1

August 3, 2018December 28, 2018 Super Admin Leave a comment

Adding new disks, including in mirroring of system partitions (md0 and md1)

GNU parted documentation

Add the new disks as hot spare, then remove them (will create the disklabel, otherwise just do this using parted)

Check the partition table of a disk already used for system and swap. Find it by checking mdstat (cat /proc/mdstat)

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sdd2[3] sdb2[1] sdc2[2] sda2[0]
2097088 blocks [5/4] [UUUU_]

md0 : active raid1 sdd1[3] sdb1[1] sda1[0] sdc1[2]
2490176 blocks [5/4] [UUUU_]

unused devices:

# parted /dev/sda
(parted) unit s
(parted) p
Model: ATA ST3000DM001-1CH1 (scsi)
Disk /dev/sda: 5860533168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 2048s 4982527s 4980480s ext4 raid
2 4982528s 9176831s 4194304s linux-swap(v1) raid

(parted) q

Run parted on the new disk

# parted /dev/sde
(parted) unit s
(parted) p
Model: ATA ST3000DM001-1CH1 (scsi)
Disk /dev/sda: 5860533168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags

(parted) mkpart system ext4 2048 4982527
(parted) mkpart swap linux-swap 4982528 9176831
(parted) p
...
Number Start End Size File system Name Flags
1 2048s 4982527s 4980480s ext4 system
2 4982528s 9176831s 4194304s linux-swap(v1) swap
(parted) q

Add partitions to system and swap

# mdadm --add /dev/md0 /dev/sde1
# mdadm --add /dev/md1 /dev/sde2

Check rebuild status using ‘cat /proc/mdstat’

Final result should be something like:


# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sde2[4] sdd2[3] sdb2[1] sdc2[2] sda2[0]
2097088 blocks [5/5] [UUUUU]

md0 : active raid1 sde1[4] sdd1[3] sdb1[1] sda1[0] sdc1[2]
2490176 blocks [5/5] [UUUUU]

unused devices: