Buffalo LS220D – lost drive (hiccup)

Yesterday I noticed that the LEDs were blinking amber on one of my LS220D boxes. My initial thought was that a disk had failed (it’s just a backup of my backup). Checked with the “NAS Navigator” application, and it stated that it was unable to mount the data array (md10) (I have not logged the full error message here, as I continued the attempts to solve the situation).

dmesg output

I logged in as root (see other posts) to check what had gone wrong.
‘dmesg’ revealed that a disk had been lost during smartctl (multiple repeats of the below content):

program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Unable to handle kernel NULL pointer dereference at virtual address 000000a4
pgd = c27d4000
[000000a4] *pgd=0fe93831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#50]
Modules linked in: usblp usb_storage ohci_hcd ehci_hcd xhci_hcd usbcore usb_common
CPU: 0    Tainted: G      D       (3.3.4 #1)
PC is at sg_scsi_ioctl+0xe0/0x374
LR is at sg_scsi_ioctl+0xcc/0x374
pc : []    lr : []    psr: 60000013
sp : cafb5d58  ip : 00000000  fp : 00000024
r10: 00000006  r9 : c41d1860  r8 : 00000012
r7 : 00000000  r6 : 00000024  r5 : beee5550  r4 : beee5548
r3 : cafb4000  r2 : cafb5d58  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 027d4019  DAC: 00000015
Process smartctl (pid: 1027, stack limit = 0xcafb42e8)
Stack: (0xcafb5d58 to 0xcafb6000)
5d40:          c057c2b8 60000013
5d60: c21f27f0 beee5548 c2274800 0000005d cafb5de4 00000000 c998edcc 00000004
5d80: c99800c8 c00a6e64 c9d034e0 00000028 c998edc8 00000029 c27d4000 c00a8fc0
5da0: 00000000 00000000 00000000 c998ed08 c2274800 56e6994b beee5a48 beee5548
5dc0: 0000005d 0000005d c2274800 c21f27f0 cafb4000 56e6994b beee7e34 beee5548
5de0: 0000005d 0000005d c2274800 c21f27f0 cafb4000 ffffffed beee7e34 c0245494
5e00: 00000053 fffffffd 00002006 00000024 beee5af8 beee5ae0 beee5ab8 00004e20
5e20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5e40: c27d4000 00000000 c27d4000 cb0023e0 c87f3d30 00000028 beee7e34 c00be67c
5e60: c27d4000 00000028 cafb5fb0 56e6994b 00000001 0000005d c8014040 beee5548
5e80: 0000005d c0245530 beee5548 0000005d 00000001 00000001 beee5548 c222c000
5ea0: c8014040 c02a6284 beee5548 beee5548 c8014040 c2274800 00000001 0000005d
5ec0: 00000000 c02422a0 beee5548 c0242be0 00000000 cafb5f78 00000001 c2949000
5ee0: ffffff9c c8014040 00000000 00000007 c054ff34 00039db8 cafb5fb0 beee5548
5f00: c21e0470 00000003 00000003 c000e3c8 cafb4000 00000000 beee7e34 c00e0060
5f20: 00000000 00000000 cf34be00 2c1b812a 5e6a6136 2c1b812a cf1a2548 00000000
5f40: 00000000 00000000 00000003 00000003 c95a2ec0 c2949000 c95a2ec8 00000020
5f60: 00000003 c95a2ec0 beee5548 00000001 00000003 c000e3c8 cafb4000 00000000
5f80: beee7e34 c00e010c 00000003 00000000 beee5548 beee5548 0006d614 beee5a8c
5fa0: 00000036 c000e200 beee5548 0006d614 00000003 00000001 beee5548 00000000
5fc0: beee5548 0006d614 beee5a8c 00000036 00000000 00000003 00000006 beee7e34
5fe0: beee5ae0 beee5540 00039688 b6da5cec 80000010 00000003 cfcfcfcf 00000014
[] (sg_scsi_ioctl+0xe0/0x374) from [] (scsi_cmd_ioctl+0x39c/0x3fc)
[] (scsi_cmd_ioctl+0x39c/0x3fc) from [] (scsi_cmd_blk_ioctl+0x3c/0x44)
[] (scsi_cmd_blk_ioctl+0x3c/0x44) from [] (sd_ioctl+0x8c/0xb8)
[] (sd_ioctl+0x8c/0xb8) from [] (__blkdev_driver_ioctl+0x20/0x28)
[] (__blkdev_driver_ioctl+0x20/0x28) from [] (blkdev_ioctl+0x670/0x6c0)
[] (blkdev_ioctl+0x670/0x6c0) from [] (do_vfs_ioctl+0x49c/0x514)
[] (do_vfs_ioctl+0x49c/0x514) from [] (sys_ioctl+0x34/0x58)
[] (sys_ioctl+0x34/0x58) from [] (ret_fast_syscall+0x0/0x30)
Code: e1a0200d e7d3a2a8 e3c23d7f e3c3303f (e1c0aab4)
---[ end trace 660c9d3c9b4a9034 ]---

fdisk output

Using ‘fdisk’ (incorrect for this NAS), I listed the partitions on /dev/sda and /dev/sdb (nothing about /dev/sda):

[root@BUFFALO-4 ~]# fdisk -l /dev/sda
[root@BUFFALO-4 ~]# fdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes
255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1  4294967295  2147483647+  ee  GPT
Partition 1 does not start on physical sector boundary.

smartctl output

[root@BUFFALO-4 ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device

[root@BUFFALO-4 ~]# smartctl --all /dev/sda
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

Segmentation fault

[root@BUFFALO-4 ~]# smartctl --all /dev/sdb
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD40EZRX-22SPEB0
Serial Number:    WD-WCC4E1UUZH74
LU WWN Device Id: 5 0014ee 2b768eeb4
Firmware Version: 80.00A80
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 14 12:10:33 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
          was completed without error.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
          without error or no self-test has ever
          been run.
Total time to complete Offline
data collection: (52320) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
          General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 523) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
          SCT Feature Control supported.
          SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   196   187   021    Pre-fail  Always       -       7183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   055   054   000    Old_age   Always       -       33525
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       36
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       7866202
194 Temperature_Celsius     0x0022   113   103   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status   Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Nothing more to do than to reboot.

After reboot

The storage array was still not mounted, smartctl could now find /dev/sda:

[root@BUFFALO-4 ~]# df -h
Filesystem Size      Used Available Use% Mounted on
udev      10.0M         0     10.0M   0% /dev
/dev/md1   4.7G    766.8M      3.7G  17% /
tmpfs    121.1M     84.0K    121.0M   0% /tmp
/dev/ram1 15.0M    100.0K     14.9M   1% /mnt/ram
/dev/md0 968.7M    216.4M    752.2M  22% /boot

[root@BUFFALO-4 ~]# smartctl --all /dev/sda
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.3.4] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD40EZRX-22SPEB0
Serial Number:    WD-WCC4E1XUDU4T
LU WWN Device Id: 5 0014ee 20cbde2d7
Firmware Version: 80.00A80
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 14 12:13:56 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
          was suspended by an interrupting command from host.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
          without error or no self-test has ever
          been run.
Total time to complete Offline
data collection: (52560) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
          General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 526) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
          SCT Feature Control supported.
          SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   250   204   021    Pre-fail  Always       -       4500
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       38
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   053   051   000    Old_age   Always       -       34713
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       30
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       7823449
194 Temperature_Celsius     0x0022   122   106   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       13
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       11
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       14

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status   Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Partition table after reboot

Now when both disks are in place again, I ran the (correct) command to list the partitions on all drives:

[root@BUFFALO-4 ~]# parted -l /dev/sdb
Model: ATA WDC WD40EZRX-22S (scsi)
Disk /dev/sda: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  1024MB  1024MB  ext3         primary
 2      1024MB  6144MB  5119MBprimary
 3      6144MB  6144MB  394kB primary  bios_grub
 4      6144MB  6144MB  512B  primary
 5      6144MB  7168MB  1024MBprimary
 6      7168MB  3992GB  3985GBprimary


Model: ATA WDC WD40EZRX-22S (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  1024MB  1024MB  ext3         primary
 2      1024MB  6144MB  5119MBprimary
 3      6144MB  6144MB  394kB primary  bios_grub
 4      6144MB  6144MB  512B  primary
 5      6144MB  7168MB  1024MBprimary
 6      7168MB  3992GB  3985GBprimary

...

Looks ok, so I tried mounting /dev/md10:

root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1/
[root@BUFFALO-4 ~]# df -h
Filesystem Size      Used Available Use% Mounted on
udev      10.0M         0     10.0M   0% /dev
/dev/md1   4.7G    766.8M      3.7G  17% /
tmpfs    121.1M     84.0K    121.0M   0% /tmp
/dev/ram1 15.0M    100.0K     14.9M   1% /mnt/ram
/dev/md0 968.7M    216.4M    752.2M  22% /boot
/dev/md10  7.2T      5.7T      1.6T  79% /mnt/array1
[root@BUFFALO-4 ~]# ls /mnt/array1/
backup/         buffalo_fix.sh* share/          spool/
[root@BUFFALO-4 ~]# ls /mnt/array1/share/
acp_commander/    buff4_public.txt  buff4_share.txt   buff4_web.txt

Checking the file system for errors

As I was able to mount the partition, I did a file system check after unmounting it:

[root@BUFFALO-4 ~]# xfs_repair /dev/md10
Phase 1 - find and verify superblock...
Not enough RAM available for repair to enable prefetching.
This will be _slow_.
You need at least 1227MB RAM to run with prefetching enabled.
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
...
        - agno = 30
        - agno = 31
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
...
        - agno = 30
        - agno = 31
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
doubling cache size to 1024
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
[root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1
[root@BUFFALO-4 ~]# ls /mnt/array1/
backup/         buffalo_fix.sh* share/          spool/

Another reboot, then checking to find out that md10 was still not mounted.
The error in NAS Navigator is: “E14:RAID array 1 could not be mounted. (2022/07/14 12:36:18)”

Time to check ‘dmesg’ again:

md/raid1:md2: active with 1 out of 2 mirrors
md2: detected capacity change from 0 to 1023410176
md: md1 stopped.
md: bind
md/raid1:md1: active with 1 out of 2 mirrors
md1: detected capacity change from 0 to 5114888192
md: md0 stopped.
md: bind
md/raid1:md0: active with 1 out of 2 mirrors
md0: detected capacity change from 0 to 1023868928
 md0: unknown partition table
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md0): using internal journal
EXT3-fs (md0): mounted filesystem with writeback data mode
 md1: unknown partition table
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md1): using internal journal
EXT3-fs (md1): mounted filesystem with writeback data mode
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md1): using internal journal
EXT3-fs (md1): mounted filesystem with writeback data mode
 md2: unknown partition table
Adding 999420k swap on /dev/md2.  Priority:-1 extents:1 across:999420k
kjournald starting.  Commit interval 5 seconds
EXT3-fs (md0): using internal journal
EXT3-fs (md0): mounted filesystem with writeback data mode

The above shows that md0, md1 and md2 went up, but are missing its mirror partition (this from /dev/sda that disappeared).
Further down in dmesg output

md: md10 stopped.
md: bind
md: bind
md/raid0:md10: md_size is 15565748224 sectors.
md: RAID0 configuration for md10 - 1 zone
md: zone0=[sda6/sdb6]
      zone-offset=         0KB, device-offset=         0KB, size=7782874112KB

md10: detected capacity change from 0 to 7969663090688
 md10: unknown partition table
XFS (md10): Mounting Filesystem
XFS (md10): Ending clean mount
XFS (md10): Quotacheck needed: Please wait.
XFS (md10): Quotacheck: Done.
udevd[3963]: starting version 174
md: cannot remove active disk sda6 from md10 ...
[root@BUFFALO-4 ~]# mount /dev/md10 /mnt/array1/
[root@BUFFALO-4 ~]# ls -l /mnt/array1/
total 4
drwxrwxrwx    3 root     root            21 Dec 14  2019 backup/
-rwx------    1 root     root           571 Oct 14  2018 buffalo_fix.sh*
drwxrwxrwx    3 root     root            91 Sep 16  2019 share/
drwxr-xr-x    2 root     root             6 Oct 21  2016 spool/

What the h… “cannot remove active disk sda6 from md10”

Checking md raid status

[root@BUFFALO-4 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sdb1[1]
      999872 blocks [2/1] [_U]

md1 : active raid1 sdb2[1]
      4995008 blocks super 1.2 [2/1] [_U]

md2 : active raid1 sdb5[1]
      999424 blocks super 1.2 [2/1] [_U]

unused devices: 
[root@BUFFALO-4 ~]# mdadm --detail /dev/md10
/dev/md10:
        Version : 1.2
  Creation Time : Fri Oct 21 15:58:46 2016
     Raid Level : raid0
     Array Size : 7782874112 (7422.33 GiB 7969.66 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Fri Oct 21 15:58:46 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : LS220D896:10
           UUID : 5ed0c596:60b32df6:9ac4cd3a:59c3ddbc
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       1       8       22        1      active sync   /dev/sdb6

So here, md10 is fully working and md0, md1 and md2 are missing their second device. Simple to correct, just adding them back:

[root@BUFFALO-4 ~]# mdadm --manage /dev/md0 --add /dev/sda1
mdadm: added /dev/sda1
[root@BUFFALO-4 ~]# mdadm --manage /dev/md1 --add /dev/sda2
mdadm: added /dev/sda2
[root@BUFFALO-4 ~]# mdadm --manage /dev/md2 --add /dev/sda5
mdadm: added /dev/sda5
[root@BUFFALO-4 ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : active raid0 sda6[0] sdb6[1]
      7782874112 blocks super 1.2 512k chunks

md0 : active raid1 sda1[0] sdb1[1]
      999872 blocks [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[2] sdb2[1]
      4995008 blocks super 1.2 [2/1] [_U]
      [====>................]  recovery = 24.2% (1212672/4995008) finish=1.2min speed=48506K/sec

md2 : active raid1 sda5[2] sdb5[1]
      999424 blocks super 1.2 [2/1] [_U]
        resync=DELAYED

unused devices: 

Some time later, sync was finished, and I rebooted again. Finally, after this reboot /dev/md10 is automatically mounted to /mnt/array1 again.

Problem solved 🙂

smartctl notes

The values of attributes 5, 197 and 198 should be zero for a healthy drive, so one disk in the NAS is actually failing, but the cause of the hiccup (disconnect) was a core dump by smatctl weekly scan.

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       13
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      11

Windows for UNIX users

Command examples written for manipulating files on an old drive with remaining crap from windows.

chown, rm: Delete old windows / program files from second drive

chown -r root:root <Directory>

takeown /F "F:\ProgramData" /A /R /D Y
icacls "F:\ProgramData" /T /grant administrators:F

rm -rf <Directory>
(after chown above)

rd /s /q "F:\ProgramData"

ln
https://www.howtogeek.com/howto/16226/complete-guide-to-symbolic-links-symlinks-on-windows-or-linux/

cp -rp
xcopy <source>\*.* /s/e/f <dest>

Inner secrets of Synology Hybrid RAID (SHR) – Part 2b – My Synology case

At about 30% into the reshaping phase (after the first disk swap), my NAS went unresponsive (disconnected both shell and GUI), and I had to wait all day until I came home and did a hard reset on it and hoped everything went well..

In the meantime, I logged a case to the Synology support. They were not of any direct help, and the hard reset did take the NAS back to continuing the reshaping process.

My case with Synology support

==
2020-12-01 13:51:37
==
Replaced one of the smallest drives in my NAS yesterday (SHR) as a first step for later expansion (I will replace all drives with larger ones before expanding – if possible to delay any automatic expansion until then).

About 80% finished with rebuilding yesterday, but for some reason it started over after the first round.

Today about 30% finished when I lost the connection to the NAS (over ssh and the web interface). It does not auto-reboot and does not respond to ping.

To lessen the risk of data loss, what should my first step be ? Can I just pull the plug and hard-reboot the NAS with the current disks mounted (14TB, 3TB, 3TB, 8TB, 8TB in a SHR config), or is it better to replace or remove the disk that I recently replaced (in slot 1: 14TB in place of the previous still untouched 3TB) ?

What are the steps to getting the volume back online if it does not mount automatically ?

As the NAS is down, I am not able to upload any logs, but attached is the rebuild status before the crash.

==
2020-12-01 15:28:58
Synology response (besides the auto response “send us logs”)
Not useful at all, exactly what I did, “Mark” who replied did not read anything..
==
Hello,

Thank you for contacting Synology.

If you wish to replace a drive in your unit, please perform these steps one by one allowing for the repair to complete before replacing any further drives.
1. Pull out the drive in question.
2. Insert a replacement drive.
3. Proceed to the Storage Manager > Storage Pool > select the volume in question and click “Manage/Action”
4. Run through the wizard to repair the volume in question with the replacement drive.
5. Once complete, proceed to the Storage Manager > Volume and Configure/Edit the volume to configure the volume to have additional size.
Please see the link below for more help.
https://www.synology.com/en-uk/knowledgebase/DSM/help/DSM/StorageManager/storage_pool_expand_replace_disk

Please bare in mind that you benefit from the additional space from the drives you will need to replace at least 2 drives for larger ones in RAID 5/SHR or 3 drives in RAID6/SHR2.
You can see the type of RAID used via – DSM > Storage Manager > Storage Pool.

If you have any further questions please do not hesitate to get in touch.

Best Regards,
Mark

==
2020-12-01 16:02:14
My reply
==
Ok, so I restart the problem description then:

I did (yesterday):
0. Power down Synology
1. Pull out the drive in question.
2. Insert a replacement drive.
3. Proceed to the Storage Manager > Storage Pool > select the volume in question and click “Manage/Action”
4. Run through the wizard to repair the volume in question with the replacement drive.

THEN, today:
4b. Today about 30% finished when I lost the connection to the NAS (over ssh and the web interface). It does not auto-reboot and does not respond to ping.

SO what now ?
As the NAS is unresponsive I will never reach step 5:

To lessen the risk of data loss, what should my first step be ? Can I just pull the plug and hard-reboot the NAS with the current disks mounted (14TB, 3TB, 3TB, 8TB, 8TB in a SHR config), or is it better to replace or remove the disk that I recently replaced (in slot 1: 14TB in place of the previous still untouched 3TB) ?

What are the steps to getting the volume back online if it does not mount automatically ?

Also, is there an option to DELAY the expansion until all drives have been replaces, as you replied changeing the first drive will not expand the volume, but I’m not there yet since I’m stuck in a crash (unresponsive system)

==
2020-12-02 23:25:46
My reply on Synologys’ suggestion to collect logs using the support centre
==
How do I launch “Support Center” on the device when it is unresponsive (which was my initial question – what to do when it hangs in the middle of repairing/reshaping) ?

I forced it off and restarted and hoped for the best – reshaping continued and the second disk is now in reshaping mode.

My other question has not yet been answered:

Is it possible to delay the time consuming step of reshaping until all disks have been replaced ?

Initial configuration: 3TB 3TB 3TB 8TB 8TB

After replacement of the first disk: 14TB 3TB 3TB 8TB 8TB, after reshaping the first disk got a partition to match the 8TB disks.

After replacement of the second disk: 14TB 14TB 3TB 8TB 8TB, while reshaping again, now disk 1 and 2 looks similar with one partition matching the largest of the remaining 3TB disk, one matching the largest on the 8TB disks and the remainder (roughly about 6TB) the same on both 14TB disks.

When replacing the third 3TB disk, I assume the following would happen:
(14TB 14TB 14TB 8TB 8TB)

On the first and second disk, the (about) 3TB partition will be replaced with a partition to match the 8TB disks. Then the remainder (3 disks with 6TB unallocated space) will be used for another raid5 (after yet another reshape)

So my question again; is it possible to delay reshaping until I have had all the disks replaced. I understand that the “rebuild” is needed in between every replacement, but “reshape” should be needed only once.

==
2020-12-03 12:19:07
Synology response
==
Hello,

Thank you for the reply.

I’m afraid you cannot delay or prevent this process, once it starts it needs to run until fruition.

I would suggest to leave this running for now, if the volume does crash fully in the mean time I can take a look at what we can do to recover the volume, but there is not much I can do currently I’m afraid.

If you have any further question please do not hesitate to get in touch.

Best Regards,
Mark
==

The crash

https://unix.stackexchange.com/questions/299981/recover-from-raid-5-to-raid-6-reshape-and-crash-mdadm-reports-0k-sec-rebuild
https://www.google.com/search?q=restart+synology+while+rebuilding
https://community.synology.com/enu/forum/17/post/20414

General SHR and mdraid links

https://www.youtube.com/results?search_query=synology+shr
https://bobcares.com/blog/raid-resync/
https://www.google.com/search?q=mdraid+reshape

Running sites on different versions of PHP on the same server

This quick guide explains the steps needed to be able to run different versions of PHP on the same server (different virtual hosts or even different folders within the same site).

Prepare

If the ‘add-apt-repository’ command is missing, you need to install the package “software-properties-common” first:

apt install -y software-properties-common

Add the ondrej/php repository to your system:

add-apt-repository ppa:ondrej/php

apt update, upgrade:

apt update
apt upgrade

Install Apache fastCGI module:

apt install -y libapache2-mod-fcgid

For each version of PHP

As of 1 Sep 2022, PHP versions 8.0 (php8.0) and 8.1 (php8.1) are those with active support and updates. PHP version 7.4 (php7.4) will continue to receive critical updates until 28 Nov 2022. See Supported PHP Versions.

Install required packages

apt install -y php7.4 php7.4-fpm php7.4-cgi php7.4-mysql libapache2-mod-php7.4

Install other wanted modules for the same PHP version, for example these commonly used:

apt install -y php7.4-curl php7.4-gd php7.4-imagick php7.4-intl php7.4-mbstring php7.4-readline php7.4-json
apt install -y php7.4-soap php7.4-xml php7.4-xmlrpc php7.4-zip

Start the php-fpm service:

systemctl start php7.4-fpm

Check that the service is running:

systemctl status php7.4-fpm

For each version, also any custom configuration in php.ini has to be duplicated. In Ubuntu the php.ini files are located in the subfolders of /etc/php/x.x/ (one subfolder for each run environment, “apache2”, “cgi”, “cli”, “fpm”).

With the understanding of the above, here are the needed commands for installing the required and extra packages for PHP 7.4, 8.0, 8.1 and 8.2:

#
# 7.4
#
apt install -y php7.4 php7.4-fpm php7.4-cgi php7.4-mysql libapache2-mod-php7.4
apt install -y php7.4-curl php7.4-gd php7.4-imagick php7.4-intl php7.4-mbstring php7.4-readline php7.4-json
apt install -y php7.4-soap php7.4-xml php7.4-xmlrpc php7.4-zip
#
# 8.0
#
apt install -y php8.0 php8.0-fpm php8.0-cgi php8.0-mysql libapache2-mod-php8.0
apt install -y php8.0-curl php8.0-gd php8.0-imagick php8.0-intl php8.0-mbstring php8.0-readline
apt install -y php8.0-soap php8.0-xml php8.0-xmlrpc php8.0-zip
#
# 8.1
#
apt install -y php8.1 php8.1-fpm php8.1-cgi php8.1-mysql libapache2-mod-php8.1
apt install -y php8.1-curl php8.1-gd php8.1-imagick php8.1-intl php8.1-mbstring php8.1-readline
apt install -y php8.1-soap php8.1-xml php8.1-xmlrpc php8.1-zip
#
# 8.2 currently (as of 1 Dec 2022) missing imagick and xmlrpc packages
#
apt install -y php8.2 php8.2-fpm php8.2-cgi php8.2-mysql libapache2-mod-php8.2
apt install -y php8.2-curl php8.2-gd php8.2-intl php8.2-mbstring php8.2-readline
apt install -y php8.2-soap php8.2-xml php8.2-zip

Set new default PHP version used by Apache

To set the default PHP version to use when not overridden by SetHandler below, use the commands a2dismod and a2enmod.
Set default to PHP 8.1

a2dismod php7.4
a2enmod php8.1
systemctl restart apache2

Set default to PHP 7.4

a2dismod php8.1
a2enmod php7.4
systemctl restart apache2

Configuring PHP version for virtual host or subfolder

Activate necessary Apache modules and restart Apache:

a2enmod actions fcgid alias proxy_fcgi
systemctl restart apache2

Use FilesMatch directive to set the PHP version:
FilesMatch is valid in both the virtualhost configuration and inside a Directory section.
To set PHP version globally for a virtual host, use it outside a Directory section.
The default PHP version can be set using ‘a2enmod php8.1’ (or any other version)

<FilesMatch \.php$>
  SetHandler "proxy:unix:/run/php/php7.4-fpm.sock|fcgi://localhost"
</FilesMatch>

Check the configuration for errors:

apachectl configtest

If result is “Syntax OK”, restart Apache:

systemctl restart apache2

Overriding PHP version using .htaccess

For this to work, “AllowOverride FileInfo” must be present for the directory (or above) in which the .htaccess file will be used to set the PHP version.
For the default virtual host, the DocumentRoot is set to /var/www/html, so to allow PHP version to be set by .htaccess at that level or below, the following must be present in the vhost configuration:

  <Directory /var/www/html>
    AllowOverride FileInfo
  </Directory>

When this has been set, FilesMatch and SetHandler (as described above) can be used within the .htaccess file. The .htaccess method have higher priority than what is set for the virtualhost, or the subfolder within the DocumentRoot of the virtual host.

Testing

Create a file named ‘i.php’ in the locations with the different PHP versions (can be different virtualhosts or folders)

<?php
phpversion();
?>

Access these locations on the virtualhosts or their directory locations to verify that they are using different PHP versions.

References

Most useful resource for this guide
How To Run Multiple PHP Versions on One Server Using Apache and PHP-FPM on Ubuntu 18.04 (DigitalOcean Community)

Linux: how to run commands by keypress on the local console

This is probably not the only way to do this, and just something I had to dig up to be able to control the piStorm from the console keyboard without being logged in.

After trying to find some built-in way of doing it, I ended up using ‘lirc’ (‘inputlircd’) to fetch the keystrokes and execute appropriate commands in the background. The guide is not intended to be complete, and it’s not even re-tested because of the trial-and-error attempt on getting this working the first time, and not taking any notes.

The most useful resource for the success was:
How to run script on keypress? (superuser.com)

Required steps to getting started with lirc

Become root

sudo su -

Install required packages:

apt -y install lirc inputlirc input-utils socat

Configuration
Find the input device you wish to capture keypresses from:

lsinput

The not really necessary step
Examine the inputlirc start/stop script “/etc/init.d/inputlirc” to see where it looks for configuration:

...
DAEMON="/usr/sbin/inputlircd"
NAME="inputlirc"
DESC="inputlirc"

test -x $DAEMON || exit 0

[ -r /etc/default/$NAME ] && . /etc/default/$NAME
...

The marked lines in the partial content of “/etc/init.d/inputlirc” reveals that a file “/etc/default/inputlirc” is sourced.

Change startup parameters for inputlircd
“/etc/default/inputlirc” contains parameters for running inputlircd, including the input device to capture events from and the parameters to the service looking for keystrokes.

Read the inputlircd manpage (man 8 inputlircd) to find out which parameters you need/want to use. The below is what I had to put in the file:

# Options to be passed to inputlirc.
EVENTS="/dev/input/event0"
OPTIONS=-m 0 -c

-m 0 = -m sets the lowest keycode to pass to the daemon
I also use -c to allow to capture the modifier keys (CTRL, SHIFT, ALT) so they will be part of a keystroke instead of generating their own events. This will make it possible to use combinations like SHIFT + F1 for command execution.

After editing and saving the file, enable and (re)start the inputlirc service:

systemctl enable inputlirc
systemctl restart inputlirc

Then check that it’s running:

systemctl status inputlirc

Snooping for keypress events
Unless you know all the keycodes you are going to use for running commands, now is a good time to check what lircd receives on specific keypresses. Run the command to snoop for keypresses in the shell, and press keys on the keyboard connected to the computer (this could be connected through USB, PS/2, Bluetooth, IR, whatever)

socat UNIX-CONNECT:/var/run/lirc/lircd STDOUT

Sample output (F12 and modifier keys)

root@raspberrypi:~# socat UNIX-CONNECT:/var/run/lirc/lircd STDOUT
58 0 SHIFT_KEY_F12 /dev/input/event0
58 0 CTRL_SHIFT_KEY_F12 /dev/input/event0

The irexec service
To make the irexec service restart when inputlirc is restarted (due to a key configuration change), the service startup file has to be slightly modified:

/lib/systemd/system/irexec.service:

[Unit]
Documentation=man:irexec(1)
Documentation=http://lirc.org/html/configure.html
Documentation=http://lirc.org/html/configure.html#lircrc_format
Description=Handle events from IR remotes decoded by lircd(8)
After=inputlirc.service
Requires=inputlirc.service
...

Add the lines marked above, then rebuild the systemd service configuration file and enable and start the irexec service:

systemctl daemon-reload
systemctl enable irexec
systemctl start irexec

Check that the irexec.service is running:

systemctl status irexec

Configuring what to run on keypresses
The file “/etc/lirc/irexec.lircrc” contains the configuration for what commands to run when selected key(combinations) are used. Wipe out all the defaults in there and add something useful. Below is the updated, more generic configuration I use on my PiOS for the piStorm now, just mapping some keys to a script with a similar name:

begin
    prog   = irexec
    button = SHIFT_KEY_F1
    config = /home/pi/irexec/shift_f1.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F2
    config = /home/pi/irexec/shift_f2.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F3
    config = /home/pi/irexec/shift_f3.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F4
    config = /home/pi/irexec/shift_f4.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F5
    config = /home/pi/irexec/shift_f5.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F6
    config = /home/pi/irexec/shift_f6.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F7
    config = /home/pi/irexec/shift_f7.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F8
    config = /home/pi/irexec/shift_f8.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F9
    config = /home/pi/irexec/shift_f9.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F10
    config = /home/pi/irexec/shift_f10.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F11
    config = /home/pi/irexec/shift_f11.sh
end

begin
    prog   = irexec
    button = SHIFT_KEY_F12
    config = /home/pi/irexec/shift_f12.sh
end

begin
    prog   = irexec
    button = CTRL_SHIFT_KEY_F12
    config = /home/pi/irexec/ctrl_shift_f12.sh
end

Whenever you have made a change to /etc/lirc/irexec.lircrc, you need to restart inputlirc (which automatically restarts liexec):

systemctl restart inputlirc

Action scripts in /home/pi/irexec
These scripts can be updated without having to restart inputlirc. Be sure to set the execute flag on them (chmod 755 /home/pi/irexec/*.sh)
For the piOS installation for the piStorm, the content of my configuration-switching scripts are as follows:

/home/pi/irexec/shift_f1.sh (the F-keys 1-10 with the SHIFT key held down):

#!/bin/sh
cp /home/pi/cfg/a1200_4068_os31.cfg /home/pi/default.cfg
sudo systemctl restart pistorm

In a similar way, I have set up the other shift-f-key combinations as shown in the video.

I have used SHIFT+F12 for a safe reboot, and CTRL+SHIFT+F12 for a shutdown of the pi. If running piStorm in RTG mode there can be a delay of about 1 minute before something happens.

/home/pi/irexec/shift_f12.sh:

#!/bin/sh
sudo systemctl stop pistorm
sudo reboot

/home/pi/irexec/ctrl_shift_f12.sh:

#!/bin/sh
sudo systemctl stop pistorm
sudo halt -p

You can check the status of the piStorm service to see that it received the shutdown command:

root@raspberrypi:~# systemctl status pistorm
● pistorm.service - Start piStorm 68k emulator
   Loaded: loaded (/lib/systemd/system/pistorm.service; disabled; vendor preset: enabled)
   Active: deactivating (final-sigterm) since Sat 2021-04-24 23:48:43 BST; 1min 6s ago
  Process: 1023 ExecStart=/home/pi/start-emulator.sh (code=killed, signal=TERM)
 Main PID: 1023 (code=killed, signal=TERM)
    Tasks: 10 (limit: 873)
   CGroup: /system.slice/pistorm.service
           ├─1024 sudo ./emulator --config /home/pi/default.cfg
           └─1025 ./emulator --config /home/pi/default.cfg

Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: [MUSASHI] Mapped read range 4: 40000000-48000000
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: [MUSASHI] Mapped write range 3: 40000000-4800000
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: Platform custom range: 00E90000-80010000
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: Platform mapped range: 00200000-48000000
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: RTG display disabled.
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: Pitch: 800 (800 bytes)
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: RTG thread running
Apr 24 13:23:53 raspberrypi start-emulator.sh[1023]: error: XDG_RUNTIME_DIR not set in the environmen
Apr 24 23:48:43 raspberrypi systemd[1]: Stopping Start piStorm 68k emulator...
Apr 24 23:48:43 raspberrypi systemd[1]: pistorm.service: Main process exited, code=killed, status=15/
root@raspberrypi:~# 

Revised start-emulator.sh script
Because I want to run the wip-crap version of the emulator at some points, I have added a check for the mentioning of “wip-crap” in the configuration file that is going to be used, then depending on its existance or not, launching the emulator from the correct directory:

#!/bin/sh
if grep -q wip-crap "/home/pi/default.cfg"; then
  echo "wip-crap"
  cd /home/pi/pistorm-bnu/
else
  echo "main"
  cd /home/pi/pistorm/
fi
sudo ./emulator --config /home/pi/default.cfg
exit 0

To enable the wip-crap version, just add a comment in the beginning of the configuration such as:

# using wip-crap functions

Inner secrets of Synology Hybrid RAID (SHR) – Part 2

Changing the first disk and my case to Synology support

Now it was time to replace the first disk. As I assumed this would never go wrong (!) and did not plan to document the upgrade, I did not take out any information about the partitions, mdraids and volumes during this first disk swap.

The instructions from Synology are quite good for this (until something breaks down):
Replace Drives to Expand Storage Capacity

Basically it says: replace the disks one by one, start with the smallest and wait until completion before replacing the next.

For the first disk swap, I actually shut down my DS1517 before replacing the disk (many models, including DS1517, supports hot swapping the disks). When the disk was replaced and I powered up the DS1517, and as expected I got the “RAID degraded” beep.
Did a check that the new drive was recognized, and then started the repair of the storage pool. As this will usually take many hours, and this was done in the evening, I have no idea of the actual time spent for repairing (rebuilding) the pool. This was about 90% finished when I stopped looking at the status around midnight that day.

The next day, I see that it had “restarted” (lower percentage than yesterday), but this is actually the next step that is initiated directly after repairing the pool. It’s called “reshaping” and during that process other mdraids are changed and adjusted (if possible) against the new disk.

Changes during the first disk swap

These are only assumptions, because I did not take enough info in between swapping the disk and until about a third into reshaping.

At the point of changing the first disk (refer to the previous part of my article), my storage pool/volume consisted of two mdraids joined together:
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

When I pulled the first drive (3TB) and replaced it with a 14TB drive, I assume the partition table on that disk was created like this (status pulled from the mid of reshaping after first disk swap, so I’m pretty sure this is correct):

/dev/sda1                  2048         4982527         4980480  fd
/dev/sda2               4982528         9176831         4194304  fd
/dev/sda5               9453280      5860326239      5850872960  fd
/dev/sda6            5860342336     15627846239      9767503904  fd

sda5 was matched up with the size of the old sda5 (and the ‘5’-partitions on the other disks)
sda6 was also created in either the step before rebuild, or right before reshaping (this partition match the size with the ‘6’-partitions on sdd and sde.
Because the (14T) disk is larger than the previous largest (8TB) one, there are some non-partitioned wasted space (about 5.8TB which will come into use after the next disk swap).

Reshaping

Again, I have not taken any full status dumps so that my information can be confirmed, but this is what I see afterwards, and adding my guesses to it because of the better logging of later disk swaps.

After the storage pool was repaired, reshaping started automatically. During this step, the RAID1 consisting of sdd6 and sde6 (md3) were changed into RAID5 consisting of sda6, sdd6 and sde6.

At about 30% into the reshaping phase, my NAS went unresponsive (disconnected both shell and GUI), and I had to wait all day until I came home and did a hard reset on it and hoped everything went well..

In the meantime, I logged a case to the Synology support (see “Part 2b” of this article). They were not of any direct help, and the hard reset did take the NAS back to continuing the reshaping process.

Inner secrets of Synology Hybrid RAID (SHR) – Part 1

Inner workings of Synology Hybrid RAID

Maybe a too much promising title for this post, but this is my guesswork on how SHR works when replacing drives. If anyone have a spare DS1517 (or later device, with at least 4 slots) to donate, I will investigate this further, cannot afford to do it on my primary NAS because of risk of loosing data – and now even not possible without upgrading the disks again to larger ones).

I will also post here my case (more or less in full) sent to Synology when the NAS got unresponsive (crashed) during the rebuild/reshaping process.

What is Synology Hyrbrid RAID ?

This is in fact the only thing Synology themselves have briefly explained in their documentation:
What is Synology Hybrid RAID (SHR)

My short explanation is that it is a software RAID that is able to maximize the utilization of mixed sized hard drives. For simplicity, Synology illustrates this with drives varying of 500GB to 2TB (in 500GB increments), possibly fooling some people to think that the disks are always split into 500GB partitions.

My findings while expanding my DS1517 (from 3TB, 3TB, 3TB, 8TB, 8TB to all 14TB) is that the remaining space of the drives are splitted in as few parts as possible to obtain the maximum available space (after setting aside about 2.5GB for the DSM (operating system) and 2GB for swap).

Replacing disks and rebuilding the RAID

Before I replaced the first disk, I actually forgot to view and save down the info about the partitions, mdraid volumes and logical volumes (I might have that somewhere else, but I will not look for it now). Based on how it looked after the first disk had been replaced, and the rebuild was done (in the process of reshaping) it should have been something like this:

# sfdisk -l
/dev/sda1                  2048         4982527         4980480  83
/dev/sda2               4982528         9176831         4194304  82
/dev/sda5               9453280      5860326239      5850872960  fd

/dev/sdb1                  2048         4982527         4980480  83
/dev/sdb2               4982528         9176831         4194304  82
/dev/sdb5               9453280      5860326239      5850872960  fd

/dev/sdc1                  2048         4982527         4980480  83
/dev/sdc2               4982528         9176831         4194304  82
/dev/sdc5               9453280      5860326239      5850872960  fd

/dev/sdd1                  2048         4982527         4980480  fd
/dev/sdd2               4982528         9176831         4194304  fd
/dev/sdd5               9453280      5860326239      5850872960  fd
/dev/sdd6            5860342336     15627846239      9767503904  fd

/dev/sde1                  2048         4982527         4980480  fd
/dev/sde2               4982528         9176831         4194304  fd
/dev/sde5               9453280      5860326239      5850872960  fd
/dev/sde6            5860342336     15627846239      9767503904  fd

Note: The partition types for sd[a-c][1-2] seems incorrect as these where changed to “fd” later on during the process, or it might have been something changed by Synology on later DSM versions (but not at the point of updating DSM).

Partitions 1-2 are the system and swap partitions on all the drives, sized 2.5GB respectively 2GB.
Partition 5 is a part of the storage space available in the volume on the NAS. In this case it is about 2.9TB in size (the maximum available on the smallest disks).
Partition 6 is the second part of the total storage space. At this time those partitions are about 4.8TB in size.

mdraid volumes

Out of the partitions above, the Synology creates these mdraid volumes:
md0: RAID 1 of sda1, sdb1, sdc1, sdd1, sde1: total size 2.5GB used for DSM
md1: RAID 1 of sda1, sdb2, sdc2, sdd2, sde2: total size 2GB used for swap
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

LVM logical disk

md2 and md3 are joined together into a logical disk using LVM, which gives about 16.5TB space in total for the storage volume on the NAS (Synology DSM says 15.5TB, but the difference is only because of how I estimate the space and how Synology does – I just take the block count, divide by two, then use a one decimal precision – which is adequate enough for this description).

DSM Storage Manager before replacing the first disk

… to be continued in part 2 …

Synology NAS – PHP

Enabling extensions in PHP CLI

https://blackswan.ch/archives/594

Become root
Find out which configuration file is used:
php --ini

I get:
Configuration File (php.ini) Path: /usr/local/etc/php70
Loaded Configuration File: /usr/local/etc/php70/php.ini

Edit the php.ini file, check/change extensions dir to where it is located. If PHP was installed through DSM it should be something like ‘/volume1/\@appstore/PHP7.0’
extension_dir = "/volume1/@appstore/PHP7.0/usr/local/lib/php70/modules/"

Enable the extensions you wish to use:
extension = mysqli.so
extension = phar.so
extension = openssl.so
extension = zip.so
extension = curl.so

Installing Composer

Composer documentationhttps://medium.com/unhandled-code/installing-composer-on-synology-6-1-4-eebd1a1c4891

Become root
Enable the extensions as above (phar, openssl and zip are needed for Composer).

Get and install Composer:
cd /usr/local/bin
curl -s http://getcomposer.org/installer | php70

Create shortcut script ‘composer’ in /usr/local/bin:
#!/bin/bash
php70 /usr/local/bin/composer.phar $*

Set permissions:
chmod --reference=composer.phar composer

Test:
composer --version