Inner secrets of Synology Hybrid RAID (SHR) – Part 2

Changing the first disk and my case to Synology support

Now it was time to replace the first disk. As I assumed this would never go wrong (!) and did not plan to document the upgrade, I did not take out any information about the partitions, mdraids and volumes during this first disk swap.

The instructions from Synology are quite good for this (until something breaks down):
Replace Drives to Expand Storage Capacity

Basically it says: replace the disks one by one, start with the smallest and wait until completion before replacing the next.

For the first disk swap, I actually shut down my DS1517 before replacing the disk (many models, including DS1517, supports hot swapping the disks). When the disk was replaced and I powered up the DS1517, and as expected I got the “RAID degraded” beep.
Did a check that the new drive was recognized, and then started the repair of the storage pool. As this will usually take many hours, and this was done in the evening, I have no idea of the actual time spent for repairing (rebuilding) the pool. This was about 90% finished when I stopped looking at the status around midnight that day.

The next day, I see that it had “restarted” (lower percentage than yesterday), but this is actually the next step that is initiated directly after repairing the pool. It’s called “reshaping” and during that process other mdraids are changed and adjusted (if possible) against the new disk.

Changes during the first disk swap

These are only assumptions, because I did not take enough info in between swapping the disk and until about a third into reshaping.

At the point of changing the first disk (refer to the previous part of my article), my storage pool/volume consisted of two mdraids joined together:
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

When I pulled the first drive (3TB) and replaced it with a 14TB drive, I assume the partition table on that disk was created like this (status pulled from the mid of reshaping after first disk swap, so I’m pretty sure this is correct):

/dev/sda1                  2048         4982527         4980480  fd
/dev/sda2               4982528         9176831         4194304  fd
/dev/sda5               9453280      5860326239      5850872960  fd
/dev/sda6            5860342336     15627846239      9767503904  fd

sda5 was matched up with the size of the old sda5 (and the ‘5’-partitions on the other disks)
sda6 was also created in either the step before rebuild, or right before reshaping (this partition match the size with the ‘6’-partitions on sdd and sde.
Because the (14T) disk is larger than the previous largest (8TB) one, there are some non-partitioned wasted space (about 5.8TB which will come into use after the next disk swap).

Reshaping

Again, I have not taken any full status dumps so that my information can be confirmed, but this is what I see afterwards, and adding my guesses to it because of the better logging of later disk swaps.

After the storage pool was repaired, reshaping started automatically. During this step, the RAID1 consisting of sdd6 and sde6 (md3) were changed into RAID5 consisting of sda6, sdd6 and sde6.

At about 30% into the reshaping phase, my NAS went unresponsive (disconnected both shell and GUI), and I had to wait all day until I came home and did a hard reset on it and hoped everything went well..

In the meantime, I logged a case to the Synology support (see “Part 2b” of this article). They were not of any direct help, and the hard reset did take the NAS back to continuing the reshaping process.

Inner secrets of Synology Hybrid RAID (SHR) – Part 1

Inner workings of Synology Hybrid RAID

Maybe a too much promising title for this post, but this is my guesswork on how SHR works when replacing drives. If anyone have a spare DS1517 (or later device, with at least 4 slots) to donate, I will investigate this further, cannot afford to do it on my primary NAS because of risk of loosing data – and now even not possible without upgrading the disks again to larger ones).

I will also post here my case (more or less in full) sent to Synology when the NAS got unresponsive (crashed) during the rebuild/reshaping process.

What is Synology Hyrbrid RAID ?

This is in fact the only thing Synology themselves have briefly explained in their documentation:
What is Synology Hybrid RAID (SHR)

My short explanation is that it is a software RAID that is able to maximize the utilization of mixed sized hard drives. For simplicity, Synology illustrates this with drives varying of 500GB to 2TB (in 500GB increments), possibly fooling some people to think that the disks are always split into 500GB partitions.

My findings while expanding my DS1517 (from 3TB, 3TB, 3TB, 8TB, 8TB to all 14TB) is that the remaining space of the drives are splitted in as few parts as possible to obtain the maximum available space (after setting aside about 2.5GB for the DSM (operating system) and 2GB for swap).

Replacing disks and rebuilding the RAID

Before I replaced the first disk, I actually forgot to view and save down the info about the partitions, mdraid volumes and logical volumes (I might have that somewhere else, but I will not look for it now). Based on how it looked after the first disk had been replaced, and the rebuild was done (in the process of reshaping) it should have been something like this:

# sfdisk -l
/dev/sda1                  2048         4982527         4980480  83
/dev/sda2               4982528         9176831         4194304  82
/dev/sda5               9453280      5860326239      5850872960  fd

/dev/sdb1                  2048         4982527         4980480  83
/dev/sdb2               4982528         9176831         4194304  82
/dev/sdb5               9453280      5860326239      5850872960  fd

/dev/sdc1                  2048         4982527         4980480  83
/dev/sdc2               4982528         9176831         4194304  82
/dev/sdc5               9453280      5860326239      5850872960  fd

/dev/sdd1                  2048         4982527         4980480  fd
/dev/sdd2               4982528         9176831         4194304  fd
/dev/sdd5               9453280      5860326239      5850872960  fd
/dev/sdd6            5860342336     15627846239      9767503904  fd

/dev/sde1                  2048         4982527         4980480  fd
/dev/sde2               4982528         9176831         4194304  fd
/dev/sde5               9453280      5860326239      5850872960  fd
/dev/sde6            5860342336     15627846239      9767503904  fd

Note: The partition types for sd[a-c][1-2] seems incorrect as these where changed to “fd” later on during the process, or it might have been something changed by Synology on later DSM versions (but not at the point of updating DSM).

Partitions 1-2 are the system and swap partitions on all the drives, sized 2.5GB respectively 2GB.
Partition 5 is a part of the storage space available in the volume on the NAS. In this case it is about 2.9TB in size (the maximum available on the smallest disks).
Partition 6 is the second part of the total storage space. At this time those partitions are about 4.8TB in size.

mdraid volumes

Out of the partitions above, the Synology creates these mdraid volumes:
md0: RAID 1 of sda1, sdb1, sdc1, sdd1, sde1: total size 2.5GB used for DSM
md1: RAID 1 of sda1, sdb2, sdc2, sdd2, sde2: total size 2GB used for swap
md2: RAID 5 of sda5, sdb5, sdc5, sdd5, sde5: total size about 11.7TB
md3: RAID 1 of sdd6, sde6: total size of about 4.8TB

LVM logical disk

md2 and md3 are joined together into a logical disk using LVM, which gives about 16.5TB space in total for the storage volume on the NAS (Synology DSM says 15.5TB, but the difference is only because of how I estimate the space and how Synology does – I just take the block count, divide by two, then use a one decimal precision – which is adequate enough for this description).

DSM Storage Manager before replacing the first disk

… to be continued in part 2 …