How to Merge VHDS Checkpoint Files

Backstory - how I accidently made 3.5 TBs of data take up 5 TBs of space

Ok, technically Veeam created the checkpoints on my VHDS due to a misconfiguration on my part. And I didn't notice that they were there for a few months. Oops. Until one day, I got alerted from PRTG saying that the cluster shared volume (csv) on the cluster was running out of space, however the volume on the guest cluster was still showing up with plenty of space.

And then I saw it. 3.5 TBs of data taking up 5 TBs of my available space.

So I did some research. Sure, some people have complained about similar issues. Their solution? Copy all the data off onto a new data storage, and try again, or revert back to a VHDX. That wasn't really an option for me. So I started to do some more digging in how a VHDS works

How a VHDS works

Quoted by Microsoft:

VHD Set files are a new shared Virtual Disk model for guest clusters in Windows Server 2016. VHD Set files support online resizing of shared virtual disks, support Hyper-V Replica, and can be included in application-consistent checkpoints.

... That definition makes it sound pretty cool. But also gives off the idea that VHDS are a new fancy thing. In reality, at its core, it looks like its simply a binary file with a .vhds extension that has a pointer to a VHDX with a .avhdx file extension.

Don't believe me? Opening up the .vhds file in notepad will be hectic, but if you dig around towards the bottom of the file, you will see the reference to the .avhdx file.

vhds-avhdx-reference

The Problem

So the real problem is I have a VHDS pointing to an .avhdx, however the .avhdx file it is pointed to has a bunch of parent .avhdx files that I need to merge together. Here is what an example directory (sorry, no productions screenshots here!) reflecting the problem looks like:

vhds-dir

Also, due to my basic understanding of how a VHDS is working behind the scenes, we have to take into account some potential problems:

  • The VHDS might somehow be aware of the other checkpoints (though after some tests I doubt it!)
  • Depending on the size of the checkpoints and speed of the underlying storage, it could take a VERY long time to merge
  • This will break any replication going on, and a re-seed will be necessary

The Solution

For every problem, a solution exists. And that solution probably can be written in PowerShell!

Disclaimer: Your results may vary. TEST THIS IN A LAB FIRST! Also, double-check your backups. And then triple-check them. I do not take any fault if you lost all of your data. Merging checkpoints is dangerous, and if done wrong, can and will loose all your data.

Our tasks

Here are the basic tasks we will need to accomplish:

  1. Power down the VMs referencing this VHDS file, and pause any replication going on
  2. Sort the .avhdx files based on the parent-child relationship
  3. Merge the files without breaking the parent-child relationship
  4. This step has a few possiblities.
    a. According to this article, you can convert a vhdx to a vhds using Convert-VHD. Since the .avhdx file being used is technically just a VHDX, we could rename it and run the PowerShell conversion. However, when converting, it creates a DUPLICATE of the original VHDX file for the conversion, even if you have -DeleteSource specified when running the command. If you have a 5TB volume, you will need 5TBs of scratch space.
    b. If you are confident that the VHDS does not have any knowledge of the checkpoints, then you can rename the final merged file to the name of the .avhdx file the VHDS references
    c. If you wanted to be tricky, you could create a new VHDS, delete the newly created .avhdx file, and rename our merged file to that.

Step 1

I'm leaving this to you to handle!

Step 2: Sorting the AVHDX Files

The explanation for each step of this is in the comments of the code!

# Get the VHD files
$disks = Get-VHD ".\temp*" | select -Property Path,ParentPath | ? Path -NotLike "*.vhds"

# Create an Arraylist. Better than arrays for performance reasons
$list = New-Object System.Collections.ArrayList($null)

# Add the Highest Parent VHDX first
$list.add($($disks | where {$_.ParentPath -eq ""}).Path)

# Cycle through the rest, and add them to the array as we find the parent
while($list.Count -lt $disks.Count )
{   
    forEach($x in $disks)
    {
        if($x.ParentPath -eq $list[$list.count-1]){
            $list.add($($x.Path))
        }
    }
}

Step 3: Merging the AVHDX Files

So, in case you did not know this, you actually do NOT have to merge every single VHDX file to the previous parent. If you select the Parent of Parent VHD files, and the lowest child, it will merge all of the inbetween files too.

Merge-VHD -Path $list[$list.Count - 1] -DestinationPath $list[0]

Step 4

Okay, pick your poison on this one.

Step 4a: Converting the file

I must remind you again, if storage is an issue with this, keep that in mind.

# Create the avhdx files new name, and backup the path
$path = Split-Path $list[0]
$baseFileName = (Split-Path $list[0] -Leaf).Substring(0, (Split-Path $list[0] -Leaf).indexof('_'))

# Rename the file
Rename-Item $list[0] "$baseFileName.vhdx"

# Perform the conversion
Convert-VHD "$path\$baseFileName.vhdx" "$path\$baseFileName.vhds"

Step 4b: Renaming the file

Rename-Item $list[0] $(Split-Path $list[$list.Count - 1])

Step 4c: New VHDS

# Create the new VHDS with the same size as previous VHDS
New-VHD .\new.vhds -SizeBytes $(Get-VHD $list[0]).Size

# Rename newly created avhdx to .old
$newAvhdxName = Split-Path $(Get-VHD .\new_*).Path -Leaf
Rename-item .\$newAvhdxName "$newAvhdxName.old"

# Rename old avhdx to new avhdx
Rename-Item $list[0] $newAvhdxName

Results

Well, I ended up testing all three of these in a lab environment, and all of them worked as far as I can tell.

To fix my actual problem though, I ended up going with Option C. I didn't have enough scratch space to perform option A, and I just didn't quite feel comfortable with running Option B.

Good luck and good day to you!

Show Comments