One of the things that I have not spent a great deal of time on in my career is diving deeper into the things that I take for granted in ConfigMgr. I’ve certainly done a couple of investigatory deep dives before, such as my investigation into the new cumulative update architecture, but I feel like I could do more. So, I figured a fantastic way to encourage myself to do more of these is to start blogging them.
For this post we’ll cover Single Instance Storage (SiS). “What is SiS,” you ask? Simply put, SiS is a whole-file level deduplication of content based on the hash of the contents of the file. Even more simply put, SiS is a way to gain significant storage savings when you have many copies of the same file existing in the same store - in our case the ConfigMgr content library.
A Need for Something New
In the before time, the long ago, ConfigMgr stored packages and their contents in an easily accessed share. You could just browse to the \\configmgr\SMSPKGx$ share, find the package ID, and boom there was all your content. What this offered in simplicity, lacked in optimization. Say you had two different packages both hosting the Visual C++ redistributable installer - you would now have two copies of the vcredist.exe file stored on the drive. Additionally, you would have to transfer the file over the network at least twice.
Enter ConfigMgr 2012, the new hotness (at the time). We needed a better way to optimize storage consumption of content for packages and applications. Other Microsoft technologies were already using the SiS concept (Exchange Server 4.0 in 1996 for example, although it was dropped in Exchange 2010), so it made sense to adopt the same concept in ConfigMgr.
Now that we know the history of the change, let’s talk about the basics. You’ve likely already inferred most of this from the description in the introduction. When you create and distribute a package or application in ConfigMgr, ConfigMgr takes all of the files and captures a SHA-256 (by default) hash of each and then sends the content of that file to the distribution point’s content library IF the distribution point does not already have a file with the same hash stored in the content library. If the distribution point already has this content, it does not need to receive it. The distribution point additionally creates a set of reference files and folders representing the content structure pointing to the instance of the stored file.
Getting a First Look
Your single instance store is in one or more folders named “SCCMContentLib” on your distribution point(s) in addition to a copy on the primary site server even if it is not a distribution point. The distribution point (DP) or primary site server prioritizes the location of the folder(s) based on:
The existence of the “NO_SMS_ON_DRIVE.SMS” (ConfigMgr will not create a folder on this drive)
Primary and secondary drive selected when you created the DP (this cannot be selected for a primary site without the DP role or a CAS)
The drive with largest amount of free space remaining
Inside the SCCMContentLib you will see one or three folders:
DataLib - This folder contains metadata for each package or application distributed to the DP. For each package or application there is one INI file containing the hash of the package content and the hash algorithm used at the root of the DataLib folder. Additionally, a folder replicating the structure of the package with individual INI files for each file in the package containing the file attributes, size, time modified, and hash. *NOTE* The DP only creates this folder on the primary drive - if you have your content library spanning multiple drives, your secondary drives will not contain this folder.
FileLib - This folder contains folders with the first four characters of a hash (e.g. EF43). Inside these folders there will be one or more sets of three files - one INI, one SIG, and one with no extension. The INI file contains a list of packages currently using this file, the SIG file is a signature of the file, and the one with no extension is the actual file itself. Each of the files in a set has the same base name - the SHA256 (by default) hash of the stored file.
PkgLib - This folder contains INI files for each package currently distributed to the DP, or in the case of the primary site server all packages and applications created on the site. The INI file contains a list of the associated packages stored in the content library (in most cases this will be a single package, but in the case of applications it will contain all revisions of the content in chronological order, and in the case of patches it will reference all the updates that are part of the package). *NOTE* The DP only creates this folder on the primary drive - if you have your content library spanning multiple drives, your secondary drives will not contain this folder.
Regarding Storage on the Primary Site
If you didn’t happen to catch it earlier, it’s important to know that your Primary Site will also store the content for any package or application you create when you create the package or application. You do not need to ever distribute the content for ConfigMgr to be create it in the SCCMContentLib on the primary site server. So when you build a primary site server without the DP role thinking you will offload all of your content to a distribution point, be aware that the content library is still created and needs a space to live on your primary site server.
Regarding Storage on a Central Administration Site
When working with a central administration site (CAS), if you create the content at the CAS level the package will be added to the CAS SCCMContentLib or when you migrate content from another ConfigMgr site and assign the CAS as the site that will manage that content. If you distribute content created at a primary site level to another primary or secondary site under a different primary, the CAS will temporarily store the content but will not add it to its own SCCMContentLib.
Content Distribution and Updates
You might be asking yourself, “how do all these files and folders get on the DP then?” No? Well, I’m going to tell you anyways.
When you distribute content to a DP, the Package Transfer Manager on the site server initiates a thread with the DP to send content. File by file the DP determines whether it already has a copy of the file in the content library by checking the hash of the file and if not receives a copy of the file from the Package Transfer Manager to the root of the FileLib. After the Package Transfer Manager copies the file to the DP it instructs the DP to add the file to the content library. Repeat until ConfigMgr transfers all files (or not if the file already exists in the content library). Finally, the thread reports to the Distribution Manager that the sending is complete and was either a success or failure.
The DP adds the files to the content library under the FileLib folder to a folder with the respective first four characters of the hash of the file. For example, if the hash of a file is: 0F1A06C55B…D1C7FF926, it creates three files under the 0F1A folder:
After processing the files, the DP creates a new INI file in the DataLib named one of three ways:
Content_GUID.VERSION: for application deployment types
GUID: for updates belonging to packages
PackageID.Version: for legacy packages
This file contains the hash of the entire package and the algorithm used to create the hash.
A folder with the same name is also created. Inside this folder is a recreation of the folder and file structure of the content source with the actual files replaced by INI files bearing the same name of the source file (e.g. ccmexec.exe becomes ccmexec.exe.ini). These INI files as explained earlier contain metadata about each file including the hash of the file.
Finally, the DP creates a new INI file in the PkgLib using the Package ID of the package or application. This file contains the list of associated DataLib “packages” that the package or application contains.
Pull DPs work slightly differently in the sense that the Pull DP initiates the transfer of data from the site rather than the site pushing the data to the DP.
When updating content, the same logic of distributing content applies. However, this time ConfigMgr sends updates to all distribution points where the content is already distributed. ConfigMgr only redistributes the changed or added files and files that are removed from the package or application are removed from the content library provided that the file is not used by any other packages or applications. When using binary differential replication (which I will cover in a future post) only the bits of the file that have changed are distributed (except for every fifth revision of a file, which we’ll save again for a later post). A new copy of the DataLib folder is created with a new version (or in the case of applications a new GUID matching the updated distribution type GUID), and the PkgLib file is updated to reflect the new version (or in the case of applications a new entry is appended to the list).
In the case of potential corruption of a package, you can instruct the site to redistribute content to a DP. In doing this you instruct the DP to overwrite any copies of the files it currently has in the content library. Effectively this becomes the same thing as distributing content with the exception that the thread doesn’t care if the file is there or not, it will still distribute the content.
When you remove a package from a distribution point (or remove it entirely from the site) the DP removes the associated content from the content library provided that the files are not shared with any other package. This also applies to application content or revisions, however not as quickly as it does for packages. If you suspect you have many orphaned files on a distribution point you can use the “Content Library Cleanup Tool” which shipped with ConfigMgr 1702 to cleanup distribution points.
If you’re like me, you may have missed the fact that the four-character folders in the FileLib can contain more than one file set. Obviously the reason for this is that while there may be 10^77 possible permutations of SHA-256, there are only 65,536 different permutations of four hex characters, so if each folder could only contain one file set you would have collisions very quickly. ConfigMgr solves this by allowing multiple file sets in the same FileLib folder. And don’t worry about hashes colliding on files - 10^77 is just slightly shy of the total number of atoms in the known universe; we won’t be having any hash collisions anytime soon.
Reconstructing a Package
Now we have all the data that we need to accurately reconstruct a package from just the content library. So let’s do it:
First let’s find a package we want to reconstruct - we’ll look in the PkgLib at one of the INI files there. In my case I see that TPL00003 references DataLib TPL00003.2:
So now we navigate to TPL00003.2 in the DataLib and see that it only has one file we need to locate - ccmsetup.exe whose hash begins with 8BB8:
Next we go digging in the FileLib for the 8BB8 folder and find the file set with the matching hash. Remember to match the entire hash of the file if there are more than one file set contained in the folder. We take the file without an extension, copy it out somewhere, and rename it to ccmsetup.exe.
Voila! You’ve recreated the package from the content library! Of course, this is the boring manual way to do it. You can also cheat use the easy method by opening the “Content Library Explorer” tool and copying the package from a distribution point.
Hopefully now you have a complete understanding of Single Instance Storage in ConfigMgr! If you still feel like you need to fill your brain hole with even more information I’d suggest you check out the following link which goes into greater detail the components involved in content distribution:
But as always - until next time, Happy Admining!