A tool to quickly update permissions on Azure Data Lake Storage Gen2 files
Recently I found myself in need to change permissions on a large number of files and directories in an Azure Data Lake Storage Gen2 storage. At first, I tried to use the CLI and PowerShell commands available through the Cloud Shell. I started to play with them, but after some experiment, they resulted quite limited and not fitting my needs. So, I decided to write a piece of code to recursively traverse the directory structure and apply the permissions changes I needed.
The tool communicates with the Data Lake through the REST API wrapped by Azure.Storage.Files.DataLake library. Every file and directory permissions update requires two API calls: the first to retrieve the current ACL and the second to send back the updated ACL. To accelerate the process, parallelized the HTTPS REST API calls (configurable value). With the parallelization set to 10, I was able to update 90 files/s (180 API call/s) at sustaining rate.
The tool works reading a JSON configuration file containing all the required parameters to connect to the storage, the new permissions to be merged, and users to be removed. Permissions are expressed as POSIX-like ACL, like user:amber@contoso.com:r---
{
"AccountName": "storagename",
"AccountKey": "1wb6X...",
"FileSystem": "myfs",
"StartingPath": "folderx/subfolder",
"ExitAfter": -1,
"LogVerbose": false,
"Parallelism": 8,
"ACL": [
"user:amber@contoso.com:r--",
"default:user:amber@contoso.com:r-x",
"group:11111e4b-964b-46e4-af2e-aaaaacfa0ca07:rwx",
"default:group:11111e4b-964b-46e4-af2e-aaaaacfa0ca07:rwx"
],
"RemoveList": [
"jane@contoso.com",
"amanda@contoso.com"
]
}
The source code and the README file are available here:
https://github.com/fhtino/azure-datalake2-stuff/tree/master/ADLSApplyPerms