Automation Of Archiving Large Libraries

Introduction 

 
Hi guys, let's explore a powerful automation for the archival of large libraries using PnP with CSOM PowerShell. This approach can be used for an automatic/semi-automatic/manual way, as per the business needs on a weekly/monthly/quarterly basis, as per the archival needs.
 
Pre-Requisites
 
Install all the necessary DLL files by just going to this link >> Click Download >> Select the Latest File. Once the installation is done, you can see a few DLL files automatically reflected on your Local Path, like:
 
"C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
"C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"
 
Also, install the SharePointPnPPowerShellOnline.msi by going to this link.
 
Maintain the same Columns, Data Types, Views both in Source Library and Target Library, mostly alldocuments.aspx 
 
PowerShell Scripts Used
 
The library scan is performed with all Nested Folders and Files, whichever has the Archive Flag manually set to 'True' OR scanned if they are 1 year old/n number of days old as per your Archival Strategy[LibScan.ps1]. 
  1. #Load SharePoint CSOM Assemblies  
  2. Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"  
  3. Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"  
  4. #Config Parameters  
  5. $SiteURL = "https://sampleharenet.sharepoint.com/sites/classictest"  
  6. $ListName = "PnPCopytoLib"  
  7. $CSVPath = "D:\LibraryDocumentsInventory.csv"  
  8. #Get Credentials to connect  
  9. $Cred = Get - Credential  
  10. Try {  
  11.     #Setup the context  
  12.     $Ctx = New - Object Microsoft.SharePoint.Client.ClientContext($SiteURL)  
  13.     $Ctx.Credentials = New - Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Cred.UserName, $Cred.Password)  
  14.     #Get the Document Library  
  15.     $List = $Ctx.Web.Lists.GetByTitle($ListName)  
  16.     #Define CAML Query to Get All Files  
  17.     $Query = New - Object Microsoft.SharePoint.Client.CamlQuery  
  18.     $Query.ViewXml = "@<View Scope='RecursiveAll'> < Query > < Where > < And > < Eq > < FieldRef Name = 'FSObjType' / > < Value Type = 'Integer' > 0 < /Value></Eq > < Or > < Eq > < FieldRef Name = 'ArchivalFlag' / > < Value Type = 'Choice' > Yes < /Value></Eq > < Lt > < FieldRef Name = 'Created' / > < Value Type = 'DateTime'  
  19.     IncludeTimeValue = 'True' > " + (get-date).adddays(-365).ToString("  
  20.     yyyy - MM - ddTHH: mm: ssZ ") + " < /Value></Lt > < /Or> < /And> < /Where> < /Query> < /View>"  
  21.     #powershell sharepoint online list all documents  
  22.     $ListItems = $List.GetItems($Query)  
  23.     $Ctx.Load($ListItems)  
  24.     $Ctx.ExecuteQuery()  
  25.     $DataCollection = @()  
  26.     #Iterate through each document in the library  
  27.     ForEach($ListItem in $ListItems) {  
  28.         #Collect data  
  29.         $Data = New - Object PSObject - Property([Ordered] @ {  
  30.             FileName = $ListItem.FieldValues["FileLeafRef"]  
  31.             RelativeURL = $ListItem.FieldValues["FileRef"]  
  32.             CreatedBy = $ListItem.FieldValues["Created_x0020_By"]  
  33.             CreatedOn = $ListItem.FieldValues["Created"]  
  34.             ModifiedBy = $ListItem.FieldValues["Modified_x0020_By"]  
  35.             ModifiedOn = $ListItem.FieldValues["Modified"]  
  36.             FileSize = $ListItem.FieldValues["File_x0020_Size"]  
  37.         })  
  38.         $DataCollection += $Data  
  39.     }  
  40.     $DataCollection  
  41.     #Export Documents data to CSV  
  42.     $DataCollection | Export - Csv - Path $CSVPath - Force - NoTypeInformation  
  43.     Write - host - f Green "Documents Data Exported to CSV!"  
  44. }  
  45. Catch {  
  46.     write - host - f Red "Error:"  
  47.     $_.Exception.Message  
  48. }  
Output
 
We shall get all the List of Files to be Archived as per the above-highlighed conditions using a Where Clause at your Local Path:  D:\LibraryDocumentsInventory.csv 
 
It should contain the following columns:
  • FileName
  • RelativeURL
  • CreatedBy
  • CreatedOn
  • ModifiedBy
  • ModifiedOn
  • FileSize
Copying all the Scanned Files with their Relative Folder Paths from the Source Library to the Archive Target Library[CopyFiles.ps1]:
  1. #Load SharePoint CSOM Assemblies  
  2. CLS  
  3. Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"  
  4. Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"  
  5. #Function to Copy a File  
  6. Function Copy - SPOFile([String] $SourceSiteURL, [String] $SourceFileURL, [String] $TargetFileURL) {  
  7.     Try {  
  8.         #Setup the context  
  9.         $Ctx = New - Object Microsoft.SharePoint.Client.ClientContext($SourceSiteURL)  
  10.         $Ctx.Credentials = New - Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Credentials.Username, $Credentials.Password)  
  11.         #Copy the File  
  12.         $MoveCopyOpt = New - Object Microsoft.SharePoint.Client.MoveCopyOptions  
  13.         $Overwrite = $True[Microsoft.SharePoint.Client.MoveCopyUtil]::CopyFile($Ctx, $SourceFileURL, $TargetFileURL, $Overwrite, $MoveCopyOpt)  
  14.         $Ctx.ExecuteQuery()  
  15.         Write - host - f Green $TargetFileURL " - File Copied Successfully!"  
  16.     }  
  17.     Catch {  
  18.         write - host - f Red "Error Copying the File!"  
  19.         $_.Exception.Message  
  20.     }  
  21. }  
  22. $RootSiteURL = "https://samplesharenet.sharepoint.com"  
  23. $SourceSiteURL = "https://samplesharenet.sharepoint.com/sites/classictest"  
  24. $TargetSiteURL = "https://samplesharenet.sharepoint.com/sites/testsitearchival"  
  25. $SourceSitePath = "/sites/classictest/"  
  26. $TargetSitePath = "/sites/testsitearchival/"  
  27. $SourceDocLibURL = "/PnPCopytoLib"  
  28. $TargetDocLibURL = "/FlowArchiveLib2"  
  29. #$TargetDocLibURL = "/FlowArchiveLib2/April_3rdWeek"  
  30. for aby Batch wise Archivals  
  31. $Credentials = Get - Credential  
  32. Connect - PnPOnline - Url $TargetSiteURL - Credentials $Credentials  
  33. $CSVPath = "D:\LibraryDocumentsInventory.csv"  
  34. Import - Csv $CSVPath | ForEach - Object {  
  35.     $SourceFileURL = $RootSiteURL + $_.RelativeURL  
  36.     $temp = ($_.RelativeURL).Replace($SourceDocLibURL, $TargetDocLibURL)  
  37.     $temp = ($temp).Replace($SourceSitePath, $TargetSitePath)  
  38.     $TargetFileURL = $RootSiteURL + $temp  
  39.     $temp = ($temp).Replace($TargetSitePath, "")  
  40.     $temp = ($temp).Replace("/" + $_.FileName, "")  
  41.     if ($TargetDocLibURL - ne "/" + $temp) {  
  42.         Resolve - PnPFolder - SiteRelativePath $temp  
  43.     }  
  44.     #Call the  
  45.     function to Copy the File  
  46.     Copy - SPOFile $SourceSiteURL $SourceFileURL $TargetFileURL  
  47. }  
Just give your inputs as per the above-highlighted areas.
 
Precautions:
 
Use Only Site Collection Admin/Global Admin Login details for Logging in while the script running is on progress. 
 
Try to hard code with password-protected security string oriented Token Management on the above scripts for no End User manual inputting involvement. 
 
Output
 
You will find all the listed files from that CSV report which have been selected for Archival created with Meta Data properties preserved on the Target Archive Library.
 
Creative Idea
 
You can merge both the above scripts for Automation using Flow/Azure Functions and make them run on a Weekly/Monthly scheduled basis that promotes automation without Manual Intervention.
 
You can apply the above-discussed process with Large Lists too that needs to be Archived.
 
Note
Since we have a buffer size limit of 100 MB for a Complete Flow-based Copying of Files we are using PnP + CSOM PowerShell to run evergreen unlimited calls no pricing/no size limitation/metadata preserving, etc. 
 
Cheers!