Locating PST files on a network

In order to size any mail archiving solution it is important to understand the amount of archive data currently in use. For many companies this is in the form of Outlook Data Files (PST’s). Unforutnately, the only resource Microsoft provide is a VBScript dating back to 2005 on the technet script center.

I decided to have a go at implementing two methods to locate PST files on the network using Powershell, the two options for locating files I came up with are:

  • Enumerate the Outlook settings on client computers to determine PST files loaded on client computers;
  • Use WMI to call the search APIs on remote computers to locate the files;

In testing the two options, using the search APIs via WMI located twice as many files as just relying on the information located in the windows registry. Both scripts will also read the first 11 bytes of the PST file to determine the file format, whether it’s an ANSI or Unicode PST file.

Using the registry

The advantage of using the data in the Windows registry is that it’s quick. We can quickly find enumerate user profiles and identify PST files loaded in Outlook. Once we have that information, the file infromation can be checked using SMB. This does however does require the Remote Registry service to be enabled and TCP port 139 to be open on client computers.

This script uses PowerShell background jobs, checking 5 computers at a time. To check more computers at once, increase the $MaxThreads variable.

cls
#
# PST Scanning Utility (Registry)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Retrieves a list of computers (recursively) from an OU in Active
# Directory then uses Remote Registry calls to locate PST files for each
# user profile.
#
# The advantage to querying the registry is speed, however accurancy is 
# potentially reduced since PST files may not be added to Outlook.
#
# Changelog
# ~~~~~~~~~
# 2012.03.28	Dave Hope		Initial version.
# 2012.03.30	Dave Hope		Added try/catch for Get-Item $tmpPath to
#								handle missing files.
#
# ======================================================================
# SETTINGS
# ======================================================================
$cfgOU = "LDAP://DC=nwtraders,DC=msft"
$cfgInterval = -30			# Difference to lastLogonTimestamp (days)
$cfgOutpath = "H:\Registry.CSV"
$MaxThreads = 5			# Maximum number of checks to run at once.
$SleepTimer = 500			# Wait between checks.

# ======================================================================
# STOP CHANGING HERE.
# ======================================================================


#
# Uses CIFS/SMB and the remote registry APIs to determine remote PST
# files and their sizes for a given computername.
# ======================================================================
$GetPSTInfo = {
	Param(
		[string]$ComputerName = $(throw "ComputerName required.")
		)
	$ReturnArray = @()

	# Test connection (ICMP) first rather than relying on the slow
	# Get-WMiObject call to fail.
	if( (Test-Connection -ComputerName $ComputerName -Count 1 -Quiet) -eq $false )
	{
		Write-Host "Failed communicating with $ComputerName - ICMP Unreachable"
		return;
	}

	# Connect to remote system.
	try
	{
		$RegHive = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey( "Users", $ComputerName )
	}
	catch {
		Write-Host "Failed communicating with $ComputerName - OpenRemoteBaseKey failed"
		return;
	}

	# Get the list of user profiles.
	$RegUsers = $RegHive.getSubKeyNames()

	# Iterate over user profiles on the computer
	foreach( $RegUser in $RegUsers )
	{
		# Get list of Outlook datafiles in use for this profile.
		$RegPath = "$RegUser\Software\Microsoft\Office\12.0\Outlook\Catalog"
		$Catalogs = $RegHive.OpenSubKey( $RegPath );
		if( $Catalogs -eq $null )
		{
			continue;
		}
	
		# Iterate over the data files, if the file name ends with
		# something other than .pst, or resides somewhere other than
		# C: skip it.
		$Archives = $Catalogs.GetValueNames()
		foreach( $Archive in $Archives )
		{
			if(
				$Archive.ToLower().EndsWith(".pst") -And	$Archive.ToLower().StartsWith("c:") )
			{
				# Replace the local path with that of a remote path.
				$tmpPath = $Archive -Replace "C:\\", "\\$ComputerName\c$\"
				try
				{
					$FileInfo = Get-Item $tmpPath -ErrorAction Stop
				}
				Catch
				{
					# PST file doesn't exist, or we can't reach it,
					# continue on with the next one.
					continue;
				}
				
				# Add file info to array.
				$FileReturn = "" | Select Computer, Owner, Path, Size, Modified, Version
				$FileReturn.Computer = $ComputerName
				$FileReturn.Owner = (Get-Acl $tmpPath | select Owner).Owner
				$FileReturn.Path = $Archive
				$FileReturn.Size = $FileInfo.Length
				$FileReturn.Modified = $FileInfo.LastWriteTime


				#
				# PST Version.
				[system.io.stream]$fileStream = [system.io.File]::Open( (Get-Item $tmpPath) , 'Open' , 'Read' , 'ReadWrite' )

				try
				{
					[byte[]]$fileBytes = New-Object byte[] 11 # Length we need.
					[void]$fileStream.Read( $fileBytes, 0, 11);
					if ($fileBytes[10] -eq 23 )
					{
						$FileReturn.Version = "2003";
					}
					elseif ( ($fileBytes[10] -eq 14) -or ($fileBytes[10] -eq 15) )
					{
						$FileReturn.Version = "1997";
					}
					else
					{
						$FileReturn.Version = "Unknown";
					}
				}
				catch
				{
					$FileReturn.Version = "Error";
				}
				$fileStream.Close();

				$ReturnArray += $FileReturn
			}
		}
	}
	return $ReturnArray;
}


#
# Gets a list of object names from AD recursively
# ======================================================================
Function GetAdObjects
{
	Param(
		[string]$Path = $(throw "Path required."),
		[string]$desiredObjectClass = $(throw "DesiredObjectClass required.")
		)
	$ReturnArray = $null

	# Bind to AD using the provided path.
	$objADSI = [ADSI]$Path

	# Iterate over each object and add its name to the array.
	foreach( $obj in $objADSI.Children )
	{
		$thisItem = $obj | select objectClass,distinguishedName,name
		if (
			$thisItem.objectClass.Count -gt 0 -And
			$thisItem.objectClass.Contains( $desiredObjectClass)
			)
		{
			$ReturnArray += $thisItem.distinguishedName
		}
		elseif(
			$thisItem.objectClass.Count -gt 0 -And
			$thisItem.objectClass.Contains("organizationalUnit")
			)
		{
			# Init to null rather than @() so we dont add empty
			# values.
			$RecurseItems = $null
			$RecurseItems += GetAdObjects "LDAP://$($thisItem.distinguishedName.ToString())" $desiredObjectClass
			if( $RecurseItems.Count -gt 0 )
			{
				$ReturnArray += $RecurseItems
			}
		}
	}

	# Make sure we have items to return, otherwise we'll push
	# empty items to the array.
	if( $ReturnArray.Count -gt 0)
	{
		return $ReturnArray;
	}
}


#
# Converts a COMObect to a LargeInteger
# ======================================================================
function Convert-IADSLargeInteger([object]$LargeInteger)
{
	$type = $LargeInteger.GetType()
	$highPart = $type.InvokeMember("HighPart","GetProperty",$null,$LargeInteger,$null)
	$lowPart = $type.InvokeMember("LowPart","GetProperty",$null,$LargeInteger,$null)
	$bytes = [System.BitConverter]::GetBytes($highPart)
	$tmp = New-Object System.Byte[] 8
	[Array]::Copy($bytes,0,$tmp,4,4)
	$highPart = [System.BitConverter]::ToInt64($tmp,0)
	$bytes = [System.BitConverter]::GetBytes($lowPart)
	$lowPart = [System.BitConverter]::ToUInt32($bytes,0)
	$lowPart + $highPart
}

#
# Evaluate the lastLogonTimestamp attribute for accounts and pull ones 
# from the last 30 days only.
# ======================================================================
Function GetObjectsLoggedIntoSince
{
	Param(
		[Array] $Computers = $(throw "Computers required"),
		[int] $LoginDays = $(throw "LoginDays required")
		)

	$earliestAllowedLogon = [DateTime]::Today.AddDays($LoginDays)

	foreach( $Computer in $Computers )
	{
		$objADSI = [ADSI]"LDAP://$Computer"
		if( $objADSI.Properties.Contains("lastLogonTimeStamp") -eq $false )
		{
			continue;
		}

		$lastLogon = [DateTime]::FromFileTime(
			[Int64]::Parse(
				$(Convert-IADSLargeInteger $objADSI.lastlogontimestamp.value)
				)
			)
		if( [DateTime]::Compare( $earliestAllowedLogon , $lastLogon) -eq -1 )
		{
			$objADSI.name
		}
		continue;
	}
}

#
# Get computer accounts from Active Directory.
$OutArray = @()
$Computers = GetAdObjects "$cfgOU" "computer"
$Computers = GetObjectsLoggedIntoSince $Computers $cfgInterval

#
# Remove any previous jobs.
$jobsTotal = $(Get-Job).Count
$i = 0
if( $jobsTotal -gt 0)
{
	foreach( $job in Get-Job)
	{
		Write-Progress -Activity "Locating PST files" -Status "Removing existing jobs" -CurrentOperation "$i of $jobsTotal" -PercentComplete ($i / $jobsTotal * 100)
		$job | Remove-Job -Force
		$i++
	}
}

#
# If we have no computers to check, just exit.
if( $Computers.Count -le 0 )
{
	return;
}

#
# Create all the jobs.
$i = 0
ForEach ($Computer in $Computers)
{
	#
	# We're currently running at $MaxThreads, wait for one to close.
	While ((Get-Job -state running).count -ge $MaxThreads)
	{
		$statTotal = $computers.count
		$statComplete = $((Get-Job -state completed).count)
		$statInProgress = $((Get-Job -state running).count)
		Write-Progress -Activity "Locating PST files" -Status "Waiting for a scan to finish before starting another" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($i / $Computers.count * 100)
		Start-Sleep -Milliseconds $SleepTimer
		$JobsRunning = (Get-Job -state running).count
	}

	#
	# Start job.
	$i++
	Start-Job -ScriptBlock $GetPSTInfo -ArgumentList $Computer -Name $Computer | out-null
	$statTotal = $computers.count
	$statComplete = $((Get-Job -state completed).count)
	$statInProgress = $((Get-Job -state running).count)
	Write-Progress -Activity "Locating PST files" -Status "Starting a scan" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($i / $Computers.count * 100)
}


#
# Finishhed creating all jobs, waiting for remaining running jobs to
# complete.
While (@(Get-Job -State Running).count -gt 0)
{
	$statTotal = @(Get-Job).count
	$statComplete = $((Get-Job -state completed).count)
	$statInProgress = $((Get-Job -state running).count)
	Write-Progress -Activity "Locating PST files" -Status "Waiting on final scans to complete" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($statComplete / $statTotal * 100)
	Start-Sleep -Milliseconds $SleepTimer
}

#
# Handle completed jobs
ForEach($Job in Get-Job)
{
	$retVal = (Receive-Job $Job)
	if( $retVal -ne $null)
	{
		$OutArray += $retVal
	}
}
$OutArray | Export-Csv "$cfgOutpath" -NoClobber -NoTypeInformation

Using WMI

Using WMI is slow compared to relying on the registry, but will locate files that are not open in Outlook. The Windows Firewall “Windows Firewall: Allow remote administration exception” should be enabled so that WMI can be accessed remotely.

Unfortunately I couldn’t get the PowerShell job functionality to work well with Get-WMiObject, so systems are checked one by one which also slows things down.

#
# PST Scanning Utility (WMI)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Retrieves a list of computers (recursively) from an OU in Active
# Directory then uses WMI to search for PST files on the remote 'C:'
# drive, saving the results to CSV format with location, size and owner
# details.
#
# The advantage to performing a search is that PST files in non-default
# locations will be found, enumerating the registry only shows files in
# use by Outlook.
#
# This script doesn't make use of threading (Jobs) due to hangs/locks
# experienced when they were implemented.
#
# Changelog
# ~~~~~~~~~
# 2012.03.28	Dave Hope		Initial version.
# 2012.04.24	Dave Hope		Added PST file version information.
# 2012.04.26	Dave Hope		Added try/catch around file owner check.
#
# ======================================================================
# SETTINGS
# ======================================================================
$cfgOU = "LDAP://DC=nwtraders,DC=msft"
$cfgInterval = -30
$cfgOutpath = "H:\WMI.CSV"

# ======================================================================
# STOP CHANGING HERE.
# ======================================================================

#
# Scans the specified hostname for PST files, returning an array of data
# must of this is inline due to the nature of job functionality in PS.
# ======================================================================
Function GetPSTInfo
{
	Param( [string]$ComputerName = $(throw "ComputerName required.") )
	$ReturnArray = @()
	
	# Test connection first rather than relying on the slow
	# Get-WMiObject call to fail.
	if( (Test-Connection -ComputerName $ComputerName -Count 1 -Quiet) -eq $false )
	{
#		Write-Host "Failed communicating with $ComputerName - ICMP Unreachable"
		return;
	}

	# Connect and execute query.
	try
	{
		#Path,FileSize,LastModified,LastAccessed,Extension,Drive
		$PstFiles = Get-Wmiobject -namespace "root\CIMV2" -computername $computerName -ErrorAction Stop -Query "SELECT * FROM CIM_DataFile WHERE Extension = 'pst' AND Drive = 'c:'"
	}
	Catch
	{
#		Write-Host "Failed communicating with $ComputerName - Get-WMIObject failed"
		return;
	}
	# Iterate over the found PST files.
	foreach ($file in $PstFiles)
	{ 
		if($File.FileName)
		{ 
			$FileReturn = "" | select Computer,Owner,Path,FileSize,LastModified,LastAccessed,Version
			$filepath = $file.description 
						
			#
			# Try and find the owner of the file.
			$Owner = "Unknown";
			try
			{
				$query = "ASSOCIATORS OF {Win32_LogicalFileSecuritySetting=`'$filepath`'} WHERE AssocClass=Win32_LogicalFileOwner ResultRole=Owner" 
				$Owner = @(Get-Wmiobject -namespace "root\CIMV2" -computername $computerName -Query $query) 
				$Owner = "$($Owner[0].ReferencedDomainName)\$($Owner[0].AccountName)" 
			}
			catch
			{
#				Write-Host "Unable to determine the owner of a PST File on $ComputerName"
			}
			
			$FileReturn.Computer = $computerName
			$FileReturn.Path = $filepath 
			$FileReturn.FileSize = $file.FileSize/1KB 
			$FileReturn.Owner = $Owner
			$FileReturn.LastModified = [System.Management.ManagementDateTimeConverter]::ToDateTime($($file.LastModified))
			$FileReturn.LastAccessed = [System.Management.ManagementDateTimeConverter]::ToDateTime($($file.LastAccessed))

			#
			# Here, we're examining part of the PST file header.
			# We only need wVer (2bytes), so we seek to that position in
			# the file.
			$tmpPath = $filepath  -Replace "C:\\", "\\$ComputerName\c$\"
			[system.io.stream]$fileStream = [system.io.File]::Open( (Get-Item $tmpPath) , 'Open' , 'Read' , 'ReadWrite' )
			try
			{
				[byte[]]$fileBytes = New-Object byte[] 11 # Length we need.
				[void]$fileStream.Read( $fileBytes, 0, 11);
				if ($fileBytes[10] -eq 23 )
				{
					$FileReturn.Version = "2003";
				}
				elseif ( ($fileBytes[10] -eq 14) -or ($fileBytes[10] -eq 15) )
				{
					$FileReturn.Version = "1997";
				}
				else
				{
					$FileReturn.Version = "Unknown";
				}
			}
			catch
			{
				$FileReturn.Version = "Error";
			}
			$fileStream.Close();

			$ReturnArray += $FileReturn
		} 
	}
	return $ReturnArray;
}


#
# Gets a list of object names from AD recursively
# ======================================================================
Function GetAdObjects
{
	Param(
		[string]$Path = $(throw "Path required."),
		[string]$desiredObjectClass = $(throw "DesiredObjectClass required.")
		)
	$ReturnArray = $null

	# Bind to AD using the provided path.
	$objADSI = [ADSI]$Path

	# Iterate over each object and add its name to the array.
	foreach( $obj in $objADSI.Children )
	{
		$thisItem = $obj | select objectClass,distinguishedName,name
		if (
			$thisItem.objectClass.Count -gt 0 -And
			$thisItem.objectClass.Contains( $desiredObjectClass)
			)
		{
			$ReturnArray += $thisItem.distinguishedName
		}
		elseif(
			$thisItem.objectClass.Count -gt 0 -And
			$thisItem.objectClass.Contains("organizationalUnit")
			)
		{
			# Init to null rather than @() so we dont add empty
			# values.
			$RecurseItems = $null
			$RecurseItems += GetAdObjects "LDAP://$($thisItem.distinguishedName.ToString())" $desiredObjectClass
			if( $RecurseItems.Count -gt 0 )
			{
				$ReturnArray += $RecurseItems
			}
		}
	}

	# Make sure we have items to return, otherwise we'll push
	# empty items to the array.
	if( $ReturnArray.Count -gt 0)
	{
		return $ReturnArray;
	}
}


#
# Converts a COMObect to a LargeInteger
# ======================================================================
function Convert-IADSLargeInteger([object]$LargeInteger)
{
	$type = $LargeInteger.GetType()  
	$highPart = $type.InvokeMember("HighPart","GetProperty",$null,$LargeInteger,$null)  
	$lowPart = $type.InvokeMember("LowPart","GetProperty",$null,$LargeInteger,$null)  
	$bytes = [System.BitConverter]::GetBytes($highPart)  
	$tmp = New-Object System.Byte[] 8  
	[Array]::Copy($bytes,0,$tmp,4,4)  
	$highPart = [System.BitConverter]::ToInt64($tmp,0)  
	$bytes = [System.BitConverter]::GetBytes($lowPart)  
	$lowPart = [System.BitConverter]::ToUInt32($bytes,0)  
	$lowPart + $highPart  
} 

#
# Evaluate the lastLogonTimestamp attribute for accounts and pull ones 
# from the last 30 days only.
# ======================================================================
Function GetObjectsLoggedIntoSince
{
	Param(
		[Array] $Computers = $(throw "Computers required"),
		[int] $LoginDays = $(throw "LoginDays required")
		)

	$earliestAllowedLogon = [DateTime]::Today.AddDays($LoginDays)

	foreach( $Computer in $Computers )
	{
		$objADSI = [ADSI]"LDAP://$Computer"
		if( $objADSI.Properties.Contains("lastLogonTimeStamp") -eq $false )
		{
			continue;
		}

		$lastLogon = [DateTime]::FromFileTime(
			[Int64]::Parse(
				$(Convert-IADSLargeInteger $objADSI.lastlogontimestamp.value)
				)
			)
		if( [DateTime]::Compare( $earliestAllowedLogon , $lastLogon) -eq -1 )
		{
			$objADSI.name
		}
		continue;
	}
}

#
# Get computer accounts from Active Directory.
$OutArray = @()
$Computers = GetAdObjects "$cfgOU" "computer"
$Computers = GetObjectsLoggedIntoSince $Computers $cfgInterval

#
# If we have no computers to check, just exit.
if( $Computers.Count -le 0 )
{
	return;
}

#
# Create all the jobs.
$statTotal = $computers.count
$statComplete = 0
ForEach ($Computer in $Computers)
{
	Write-Progress -Activity "Locating PST files" -Status "Waiting for a scan to finish before starting another" -CurrentOperation "Total: $statTotal , Complete: $statComplete" -PercentComplete ($statComplete/$statTotal * 100)
	$RetVal = GetPSTInfo $Computer
	if( $RetVal -ne $null)
	{
		$OutArray += $retVal
	}
	$statComplete++
}

$OutArray | Export-Csv "$cfgOutpath" -NoClobber -NoTypeInformation

Which method is best?

In the environments I’ve tested these scripts in the WMI method returned significantly more PST files due to the simple fact that it runs a search on each client computer. If anyone has feedback on running these in their environments I’d love to hear it.

Published by

Dave Hope

Dave is a Principal Software Analyst for a UK based retirement developer, in his spare time he enjoys digital photography and rock climbing.

19 thoughts on “Locating PST files on a network”

  1. Hello,

    Great script, thank you very much for providing this.  One question, anyway to exclude the PSComputerName and RunspaceId properties from the results?  I have tries searching online and have failed to come up with the answer.  Thanks again.

    Sean

    1. Hi Sean, You could simply remove them using Excel or perhaps do something like:

      $OutArray | Select Computer,Owner,Path,FileSize,LastModifided | Export-Csv “$cfgOutpath” -NoClobber -NoTypeInformation

  2. Great Script!
    I am using the WMI script. A request for the next person wanting to do this PST search and destroy project. Is it possible to search all drives (non-removable) in the computer in one sweep? The current script does a great job at searching a single drive. If to search a different drive letter it is needed to change the drive letters, c:, d:, e:….z: on line 54 and output file on 24. This eliminates the issue of unknown/random drive letters on target computers. Otherwise, thank you again. Favorite wine (currently)?

  3. So I just ran the WMI version and it only found 91 computers… I have over 900 on my network. Do I need to specify an OU?

  4. This hasn’t been looked at in a while, so I doubt I’ll get a response. I’m using your WMI version, but altered to look at a different file extension. I’m wondering if there is a way to set it up to log ALL machines that it scans, not just the ones that have the specific file extension. Just so we have a way to report which machines were actually scanned. Thanks.

  5. Hi, I couldn’t get it working. I changed this 2 strings:

    $cfgOU = “LDAP://DC=nwtraders,DC=msft” (replace to my OU)
    $cfgOutpath = “H:\Registry.CSV” (replace to my C:\)

    Run Script, script ends without any errors or results (file C:\Registry.CSV wasn’t created).
    Can you help me to determine what’s the problem?

      1. Sure. Are you running the script as a user that has WMI access to remote hosts? – Also, can you browse to the admin shares on the computers? – I.E. c$ and admin$

  6. Hi, great work. Is there a way that we can identify who owns the PSTs using a script or investigating the files attributes (scripted hopefully)!!

    1. Hi Darren,

      The script already returns the owner of the file (as per NTFS). It doesn’t go as far as looking into the PST file however, I’m not sure how best that could be done as someone could have mail in there from shared mailboxes etc which would make it unreliable.

  7. Hi Dave, mine also returns no results. Most of our users have their pst files on Network shares, not on the C: drive. I’m specifically looking for users who have their pst files connected to V: drive paths.
    Is there any way to do this?

  8. Hi Dave,

    I know this is a old thread, but could you advise how to output all machines scanned and also display the OU’s they are currently in.

  9. Dave,
    Thanks for this script! It’s been very helpful for identifying people dragging their feet during our transition from On-Prem to O365. I made a couple changes to the WMI script, that may help someone else to generalize this a little more (rather than restrict it to a single OU).

    # SETTINGS
    #=========================================================
    $cfgInterval = -30
    $cfgOutpath = “C:\utils\log.CSV”

    And

    # Get computer accounts from Active Directory.
    $OutArray = @()
    $Computers = Get-ADComputer -filter *
    $Computers = GetObjectsLoggedIntoSince $Computers $cfgInterval

    If you really want to dig for .pst’s, make the change Dave mentioned above to Scott for searching all drives as well.

Leave a Reply

Your email address will not be published. Required fields are marked *