Instructions for the Event 5, the Logfile Labyrinth:
First let’s see how my entry http://scriptinggames.org/entrylist_.php?entryid=1069 performs:
- Using the 1MB large log files from the zip file in the instructions
- Using 4GB of files located on my laptop that doesn’t have a SSD disk 😦
Although it wasn’t a requirement, I chose to write a function that could parse many files including large files of 25MB,125MB,..as quick as possible without having a negative impact on memory.
There are many ways to read files content and IIS log files. Here’s a non- exhaustive list I found on the web:
- Using a SQL DB
- Using the New-Object -ComObject MSUtil.LogQuery from LogParser
- Using the built-in Select-String cmdlet
- Using the built-in switch statement that has a FilePath parameter
- Using the built-in Get-Content cmdlet that reads lines 1 by 1 by default, or using -ReadCount 0 for all lines, or using a specific number of lines (-Readcount 1000 for example)
- Using the .Net methods of the class [IO.File]
- Using the .Net methods associated with the [IO.StreamReader] class
- Using the [regex] object (my favorite)
All of these methods perform differently, may consume a lot of RAM,…
To select the method, I’ve tested the following
Get-ChildItem -Path .\LogFiles\W3SVC1 | % { New-Object -TypeName psobject -Property @{ FileName = $_.FullName 'FileSize(MB)' = '{0:N2}'-f($_.Length/1MB) GCMethod = (Measure-Command { Get-Content $_.FullName | Out-Null }).ToString() GC1000Method = (Measure-Command { Get-Content $_.FullName -ReadCount 1000 | % { $_ } | Out-Null }).ToString() StreamMethod = (Measure-Command { $reader = new-object System.IO.StreamReader -ArgumentList $_.FullName while ( ($line = $reader.readline())) { $line } }).ToString() }}
Here are a few points that you may notice in my entry.
-
The regular expression in the ValidatePattern attribute will allow IPV4 and IPV6 IP addresses. It means that you can type for example:
127.0.*
192.168.*
*1
::1
2001:*
fe80*I didn’t use the following code because it gave from strange results
try { [IPAddress]::Parse($_) | Out-Null } catch { throw "An invalid IP address was specified." }
As I use the -LIKE operator to filter the resulting array of unique IP Addresses, I needed to know whether the Pattern parameter of my function was used so that I can make sure it ends with a wildcard.
if ($PSBoundParameters.ContainsKey('Pattern')) { if (-not($Pattern.EndsWith("*"))) { $Pattern = "$Pattern*" } Write-Verbose "Using pattern $Pattern to filter the output" }
-
I’ve used the following MSDN page to find out the different file names and understand the different format of IIS log files
- I’ve also extracted the first four lines in a W3C log file to determine in what column the “c-ip” is located. I used the -TotalCount parameter of the built-in Get-Content for this purpose.
- You may notice that I also used the new -notin operator of PowerShell version 3.0 to avoid creating a huge array in memory when reading big files containing millions of lines.
-
The total length of my progress bar represents the total size of files. The progress is proportional to the file size.
Here’s my full entry 😎
#Requires -Version 3 Function Get-IPFromIISLog { [CmdletBinding()] Param( [Parameter()] [ValidatePattern("^(\d|[a-f]|:|\*|\.|\%)*$")] [string]$Pattern="*", [Parameter()] [ValidateScript({ Test-Path -Path $_ -PathType Container })] [string]$FilePath = ".\" ) Begin { if ($PSBoundParameters.ContainsKey('Pattern')) { if (-not($Pattern.EndsWith("*"))) { $Pattern = "$Pattern*" } Write-Verbose "Using pattern $Pattern to filter the output" } Function Get-LineStream { [CmdLetBinding()] Param( [int32]$Index, [string]$Separator, [String]$Path ) Begin { $arIP = @() } Process { try { $StreamReader = New-object System.IO.StreamReader -ArgumentList (Resolve-Path $Path -ErrorAction Stop).Path Write-Verbose "Reading Stream of file $Path" while ( $StreamReader.Peek() -gt -1 ) { $Line = $StreamReader.ReadLine() if ($Line.length -eq 0 -or $Line -match "^#") { continue } $result = ($Line -split $Separator)[$Index] if ($result -notin $arIP) { $arIP += $result } } $StreamReader.Close() $arIP } catch { Write-Warning -Message "Failed to read $Path because $($_.Exception.Message)" } } End {} } # end of function } Process { try { $allFiles = Get-ChildItem -Path $FilePath -Filter *.LOG -Recurse -ErrorAction Stop } catch { Write-Warning -Message "Failed to enumerate files under $FilePath because $($_.Exception.Message)" break } if ($allFiles) { $IPCollected = @() $Count = 1 $FileSizeSum = 0 $TotalSize = (($allFiles | ForEach-Object { $_.Length }) | Measure-Object -Sum).Sum $allFiles | ForEach-Object { $File = $_ $FileSizeSum += $File.length $WPHT = @{ Activity = "Reading file $($File.Name) of size $('{0:N2}'-f ($File.Length/1MB))MB" ; Status = '{0} over {1}' -f $Count,($allFiles).Count ; PercentComplete = ($FileSizeSum/$TotalSize*100) ; } Write-Progress @WPHT $Count++ # Based on the file name we know the IIS Format Switch -Regex ($File.Name) { '^u_ex.*\.log' { $IISLogFormat = 'W3C' ; break} '^ex.*\.log' { $IISLogFormat = 'W3C' ; break} '^in.*\.log' { $IISLogFormat = 'IIS' ; break} '^nc.*\.log' { $IISLogFormat = 'NCSA' ; break} default { $IISLogFormat = 'Custom' ; break} } Switch ($IISLogFormat) { 'W3C' { Write-Verbose "Reading W3C formatted file $($File.FullName)" try { $First4Lines = Get-Content -Path $($File.FullName) -TotalCount 4 -ErrorAction Stop } catch { Write-Warning "Failed to read the content of the file $($File.Name) because $($_.Exception.Message)" } if ($First4Lines) { $i = -1 $Index = ($First4Lines[-1] -split "\s" | ForEach-Object { [PSObject]@{ Index=$i ; FieldName = $_} $i++ } | Where-Object { $_.FieldName -eq "c-ip"}).Index } if ($Index) { [array]$IPCollected += (Get-LineStream -Path $File.FullName -Separator "\s" -Index $Index) } else { Write-Warning "Could not find the c-ip field in the W3C log file $($File.FullName)" } } IIS { Write-Verbose "Reading IIS formatted file $($File.FullName)" [array]$IPCollected += (Get-LineStream -Path $File.FullName -Index 0 -Separator ",") } NCSA { Write-Verbose "Reading NCSA formatted file $($File.FullName)" [array]$IPCollected += (Get-LineStream -Path $File.FullName -Index 0 -Separator "\s") } default { Write-Warning "Cannot parse a custom log file $($File.FullName)" } } $IPCollected = ($IPCollected | Sort -Unique) } Write-Verbose ("A total of {0} unique IP were collected" -f $($IPCollected.Count)) $IPCollected | Where-Object { $_ -like $Pattern } } else { Write-Warning "No file with .LOG extension found in this folder and subtree" } } End {} }