2013 Scripting Games Event 5

Instructions for the Event 5, the Logfile Labyrinth:

First let’s see how my entry http://scriptinggames.org/entrylist_.php?entryid=1069 performs:

  • Using the 1MB large log files from the zip file in the instructions
  • Using 4GB of files located on my laptop that doesn’t have a SSD disk 😦

Although it wasn’t a requirement, I chose to write a function that could parse many files including large files of 25MB,125MB,..as quick as possible without having a negative impact on memory.

There are many ways to read files content and IIS log files. Here’s a non- exhaustive list I found on the web:

  • Using a SQL DB
  • Using the New-Object -ComObject MSUtil.LogQuery from LogParser
  • Using the built-in Select-String cmdlet
  • Using the built-in switch statement that has a FilePath parameter
  • Using the built-in Get-Content cmdlet that reads lines 1 by 1 by default, or using -ReadCount 0 for all lines, or using a specific number of lines (-Readcount 1000 for example)
  • Using the .Net methods of the class [IO.File]
  • Using the .Net methods associated with the [IO.StreamReader] class
  • Using the [regex] object (my favorite)

All of these methods perform differently, may consume a lot of RAM,…
To select the method, I’ve tested the following

Get-ChildItem -Path .\LogFiles\W3SVC1 | % {             
New-Object -TypeName psobject -Property @{            
    FileName = $_.FullName            
    'FileSize(MB)' = '{0:N2}'-f($_.Length/1MB)            
    GCMethod = (Measure-Command {            
        Get-Content $_.FullName | Out-Null             
    }).ToString()            
    GC1000Method = (Measure-Command {            
        Get-Content $_.FullName -ReadCount 1000 | % { $_ } | Out-Null             
    }).ToString()            
    StreamMethod = (Measure-Command {            
        $reader = new-object System.IO.StreamReader -ArgumentList $_.FullName            
        while ( ($line = $reader.readline()))  {            
        $line            
        }            
    }).ToString()            
}}

Here are a few points that you may notice in my entry.

  • The regular expression in the ValidatePattern attribute will allow IPV4 and IPV6 IP addresses. It means that you can type for example:
    127.0.*
    192.168.*
    *1
    ::1
    2001:*
    fe80*

    I didn’t use the following code because it gave from strange results

    try {            
            [IPAddress]::Parse($_) | Out-Null            
        } catch {            
            throw "An invalid IP address was specified."            
        }

    As I use the -LIKE operator to filter the resulting array of unique IP Addresses, I needed to know whether the Pattern parameter of my function was used so that I can make sure it ends with a wildcard.

    if ($PSBoundParameters.ContainsKey('Pattern')) {            
        if (-not($Pattern.EndsWith("*"))) {            
            $Pattern = "$Pattern*"            
        }            
        Write-Verbose "Using pattern $Pattern to filter the output"            
    }
  • I’ve used the following MSDN page to find out the different file names and understand the different format of IIS log files
  • I’ve also extracted the first four lines in a W3C log file to determine in what column the “c-ip” is located. I used the -TotalCount parameter of the built-in Get-Content for this purpose.
  • You may notice that I also used the new -notin operator of PowerShell version 3.0 to avoid creating a huge array in memory when reading big files containing millions of lines.
  • The total length of my progress bar represents the total size of files. The progress is proportional to the file size.

Here’s my full entry 😎

#Requires -Version 3            
            
Function Get-IPFromIISLog {            
            
[CmdletBinding()]            
Param(            
[Parameter()]            
[ValidatePattern("^(\d|[a-f]|:|\*|\.|\%)*$")]            
[string]$Pattern="*",            
            
[Parameter()]            
[ValidateScript({            
    Test-Path -Path $_ -PathType Container            
})]            
[string]$FilePath = ".\"            
            
)            
Begin {            
            
    if ($PSBoundParameters.ContainsKey('Pattern')) {            
        if (-not($Pattern.EndsWith("*"))) {            
            $Pattern = "$Pattern*"            
        }            
       Write-Verbose "Using pattern $Pattern to filter the output"            
    }            
            
    Function Get-LineStream {            
    [CmdLetBinding()]            
    Param(            
        [int32]$Index,            
        [string]$Separator,            
        [String]$Path            
    )            
    Begin {            
        $arIP = @()            
    }            
    Process {            
        try {            
            $StreamReader =  New-object System.IO.StreamReader -ArgumentList (Resolve-Path $Path -ErrorAction Stop).Path            
            Write-Verbose "Reading Stream of file $Path"            
            while ( $StreamReader.Peek() -gt -1 )  {            
                $Line = $StreamReader.ReadLine()            
                if ($Line.length -eq 0 -or $Line -match "^#") {            
                    continue            
                }            
                $result = ($Line -split $Separator)[$Index]            
                if ($result -notin $arIP) {            
                    $arIP += $result            
                }            
            }            
            $StreamReader.Close()            
            $arIP            
        } catch {            
            Write-Warning -Message "Failed to read $Path because $($_.Exception.Message)"            
        }            
    }            
    End {}            
    } # end of function            
            
}            
Process {            
                
    try {            
        $allFiles = Get-ChildItem -Path $FilePath -Filter *.LOG -Recurse -ErrorAction Stop            
    } catch {            
        Write-Warning -Message "Failed to enumerate files under $FilePath because $($_.Exception.Message)"            
        break            
    }            
            
    if ($allFiles) {            
        $IPCollected = @()            
        $Count = 1            
        $FileSizeSum = 0            
        $TotalSize = (($allFiles | ForEach-Object { $_.Length }) | Measure-Object -Sum).Sum            
        $allFiles | ForEach-Object {            
            $File = $_            
            $FileSizeSum += $File.length            
            $WPHT = @{            
                Activity = "Reading file $($File.Name) of size $('{0:N2}'-f ($File.Length/1MB))MB" ;            
                Status = '{0} over {1}' -f $Count,($allFiles).Count ;            
                PercentComplete  = ($FileSizeSum/$TotalSize*100) ;            
            }            
            Write-Progress @WPHT            
            $Count++            
            
            # Based on the file name we know the IIS Format            
            Switch -Regex ($File.Name) {            
                '^u_ex.*\.log'  { $IISLogFormat = 'W3C'    ; break}            
                '^ex.*\.log'    { $IISLogFormat = 'W3C'    ; break}            
                '^in.*\.log'    { $IISLogFormat = 'IIS'    ; break}            
                '^nc.*\.log'    { $IISLogFormat = 'NCSA'   ; break}            
                default         { $IISLogFormat = 'Custom' ; break}            
            }            
            Switch ($IISLogFormat) {            
                'W3C' {            
                    Write-Verbose "Reading W3C formatted file $($File.FullName)"            
                    try {            
                        $First4Lines = Get-Content -Path $($File.FullName) -TotalCount 4 -ErrorAction Stop            
                    } catch {            
                        Write-Warning "Failed to read the content of the file $($File.Name) because $($_.Exception.Message)"            
                    }            
                    if ($First4Lines) {            
                        $i = -1            
                        $Index = ($First4Lines[-1] -split "\s" | ForEach-Object {            
                            [PSObject]@{ Index=$i ; FieldName = $_}            
                            $i++            
                        } | Where-Object { $_.FieldName -eq "c-ip"}).Index                                    
                    }            
                    if ($Index) {            
                        [array]$IPCollected += (Get-LineStream -Path $File.FullName -Separator "\s" -Index $Index)            
                    } else {            
                        Write-Warning "Could not find the c-ip field in the W3C log file $($File.FullName)"            
                    }            
                }            
                IIS {            
                    Write-Verbose "Reading IIS formatted file $($File.FullName)"            
                    [array]$IPCollected += (Get-LineStream -Path $File.FullName -Index 0 -Separator ",")            
                }            
                NCSA {            
                    Write-Verbose "Reading NCSA formatted file $($File.FullName)"            
                    [array]$IPCollected += (Get-LineStream -Path $File.FullName -Index 0 -Separator "\s")            
                }            
                default {            
                    Write-Warning "Cannot parse a custom log file $($File.FullName)"            
                }            
            }            
            $IPCollected = ($IPCollected | Sort -Unique)            
        }            
        Write-Verbose ("A total of {0} unique IP were collected" -f $($IPCollected.Count))            
        $IPCollected | Where-Object { $_ -like $Pattern }            
    } else {            
        Write-Warning "No file with .LOG extension found in this folder and subtree"            
    }            
}            
End {}            
}            
Advertisements

5 thoughts on “2013 Scripting Games Event 5

  1. Really nice shot. I gave you five stars for this script because I found it extremely interesting from a technical point of view for the way you get to access big files really fast and for the way you handle all the different types of IIS logs.One question I have to this regard is that you say you prefer regex methods to read logfiles and then use IO.streamreader instead. What’s the explanation for this choice?

    I also liked the progress bar based on file size. Great!

    If I can suggest something, since you are using v3, why not to replace $_ with $PSItem and keep yourt code uniform?
    Also, the task asked for no sorting whatsoever, so, why sorting $IPCollected instead of just selecting?
    Carlo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s