Analysing files with virustotal.com public API

I’ve sometimes some forensics tasks to do at work. A member of the network team gave me a USB stick with the incoming mail server traffic.
My task was to get files’ thumbprints and query virustotal.com using its public API to see if these file hashes are known. I’d have used the private API of virustotal if I’d the right to upload the file to virustotal.com.

Before plugging-in the USB stick, I uninstalled my antivirus and then disabled Windows Defender.

I recycled some obsolete code I already published in January 2013 that you can find on this page.

One of the problems with my previous code was that the URL https://www.virustotal.com/search/ it used is not valid anymore. Now virustotal supports multilanguage and the correct URL is https://www.virustotal.com/en/search/

The second problem with my code was that the core regular expression is obsolete and doesn’t match the updated HTML code returned by a search.

To fix the code, I had to shift back to the methodology I used in the first place in 2013 before coming up with the PowerShell code.
In other words, I had to analyse the web queries (hearders and redirection) and their HTML code.
I submitted manually both a known safe SHA2 thumbprint of a file, a24f400c4fc6b7d4085f9e264f6d3f70c13d678c0e34b3e31f9163cf10e423ec , a known malware thumbprint 7e8bb57ad97ace3aa4a8f3ecaf5538e84b58a06e16569f3f60f190fa3e83f80b and a totally unknown thumbprint that has never been submitted to virustotal.com
…and started a network capture in IE11

Then with the above SHA2 checksum, I did:

$res = (Invoke-WebRequest -Uri 'https://www.virustotal.com/en/search/' -Method Post -Body "query=$hash" -MaximumRedirection 0 -ErrorAction SilentlyContinue)            
$page = Invoke-WebRequest -Method GET -Uri $res.Headers.Location -ErrorAction SilentlyContinue -MaximumRedirection 0

and I started to use the excellent Get-Matches function provided by Dr. Tobias Weltner: http://powershell.com/cs/blogs/tobias/archive/2011/10/27/regular-expressions-are-your-friend-part-1.aspx

I started to find a regular expression that matches each case:

$page.AllElements.FindById('antivirus-results').outerHTML | 
Get-Matches -Pattern '<TD\sclass=ltr>(?<ID>.*)\s</TD>'

$page.AllElements.FindById('antivirus-results').outerHTML | 
Get-Matches -Pattern '<TD\sclass="ltr\stext\-green"><I\stitle="(?<ID>.*)"\sclass=icon\-ok\-sign\sdata\-toggle="tooltip"></I></TD>'

for the following HTML code:
and

$page.AllElements.FindById('antivirus-results').outerHTML | 
Get-Matches -Pattern '<TD\sclass="ltr\stext-red">(?<ID>.*)\s</TD>'

for the following HTML code:

The last step consisted in merging all the 3 regular expressions into a single one like this:

$page.AllElements.FindById('antivirus-results').outerHTML | 
Get-Matches -Pattern '<TD\sclass=("?)ltr(>|\stext\-green"><I\stitle="|\stext-red">)(?<ID>.*)("\sclass=icon\-ok\-sign\sdata\-toggle="tooltip"></I></TD>|\s</TD>)'

Here’s the full code I used to obtain at the end an Excel file containing the results:

#Requires -version 4.0

$allfiles = @()
# We first clear all errors in the automatic variable
$Error.Clear()
# We capture all files and let error happen silently and being logged into the $error automatic variable
$allfiles = Get-ChildItem -Path C:\ZIP -Recurse -Force -Include * -File -ErrorAction SilentlyContinue

# Let us know what happen
$Error | Where { $_.CategoryInfo.Reason -eq "PathTooLongException" } | ForEach-Object -Begin{
    Write-Warning -Message "The following folders contain a file longer than 260 characters"
    # Get-ChildItem : The specified path, file name, or both are too long. 
    # The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
} -Process {
    $_.CategoryInfo.TargetName
}
Write-Verbose -Message ('There {0} other type of errors' -f ($Error | Where { $_.CategoryInfo.Reason -ne "PathTooLongException" }).Count) -Verbose
Write-Verbose -Message ("There's a total of {0} files" -f $allfiles.Count) -Verbose

# Show extensions by occurence
$allfiles | Group -NoElement -Property Extension | Sort -Property Count -Descending

Start-Sleep -Seconds 1

$totalzip = ($allfiles | Where { $_.Extension -eq '.zip' }).Count
$filecount = 0

# Select only files with a zip extension
$results = @()
$allfiles | Where { $_.Extension -eq '.zip' } | 
ForEach-Object {
   
    $filecount++

    $hash = $res = $page = $checksum = $obj = $outtext = $null
    
    Start-Sleep -Milliseconds (Get-Random -Maximum 750 -Minimum 500)

    $hash = (Get-FileHash -Path $_.FullName -Algorithm SHA256).Hash
    
    Write-Verbose -Message ('Searching file {2}/{3} on {0} with sha256 {1}' -f $_.FullName,$hash,$filecount,$totalzip) -Verbose
    
    # Append a SHA256
    $_ | Add-Member -MemberType NoteProperty -Name SHA256 -Value $hash -Force

    # Search virustotal by SHA256
    $res = (Invoke-WebRequest -Uri 'https://www.virustotal.com/en/search/' -Method Post -Body "query=$hash" -MaximumRedirection 0 -ErrorAction SilentlyContinue)

    if ($res.StatusCode -eq 302 ) {
        if ($res.Headers.Location) {
            try {
                $page = Invoke-WebRequest -Method GET -Uri $res.Headers.Location -ErrorAction SilentlyContinue -MaximumRedirection 0
            } catch {
                Write-Warning -Message "The request on $($res.Headers.Location) returned $($_.Exception.Message)"
            }
            if ($page.Headers.Location -notmatch "file/not/found/") {
                try {
                    $obj = New-Object -TypeName PSObject
                    $outtext = ($page.AllElements | Where { ($_.TagName -eq 'TBODY') -and ($_.outerHTML -match "$hash") -and ($_.outerText -match "Detection\sratio") }).OuterText
                    if ($outtext) {
                        $outtext -split "`n" |  ForEach-Object {
                            if ($_ -match ":") {
                            $obj | Add-Member -MemberType NoteProperty -Name ($_ -split ":")[0] -Value (-join($_ -split ":" )[1..($_ -split ":" ).count]) -Force
                            } else {
                                Write-Warning -Message "Mismatch with ':'"
                                $_
                            }
                        }
                        # Analysis tab
                        $count = 0
                        $analysisar = @()
                        $AVName = $DetectionDate = $DetectionRate = $null
   
                        ([regex]'<TD\sclass=("?)ltr(>|\stext\-green"><I\stitle="|\stext-red">)(?<ID>.*)("\sclass=icon\-ok\-sign\sdata\-toggle="tooltip"></I></TD>|\s</TD>)').Matches(
                        $page.AllElements.FindById('antivirus-results').outerHTML) | ForEach-Object -Process {
                            $count++
                            switch ((@($_.Groups))[-1]) {
                                {$_ -match '\d{8}'} { $DetectionDate = $_ ; break}
                                {$_ -match '(^\-$|^File\snot\sdetected$)'}{ $DetectionRate = '-' ; break }
                                {$_ -match '.*'} {
                                    if ($count -eq 1) {
                                        $AVName = $_
                                    } else {
                                        $DetectionRate = $_
                                    }
                                    ; break
                                }
                            }
                            if ($count -eq 3) {
                                $count = 0
                                $analysisar += New-Object -TypeName PSObject -Property @{
                                    Update = $DetectionDate
                                    Result = $DetectionRate
                                    Antivirus = $AVName
                                }
                            }
                        }
                        $obj | Add-Member -MemberType NoteProperty -Name Analysis -Value $analysisar -Force
                        $obj
                        $_ | Add-Member -MemberType NoteProperty -Name VTResults -Value $obj -Force
                    } else {
                        # Write-Warning -Message "$outtext # is because the file has probably been never submitted"
                        # it shouldn't happen but... who knows
                        $_ | Add-Member -MemberType NoteProperty -Name VTResults -Value ([string]::Empty) -Force
                    }
                } catch {
                    $_
                }
            } else {
                Write-Warning -Message "the file was not found"
                $_ | Add-Member -MemberType NoteProperty -Name VTResults -Value "Unknown by VT" -Force
            }
        } else {
            Write-Warning -Message "the location in the header is empty"
            $_ | Add-Member -MemberType NoteProperty -Name VTResults -Value "Header empty issue" -Force
        }
    } else {
        Write-Warning -Message "the page returned a $($res.StatusCode) status code"
        $_ | Add-Member -MemberType NoteProperty -Name VTResults -Value "Status code issue" -Force
    }
    $results += $_
}

Write-Verbose -Message ("There's a total of {0} results" -f $results.Count) -Verbose

Write-Verbose -Message ("There's a total of {0} unknown files" -f (
    $results | Where 'VTResults' -eq "Unknown by VT").Count
) -Verbose

# Export results to CSV

# First unknown files
($results | Where 'VTResults' -eq "Unknown by VT") | 
Select Name,FullName,SHA256,
    @{l='Ratio';e={'Unknown by VT'}},
    @{l='MalwareName';e={[string]::Empty}} | 
Export-Csv -Path "$($env:USERPROFILE)\Documents\VT.Zipfiles.analysis.csv" -Encoding UTF8  -NoTypeInformation -Delimiter ";"

# Then export files that are identified as malware
Write-Verbose -Message ("There's a total of {0} known files" -f (
    $results | Where 'VTResults' -notin @("Unknown by VT","Header empty issue","Status code issue")).Count
) -Verbose

($results | Where 'VTResults' -notin @("Unknown by VT","Header empty issue","Status code issue")) |
Select Name,FullName,SHA256,
    @{l='Ratio';e={
        $_.VTResults.'Detection Ratio' -replace '\s',''
    }},
    @{l='MalwareName';e={
        ($_.VTResults.Analysis | Where { $_.Result -notmatch "^\-$"}).Result -as [string[]]
    }} |
Export-Csv -Path "$($env:USERPROFILE)\Documents\VT.Zipfiles.analysis.csv" -Encoding UTF8 -Append -NoTypeInformation -Delimiter ";"

With the above code, I think it took about an hour to complete the scan of ~2500 files (I went to lunch and didn’t measure it).
Notice the Start-Sleep cmdlet in the foreach loop to avoid being too aggressive with the virustotal public API.
Here are some samples of the screen output I had while processing files.



And here is the Excel file after 2 hours of work ๐Ÿ˜€

PowerShell rocks! ๐Ÿ˜Ž

Advertisements

4 thoughts on “Analysing files with virustotal.com public API

    • The above code may not be working anymore as VT updates its HTML code.
      I’ll have a look when I’ve got some time and will probably write another blog post to answer your question.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s