Using CloudSearch to read CloudTrail Logs: [Part 3]

In my previous posts Using CloudSearch to read CloudTrail Logs: [Part 1] and [Part 2] I described about setting up the CloudSearch environment and other preparedness to get started with uploading CloudTrail logs to your CloudSearch domain. Now that you have the base environment setup, you will need a worker process that is executed periodically to upload logs from S3 to your CloudSearch domain; which is a PowerShell script in this case.

A quick summary of the environment, you have enabled CloudTrail logging with SNS notification and have subscribed to this SNS topic via SQS. This ensure SNS notifications are queued and can be processed at a later point in time. Finally, you have also launched an IAM role enabled EC2 instance with a policy that will allow access to SQS, S3 and CloudSearch. Now, lets break the script that will perform the actual job of moving CloudTail logs from S3 to CloudSearch.

Before we jump on to the script, here are few requirements for this script to run.

  1. 7zip: Used to unzip the downloaded “gz” file from S3. You may choose to use a different program.
  2. AWS CLI for CloudSeach: AWS CLI is used to upload the logs to CloudSearch. At the time of writing this script, Write-CSDDocuments would always error out “Request forbidden by administrative rules” even if document upload policy is wide open. I had escalated this to the CloudSearch team and they acknowledged this as a bug.

Here is the script block that does the job.

# Store all required parameters in variables
$CloudSearchDomain = 'Your CloudSearch Domain Name'
$DomainEndpoint = 'Your CloudSearch Document Endpoint'
$Source = 'Directory Where you want to save the downloaded S3 file'
$Output = 'Directory where you want to save processed files'
$UploadFormat = 'json'
$Region = 'AWS Region where CloudSearch runs in'
$ContentType = 'application/json'
$CloudTrailQueue = 'The SQS Queue Endpoint'

This block above assigns variables that would be used later in the script.

# Read SQS queue and receive required messages
$SQSMessages = Receive-SQSMessage $CloudTrailQueue -VisibilityTimeout 60 -MaxNumberOfMessages 1

This block above reads SQS messages and initiates further processing.

#Process SQS Message, read S3 Bucket and Key from the message and extract JSON file
$SQSMessages | % {
    Try {
        $SQSMessage = $_
        $SNSMessage = $SQSMessage.Body | ConvertFrom-Json
        $Null = Read-S3Object -BucketName $SNSMessage.s3Bucket -Key $SNSMessage.s3ObjectKey[0]`
        -File "$Source\CloudTrail.json.gz"
        Start-Process -Wait -FilePath 'C:\Program Files\7-Zip\7z.exe' '-y e`
        CloudTrail.json.gz' -WorkingDirectory $Source

This block above downloads the CloudTrail logs from S3 and stores in a directory for further processing. This block is where you will change the archive program if you decide not to use 7zip.

# Rebuild JSON file in SDF format
        $a = (Get-Content "$Source\CloudTrail.json" -Raw) | ConvertFrom-Json
        $i = 0
        ForEach ($Record in $a.Records){
        $random = (Get-Random)
        $File = "$Output\$random.json"
        $guid = [guid]::NewGuid()
        $Rec = $a.Records[$i]
        Set-Content -Path $File -Value @"
    {"type" : "add",`n
     "id": "$guid",`n
     "fields": {`n
        "aws_region": "$($Rec.awsRegion)",`n
        "event_id": "$($Rec.eventID)",`n
        "event_name": "$($Rec.eventName)",`n
        "event_source": "$($Rec.eventSource)",`n
        "event_time": "$($Rec.eventTime)",`n
        "eventtype": "$($Rec.eventType)",`n
        "request_id": "$($Rec.requestID)",`n
        "source_ip_address": "$($Rec.sourceIPAddress)",`n
        "user_agent": "$($Rec.userAgent)",`n
        "accesskey": "$($Rec.userIdentity.accessKeyId)",`n
        "account_id": "$($Rec.userIdentity.accountId)",`n
        "arn": "$($Rec.userIdentity.arn)",`n
        "principalid": "$($Rec.userIdentity.principalId)",`n
        "usertype": "$($Rec.userIdentity.type)",`n
        "username": "$($Rec.userIdentity.userName)",`n
        "eventversion": "$($Rec.eventVersion)",`n
        "recipientaccountid": "$($Rec.recipientAccountId)"`n
        $i = $i+1

This block loads the unzipped file in a variable. It then assigns a random name for each record from the source JSON file and saves to another directory in JSON format. Next, it loads each saved file and re-formats it to a SDF file.

# Upload each file to CloudSearch Domain. Had to use AWS CLI because at the time of writing this
        # script, Write-CSDDocuments cmdlet would always error.
        ForEach ($Item in (dir $Output "*.json")){
        $uri = @'
        cmd.exe /C "c:\Program Files\Amazon\AWSCLI\aws.exe" cloudsearchdomain --region $Region --endpoint-url $DomainEndpoint upload-documents --content-type $ContentType --documents $Item.Fullname
        Invoke-Expression -Command:$uri

This block above will upload each file as individual document to the specified CloudSearch Domain.

# CleanUp working directories and processed SQS message.
        Remove-Item -Path "$Source\CloudTrail.*" -Force
        Remove-Item -Path "$Output\*.json" -Force
        Remove-SQSMessage -QueueUrl $CloudTrailQueue -ReceiptHandle $SQSMessage.ReceiptHandle -Force

     # Catch any error during the script execution.
        #Log errors to the console
        Write-Host "Error!" $_

Finally, this block cleans up all the working directories and deletes the SQS message that was just processed. This is to ensure that these objects are not re-processed while the script is executed again. And the last portion of the script catches any errors that occur while the script is running.



1 Comment

Using CloudSearch to read CloudTrail logs [Part 2]

In my previous post Using CloudSearch to read CloudTrail Logs: [Part 1] I described about setting up the CloudSearch environment and other preparedness to get started with uploading CloudTrail logs to your CloudSearch domain. One last thing is get the right indexes created for your search domain corresponding to your document uploads and this is what we will do now. An index is what defines what content within you documents are searchable. So having right indexes created is essential, not only to be able to search your documents, but is also required for successful document uploads. Indexes for the CloudSearch domain are all field names under the fields block. While uploading documents, CloudSeach is concerned about the index fields and would not allow you upload contents until they match with the documents.

AWS supports the following index fields: (As in AWS Knowledgebase)

  • date—contains a timestamp. Dates and times are specified in UTC (Coordinated Universal Time) according to IETF RFC3339: yyyy-mm-ddTHH:mm:ss.SSSZ. In UTC, for example, 5:00 PM August 23, 1970 is: 1970-08-23T17:00:00Z. Note that you can also specify fractional seconds when specifying times in UTC. For example, 1967-01-31T23:20:50.650Z.
  • date-array—a date field that can contain multiple values.
  • double—contains a double-precision 64-bit floating point value.
  • double-array—a double field that can contain multiple values.
  • int—contains a 64-bit signed integer value.
  • int-array—an integer field that can contain multiple values.
  • latlon—contains a location stored as a latitude and longitude value pair (lat, lon).
  • literal—contains an identifier or other data that you want to be able to match exactly. Literal fields are case-sensitive.
  • literal-array—a literal field that can contain multiple values.
  • text—contains arbitrary alphanumeric data.
  • text-array—a text field that can contain multiple values.

The options you can configure for a field vary according to the field type: (As in AWS Knowledgebase)

  • HighlightEnabled—You can get highlighting information for the search hits in any HighlightEnabled text field. Valid for: text, text-array.
  • FacetEnabled—You can get facet information for any FacetEnabled field. Text fields cannot be used for faceting. Valid for: int, int-array, date, date-array, double, double-array, latlon, literal, literal-array.
  • ReturnEnabled—You can retrieve the value of any ReturnEnabled field with your search results. Note that this increases the size of your index, which can increase the cost of running your domain. When possible, it’s best to retrieve large amounts of data from an external source, rather than embedding it in your index. Since it can take some time to apply document updates across the domain, critical data such as pricing information should be retrieved from an external source using the returned document IDs. Valid for: int,int-array, date, date-array, double, double-array, latlon, literal, literal-array, text, text-array.
  • SearchEnabled—You can search the contents of any SearchEnabled field. Text fields are always searchable. Valid for: int, int-array, date, date-array, double, double-array,latlon, literal, literal-array, text, text-array.
  • SortEnabled—You can sort the search results alphabetically or numerically using any SortEnabled field. Array-type fields cannot be SortEnabled. Only sort enabled numeric fields can be used in expressions. Valid for: int, date, latlon, double, literal, text.

Based on the above following are the index fields that you will need either manually created or through cs-configure-from-batches to build indexes for CloudTrail logs. Following are the indexes:

Index Field Name Index Type
event_name literal
aws_region literal
event_id literal
request_id literal
source_ip_address literal
user_identity_account_id literal
user_identity_type literal
user_identity_user_name literal
event_source literal
error_code int
error_message text
user_agent text
user_identity_arn text
event_time date

Additional configurations like enabling search, Highlight or facets are based on your requirement. This completes setting up AWS CloudSearch with required indexes. In the next article I will describe how to use PowerShell to upload documents to CloudSeach.


1 Comment

Amazon CloudWatch Logs in the US West (N. California) Region

Amazon CloudWatch Logs in the US West (N. California) Region –

Leave a comment

Seamlessly Join EC2 Instances to a Domain

Seamlessly Join EC2 Instances to a Domain –


Leave a comment

Microsoft adopts first international cloud privacy standard – Microsoft on the Issues

Microsoft adopts first international cloud privacy standard –


Leave a comment

Using CloudSearch to read CloudTrail logs: [Part 1]

Integrating CloudSearch and CloudTrail was fun. I was tasked with finding a right logging solution for CloudTrail, CloudWatch Logs, Windows Event Logs and other application logs. And here I am describing about uploading CloudTrail logs to CloudSeach.

AWS CloudSearch is Amazon A9 powered search engine. It offers a easy to configure console, using which you can configure search indexes with support for Search, Sort, Facet and so on. However, the document upload part is the key. CloudSearch offers various upload paths viz., the console, command line tools and APIs. While uploading documents using all three methods is fairly straight forward; these methods upload your data in a predefined format which makes it difficult to search the content of the document. Below is a sample of how documents are restructured in Search Data Format (SDF) while using defaults.

[ {
  "type" : "add",
  "id" : "CloudTrail.json",
  "fields" : {
    "content" : "{ All data are within this block }",
    "resourcename" : "CloudTrail.json",
    "content_encoding" : "ISO-8859-2",
    "content_type" : "application/json"
} ]

The once in bold are the essentials for successfully uploading a document to CloudSearch Domain. However all data within the content field would not really help in searching nor will your CloudSearch document be readable. Understanding how the SDF is structured will really help build a custom SDF document that CloudSearch will allow to upload. Lets break the document at each action.

  1. Type : Defines the type of process. Accepts ADD and DELETE
  2.  ID : This is the document name that CloudSearch uses to identify each document. Can be related to a document name.
  3. Fields: This is the key. Fields hold all type of index values that you want to be searchable. Therefore, it is a not hard and fast rule to have the fields structure like in the sample above. As long as we build a custom structure and corresponding indexes created, your document should be uploaded without errors.

Now that you know how to structure a search document lets look at building the environment to send CloudTrail logs to CloudSeach domain. CloudTrail logs are stored in S3. We have two approaches to retrieving these logs from S3. One is to generate an SNS notification on object creation in S3 and trigger you code to run based on the notification and the other is to queue these SNS in an SQS queue and run the code at specific intervals to upload the logs.

At this stage, I assume you have already setup CloudTrail logging and have enough logs to upload to CloudSearch. Also, you will need to turn on SNS for CloudTrail and create an SQS queue with subscription to the CloudTrail SNS topics; because we are going to be using method two. Then, create a CloudSearch domain to which you want the logs uploaded to.

When you have the above mentioned configuration in place, the next step is to create an IAM role with the required level of access to S3 and CloudSearch domain. Below is a policy snip that grants only the required privilege to read SQS messages, get logs from S3 and upload them to a CloudSearch domain.

  "Version": "2012-10-17",
  "Statement": [
      "Sid": "cloudtrailworkerrole",
      "Effect": "Allow",
      "Action": [
      "Resource": [

Finally, launch an IAM Role enabled EC2 Instance with the above policy attached to it. This should grant the instance, its required privilege to access all required resources. This completes our initial configurations in setting up the environment to upload CloudTrail logs to CloudSearch. In my following post I will describe how to upload CloudTrail logs to CloudSearch using PowerShell.



PowerShell: Nagios Plugin to monitor Windows Server Backup Status

Windows Server Backup is a native Windows Backup tool that is installed as a Windows Feature. It is a great tool to backup data to local disks and gives a quick restore ability. However, when it comes to monitoring and alerting capabilities, it does not offer a built in mechanism to achieve this. This post describes about how PowerShell can help us achieve this.

There are two methods by which we can monitor the status / result of overnight backups. One way is to monitor Windows Event Logs and check for specific log entries that indicate successful backup completion and the other is to use Windows Backup PowerShell snapin to get the backup status.

METHOD 1: [Using Windows Event Viewer]

Windows Server Backup upon backup job completion, logs a “successful” backup completion event with Event ID 4 in Microsoft-Windows-Backup. This event is logged only for successful backups and not logged when backups fail. Instead an error message is logged. With that said, lets get started with parsing event viewer.

Declare Variables
# Define Parameters for the script
# Log: Microsoft-Windows-Backup. This can be replaced with other logs.
# ID: 4. This can be replaced with other logs.
$ErrorActionPreference = "SilentlyContinue"

#Declare Date and Start Time to check event Logs
$Date = Get-Date -format "MM-dd-yyyy"
$StartAt = (Get-Date).Date

# Fetch event logs for further processing
$Success = (Get-WinEvent @{Logname=$Log;id=$ID;StartTime=$StartAt})
$Failure = (Get-WinEvent @{Logname=$Log;level=2;StartTime=$StartAt})

The above script block has some variable declarations and gets the events we are interested in. Moving ahead the script block below processes the fetched information and returns a success or failure to Nagios.

# Validate fetched logs with the condition below
if($Success.Count -gt '0' -and ($Failure.Count -eq 0 -or `
  ($Success.TimeCreated -gt $Failure.TimeCreated)))
  Write-Host "OK:" $Date":" $($Success.Message);Exit 0
  Write-Host "CRITICAL:" $Date":" $($Failure.Message);Exit 2

The above script block validates the logs fetched against 2 conditions. One is it checks for the presence of Event ID 4 and also ensure there are no newer failure events for the given date. Finally, it returns the result to Nagios with OK or Critical.

METHOD 2: [Windows Server Backup PowerShell Snapin]

The second method is use Windows PowerShell Snapin for Windows Server Backup to fetch the required information and validating across our result condition. To use this method we need to install Windows Server Backup Command line tools and add the corresponding PSSnapin. So, our initial script block validates the availability of this PSSnapin and loads it if its false.

# Check for Windows Server Backup PowerShell Snapin and Install / Load if unavailable
If ((Get-PSSnapin -Name Windows.ServerBackup -ErrorAction SilentlyContinue) -eq $null)
        Import-Module ServerManager
        Add-WindowsFeature Backup-Features -Include | Out-Null
        Add-PSSnapin Windows.ServerBackup
$ErrorActionPreference = "SilentlyContinue"

Next, we store the result of Get-WBSummary in a variable.

# Declare variables to store information gathered
$Date=Get-Date -format "MM-dd-yyyy"
$BackupSummary = Get-WBSummary

Finally, we process the information gathered and return the result to Nagios with OK or Critical.

# Validate Backup Status
If (($BackupSummary.LastSuccessfulBackupTime).Date -eq (Get-Date).Date `
     -and $BackupSummary.LastBackupResultHR -eq '0')
    Write-Host "OK:" $Date":" "The backup operation has finished successfully"; Exit 0
    Write-Host "CRITICAL:" $Date":" $($BackupSummary.DetailedMessage); Exit 2

Moving on to Nagios configuration, define a Nagios service check and configure NSClient to run external scripts. Below are some screen grabs describing them.


Service Definition for Method 1


NSClient.ini Configuration

TIP: You can add ExecutionPolicy Bypass to PowerShell command in NSClient.ini file so Nagios is able to execute the script even if it does not have the required level of ExecutionPolicy.

, ,

Leave a comment