How to Create an IT Audit Ecosystem in AWS — Part 2 living smart-reports

7 min readOct 14, 2020

How long should it take to go from the opening meeting to the end of your audit plan? If the answer is more than one day, it’s an eternity.

In part I, you implemented a serverless architecture which allows you to gather data and to do evaluations with certain complexity. If you scheduled the Lambda Boss and created more Lambda auditors, in a couple days you could have enough data to correlate sources and make more comprehensive evaluations.

In part 2, I’m going to show how to create living smart-reports with Sagemaker, with an example.

This is our guide:

Knowing the problem and our scope.
Understanding the simplified architecture at a high level.
Pre-requirements.
Creating Preprocessing Notebook and its Lifecycle.
Developing our living smart-report.
Restricting the default Sagemaker executing Role and creating the user access role.
Implementing an automatic turn on/off for the Notebook.
Running the whole solution.
What we learned in this tutorial.

Knowing the problem and our scope:

It’s a common practice that IT Teams create their own AMIs (Amazon Machine Image), but not all the AMIs created are secure enough and even some could be prohibited by your company or customer.

In this post our main mission is to identify invalid AMIs. To perform this we’ve a Dynamo table with valid AMIs and an S3 bucket with metadata from EC2 in each audited account.

In this example we have two accounts, the audit account and one audited account. In the audited account I created three EC2 instances with different AMIs, while in the audit account I created a small Dynamo table which contains the allowed AMIs. We’re going to query the metadata in S3 through Athena with PyAthena.

In this case, considering that we’re auditing just one account, Athena could be replaced by a simple Boto3 method, get-object, but I want to give you tools for auditing a big multi-account environment.

Understanding the simplified architecture at a high level:

In the picture below you can see the simplified architecture.

Living smart-reports simplified architecture

Here are highlights related to each component:

An S3 Bucket that stores metadata collected by the Subsystem explained in part 1 of this series.
An Athena table which allows us to query the metadata in (1) through SQL.
A Dynamo table with the valid-AMI inventory.
A Notebook instance that preprocesses the multiple-source information and saves the results in (5). This Notebook turns off automatically once its missions ends.
An S3 bucket that stores the results of (4).
A Notebook which presents living smart-reports to our customers.
A Lambda function that turns on the preprocessing Notebook (4).
A CloudWatch event rule that triggers the Lambda function (7).

Pre-requirements:

You have to have an Athena table with metadata from EC2. In the next picture you can see a query executing over my Athena table “meta_ec2”. To collect this metadata I used the method describe_instances from Boto3’s EC2 client. For our example I used a small table with only three instances.

Here is the query to copy:

Note: If you want to learn how to create tables in Athena, take a look at this AWS reference: https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html

You should have a Dynamo table with the valid-AMI inventory. I created a table with just two valid AMIs and two fields: amiand os. You can enrich your table with more information according to your scope. Next I’ll show you my example table.

Note: If you want to learn how to create tables in Dynamo, take a look at these links:

Creating Preprocessing Notebook and its Lifecycle:

You can snip the code here:

Once you’ve created the preprocessing Lifecycle, you should create the preprocessing Notebook instance. To do this, click on (1) and then on (2)

Following that you have to fill in the fields in the red boxes as shown in the picture below.

It could take a few minutes until the Notebook is InService, and next click on “Open Jupyter”. For our target I suggest creating a new Jupyter Notebook with kernel conda_python3.

In our preprocessing Notebook we’re going to use Athena to get AMIs EC2 information, and Dynamo to get valid-AMI inventory. Once you’ve finished the preprocessing you have to use Pickle library for serializing the outputs. At the bottom you can see the preprocessing Notebook.

Developing our living smart-report:

This Notebook needs two kinds of Lifecycles, one with high privilege and all needed dependencies, and another one with lower privileges to share with our users.

You can copy the restricted Lifecycle below. You should set the root password on line 50.

Furthermore, you can copy the permissive Lifecycle below:

In the same way, you could restrict the Lifecycle more and more. For instance, you could restrict commands like: npm, curl, openssl, rpm, wget, git, rm, touch, cat, echo, vi, vim, nano, mkdir, sh, and so on. Additionally, you could protect some important paths. You have to be careful and modify both Lifecycles.

In the next gif you can look closely at my end-user report

Restricting the default Sagemaker executing Role and creating the user access role:

Our end-user report has to have just the necessary permissions, as stated in the Principle of Least Privilege (POLP). So in this step we’re going to prune the default Sagemaker role, and then we’ll make an end-user access role.

Pruning the Sagemaker role: I suggest deleting the SageMaker full Access policy. To perform that click on (1), and next click on (2).

The following pruned Sagemaker policy allows you to turn on the Notebook instance, to read some specific S3 buckets, to write logs and to perform some basic actions over ECR:

Creating the end-user access policy: At the bottom you can copy the end-user access policy. Whoever wants access as a reader should belong to a group with this policy.

Implementing an automatic turn on/off for the Notebook:

We’re going to create:

A lambda function that turns on the preprocessing Notebook.
A Cloudwatch event which triggers the aforementioned lambda.
A new Lifecycle performs all the preprocessing tasks as soon as the Notebook is started. When that is finished, the instance is shut down.

A lambda function that turns on the preprocessing Notebook

You can copy the following lambda code:

Below is the policy which allows the lambda to turn on the Notebook

A Cloudwatch event which triggers the aforementioned lambda

I’m going to create a Cloudwatch rule that triggers my lambda every day. Take a look at the next image.

A new Lifecycle performs all the preprocessing tasks as soon as the Notebook is started. When that is finished, the instance is shut down

You can find this Lifecycle in the next code to snip. In line 46 we’re running the Notebook, and between lines 47 and 49 we’re modifying the crontab in order to automatically shut down the Notebook instance.

Running the whole solution

Before we run the whole solution you have to be aware that our implementation has two states. The first one is just for setting and creating our report, and it looks like this: