AWS Compute Blog
Building NoSQL Database Triggers with Amazon DynamoDB and AWS Lambda
 Tim Wagner, AWS Lambda General Manager
 Tim Wagner, AWS Lambda General Manager
 
SQL databases have offered triggers for years, making it easy to validate and check data, maintain integrity constraints, create compute columns, and more. Why should SQL tables have all the fun…let’s do the equivalent for NoSQL data!
Amazon DynamoDB recently launched their streams feature (table update notifications) in production. When you combine this with AWS Lambda, it’s easy to create NoSQL database triggers that let you audit, aggregate, verify, and transform data. In this blog post we’ll do just that: first, we’ll create a data audit trigger, then we’ll extend it to also transform the data by adding a computed column to our table that the trigger maintains automatically. We’ll use social security numbers and customer names as our sample data, because they’re representative of something you might find in a production environment. Let’s get started…
Data Auditing
Our first goal is to identify and report invalid social security numbers. We’ll accept two formats: 9 digits (e.g., 123456789) and 11 digits (e.g., 123-45-6789). Anything else is an error and will generate an SNS message which, for the purposes of this demo, we’ll use to send an email report of the problem.
Setup Part 1: Defining a Table Schema
First, start by creating a new table; I’m calling it “TriggerDemo”:

For our example we’ll use just two fields: Name and SocialSecurityNumber (the primary hash key and primary range key, respectively, both represented as strings). In a more realistic setting you’d typically have additional customer-specific information keyed off these fields. You can accept the default capacity settings and you don’t need any secondary indices.

You do need to turn on streams in order to be able to send updates to your AWS Lambda function (we’ll get to that in a minute). You can read more about configuring and using DynamoDB streams in the DynamoDB developer guide.

Here’s the summary view of the table we’ve just configured:

Setup Part 2: SNS Topic and Email Subscription
To give us a way to report errors, we’ll create an SNS topic; I’m calling mine, “BadSSNNumbers”.

(The other topic here is the DynamoDB alarm.)

…and then I’ll subscribe my email to it to receive error notifications:

(I haven’t shown it here, but you can also turn on SNS logging as a debugging aid.)
Ok, we have a database and a notification system…now we need a compute service!
Setup Part 3: A Lambda-based Trigger
Now we’ll create an AWS Lambda function that will respond to DynamoDB updates by verifying the integrity of each social security number, using the SNS topic we just created to notify us of any problematic entries.
First, create a new Lambda function by selecting the “dynamodb-process-stream” blueprint. Blueprints help you get started quickly with common tasks.

For the event source, select your TriggerDemo table:

You’ll also need to provide your function with permissions to read from the stream by choosing the recommended role (DynamoDB event stream role):

The blueprint-provided permission policy only assumes you’re going to read from the update stream and create log entries, but we need an additional permission: publishing to the SNS topic. In the later part of this demo we’ll also want to write to the table, so let’s take care of both pieces at once: Hop over to the IAM console and add two managed policies to your role: SNS full access and DynamoDB full access. (Note: This is an quick approach for demo purposes, but if you want to use the techniques described here for a production table, I strongly recommend your create custom “minimal trust” policies that permit only the necessarily operations and resources to be accessed from your Lambda function.)

The code for the Lambda function is straightforward: It receives batches of change notifications from DynamoDB and processes each one in turn by checking its social security number, reporting any malformed ones via the SNS topic we configured earlier. Replace the sample code provided by the blueprint with the following, being sure to replace the SNS ARN with the one from your own topic:
var AWS = require('aws-sdk');
var sns = new AWS.SNS();
exports.handler = function(event, context) {processRecord(context, 0, event.Records);}
// Process each DynamoDB record
function processRecord(context, index, records) {
    if (index == records.length) {
        context.succeed("Processed " + records.length + " records.");
        return;
    }
    record = records[index];
    console.log("ID: " + record.eventID + "; Event: " + record.eventName);
    console.log('DynamoDB Record: %j', record.dynamodb);
    // Assumes SSN# is only set only on row creation
    if ((record.eventName != "INSERT") || valid(record)) processRecord(context, index+1, records);
    else {
        console.log('Invalid SSN # detected');
        var name = record.dynamodb.Keys.Name.S;
        console.log('name: ' + name);
        var ssn  = record.dynamodb.Keys.SocialSecurityNumber.S;
        console.log('ssn: ' + ssn);
        var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.';
        console.log('Message to send: ' + message);
        var params = {
            Message:  message,
            TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE'
        };
        sns.publish(params, function(err, data) {
            if (err) console.log(err, err.stack);
            else console.log('malformed SSN message sent successfully');
            processRecord(context, index+1, records);
        });
    }
}
// Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn.
function valid(record) {
    var SSN = record.dynamodb.Keys.SocialSecurityNumber.S;
    if (SSN.length != 9 && SSN.length != 11) return false;
    if (SSN.length == 9) {
        for (var indx in SSN) if (!isDigit(SSN[indx])) return false;
        return true;
    }
    else {
        return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) &&
               SSN[3] == '-' &&
               isDigit(SSN[4]) && isDigit(SSN[5]) &&
               SSN[6] == '-' &&
               isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]);
    }
}
function isDigit(c) {return c >= '0' && c <= '9';}
Testing the Trigger
Ok, now it’s time to see things in action. First, use the “Test” button on the Lambda console to validate your code and make sure the SNS notifications are sending email. Next, if you created your Lambda function event source in a disabled state, enable it now. Then go to the DynamoDB console and enter some sample data. First, let’s try a valid entry:

Since this one was valid, you should get a CloudWatch Log entry but no email. Now for the fun part: Try an invalid entry, such as “Bob Smith” with a social security number of “asdf”. You should receive an email notification something like this for the invalid SSN entry:

You can also check the Amazon CloudWatch Logs to see the analysis and reporting in action and debug any problems:

So in a few lines of Lambda function code we implemented a scalable, serverless NoSQL trigger capable of auditing every change to a DynamoDB table and reporting any errors it detects. You can use similar techniques to validate other data types, aggregate or mark suspected errors instead of reporting them via SNS, and so forth.
Data Transformation
In the previous section we audited the data. Now we’re going to take it a step further and have the trigger also maintain a computed column that describes the format of the social security number. The computed attribute can have one of three values: 9 (meaning, “The social security number in this row is valid and is a 9-digit format”), 11, or “INVALID”.
We don’t need to alter anything about the DynamoDB table or the SNS topic, but in addition to the extra code, the IAM permissions for the Lambda function must now allow us to write to the DynamoDB table in addition to reading from its update stream. If you added the DynamoDBFullAccess managed policy earlier when you did the SNS policy, you’re already good. If not, hop over to the IAM console and add that second managed policy now. (Also see the best practice note above on policy scoping if you’re putting this into production.)
The code changes only slightly to add the new DynamoDB writes:
var AWS = require('aws-sdk');
var sns = new AWS.SNS();
var dynamodb = new AWS.DynamoDB();
exports.handler = function(event, context) {processRecord(context, 0, event.Records);}
function processRecord(context, index, records) {
    if (index == records.length) {
        context.succeed("Processed " + records.length + " records.");
        return;
    }
    record = records[index];
    console.log("ID: " + record.eventID + "; Event: " + record.eventName);
    console.log('DynamoDB Record: %j', record.dynamodb);
    if (record.eventName != "INSERT") processRecord(context, index+1, records);
    else if (valid(record)) {
        var name = record.dynamodb.Keys.Name.S;
        var ssn = record.dynamodb.Keys.SocialSecurityNumber.S;
        dynamodb.putItem({
            "TableName":"TriggerDemo",
            "Item": {
                "Name":                 {"S": name},
                "SocialSecurityNumber": {"S": ssn},
                "SSN Format":           {"S": ssn.length == 9 ? "9" : "11"}
            }
        }, function(err, data){
            if (err) console.log(err, err.stack);
            processRecord(context, index+1, records);
        });
    }
    else {
        console.log('Invalid SSN # detected');
        var name = record.dynamodb.Keys.Name.S;
        console.log('name: ' + name);
        var ssn  = record.dynamodb.Keys.SocialSecurityNumber.S;
        console.log('ssn: ' + ssn);
        var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.';
        console.log('Message to send: ' + message);
        var params = {
            Message:  message,
            TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE'
        };
        sns.publish(params, function(err, data) {
            if (err) console.log(err, err.stack);
            else console.log('malformed SSN message sent successfully');
            dynamodb.putItem({
                "TableName":"TriggerDemo",
                "Item": {
                    "Name":                 {"S": name},
                    "SocialSecurityNumber": {"S": ssn},
                    "SSN Format":           {"S": "INVALID"}
                }
            }, function(err, data){
                if (err) console.log(err, err.stack);
                processRecord(context, index+1, records);
            });
        });
    }
}
// Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn.
function valid(record) {
    var SSN = record.dynamodb.Keys.SocialSecurityNumber.S;
    if (SSN.length != 9 && SSN.length != 11) return false;
    if (SSN.length == 9) {
        for (var indx in SSN) if (!isDigit(SSN[indx])) return false;
        return true;
    }
    else {
        return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) &&
               SSN[3] == '-' &&
               isDigit(SSN[4]) && isDigit(SSN[5]) &&
               SSN[6] == '-' &&
               isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]);
    }
}
function isDigit(c) {return c >= '0' && c <= '9';}
Now you can go to the DynamoDB console to add more rows to your table to watch your trigger both check and your entries and maintain a computed format column for them. Don’t forget to refresh the DynamoDB table browse view to see the updates!
(Now that the code updates rows in the original table, testing from the Lambda console will generate double notifications – the first one from the original test, and the second when the item is created for real. You could add an “istest” field to the sample event in the console test experience and a condition in the code to prevent this if you want to keep testing “offline” from the actual table.)
I chose to leave the original data unchanged in this example, but you could also use the trigger to transform the original values instead – for example, choosing the 11-digit format as the canonical one and then converting any 9-digit values into their 11-digit equivalents.
Summary
In this post we explored combining DynamoDB stream notifications with AWS Lambda functions to recreate conventional database triggers in a serverless, NoSQL architecture. We used a simple nodejs function to first audit and later transform rows in the table in order to find invalid social security numbers and to compute the format of the number in each entry. While we worked in JavaScript for this example, you could also use Java, Clojure, Scala, or other jvm-based languages to write your trigger. Our notification method of choice for the demo was an SNS-provided email, but text messages, web hooks, and SQS entries just require a different subscription.
Until next time, happy Lambda (and database trigger) coding!
-Tim