Part 2: Enrichment Latency Testing

Spade's engineering team highlights the testing strategy that allows us to deploy rapidly while maintaining industry-leading latencies.

In the previous blog post, we talked about Enrichment Quality Tests, the suite of tests we’ve developed to make sure that our transaction enrichment API achieves the match rate and match quality we expect. This time, we’re going to dive into our testing strategy behind latency - one of the most important metrics at Spade. Spade’s customers often operate within the card authorization flow. This means that our API needs to serve responses fast enough that customers can act on this information during the time that a credit or debit card transaction is processed (think between when you swipe your card and when it’s approved). We have a testing strategy that ensures that deployments don’t introduce unexpected latency increases and keep our API running lightning fast. 

There are three key elements of our latency testing suite that we’re going to discuss. End-to-end testing, integration testing, and monitoring. 

End-to-end Testing

Similar to the Enrichment Quality Tests we previously discussed, as a step in our CI/CD pipeline we run an Enrichment Latency Test. This test measures latency percentiles against our staging environment prior to deployment. We define acceptable percentile values for each of our endpoints and run load tests against each one. You can see an example of the test’s output below.

Running load test on card enrichment endpoint

Actual/Allowed latencies for POST /transactions/cards/enrich:

       p50       p75       p90       p95       p99

----------|---------|---------|---------|---------|

     20/25     23/30     25/35     28/45    35/50

Running load test on transfer enrichment endpoint

Actual/Allowed latencies for POST /transactions/transfers/enrich:

       p50       p75        p90       p95       p99

----------|---------|---------|---------|---------|

     16/25     19/30     22/35     24/45     30/50

We have two main enrichment engines that power our API: one for cards and one for transfers. We test a set of synthetic transactions against endpoints which use each engine to validate that the observed latencies for the current deployment fall within the expected percentiles. If any of the percentile SLAs are violated, the deployment is stopped before it can reach production.

We also remove network latency from our results in order to isolate the portion of latency specific to our application. This helps to avoid test flakiness and measure only the contribution of our system to the observed latency percentiles.

Our staging environment doesn’t have parity with the infrastructure of our production environment. While this means that the latency percentiles calculated here don’t give us a perfect measure of the latencies we’ll see in production, the test still functions to prevent us from deploying changes which would drastically increase API response times. Because we make active investments in keeping our developer experience high, these tests can also be run locally by our engineers to quickly validate any changes which could affect the response times of our API before going through the deployment process.

Integration Testing

Early on in Spade’s development it was clear that one of the biggest impacts on latency was database queries. Each query against our database requires a trip across the network, significantly increasing the response time for a request. Thus, we pay very close attention to the number of database queries we make in addition to optimizing query speed. 

Managing latency is a standard problem, and we use a very standard solution: writing integration tests that assert on query counts. There’s nothing fancy about this approach, but using a classic tool to help solve a classic problem has proved very effective. This is part of our overall “shift-left” testing philosophy where we try to catch latency issues as far “left” as possible in the development process. You can see a pseudocode example of such a test below.

def test_enrich_transactions_uses_X_db_queries(self): 

"""

Queries should include:

1. ...

2. ...

...

with self.assertNumQueries(X):

enrich_transactions([transactions])

The above test directly validates that only X queries are executed during a particular code path. We write similar tests to cover all the major code paths of our API. In the event that we introduce a new query somewhere deeper in the callstack, this test will fail. The author of the commit will need to update the assertions of the test and the comment to explain what this new query is doing. During code review, it becomes extremely clear when new queries are added, allowing engineers to flag changes that might have a concerning impact on latency. 

This is a particularly useful test given that Spade’s API is based on the Django ORM. Django can subtly introduce extra database queries when accessing foreign key related objects. Without such tests to precisely assert the number of expected queries, it would be much easier for us to accidentally add new DB hits into our application – and each hit matters when your p50 is around 20ms. 

Monitoring

Finally, we use datadog to maintain dashboards of our API latency and to alert us if production latency exceeds SLAs. We learned the hard way that we specifically need to use AWS Application Load Balancer (ALB) latency rather than application latency when calculating percentiles. While this issue was obvious in hindsight, we initially measured our application latency when alerting. However, application latency doesn’t account for additional time spent at the reverse-proxy level (the ALB) or the level of our WSGI server. Requests queuing up in Gunicorn during high-traffic periods could be causing high-latency for customers but would be invisible to us at the application level. The table below shows the latency percentiles at both levels over the past week. 

Percentile

Average Latency (Application)

Average Latency (ALB)

99

29.2ms 

35.5ms

95

24.4ms

28.9ms

90

22.4ms

26.2ms

75

19.8ms

23.3ms

50

16.9ms

19.8ms

Since switching our dashboards and monitoring to ALB latency, this disparity has been eliminated. While we can’t account for the network latency to reach our endpoints, measuring the latency at the ALB-level gives us the most holistic metric for the responsiveness of our API. 

Maintaining a low latency service is a continuous challenge. As we work to constantly improve the quality of our transaction enrichment API, the latency tests and monitors we’ve implemented let us confidently push code, experiment with new methods, enrich more transactions than ever before – all while keeping our API running at light speed. 

If you want to explore exciting new technologies while maintaining the industry best transaction enrichment latency, come join the Spade team! And as always, if you need detailed data about card transactions to power your services don’t hesitate to reach out to our sales team.