How Moov approaches setup and testing

May 21, 2024

Education 9 minute read

When architecting payment solutions, rigorous and extensive testing is not just beneficial, it’s imperative.

In November of last year, a significant Federal Reserve processing glitch impacted around 900,000 payment transactions across three major US banks. What’s worse is it occurred on a payday, disrupting the direct deposit of wages and resulting in widespread inconvenience and stress.

Incidents like this highlight the critical importance of reliability in payment processing. In what follows, we’ll discuss Moov’s commitment to testing and how we’ve designed our payments platform with reliability and testability in mind.

Moov’s rigorous requirements

When we first began conceiving and building our payments platform, we had very specific requirements for our setup:

We wanted to run the entire stack as close to staging and production environments as possible—meaning our setup had to mirror the challenges and complexities of the real world.
We needed our setup to be fully automated and running in Continuous Integration to enforce consistency across engineers, features, configuration, testing, etc.
We wanted versioned changes and reproducible builds to allow engineers to easily share a feature or fix on a branch.

Making all of this happen is no small feat, but it was necessary. As to how we’ve made it a reality, here’s a closer look at the key components of our dev setup and how they fulfill our requirements:

Testability from the “get Go”

It all begins with our choice of programming language, Go. Of course, we use Go for a lot of reasons: it’s open source, prioritizes readable code, and has excellent libraries. Go also has a built-in testing framework that’s fantastic at facilitating the writing of efficient, concise test cases. This framework supports unit testing across different platforms and integrates seamlessly with our Continuous Integration (CI) setup, ensuring consistent test results across all engineering workstations. In other words, it helps us avoid the dreaded “works on my machine” syndrome.

This approach has not only helped identify potential issues early, but it has also significantly enhanced the onboarding process for new engineers, providing them with a comprehensive, hands-on experience from day one.

Microservice architecture

Moov’s platform has approximately 90 Go microservices, each responsible for a distinct aspect of our payment processing pipeline. This microservice architecture, managed through Docker Compose (more on this below), facilitates modular development and testing, allowing for more granular control and faster iteration on specific platform features. Docker images allow us to replicate deployment with Kubernetes in staging and production.

Docker Compose environment

We leverage Docker Compose to orchestrate our Docker containers in testing, which enables the replication of our application stack on all developer laptops. This approach ensures that all services are built and deployed in isolation, which significantly reduces the chances of environmental inconsistencies causing runtime errors. By specifying service dependencies and health checks within docker-compose, we establish a robust, inter-service communication protocol that closely replicates our production environment.

Event-driven architecture

Moov captures every action on the system as an event. We use a Kafka-compatible broker called Redpanda as an eventing system to store and distribute events across the platform. Eventing allows Moov to handle traffic spikes with flexibility, ensures at-least-once delivery (so your payments always complete), and it gives us the ability to replay events.

docker-compose.yml

  kafka1:
    image: docker.redpanda.com/vectorized/redpanda:v22.3.21
    container_name: kafka1
    healthcheck:
      {
        test: curl -f localhost:9644/v1/status/ready,
        interval: 1s,
        start_period: 30s,
      }
    volumes:
      - redpanda-0:/var/lib/redpanda/data
    networks:
      - intranet
    ports:
      - 18081:18081
      - 18082:18082
      - 19092:19092
      - 19644:9644
    command:
      - redpanda
      - start
      - --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
      - --advertise-kafka-addr internal://kafka1:9092,external://localhost:19092
      - --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
      - --advertise-pandaproxy-addr internal://kafka1:8082,external://localhost:18082
      - --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
      - --rpc-addr kafka1:33145
      - --advertise-rpc-addr kafka1:33145
      - --smp 1
      - --memory 1G
      - --mode dev-container
      - --default-log-level=info

  topics:
    image: docker.redpanda.com/vectorized/redpanda:v22.3.21
    depends_on:
      kafka1:
        condition: service_healthy
    networks:
      - intranet
    command:
      - topic
      - --brokers kafka1:9092
      - create
      - cron.cmd.v1
      - transfers.ach.cmd.v1

Isolated databases

At the heart of Moov’s platform-dev environment lies a MySQL database instance running with separate databases for each service. This allows each service to have autonomy over the tables, indexes, and data it stores without impacting other services. Isolating each service’s database also provides an additional layer of security by ensuring each service’s data is inaccessible to the others.

We define MySQL in platform-dev’s docker-compose.yml:

  mysql:
    image: mysql:8
    restart: on-failure
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=local
    volumes:
      - ./conf/mysql/initdb.d:/docker-entrypoint-initdb.d
    command:
      - "--log_bin_trust_function_creators=1"
    networks:
      - intranet
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
      timeout: 20s
      retries: 10

platform-dev has a ./conf/ directory which holds configuration for every service. The ./conf/mysql/initdb.d directory is mounted into the MySQL container where it has migrations to setup each service.

01-login.sql

CREATE DATABASE IF NOT EXISTS `login`;

CREATE USER 'login'@'%' IDENTIFIED BY 'login';
GRANT ALL ON `login`.* TO 'login'@'%';

Then each service has a Database configuration block used to connect.

./conf/login/config.yml

Login:
  Database:
    DatabaseName: "login"
    MySQL:
      Address: "tcp(mysql:3306)"
      User: "login"
      Password: "login"
      Connections:
        MaxOpen: 5
        MaxIdle: 5

Service configuration

To streamline the configuration and setup of these databases and services, platform-dev incorporates a centralized configuration directory, ./conf/. This directory acts as the repository for all service-specific configurations, facilitating easy management and adjustments. Each service has its own directory where a config.yml file is populated with service-specific configuration. There is also a ./conf/mysql/initdb.d directory which is seamlessly mounted into the MySQL container, serving as the foundation for database migrations and setups for each service.

docker-compose.yml

  login:
    image: moovfinancial/login:v0.1.0
    depends_on:
      kafka1:
        condition: service_healthy
      mysql:
        condition: service_healthy
      smtp:
        condition: service_started
    networks:
      - intranet
    volumes:
      - "./conf/login:/configs"
      - "./conf/devices:/configs/devices"
    environment:
      - APP_CONFIG=/configs/config.yml
      - HONEYCOMB_API_KEY=${HONEYCOMB_API_KEY}

This setup process is designed to be automatic and replicable, ensuring that each service’s database is initialized following its specific requirements without manual intervention. The approach not only accelerates the development process but also ensures consistency across different development environments, mirroring the precision and reliability we aim for in our production setups.

Load balancer

Moov runs a load balancer and auth layer which routes every endpoint through required auth checks and to the backing service. This setup is largely identical to production as it’s powered by configuration files and internal hostnames. There’s a DNS record for local.moov.io pointing at 127.0.0.1 which allows scripts and the local machine to reference itself with a DNS record.

Startup

platform-dev is powered by a Makefile which has an up command that runs docker-compose and waits for all containers to start. The script checks docker ps for the .Status of each container until they’re all started or healthy.

Automated testing

Our comprehensive suite of Go tests, executed against Moov API endpoints, forms the backbone of our quality assurance process. These tests ensure that our platform functions identically to our production environment, enabling us to identify and rectify issues well before they impact our users.

The tests load some initial data with a “populate” script that connects to http://local.moov.io and makes API calls for sign up, account creation, and verification for tests. The high degree of consistency improves confidence with Moov’s platform as we’re able to catch issues early, often before they make it to staging or production. All tests run in Moov’s CI environment as well on every pull request to platform-dev.

Flaky tests

Platform-dev has a lot of tests which run on every Pull Request, however some of them aren’t as reliable as others. We mark tests as “blocking” or “non-blocking” (default) and run blocking tests first. Blocking tests are considered critical to the platform and highlight the most important code paths. If any blocking test fails, the entire CI build fails and that platform-dev change cannot be merged. Non-blocking tests are allowed a couple of retries before they’re marked as failed.

.github/workflows/test.yml

    - name: 'Blocking Tests'
      id: blocking-tests
      continue-on-error: true
      env:
        PRIORITY_LEVEL: "blocking"
      run: go test ./pkg/... -race -count 1 -p 1 -timeout 20m -failfast

    - name: 'Non-Blocking Tests'
      id: non-blocking-tests
      continue-on-error: true
      env:
        PRIORITY_LEVEL: "non-blocking"

      run: |
        go install gotest.tools/gotestsum@latest

        packages=($(find pkg -name "*_test.go" | xargs -n1 dirname | sort -u))
        for PKG in "${packages[@]}"; do
          gotestsum --rerun-fails --rerun-fails-max-failures=3 --rerun-fails-run-root-test --packages="./$PKG" -- -race -count 1 -p 1 || EXIT_CODE=$?
          if [ -n "$EXIT_CODE" ]; then
            echo "::warning ::$PKG has test failures"
            export PKGS_FAILED="$PKGS_FAILED $PKG"
          fi

          unset EXIT_CODE
        done

        if [ -n "$PKGS_FAILED" ]; then
          echo "Tests failed:$PKGS_FAILED"
          exit 1
        fi

Long test durations

We’ve had to monitor and review the tests over time to keep the suite focused on essential functionality and prevent bloat, which would strain computing resources without adding much value. This effort includes watching for overlapping or duplicate tests across team boundaries, and tests that are better suited to unit tests within an individual service instead of platform-dev. In addition, we’ve had to monitor Docker image size and resource usage despite Go itself being lightweight when it comes to CPU and memory requirements, but after 90+ services any small increase to CPU/memory across the services adds up on a laptop.

High resource usage

We leverage Docker Compose profiles to enable engineers to selectively run components of the Moov platform. Each service is tagged with its corresponding profile, such as “cards”. These profiles empower developers to launch only the necessary segments of the Moov stack. Furthermore, dependent services are started in order and brought online in a structured sequence.

Lessons we’ve learned

Through our journey with platform-dev, we’ve not only streamlined our development process but also created an environment where engineers can explore, experiment, test, and validate new features with unprecedented ease. This approach has virtually eliminated “works on my machine” issues and solidified our configuration management practices.

Because we designed platform-dev as an easy-to-use environment that gives developers full access to the entire Moov platform, we’ve been able to catch countless bugs—from library upgrades with a hard-to-spot breaking change to large multi-system refactors. This has made platform-dev critical to Moov’s rapid development.

platform-dev also serves as a teaching tool for new engineers. We can very easily walk someone through the systems they’ll be working on and give them hands-on experience running the services themselves. They’re able to make changes and test them without leaving their laptop.

Final thoughts

There’s not much room for error when you’re building out payments systems, but there is a lot of room for ingenuity and innovation. The setup we’ve designed and the team we’ve built are doing amazing things every day. This is by design. Moov began as a set of open source libraries and a community where we’ve always wanted to give developers the tools it takes to succeed. If this sounds like something you’d want to be a part of, check out our Slack community—or our docs, if you want to try some builds for yourself.

If you or your company is building fintech, I’d recommend a setup similar to what we’re making Moov on top of. It works.

Also, if you’re looking to connect with like-minded engineers and fintech builders, consider attending our developer-focused conference fintech_devcon.

Hope this has been helpful and happy building!