Here at Reflexion Health, we’re proud to be on the cutting edge of telehealth — a place we wouldn’t be without the hard work of our team world-class engineers and developers.
Innovative and forward-thinking, this team has worked to create the highly engaging, easily intuitive interface that’s generated widespread acclaim (most recently from a peer-reviewed journal study led by a team of researchers from a major academic medical center, according to whom our Virtual Exercise Rehabilitation Assistant (VERA) platform encourages “offers the advantage of cost savings, convenience, at-home monitoring, and coordination of care, all of which are geared to improve adherence and overall patient satisfaction”).
In a new series of blog posts, we’re lifting the veil on some of the ways in which our team is innovating development solutions to further improve our services.
Recently, Front-End Software Engineer Katie Huang discussed "How the Reflexion Health Tech Team Uses TypeScript Generics to Define Typographic Hierarchies." Today, one of our Lead Software Engineers, David Torres, discusses how we’ve embraced Docker, a popular application container technology, to optimize the way we approach development workflows.
How to containerize development workflows: 4 initial criteria
Within the past year, the Reflexion Health API team has worked to migrate our developer workflow into a containerized system using Docker. Docker and container technology in general have a host of well-known benefits centered around advancements in DevOps and IT, such as environment-consistency and ease of deployment and scaling. However, we found that we really had to piece together the good practices for using Docker in a software development setting. For the API team at Reflexion Health, we wanted to use Docker to solve one additional use case: How to quickly and easily develop against a multitude of applications that are costly to setup?
Our backend applications are numerous, consisting of five separate services written in Ruby and GoLang and with plans to expand to services written in .NET Core. Getting all of these applications installed and running on a developer's computer was a lengthy and manual process that we were able to optimize to just a few commands on the terminal. It’s so inexpensive to spin up services that we now often have multiple copies of our entire service collection running on our laptops.
Our primary concerns were to answer the following questions:
How can we improve orchestration — i.e., easily run all our services locally, with a minimal learning curve for team members?
How do we structure our projects?
How can we minimize the development-build-debug cycle?
How should our applications interface with our database in development?
This blog post discusses how we’ve answered these questions, and how we’re implementing those findings to produce the best possible results.
What do we use for orchestration?
We use Docker Compose to launch our services locally. We chose this option out of the many available, which vary depending on the technology being used to deploy containers into production. (For example, there’s also Docker Swarm or Kubernetes.)
Admittedly, we're aiming for simplicity here: We want to reduce the learning-curve associated with using the orchestration tool, we don't need to replicate infrastructure components like load balancing, auto scaling, and self-healing, and generally we want to minimize the time that it takes to clone your repos and get them running. Docker Compose fits these criteria well. It comes essentially out of the box straight from Docker, and it has a small command line surface that’s easy to fit in your head.
While this may change down the line, we'll probably continue to use Docker Compose for the "average-case" development, and then switch to more advanced local orchestration tools when debugging or testing issues that require load balancing, auto-scaling, or self-healing.
What's our project structure?
We have several independent projects, each in their own repository. We structure them as shown here:
At the top of the hierarchy, we have simple project called the Services Repo. It is a high-level repository that sits one directory above our other services, and its only purpose is to contain the docker-compose.yml file, which defines how our services run together. It doesn't generate any build assets; it exists purely for development purposes.
The services repo also has several helper scripts. For example, we have a “checkout-everything” script that clones all the child repositories in a single go. This is particularly effective because it helps ensure that cloning, building and running our entire collection of API services can be done in a one-liner:
./scripts/checkout-everything.sh && docker-compose build && docker-compose up
It's hard to emphasize how cool this is! Just a few months ago, getting all our services running locally could take an hour or more (even for those with the highest level of experience and expertise with our system).
Each child repo houses two Dockerfiles: Dockerfile and Dockerfile.dev. Dockerfile is used to generate production containers, while Dockerfile.dev is used for local development. An alternative to having two Dockerfiles is to keep both development and production flavors in a single multi-stage Dockerfile; however, we’ve found this harder to manage. The multi-stage build feature was meant to simplify the Dockerfile when optimizing for container size. Using the feature to support multiple builds in one file seems like a misuse of the feature and leads to convoluted Dockerfiles.
Here’s a snippet of a typical docker-compose.yml file that shows how everything is wired up:
version: '3.5' services: # AUTH SERVER auth: build: context: ./auth-server #Specify the context and Dockerfile.dev here dockerfile: Dockerfile.dev volumes: #Use a bind volume to mount, not copy, your source code; more on this below - type: bind source: ./auth-server target: /app # GOAPI goapi: build: context: ./veraserver dockerfile: Dockerfile.dev volumes: - type: bind source: ./server target: /go
Note that we specify the Dockerfile.dev flavor of the Dockerfile. We also need to specify the specific context since the docker-compose.yml file lives outside of the project repositories.
Speeding up the develop-build-debug cycle
To keep the develop-build-debug cycle short, we use a few tricks to optimize the build time for our containers:
Use Docker bind volumes to mount the source code into the container, instead of copying it.
When you do need to build a container, optimize your Dockerfile.
Using Bind Volumes
Employing this technique means that we don’t have to issue a docker-compose build command every time we change code. As mentioned above, this makes use of Docker bind volumes, a type of volume where a directory is mapped on the local filesystem directly into the container at run time.
Using this method, any code changes made from either the host or the container are immediately available to the other party. (See the volumes section in the sample docker-compose.yml file above.) It’s as easy as specifying the source directory on your local file system, and the target directory on the container.
Optimizing the Dockerfile
We like to optimize our development of Dockerfiles by leveraging the layered-cache behavior of Docker. Think of each line in a Dockerfile as generating a new layer in the container. These layers get cached and can be reused across, and this is how Docker keeps its build times low.
If Docker finds that a line in the Dockerfile needs to be rebuilt (i.e., a cache miss) then every subsequent line in the Dockerfile, by necessity, needs to be rebuilt, too. So basically, it’s best to strategize how to build Dockerfiles so that the expensive and infrequent steps happen first, and the cheaper, more frequent steps happen later.
Consider this typical “naïve” Dockerfile for a .NET core service:
FROM microsoft/aspnetcore-build:2.0 # -Layer1 WORKDIR /app #First copy your source code into /app -Layer2 COPY . . #Next, restore dependencies (lengthy) -Layer3 RUN dotnet restore #Run command when starting container -Layer4 CMD ["dotnet", "run"]
This pattern is typical across many programming languages. First, a source code is copied into the container. Then, a command is run to restore dependencies. And then you run the application.
There's a problem with the COPY . . line, though! If you change the source code at all, it's going to invalidate the cache at Layer 2, and this will force you to run the lengthy restore command every single time.
Fortunately, this can be optimized, since dependencies aren’t added to projects very often. What you want to do is perform the restore first and then copy the source code:
FROM microsoft/aspnetcore-build:2.0 # -Layer1 WORKDIR /app #Copy your dependency manifest first -Layer2 COPY *.csproj . #Next, restore dependencies (lengthy) -Layer3 RUN dotnet restore #Next, copy source code -Layer4 COPY . . # Run command when starting container -Layer5 CMD ["dotnet", "run"]
In the file above, we first copy over the .csproj file, which is our dependency manifest (this is like a Gemfile or a package.json when programming in Ruby or Node). Then, we perform the restore command at Layer 3. Now, when we make a change to our source code, we invalidate Layer 4 — but we don’t invalidate the lengthy restore command at Layer 3. A side note: This method can be further improved by caching the dependencies in a named Docker volume, as outlined here.
With this trick, the same library volume is shared across multiple projects or clones, thus speeding up the rebuild time even further. We didn’t find this all that useful, though. Our build times are not that long that speeding up the build time of a fresh clone was necessary. And even more critically, one can get into trouble if two project branches require different dependency versions, forcing a rebuild from scratch each time. The real use case for this trick is if you have a large collection of services that you can basically guarantee will always have the same set of dependencies.
Developing with databases
When it comes to databases, there are two options: Run the database server in a container, or run it on the host computer; both methods have their pros and cons.
Let’s consider the first option. Running the database in a container makes sense because, logically, a database is just another service. The setup is straightforward; you simply define your database as an additional service in docker-config.yaml, and all the other services then have DNS access to it.
When working with database containers for the first time, an obvious question is: How do you persist your data when the database container goes down? Containers are inherently “memory less” this way. Bringing them down wipes away the underlying file system by design. This can be dealt with this by making use of Docker volumes, i.e., by mounting a directory on the host computer’s file system that will persist your database’s data directory.
This method is the simplest to setup and works well for many use cases; however, if your work involves a lot of database development or ad-hoc querying, it can make accessing your database cumbersome.
First, if you want to log into your database, you must keep your database container running at all times. This will be a problem, because you’ll often be bringing down your entire set of applications as part of your work. If your database container happens to be stopped, you can run your local database engine and point it to the shared data directory. This works but it’s awkward: Since the data directory gets locked by whatever engine is using it, you need to make sure only one database engine is running at a time.
The biggest problem we found, however, was that running ad-hoc SQL queries through your container is going to be noticeably slower than what you’re likely used to — it just doesn’t compare to running a database engine directly on your machine.
This second method requires a bit more manual setup, but only when you’re setting up a computer for the first time. The trick to getting this working is to expose your host computer’s IP address to your services; they will then use this IP address in their database connection string. (Just using localhost isn’t going to work because the host PC really is a separate machine with respect to the container services.) You can identify the IP address by running ipconfig or ifconfig; the correct address will typically be the IPv4 address of your computer’s primary internet adapter (the one that is connected to the Internet). To eliminate the guess work, you can run a dummy netcat server on the host PC:
> nc -l 6667
… and then reach out to it from a container by running the following command:
> docker run -it solidnerd/curl http://<YOUR_IP_ADDRESS>:6667
You have identified your IP address when your netcat server responds to the HTTP request. Put this IP address into an OS environment variable like $DOCKER_HOST_IP.
Finally, make sure that your services will pick up their database connection strings from an environment variable, then you need only map $DOCKER_HOST_IP to this container variable, as shown here:
version: '3.5' services: # GOAPI goapi: build: context: ./server/src/reflexionhealth.com/veraserver dockerfile: Dockerfile.dev volumes: - type: bind source: ./server target: /go ports: - "3100:3100" # Here’s where $DOCKER_HOST_IP gets injected into a connection string variable. # We assume the software will look for the database IP address through its # own $DATABASE_HOST environment variable environment: - DATABASE_HOST=$DOCKER_HOST_IP
This approach to simplifying and combining workflows has reshaped the way we approach and tackle the challenges of development. Because it is trivial to spin up copies of our API services, we can solve problems at a faster pace with far more flexibility than we had before. Perhaps needless to say, this reflects positively not just on our own efficiency and capacity for operational success, but that of our clients, as well.
David Torres is the Lead Software Engineer of the API Team at Reflexion Health.