Migrating our homegrown billing system to Stripe
Over the years, Stripe has become an established name when it comes to online payments. That’s why, they were one of the main contenders we had in mind, here at WeTransfer, when a major overhaul of our billing system was long due. This is a story about how to wear stripes, or more exactly, about how we migrated our entire billing system to Stripe.
The Background
Let’s get back to mid 2018. Our back-in-the-day billing system was pretty basic: it included two plans (monthly and yearly), and supported two currencies (€ for our European users and $ for everyone else). It was baked in-house, integrating an external payment processor and some assorted billing logic, and supported our business for a good number of years already.
However, as WeTransfer became more popular, it was slowly outgrowing this solution, up to the point where it became obvious there was a need for more. That’s why, time came when we gathered everybody that had a say about money around a table and discussed visions, plans and ambitions. It came out that we had to make a choice: radically refactor our current billing system, or build a new one from scratch.
Weighing the Options
Much as tempting was to pick option number two, we stopped, took a deep breath, and started weighing the options. After all, no strategic, long-lasting decision should ever be taken based on gut feeling. We asked ourselves what would make the dream billing system, and assessed the two options in that context.
First and foremost, we wanted our (well, the company’s) revenue to be in safe hands. Handling money is neither a trivial task, nor one that we had expertise in. So, we decided to continue delegating this to a third party, that can provide greater robustness and reliability. Both our old payment processor and Stripe shined at this. 0 - 0, so far.
Then, it came to feature offering and other goodies. At WeTransfer, we needed more flexible plans, support for different currencies and more payment methods. Our home-baked system could have ticked all of these, but we estimated the price of the necessary refactor to be quite high. On the other hand, Stripe offered these out-of-the-box, plus automated billing, integrated fraud detection and other niceties. 1 - 0, for option two.
In the end, we’re all humans and want to spend happy times in the office. That’s why we desired a system that we can reason about, that integrated with well designed and well documented third parties, and that would keep the number of WTFs to a minimum. That was a clear win for option number two. Trust me 😉. Even more, while the old billing system certainly had better days, it was still doing its job, and it was doing it fine. Then, as a wiseman once said, why change it?
After carefully considering the above, we came up with the verdict: we will design and implement a brand new billing system, on top of Stripe.
The Fine Print
Building a billing system from scratch is not rocket science. It’s kind of a luxury, actually. In real life, you are almost never given a blank canvas to freely express your engineering talent. That is to say, WeTransfer had been live for almost nine years, and key to the everyday life of many individuals. Money was constantly coming in, and we had to keep it that way.
We couldn’t just build a new system and switch traffic to it. We’re talking about new technology, new user flows, payment information that needed to be migrated; all these would’ve posed a high financial risk. We needed to find a way to gradually roll out the new system. And that meant running two billing systems in parallel for a period of time.
Stripe boasted lots of features, but absolute parity with our old billing system was nowhere near. Among the biggest minuses were the lack of tax-inclusive plans (which got recently addressed), and of course, PayPal support.
We also had to account for highly asynchronous events: chargebacks. Say, a user pays for their subscription today, we migrate them tomorrow and after thirteen months they figure their payment was collected in error and request a chargeback. Indeed, 13 – that’s the limit for disputing SEPA Direct Debit payments. So, even long after we had successfully migrated billing systems, we still had to keep the old way of handling chargebacks running, in case our old payment provider would receive such requests.
Ultimately, we had to admit that things will go wrong, sooner or later; that’s how engineering goes. So, we dedicated a good chunk of time building safety nets, as many as a good night’s sleep would need.
The Fundamental Bits
Every system is built around some core concepts. Given the nature and the complexity of a billing system, we gave these a well deserved thought when designing ours. The ones that stuck to our final design were: having one source of truth for data, building an isolated and extensible system, and ensuring idempotency in key areas. Let’s go through each one of these in more depth.
Single Source of Truth
Dispatching a core aspect of your business to a third party sparks off an interesting discussion. Which data will you trust? Who will act as the master and who will be the replica? In our case, it boiled down to who got more responsibility: us, or Stripe. Because this is what they do for a living, and they’re doing it pretty well, it made sense to trust them.
Therefore, the majority of our billing-related logic is performed directly on Stripe, while our database is simply a reflection of Stripe’s data. In other words, our database acts as a cache for application state (think of subscription statuses, which determine what our frontend will present to the user). For this, we’re making heavy use of webhook events. We silently process them in the background, which works particularly well for our use-case, where eventual consistency is considered good enough.
Isolation and Extensibility
One of the pain points of our old billing system was its lack of flexibility: it had a fixed number of plans, supported a fixed number of currencies and it was tightly coupled to our old payment provider. This worked perfectly fine in the past, however our company ambitions made it fairly obsolete. We learned our lesson and designed the new system with a few things in mind.
First, we changed our hardcoded business rules to be more extensible. That means, plans were moved from this:
PLANS = { monthly: { # ... }, yearly: { # ... } }
to a proper ActiveRecord model, backed by its own database table. And magic numbers turned to more versatile configuration objects.
Then, we wrote the new system totally isolated from the old one. Put it another way, we created new models, services, policies and what not, and we also namespaced them differently (for instance, New::Subscription
or New::Services::SubscribeUser
). This way, we established a very clear boundary between the two systems, so the chances of them clashing with each other were slimmer. As a bonus, this decision turned out incredibly useful when we were to wipe the old system out of our codebase.
Finally, we always kept thinking about tomorrow. While we will only support Stripe in the foreseeable future, we didn’t want to lock ourselves down on our payment provider. After all, that was one of the reasons why it was so hard to touch the old system. Thus, all our billing-related objects look like:
#<Payment external_provider: "stripe", external_id: "ch_foo" ...>
This way, we can easily swap payment providers in the future, or externalise some chunks to other third parties (like handling payments through an additional provider – hello PayPal!).
Idempotency
Any application dealing with payments is already fairly complex. Adding a third party to the mix only increases this complexity. Therefore, many things can go wrong: from random network failures to syntax errors in a worker class, everything is possible. That is why, any key action should be idempotent: it can be retried any number of times, producing the same result. You don’t want to be that developer doing a silly syntax error, only to find out that all signup-related background jobs have been permanently lost.
We made sure all our critical paths are idempotent:
- requests to Stripe;
- background jobs, that process Stripe webhook events;
- migration scripts.
Fortunately, ensuring idempotency is quite a trivial task, in most cases. Stripe enables it very easily through their SDK:
Stripe::Subscription.create(attrs, { idempotency_key: 'foo' })
while in worker classes you can usually get away with something like:
class ProcessEvent < Worker
def perform
return if @event.processed?
# ...
end
end
The Rollout
We were happy with how our brand new billing system took shape and quite eager to see it out in the wild. It was time to welcome all our users into the new system. Because the two billing systems were isolated from each other, we had a fairly simple job rolling out the new one. There were basically two ways a user could land on the new system: either through signup, or as a result of them being migrated. Let’s tackle the former first.
We created a feature flag with a few targeting rules, and evaluated it each time a user had signed up. You can think of it as a high level if
statement:
post '/signup' do
if use_new_system?
New::Services::SignupUser.new(user, params).perform
else
Services::SignupUser.new(user, params).perform
endend
We persisted the evaluation on the user record, and used it in a few places to decide which implementation to use:
class User < ActiveRecord::Base
has_one :old_subscription
has_one :new_subscription
def subscription
uses_new_system? ? new_subscription : old_subscription
endend
In order to minimise any potential issue that could arise, we performed a gradual rollout. We enabled the feature flag for a small percentage of users in certain countries, and then went on to increase percentages and select more countries.
We preferred to roll out small increments rather than the whole new system at once. Our MVP focused on signup and only supported credit cards as payment methods. It was only later that we added the possibility to cancel and reactivate a subscription, to update payment details, or support for other payment methods (such as iDEAL or SOFORT). This allowed us to focus on one feature at a time, which we could quickly validate from real production usage. Remember, we always had the emergency button available, that would switch to the old billing system, if needed.
Migration 101
After we made sure we were not adding any more users to the old system, it was time to move the ones that were already there. This was considerably trickier because we had to deal with some very sensitive payment information, that was out of our hands (and for a good reason). We identified two problems here: migrating user information and migrating payment information.
The first problem was not really a problem; we basically mapped data between the two systems, and created some Stripe objects along the way. Quite boring, and I won’t insist more.
The real challenge came with the second problem. As one might expect, it was a bit more involved than simply querying API A for payment information, then POST
ing it to API B. Payment information is considered sensitive and is subject to a number of regulations one must comply with. In other words, there is no API A or API B. The way this works, in theory, is that you contact your current payment provider, you ask them to dump all payment information they have and securely transfer it to your new payment provider. They, in turn, will do a best effort importing that payment information, and provide you with a mapping file between old and new data. In reality, it was obviously much more than that. Let’s see how it went.
We are talking about a threefold coordination effort (us — our old payment provider — Stripe) that we were responsible for. We had to break the news to the old provider that we want to move away from them. We had to create a plan for the migration, way ahead of time, and check that both providers agree with it. We had to make sure the two providers had their secure communication channel set up. And lastly, we had to follow up and mediate any issue that arose during the migration process.
We had to make sure timing was right on our side as well: that no more payment information would creep into the old provider during the migration. It may sound unrelated, given all signups had been already going to the new system, but what about old users that updated their card details? We had to turn some toggles off in order to keep data consistent. On the bright side, our old system gave us an unexpected, but important advantage in this matter: it used to charge users one week before their current period ended. Because Stripe charges users exactly when their current period ends, this resulted in a week when no recurring payments would be made. That made us more at ease with the migration, providing us with a time buffer in the eventuality that something went wrong.
It is important to mention that such migrations are performed in one go. None of the two providers were thrilled with the idea of batch migrations, which would’ve given us the room to fix any error that would’ve arisen. So, we had to be extra sure everything fitted perfectly on our side.
Migration Time
That being said, on a warm Thursday morning, the big moment arrived: we started the migration. It took less than a day for our old payment provider to dump and transfer all payment information to Stripe, which was considerably faster than we expected. We were set for an early celebration, but reality quickly set us down.
First, there were issues with the dump format. Then, the dump was not complete. We settled these issues with the two providers and, soon after, we noticed various errors happening: invalid payment details, unsupported cards, expired mandates and what not. Luckily, we had anticipated such issues and prepared for the worst case scenarios. Due to our rollback strategies, extensive data validations and idempotent actions we managed to keep our data consistent and our users unaware of the burdens faced during the migration process.
Rinse and repeat, we overcame all these issues with the two providers and in around two weeks we managed to migrate almost all payment information to Stripe. I said almost, because some errors were not fixable and we had to give up on several card details, which meant some users needed to submit them again. Not bad, considering that was the biggest problem we had after such a migration.
Closing Notes
After little more than ten months, we popped open the secretly stashed champagne bottle and savoured our victory. It was a fun ride and we learned a lot from this joint effort. There had certainly been ups and downs, but they are unavoidable at such a scale. On the bright side, all the challenges that got in our way made the celebration even sweeter in the end.
To conclude this article, I made a list of the most important things that I learnt while working on this project. I will definitely keep them in mind the next time I migrate billing systems:
- design the new system isolated and loosely coupled to the old one; implement the switch between the two systems early in the application flow;
- ensure idempotency for key processes;
- document. every. major. step;
- always keep stakeholders updated.