Dealing with domain modelling mismatches on external services

Integrating with external services is a pain. The technical challenges—reliability, performance, caching, downtime, and so on—are well-known and often solvable with infrastructure. But a less-obvious non-technical challenge is domain modelling mismatches.

Domain model mismatches happen because people see the same world differently. To you, it's a House; to them, it's a Building. To you, it's a User; to system A, a Customer; to system B, it's a Client. And the way we describe our world defines how we interact with it.

I've found that these mismatches are especially tough in these cases:

Multiple entrypoints. When entities in your system may be created or modified via other systems. Example: Your app has “Customers” that can sign up directly, but your sales/finance team can also create clients in Stripe for invoicing. You need to sync these new users into your system.
Bidirectional sync. When you need to also sync entities created in your system to external systems. For instance, you may need to push some data about your users to your helpdesk such as Intercom or Zendesk, so agents can have richer context about a user when providing assistance.
Partial domain coverage. When each system only covers a slice of your overall picture. For example, your sales CRM stores sales data, your billing system stores payment info, your helpdesk stores support tickets, and your analytics system stores usage activity. To your system, all these form a complete picture of the “Customer” model.

Here's a sample system that might have to deal with all these cases. It stores a User object, some parts of which it syncs to some other systems, and some parts which it syncs back from others. Each of the other systems only stores a few pieces of the User object.

Of course, ideally, you want to avoid these. And you especially want to avoid them occurring on the same project. But you can't always avoid it.

I've worked on projects that somehow checked all of these. In our case, we wanted to do things such as tracking a user's journey from way before they actually became users, so we allowed them multiple entrypoints into our system.

I'll explore some of the ways mismatches show up, with some real examples I've encountered.

Naming

Different systems may use different names.

Examples:

We had a model called an Appointment. On our calendar tool (Calendly), it was an Event; on our sales tool (Pipedrive), an Activity; on our contact tool (Aircall), a Call.
Similarly, a Customer on our end was a Calendly Invitee, a Pipedrive Person, and an Aircall Contact.

Sometimes these are just differences in nomenclature, with no differences in behaviour. In such cases, the only headache is keeping track of the names, especially when they clash with something else in your system.

Other times, the names reflect the difference in semantic purposes and implicit context that these models carry. Calendly uses the term "event" because their domain is concerned with the generic act of scheduling something to happen at some time; we use "appointment" because we are interested in the time, but also the meeting with someone. Similarly, a call in Aircall focuses only on a conversation via a specific medium, and does not have to be scheduled. And Pipedrive Activities are not necessarily appointments; they only represent an action taken in the sales process.

Apart from the models themselves, the actions available might have different names: Appointment.create vs Event.schedule, Invoice.mark_as_paid vs Transaction.pay, etc. Once again, these could be significant behaviour differences or not.

Overall, these semantic differences are usually pointers to more serious behavioural differences, as we'll see below.

Spelling

The same thing might be spelt (or spelled) differently.

The least significant difference, but it's not to be overlooked. Spelling mistakes can cause bugs if the right tooling isn't in place—for example, checking if event[:status] == :cancelled (British spelling) when Calendly spells it canceled (American).

Structure

One system might represent some information using one field, another might split it across multiple fields or even multiple models.

If the mapping is straightforward, this isn't too annoying; you just need to remember to do the conversion at the service boundaries.

Examples:

We needed to pull in card transactions from an external card provider. On our end, we just cared about the time the user performed the transaction. However, the provider's API had multiple timestamps: authorisationTime, paymentTime, paymentLocalTime, and createdAt, fields with different meanings. We don't care as much about the distinction between these, so we built a transactionTime in our system based on a combination of these.
A user's sales journey was tracked in Pipedrive typically as a Lead object, which transitioned into a Deal object. Even though Pipedrive had a Person object, sales agents tended to track information about the user right on the Deal object. So syncing users to our backend meant we had to pull in data from the Deal as well.

Validations

One service might allow things another rejects, and vice versa.

Examples:

When creating customer accounts for testing, we often used an email ending with a .test TLD. Our new customer support tool, Dixa, rejected such domains, so when we began importing our existing users into their system, we would run into failures.
Our sales agents sometimes signed users up with phone numbers only, and a "fake" email (for instance, older people who weren't very Internet-savvy). In many cases, these emails were technically valid, but not allowed by Dixa, so we were unable to import these users into it.
Pipedrive allowed agents to create "Activities", which mapped to Appointments on our end, but did not enforce the presence of a date and time.

Flows

The lifecycles and workflows of corresponding domain models might be different across services.

A model in one system might go through states or transitions that don't exist on yours, or vice versa. This hurts especially in a many-to-one setup, where every new provider has its own state machine, and it might not.

Sometimes, an external model may have states or properties that you have no idea about. Sometimes, no one on the team knows!

Examples:

On Calendly, an event can only have two states: active or canceled. On our end, our Appointments could be created, rescheduled, canceled, or completed.
We added a wallet provider who performs KYC on each user. They track this via their User model's status. However, our internal user model was much simpler than their KYC workflow, so we couldn't track the KYC status on it.
We had a bank partner that would send us transactions labelled deleted and hidden. Somehow these made their way into our own Transaction model, but there was never any real clarity on what those meant.
We were syncing card transactions from a card provider. At first, we synced all transactions, but later realized we only cared about Settled transactions. But then we found out a card transaction's lifecycle is more complex, and it can be Settled multiple times, or never, or with a different amount. This flow was new to us, so we displayed wrong info to customers in some cases. Only after a few reports did we dive deep into understanding the lifecycle and deciding how we would model it.

Restricted and allowed behaviours

A service might attribute certain behaviours and rules to their models that you don't.

These can either limit what you can do or force you to add extra validations.

Examples:

Dixa, being a customer service platform, allows multiple emails and phone numbers to be associated to a user. We did not. This created a problem when syncing these users to our system.
We allowed multiple users to share a phone number, but Dixa does not.
Calendly events are nigh-immutable. Rescheduling an event cancels the old one and creates a new one. Calendly sends a separate webhook for cancelled, and one for created. On our end, this looked like two separate actions.
Pipedrive activities are very mutable. This caused us a ton of grief, as the sales agents did a lot of strange (or normal but unexpected) things that regularly caused problems in our system. For instance, we synced only Activities of type "Consultation" to our system (as Appointments). But sometimes, the activity type may be changed later, which wrecks our model. Also, an agent might delete an Activity, which was okay (we can just cancel the appointment on our end), but meant we could not fetch any information about it from the Pipedrive API.

Identifiers

The same resource might be identified by different keys on different systems.

Examples:

Calendly: The email is the unique identifier for an event attendee. However, sometimes our users used a different email to book the appointment on Calendly, which meant we couldn't find them in our database!
Dixa requires a user to have at least one of email or phone. This makes sense for their domain (they're a customer contact platform), but led to a duplicates situation for us: if an unsynced user contacted us via email and via phone separately, Dixa would create this as two separate accounts, one with their email, and one with their phone.
Pipedrive did not require emails or phone numbers for users, or enforce uniqueness of those. This led to a ton of duplicate users (at some point, over two thousand).
Dixa supports an external_id field, allowing us to uniquely link a user on their end with one on ours. But this only worked for users created in our system, and synced to theirs. For users created on their end, we had to use heuristics to match them uniquely to a user on ours.

Scheduling

Different systems might operate on a different schedules.

Examples:

Some of our banking providers would notify us of new transactions in real time, which is nice, so our database is always up to date. Other providers provide us a list of new transactions once a day, and others once a week. This is a modelling mismatch—to us, a BankStatement is a report of all transactions on your bank account; to them, it's a report of at least all transactions up to the past week.

Dealing with domain model mismatches

First, think again. Before adding a new domain model, ask, "Do we need this? Does this entity represent an actual object intrinsic to our domain, or is it merely a record we mirror from elsewhere?" I've seen setups with domain models representing external entities that had no semantic value to the service. This led to the rest of the codebase being coupled to these unnecessary models. If the model isn't core to your domain or doesn't affect any key business logic, then avoid maintaining yours. In this case, sometimes all you need is a few lines of code mapping one field to another.

Next, pick your battles. Now that you've decided your own model is probably needed, you must decide how much of a mismatch you're willing to tolerate. Should you try to follow their architecture? Should you relax some of the constraints on your end? Should you assume any future partners will follow the same model? It is tricky, especially when you don't yet know much about the domain. One strategy is to just go with how the external service models it, but this can have the long-term consequence of locking you in.

The best engineers I've worked with have had the ability to come up with a model that was flexible enough to allow for change or different external providers, without over-engineering things. At the very least, you'll have to:

research further into the domain and the external service's model to understand use cases, limitations, and comparison with your business case
communicate and brainstorm with product and business leads to better understand their vision

On the technical side, some things we've done include:

external_reference and external_service columns. In cases where a single model on our end might map to multiple entities on other services, we used a separate table, consisting of reference, service, and entity.
JSON(B) columns to dump some additional service-specific fields that may be relevant only for that field. Some people find this controversial, but I've found worked better than having sparse columns for each service provider and was easier to maintain than entity-attribute-value.
decoupling models that represent a sub-process or sub-component of another, such as in our KYC mismatch case above

Consider separating domain models, especially useful when workflows/structures don't match. For instance, in the case of the KYC lifecycle above, we created a User::KYC model to track specifically the KYC process for that user. On our provider, this information was still a part of the User object, but for us, it allowed us to examine the KYC process independently without needing to clutter our User model. It also served as a form of Bounded Context.

Embrace the workarounds. You will run into cases where there's no "clean" solution, and it's okay to do something unorthodox. Some examples:

Scheduling mismatch: Calendly webhooks sometimes were delayed, so to prevent users seeing missing data, we implemented a workaround where we would call their API on demand, to "prefetch" the expected webhook. And since we could potentially be processing the webhook at the same time, we had to handle this race condition with a mutex.
Behaviour mismatch: Since Calendly models appointment rescheduling as cancelling + creating new, we had to at some point implement a "waiting" loop in our webhook processor to check if an appointment had just been rescheduled or truly cancelled. In a related problem, we implemented something called intents to get around some restrictions on rescheduling.
Identifiers mismach: Since there was no exact way to prevent Pipedrive duplicates, we ended up adjusting our sync process to also look for and merge duplicates whenever syncing.

Workarounds must be accompanied by documentation. Document key differences and details of the integration. This applies to any aspect of external service integrations, but can be super helpful for domain modelling, because there are so many decisions, small and large. Every kind of documentation helps, from code comments to full Wiki pages. Future developers who need to touch this code can see at a glance that a certain external service represents information in this weird way, so this is why we must do this random piece of witchcraft.

Establishing strong domain boundaries is also key. As much as possible, aim to keep the mapping between different systems at the outer edges of your system. Your core domain models should stay agnostic of the providers you choose, to an extent, and then a Translation Layer can deal with figuring out the various kinks.

I've also found that setting up good tooling can help smooth things over. Custom lint rules can help, from complex rules that enforce inter-service dependencies to simple ones that correct all spellings of "Cancelled" to "Canceled". Good integration libraries also help. I'm increasingly of the opinion that it's better to make your own API client, because the external SDKs often come with the assumptions of their model that don't translate cleanly to yours.

And don't forget observability. Make your code shout at you. Let your system tell you about its state. Don't just set up logs, metrics, and traces; store information that is useful, such as change history, event history, etc, and build useful tools and admin panels to explore that information. For our syncs, we invested a lot of effort in making our webhook processing as observable as possible: we could explore, track, and replay webhook payloads; we could see the result of a specific webhook, and what entities it had created or modified. This greatly aided us in finding the errors caused by mismatches. In the case where we needed to prefetch items from Calendly, we added metrics to tell us how often we had to do this, how often it was actually useful, and the overall impact on our system.

Lastly, architect for change. I've learnt that migrations are inevitable. Assume that someday we will have to migrate to another provider, or APi, or infrastructure, and so on. So don't architect to the current provider, but also don't make your architecture as generic as possible. Instead, focus making it robust, observable, documented and clear enough that changing it is not an expensive process—its dependencies are not hidden, its tradeoffs are understood, and its structure is clear.

In general, domain driven design has several patterns that help here. Of course, they come with their own tradeoffs, so choose appropriately. I'm personally still exploring many of these patterns to see how I can leverage them without taking on the overhead.

I write about my software engineering thoughts and experiments. Want to follow me? I don't have a newsletter; instead, I built Tentacle: tntcl.app/blog.shalvah.me.

Dealing with domain modelling mismatches on external services

Naming

Spelling

Structure

Validations

Flows

Restricted and allowed behaviours

Identifiers

Scheduling

Dealing with domain model mismatches

Other Posts

A practical tracing journey with OpenTelemetry on Node.js

DIY Smart home project: Presence-activated lights