Jul 4, 2024
Lessons learned from building a data sync between Airtable and Bubble
đź’ˇ Also published on the BlueDot blog.
I built the Course Hub (BlueDot's learning management system) using Bubble. It is easier than coding something from scratch and more powerful than most other drag-and-drop website builders.
One of my challenges was syncing data between Bubble's internal database and Airtable, where we store and analyse all course data.
Why don't you just make an API request to create a record in Airtable?
Making an API call is slower than creating a local record. If we have many requests coming in or someone spamming buttons, this could mess up the workflows for updating a created record.
Also, Airtable API in Bubble is not robust to changes in field names. So, if someone changes the name of a field, the API won’t work, and we will not get any alerts on this until we explicitly check.
If you Google "Bubble Airtable data sync", the first solution that comes up is Whalesync. It is a pretty neat tool that handles the API requests behind syncing data so you only need to configure the tables and let it run.
Whalesync worked really well for us. It helped us make near-real-time syncs between Airtable and Bubble for thousands of records. Their team was also quick to respond to any issues and happy to jump on calls to troubleshoot bugs.
But about six months ago, Whalesync stopped syncing records for resource completions, a table that logs whether a person has completed a particular reading and any feedback they have on it. Then, it also stopped syncing records for exercise responses.
I went back and forth with Whalesync’s support team for almost two months until they finally threw in the towel. They had never encountered this issue before and had no idea how to fix it. There was nothing more they could do for me.
Attempting to find my own solution
I tried fiddling with how I configured Whalesync occasionally, testing my own hypotheses on why it wasn't working.
Hypothesis #1: Have you tried turning it off and on again?
I was receiving hundreds of syncing issues, none of which made much sense to me. For example, Whalesync was telling me that we had deleted a record in Airtable even though we hadn't deleted any records.
I decided to create a new table in Airtable and a new sync setup on Whalesync. After waiting >24 hours for all the syncs to go through, the last record created was still stuck six months in the past on March 2024.
Hypothesis #2: Too many records
The resources completions table also happened to have the most records. At ~70,000 records this was 10x more than any other table. I thought maybe there was a hidden record limit or something related to the number of records causing the issue.
After auditing each table, I saw that the exercise responses table, which was facing similar issues as the resource completions, had around the same number of records (~30,000) as another table that was syncing perfectly fine.
Abandoning Whalesync
After circling back on this issue three times in the past 6 months and spending hours trying to figure out why Whalesync wasn't working, I decided to build my own data sync.
We have other automations set up in Make, so I decided to see if they had a Bubble integration (they did!), and that was all it took.
I tested out some simple logic for creating and updating records to make sure it was syncing the data appropriately, finally putting to bed the issue that has been nagging me for more than six months.
What took so long?
I honestly don't know why I didn't think of it earlier. Some part of me assumed that it would be really complicated and didn't want to deal with it. I figured that if there was a company built around data syncing, it must have been something complicated. But in hindsight, I think low-code tools like this are made for people who want to save time figuring out the logic like I did and for non-technical people to have a quick workaround.
Some other parts of me liked the simplicity of having all data syncs happen in Whalesync instead of keeping track of data syncing from many different sources.
But I think the biggest reason was that I hadn't realised the impact this was having on our courses. No one could access this data, meaning we had no idea if people were doing the readings and exercises or what they thought of them.
I had assumed that someone would tell me they were missing this data, instead the rest of the team simply learned to live without it. Once someone downloaded a CSV of the data and filtered it elsewhere to get a sense of engagement on the course.
When I heard about that, I felt quite crushed that I still hadn't been able to fix the issue and that it had been a wasted opportunity for improvement.
Lesson learned
I started writing this to make a point about how we use low-code/no-code tools at BlueDot, but there's also a lesson here about building products.
#1 focus, focus, focus
If I hadn't dedicated the week to sorting out all the bugs cropping up with our systems, I would not have had the patience and mental focus to figure out how to solve this.
In the past, I had scratched at this problem but between waiting for responses from support teams and results from my tests to come in, I ended up spending a lot of time clueing myself back into the context of the problem that I decided to leave it unsolved in frustration.
Marinading my brain in all the software issues inspired me across the solutions I was implementing in Make, Airtable, customer.io, etc., to figure out how to solve the data syncing.
I had started setting my top priority for each week, but this experience convinced me more of how important focus is.
#2 develop specific hypotheses to test
I've found it useful when debugging particularly tricky issues to work through hypotheses of the root cause methodically. Starting with brainstorming potential causes, then trying to disprove them in turn. I usually start from the lowest-effort/least complicated hypotheses to disprove.
With the data sync, the easiest was to assume this was a known error with Whalesync and try to get their team to solve it. This required email exchanges and a few calls spread over a few weeks.
Second, I tried re-doing the setup. Third, I was going to create a test environment to determine if table sizes made a difference. However, a much simpler way to test was to compare the performance across tables with similar numbers of records. Since they were similar, I could set that aside.
Finally, I decided to build a custom syncing workflow, which took much less effort than I had imagined.
I prioritised sticking to Whalesync as much as possible to keep all data syncs within the same workflow, but sometimes things just don't pan out.
#3 understanding your users gives you product specs
Building my own data sync meant figuring out how to ensure data in Airtable was kept up to date with any changes made in Bubble. There could be a lot of edge cases to account for, but understanding that the course leads who are using this data are mainly looking for:
- Resources which should be updated/supplemented
- Proxies for course engagement
- Exercise prompts which might need to be reworded or changed
I prioritised the first version on:
- Ensuring the past two weeks of data is up to date
- Avoiding duplicate data, which could skew aggregate statistics
- Ensuring text came through in a readable format (this is a whole other nightmare to get into about trying to convert Bubble's BB code formatting to Airtable’s rich text formatting)
I could have tried to sync all the data we have from Bubble into Airtable, but knowing my users, they won't be using that any time soon. Once the essential features are done, I'll work on that and make the syncs more robust!
I really enjoy reflecting on my product journey and learning from other’s experiences, so if you have recommendations on cool product people/blogs I should follow, please let me know!