Streamlining Large-Scale Dataset Migrations with Background Coding Agents at Spotify
Introduction
Migrating thousands of datasets from one system to another is a daunting task. At Spotify, our engineering team faced this exact challenge when we needed to move a vast number of consumer datasets downstream. The process was error‑prone, time‑consuming, and risked breaking critical data pipelines. To overcome these obstacles, we developed a solution that combined three powerful tools: Honk (our background coding agent framework), Backstage (our developer portal), and Fleet Management. This article explains how these components worked together to supercharge our dataset migrations and dramatically reduce operational pain.

The Challenge
As Spotify’s data ecosystem grew, we accumulated thousands of datasets consumed by various downstream services. Over time, changes in storage backends, schema requirements, or performance targets forced us to migrate these datasets to new locations. Manual migration was impossible at scale – each dataset had its own transformation logic, dependencies, and validation needs. Engineers spent weeks writing one‑off scripts, testing them, and monitoring the results. The process was slow, prone to human error, and consumed valuable developer time that could be spent on core product features.
Background Coding Agents: Honk
Honk is an internal framework we built to automate repetitive data engineering tasks. At its core, Honk operates as a set of background coding agents – lightweight, stateless workers that listen for migration requests, execute the required code on the data, and report results. Unlike traditional batch jobs, these agents run continuously, respond to events, and can be orchestrated to handle complex workflows.
How Honk Works in Migrations
When a dataset migration is triggered, Honk’s agents perform the following steps:
- Fetch metadata from Backstage about the dataset, including its schema, transformation rules, and downstream consumers.
- Apply transformations using pre‑defined code templates or custom scripts provided by the owning team.
- Write the transformed data to the new target location while logging every action for auditability.
- Validate the output by comparing checksums, row counts, and sample records against the original.
Because Honk agents run as background processes, multiple migrations can occur in parallel without overwhelming the infrastructure. The framework also supports rollbacks: if validation fails, the agent automatically reverts the changes and notifies the team.
Integration with Backstage
Backstage serves as the single source of truth for all dataset metadata at Spotify. By integrating Honk with Backstage, we enabled engineers to initiate migrations directly from the portal, without writing any code. The integration provides:
- Dataset discovery: Engineers search for datasets they own or consume, viewing current location, schema, and dependency graphs.
- One‑click migration request: After selecting a target backend (e.g., from Amazon S3 to Google Cloud Storage), the user clicks “Migrate” and Backstage automatically generates the migration job for Honk.
- Real‑time status updates: The portal shows progress bars, logs, and any errors encountered by the Honk agents.
This seamless integration eliminated the need for engineers to remember command‑line flags or SSH into servers. They could manage migrations from a single, familiar interface. To learn more about Backstage, see the Backstage documentation.

Fleet Management for Scaling
Honk agents run on a fleet of worker nodes that can scale up or down based on demand. Our Fleet Management system automatically provisions and deprovisions these nodes, ensuring that migration workloads never starve for resources. The system monitors queue depth and node health, spinning up additional agents during peak migration windows (e.g., when a large number of datasets are being moved together). It also handles node failures by restarting failed tasks on healthy instances.
Fleet Management’s auto‑scaling logic uses metrics from Honk (such as task completion rate, average execution time, and error rate) to determine the optimal number of agents. This dynamic approach saved us from over‑provisioning hardware while guaranteeing that migrations finished within the required time windows.
Results and Benefits
Since deploying Honk with Backstage and Fleet Management, we have migrated over 5,000 datasets downstream with minimal human intervention. Key outcomes include:
- 90% reduction in migration time per dataset – from an average of 2 days to less than 2 hours.
- Zero data loss due to the built‑in validation and rollback features.
- Faster developer onboarding – new team members can migrate datasets after a 15‑minute walkthrough instead of weeks of training.
- Elimination of configuration drift – because all migration logic is version‑controlled and auditable.
The combination of background coding agents and a unified developer portal turned a painful manual process into a smooth, automated operation. Engineers now trust the system to handle even the most complex migrations, freeing them to focus on higher‑value work.
Conclusion
Large‑scale dataset migrations don’t have to be a nightmare. By building Honk on top of Backstage and managing it with Fleet Management, Spotify transformed a fragile, manual process into a robust, self‑service system. The key ingredients were automation, metadata integration, and scalable infrastructure – all delivered through background coding agents. If your organization faces similar challenges, we encourage you to consider a similar approach: invest in event‑driven automation, consolidate metadata in a portal, and design for elasticity. The result is a migration pipeline that runs quietly in the background, letting your engineers sleep soundly.
Related Discussions