Parallels in Time: Epic Software Migrations & The Great Migration of Masai Mara


Steer software migration to happen at scale while noticing similarities and principles from the migrations of Serengeti. 

It was 30th July 2014. I was in Masai Mara to witness the great migration. It was a spellbound moment as I watched millions of wildebeests migrate from Serengeti plains to the south of Kenya’s Masai Mara in search of greener pastures.

Cloud vendors offer programming models once unavailable elsewhere and offer new advances as service — this entails epic migration journeys to a next generation storage infrastructure.

This article helps Program Managers & Developers to steer software migration to happen at scale while noticing similarities and principles from the migrations of Serengeti plains. Enjoy!

Migrations can be hard

Cloud as an architecture provides on-demand access to compute. However, the behind-the-scenes Cloud operations are challenging. Making all internal teams leverage any new tech or new kind of physical design in platforms or advances, requires massive migration programs with mandates, buy-ins, significant staffing, critical work products, and engineering tools.

Step 1: Why do migrations happen?


Wildebeests seek fresh grazing, nutritious grasses, better quality water.


Migrations can happen for a number of reasons. In general, an entity decides to move all its applications, databases and other business elements from the local server to the cloud server or custom migrations such as:
  • Replace legacy systems with a cloud solution for greater availability and enable large scale computations. The engineering team has to re-engineer the platform for the inevitable migration. Hundreds and thousands of job have to be migrated.
  • Migration of on-premises data warehouse to BigQuery on Google Cloud,
  • Homogeneous data migration to some service hosted remotely.

Step 2: Need for Shared Goals

It can tough herding the teams without shared goals and a sense of urgency.
Create shared goals to energise and motivate your teams. Kickoff with a “program charter meeting” and build on shared goals, spell out risks and dependencies early. Then:
  • Host workshops with relevant teams “Why do we do what we do?”
  • Get executive level support and mandates, if needed. Ensure there are no disconnects between stakeholders and decision makers. Though mandates are useful, it is critical to reach out to adjacent teams and get buy-ins from relevant teams. Else, the program can be perceived as top-down, adding to more friction.
  • Ensure migration is part of that team’s OKRs and product area roadmap.
  • Tailored presentations for every division or product area to address specific challenges & the migration benefits need to be articulated at all levels of the org.
  • Institute exception & escalation procedures, use issue tracker for tracking and discuss at escalation review meetings.

Step 2: Migration Stages The Serengeti Way

Migrating Phases: The timing of these migrations is controlled to a great extent by the monsoon patterns. credits: map source utali.com
The return to the Southern pastures can be broken into phases before the mad rush happens: from The birth and weaning of puppies, The start of the journey, The mating season, The crossing of the Grumeti River and the Mara River, to The growth of puppies and new pregnancies (source).
S/W Migrations share some similarities: From birth of a migration idea, development of plan, testing it out with integration test-beds, rolling it out as experimental — then to soft rollouts implemented with flags such as opt-in and opt-outs to avoid disruption of services, to real deployments (limited testers), and finally fleet-wide rollout.

Step 3: What to migrate?

Scouting the plains for giraffe

As you put together a migration plan:
  • Have automated tools to scrub jobs that need to migrate
  • Auto assign migration tasks (bugs) to groups and contacts (the ground truth data)
  • Have dashboard with ground truth data
  • Hand engineer where needed to account for: data, business applications, data pipelines.

Step 4: Basic Work Products


We drove from Nairobi starting at 7am, reached Mara by 1 pm. Our Toyota even had a pop-up roof ideal for game viewing and sightseeing. George, our driver and guide, made the trip transformational. His insight into the plains, animals, navigating skills without disturbing the surroundings was crucial.

  • Setup a web page — make <youreffort> happen: the lingua franca — for all validated docs to be in one place. Don’t introduce fragmentation in content.
  • Strategy doc: Gather information about your legacy systems, list a prioritised backlog of use cases.
  • FAQs: Start documenting questions that come up in office hours
  • Define success metrics for the effort (how every day delay in migration costs $$$, to number of jobs migrated, number in flight etc)
  • A responsibility matrix (who does what, contact info)
  • Decision bugs (a ordered list of decisions that need to happen to set the pace)
  • Dashboard that mitigates risks you identify
  • Migration blockers and escalation steps
  • Some trackers for Known issues in every software drop
  • Post mortem

Step 5: Ensure your team is staffed well

A well staffed core team of Zebras. Wildebeests lean on them to make decisions during migrations — crossing the river
  • Your core software engineering team has the responsibility to make the migration happen producing tools to automate the process, dashboards that reflect the ground truth data, & provide technical insights to mitigate risks for majority of use cases.
  • The horizontal program will be lead by a group of experienced technical program managers, and an executive sponsor.
  • Extended team will have a technical point of contact or technical program managers in every impacted product area.
  • Don’t forget to ask for more headcount as complexity grows.

Step 6: Account for unplanned events

Wildebeests struggle with the decision to cross, Crocodiles prepare for a feast!
Early integration tests and test environments to test for corner cases should come in handy.
To reduce the risk, institute integration suite very early for stability into release process. A program manager has to lead this testing effort with great care to achieve stability in the migration effort.
Migrations have many unplanned moments to watch out for
In Serengeti, during the Jan- March phase, many calves are prone to predators.

Asking the most critical team in your company (e.g revenue generating) to on-board early should be deferred while risks are high. I remember in one migration effort for an email backend, a newly introduced script to move some executive accounts got delayed as the script encountered some error with a barnacle account with attachments (referring to those using non-company software accounts). This impacted many teams relying on this migration effort to run on time.

Step 7: Mapping Critical User Paths

A herd of wildebeests ends up on the wrong part of a riverbed, and unable to migrate. It is important to map all critical migration paths so you have a path forward for all teams, and no one is left behind.
You have to watch the movie play many times before you can create a repeatable process.
  • Co-locate yourself with a local team-A that wouldn’t negatively impact the core revenue stream of the company. As you work with Team-A’s install base, you will map migration replacements for their use case. Team-A would put together a collection service to replace the legacy system that ran locally.
  • Understand migration issues and requirements, majority of wrinkles & critical paths. Escalate and work with core team to navigate technical issues.

Step 8: The early beginnings

100,000 beasts line up near the mara river
  • Using whatever automated tools your team has, file bugs agains the customers who need to migrate their jobs.
  • Be prepared some groups may not have a real owner and they may even not feel the sense of urgency.
  • So, schedule 1:1s and office hours in such situation and invite users.

All this will give you some lift. Butt still the graph might turn out linear (y=mx+c type). Know that this can’t scale.

Imagine now an executive comes to you and says:

“I really want to see an Everest climb in the migration graph, can you make it happen?”
It was a turning point for me. It made me think 10x.

Step 9: From Linear to Exponential Migration Progress

To get the Everest climb in migration graph here is something that worked for me:
  • I came up with successful metrics using the throughput gains in a cluster. Identified top 25 jobs and measured the bandwidth each of these jobs. Triage the long ­tail of unassigned bugs.
  • Then identify the top 5 heavy hitters. Provide special mentoring, support, and 1:1 office hours.
  • Special downstream presentation from Tech Lead for every product area (PA). Arrange Q&A hours
  • Weekly communication (you can never go wrong with over communication)
  • Triage all blockers, provide quick responses.

Step 10: Refine Success Metrics & Monitor Progress

Defining success metrics via use of top heavy hitters via throughput gains plus the progress in the migration path of the Top 25 heavy hitters was a game changer. The Top 25 formed the majority of the payload, and was an excellent proxy for gains made in the new service adoption. I had in place real time dashboards that threw light on open vs closed bugs, bandwidth gains using old vs new collection service.
Tracking heavy hitters payload (simulation only)

When the top 5 heavy hitters moved, it helped recover the track from a high risk phase. Now, I had the long tail bugs to move and putting some dates in place when support systems would be no longer available (forcing function to make them migrate). The long tail will eventually be “auto-converted” as a part of fleet-wide rollout by applying a few defaults in preparation for the new cloud service enablement.
Zebras play a critical role to help the Chief Beest make a decision. Then the mad run begins.
  • Institute regular weekly updates
  • Tools that communicate system level changes
  • Newsletter that highlights customers wins

Step 11: Managing the long tail

  • It is possible the vast majority of software pieces might just work in the new environment with a few defaults in place in code. For a few edge cases, some hand-engineering will be needed and throwing light on those aspects is essential.
  • Have regular triages of long tail unassigned bugs. Constant communication is crucial.
  • With flag flips, a schedule to default the long tail to the new path, and constant reminders, the long tail will eventually disappear.

Step 12: Move to large scale migration

One has to be careful before initiating the fleet-wide mad run. Why risk it all in one mad trip? Over 250,000 wildebeests lose their lives annually due to starvation, thirst, stampede, & predation!
  • Dashboards with Leading and Lagging indicators: Have signals that feed into fleet-wide readiness per product area, per lead, per job level etc. This one is super useful for every level of org to chase their long tail and indicate readiness. If you need dashboard templates, please let me know.
  • Dashboard for core tracks (features that are being rolled, what is available, what features are used etc.)
  • Weekly progress meetings
Simulation of a dashboard you can use to track your migration

Every time we hit a glitch in our migration, our dates for launch got impacted. We had to revise the plan. This was not good. I raised awareness for the need for a new set of integration testing suite and added this for stability into the release process. I also got a commitment from every product area to identify their top heavy hitters and at least one service complete its migration to new code path. In addition, hardware for use with this migration also had to be pilot tested separately. With all this we had to do a total reset of the program.
Every system has a bottleneck. Value cannot flow through the system any faster than it flows through that bottleneck. So the best way to improve the system flow is to improve the rate at which value flows through the bottleneck” — Mary Poppendieck
Cheetahs moving swiftly to catch prey
At Serengeti, I watched these wildebeest struggle to cross the water for hours because of a pending decision: go or no-go. They will come close to the river, then make a U-turn. The saga repeats.

The speed of decision making sets the pace for any org.
Till the next epic migration @ Serengeti

You can reach the author at sudhakar.ramakrishnan108@gmail.com. If you need assistance with any work product/template/guidance, sign-up please.

Comments