How NOT to Migrate from Svn to Git
OK, where was I? Oh yeah, I was telling the story about version control in the LANDIS-II community. I had described how the LANDIS-II source code started on a Subversion server at FLEL, and eventually ended up in a Subversion repository hosted by Google Code in 2011.
So, last year, in March, Google announced its plans to shut down its project hosting service. I relayed this news to the L-II developers group, which started a community discussion about where the L-II code and its provenance should migrate to.
Given all the problems with the migration to Google Code, I was really concerned how the migration from there would be managed. I had hoped that the mistakes from the first migration would motivate Rob to better manage this second migration. Unfortunately, my concerns turned out to be well founded.
The developer community put forth a couple options for the new home for L-II’s source code – CodePlex and GitHub. SourceForge was also a viable option at that time.1 But it was pretty clear that GitHub was the leading candidate, especially given its increasing prominence in the version control sector. And that’s where LANDIS-II now resides. That is the clearly the best home for it, even though, as I pointed out, this new location required a significant change for our developer community because we had to adapt our workflow and mindset from Subversion’s centralized paradigm of version control to Git’s distributed paradigm.
So there’s no problem with where our code base was migrated to. The problems have to do with how it was migrated.
Insufficient Transparency and Documentation
I expected a more thorough and transparent decision-making process. But after my post about the implications of switching from Subversion to Git, there was no more discussion for two weeks. And then Rob simply announced the decision to migrate to GitHub.
It’s not clear how that decision was reached – how the options were evaluated and compared. In his announcement, he said there were many reasons, but he didn’t list them. The rationale behind technical decisions of this magnitude really needs to be documented for the community – for its current members as well as future ones.2
Furthermore, no specific plans were posted about how the migration would be done. Rob wrote that “Google has made such a transition very easy for repository managers”. But if it was so easy, then how did it get screwed up? How was a version-control migration poorly managed yet again?
Failure to use GitHub Importer
GitHub actually provides a tool – the GitHub Importer – for importing source code hosted in other version control systems. This tool can import a single project located inside a multi-project Subversion repository into its own separate Git repository. Alex and I demonstrated that it worked with 3 different types of LANDIS-II components inside the Subversion repository at Google Code:
- the Model Core
- the LANDIS-II Software Development Kit (SDK) 3, and
- the Land Use extension.4
The GitHub Importer converts the different entities in a Subversion project into their appropriate Git counterparts:
- the project’s Subversion trunk folder into the Git master branch,
- its Subversion branch folders into Git branches, and
- its Subversion tag folders into Git tags.
I can’t think of a reason why this tool shouldn’t have been used. Unfortunately, because it wasn’t, the LANDIS-II Git repositories have various problems.
Crippled Repositories
These problems include:
- Multiple projects shoe-horned into a single Git repository.
- For example, the Extensions-Output repository contains 10 projects, which Brian M quickly discovered makes it impossible to use Git branching on just one of those projects.
- Even Git repositories with just a single project still have Subversion folder structure.
- For example, the Century Succession and Biomass Insect extensions, as well as the Biomass Cohort library, each have folders called
trunk
,branches
andtags
.
- For example, the Century Succession and Biomass Insect extensions, as well as the Biomass Cohort library, each have folders called
- A component’s provenance (i.e., its revision history) wasn’t imported properly, or not at all.
- For example, the Century Succession repository only has commits going back to last July when the migration was done. None of its revisions in Google Code were imported. So, just like the first migration, its provenance was severed once again.
- At the other end of the spectrum, the Core-Model repository contains two projects: the model core and the SDK. Yet it contains all 3,813 revisions in the entire Google Code repository for all the LANDIS-II components! So its repository contains thousands of empty commits – commits with 0 insertions and 0 deletions – that have nothing to do with either project.
On the same day that Rob announced the migration was complete, Brian M posted about the problems trying to branch a single project in the Extensions-Output repository. Alex responded the following day with a good detailed explanation of the difference between Subversion and Git branching. He also noted that his colleagues at Notre Dame were also struggling with the switch from Subversion to Git. So the issue that I had mentioned back at the end of March last year – the cognitive leap from Subversion’s central paradigm to Git’s distributed paradigm – can be challenging even for professional developers used to the former.
I followed up Alex’s post that same day, describing our success with the GitHub Importer. I posted again the next day, trying to re-emphasize the critical need to import each LANDIS-II component into its own Git repository. But my words went unheeded. In fact, they were not only ignored… they were deleted.
Footnotes
-
I learned my lessons about preserving provenance from the first migration. I used SourceForge’s import service to archive the LANDIS-II project. That service not only imported all the revision history from Google Code, but also the wiki pages and issues (tickets). ↩
-
Alex has since deleted the SDK repository that he imported from Google Code last year. So I just re-ran GitHub Importer today using a URL inside the LANDIS-II archive at SourceForge ↩
-
The link points to my personal repository for the Land Use Extension that I imported from Google Code last year. Alex also imported the extension from Google Code, and then transferred the new Land Use Git repository to the LANDIS-II Foundation. ↩