How to vendor a git into another git

habr.png

Discovering git vendor extension.

Cross-post from my medium blog: https://medium.com/opsops/git-vendor-295db4bcec3a

I would like to introduce the proper way to handle vendoring of git repositories.

Vendoring is a way to integrate other«s work into your own. It«s the opposite of «linking» against third-party library. Instead of having that library as a dependency, application uses this library as a part of own source code and keep that code «inside» itself.

Normally, vendoring is done by language tooling: bundler, cargo, pip, etc. But sometimes you need to vendor something not covered by any existing toolset, or something multi-language, that it«s impossible to find the «core» language tool for that.

The solution for this situation is vendoring on a git level. You have your own git repository (I call it »destination repo»), and you want to incorporate some other repository (I call it »source repo») as a directory into your (destination repo).

The things you expect from a well-designed vendoring system (regardless of Git it is or not):


  • Visibility. You want to know that some code is vendored, means it wasn«t written by committer.
  • Provenance. You want to know where it come from. What was the repository someone had integrated into your repo two years ago? And what version/commit is was?
  • Updatability. You want to be able to update that code when it get some bug fixed in original repo (or have a new long-waited feature). As special case for updatability you want to be able to delete vendored code (and only it).
  • Repeatability. Vendoring shouldn«t be the art, it should be the rigid error-proof process. vendor foo into bar should yield the same result no matter who has done it.
  • Transportability. A person cloning your repository should be able to continue dealing with vendoring in exactly the same way as you did. That means all vendoring-related information should stay in the git and should transfer during push/pull.
  • Governance. All vendored changes should stay «as they are» until someone update them. You don«t want to have unexpected (breaking) updates, moreover, you absolutely want to keep vendored stuff available even the source repo is no longer available.
  • Patchability. You want to be able to tweak vendored code and still be able to update it to a newer version. Preferably, without conflicts, but at least, with clear visibility where those conflicts had happened.

And, giving the git nature of Git, you want that system to be branch-friendly. If branch A have vendored code at version a1, and branch B at version b1, you want to switch between them every time you switch between A and B. Moreover, you want to be able to change version a1 to a2, and version b1 to b2 without worries about versions in another branch.

… And you want to be able to vendor more than one external repository, so vendoring shouldn«t be one-time event per repo.

As you can see it«s a long list of requirements. I analyzed existing (other) solutions before getting to the best solution (git vendor).


Copy-paste

Copy-paste is so vicious way to vendor anything that I have nothing good to say about it. You loose provenance, visibility, updatability. You don«t loose transportability though, as there is no link to the old repo in the first place. Don«t do vendoring like that.


Git-in-git

This is a stupid but somewhat working trick. Create a folder vendored_foobar into your repository, go into vendored_foobar and clone that foobar. Go back to the top-level and commit all changes you have.

Pros: simple to do, provide local provenance, governance and an excellent patchablity.

Cons: It«s brittle, it does not survive push (nested .git folder is not included into your repo, so for external observers vendored code is indistinguishable from your own). So you«ll loose transportability, and a provenance in a long run.
Submodules

The idea is that you have some folders of your git managed in another git. It«s the oldest «something» git had provided. Unfortunately, it«s branch-unfriendly, and it lacks governance over vendored code. If remote repo is gone, you can«t use your submodules.

And don«t forget how hard is to clone this repo.


git subtree

Git can use «subtree» way of merging external gits as folders into the local git repository. It«s almost perfect, except for updatability, repeatability, and provenance, and visibility. Barring the manual digging into a git history, it«s impossible to see which part of the git repo is vendored and which is not. And you have no idea where those changes are coming from. If committer hasn«t wrote this, information is lost. And if s/he has, it«s not repeatable as minor changes can happens during filling that information.

So, enter the prized winner, git vendor.

Git vendor is an amazing extension for git written by Brett Langdon about three years ago. It«s just around 200 lines in bash, but it«s so well-written that I have no complains about it at all (it has everything a good program should have: manual pages, help, bash completion, reasonable error handling and failsafe guards).

It uses git subtree and extends it with functions to cover loose sides of vendoring by git subtree.

Every important point is checked:


  • visilibity. Just call git vendor list and see all vendored stuff.
  • provenance. It shows remote repo and allows to see what commit was vendored.
  • updatability. git vendor update, and it supports for branches, tags and commits as a way pin-point what exactly to take. And you can do git vendor remove, of course.
  • repeatability. There is no manual operations involved, so everyone will get the same result on initial vendoring or following updates.
  • transportability. All changes are stored as a special tags in git history, so they are completely push/pull-friendly. And they works great with branches and arbitrary history checkouts.
  • govenrance. All vendored code is stored inside your repo.
  • patchability. It is (you will see a clear merge conflicts with your changes), but it«s a the weakest side of git vendor. I would prefer to have «patch queue» (like in debian/pathes for deb packages), but nevertheless, there is a minimal support for that.

It has specific policy on how stuff is vendored: if you want to clone https://github.com/serverscom/dibctl it goes into vendor/github.com/serverscom/dibctl/. You can change »vendor/» part, but the rest is a hard policy. Symlinks can easier that, though.

There are few minor bugs there: you can«t use it on empty repos, you can«t use local gits as sources, you can«t see help until you are in git repo. None of them cause problems during normal work with real repos.

git-vendor is a perfect tool to vendor one git repo into another. It provides all required functionality for the best practices of vendoring: keeping provenance, providing visibility and updatability.

© Habrahabr.ru