Zum Inhalt der Seite gehen


"GitHub OSS Governance File Dataset", Yen et al., 2023

❓ How many #github repos have a governance.md?

➡️ ~1,600,000 🐘

❓ Of those, how many have the governance.md in their root directory? (I.e. remove dependencies)

➡️ 1,899 🐭

❓ Of those, how many have have at least one issue/commit? (I.e. 'significant')

➡️ 710 👀

https://arxiv.org/abs/2304.00460# #gov #governance
Authors note limitations

1) github API non-deterministic

2) there are many other places project may keep their governance (e.g. in their readme, any other filename, not on github...)
Screenshot from paper supporting post, text reads:

Limitations We note that this dataset has two major limitations regarding completeness and potential bias. Completeness 1) Because GitHub’s search API doesn’t return a deterministic and full set of results, the dataset is not a complete set of all GitHub-hosted repositories with a governance file. 2) As we only collected projects which contain the GOVERNANCE.MD file in the root directory, some GitHub-hosted projects are missing from our dataset as they might organize and store their governance files differently. For example, some projects put their governance files directly in the readme file.
In other words: a hell of a lot more than 710 github projects have governance documents.

600mb dataset and the scripts they used here: https://zenodo.org/records/7530768