GitHub README Analyzer
An experiment to algorithmically improve your GitHub README
Think you've got a great README?
We know writing a README can be a challenge. And, how do you know what makes a README any good?
Instead of guessing, we tried to train a model using natural language processing, and machine learning. This data science experiment uses the 10,000 most starred GitHub repositories, across the 10 most popular programming languages.
Try it out: We'll analyze your README, score it, and provide recommendations for improvement across four categories.
Learn more about our data science approach to analyzing GitHub README's.
Your README Report Card
Your README's Overall Grade
Your overall score is calculated as an average of your README's headers, code samples, text, and image scores. Each section provides insights and suggestions for improving the quality of your README relative to the 10,000 popular repositories we've analyzed.
These grades are not definitive. Rather, they're the result of machine learning, and are provided on a "best effort" basis. We recognize that the model doesn't account for all the complexity and nuance a README has. Ultimately, you should use your own judgement about what to include, remove, and ignore. Inevitably, there will be results that don't make sense. tl;dr data science is hard.
- Popular repositories probably have a good, well-documented README
- Popular repositories have more stars than bad repositories
- Each programming language has unique characteristics
In general, we found a higher correlation between a README's quality and the specific headers, and text used throughout. Conversely, we found a lower correlation between the quality and the number of code samples, and the number of images in the README.
In order to correct this, we removed any repository that had zero images, or code snippets from our model, because these are helpful, additive features.
How was this made?
Learn more about our data science approach to analyzing GitHub README's, and find the complete code sample in the Algorithmia Sample Apps repo here, which earned a B grade. 😎
Having clear section headers help users quickly find what they're looking for. Our recommendations provide guidance on what sections you should consider adding, changing, or removing. In many cases, section headers can come in multiple forms. Such as Install, Installing, or Installation. In these situations, we simply pick one, and recommend it. Feel free to pick your own flavor.
It should go without saying that code samples are extremely helpful. Many developers jump straight to code examples, rather than reading the documentation. Here, we attempt to make recommendations relative to the average number of code samples in the most popular repositories.
The average README has 8.33 code samples.
The text of your README is important, because it explains what your README is about, how it works, why you need it, and more. Your README should be readable, coherent, and clear.
Our model analyzes every word used throughout your README to suggest keywords commonly used in popular README's. For instance, if we recommend that you include a word like "globals," perhaps you should include a sentence or two describing the role of globals in your project.
Popular repositories on GitHub often have images or badges in their README's. The badges indicate things like continious integration, build status, or package manager inclusion. Other types of images, such as screenshots or GIFs, can also be useful in conveying information about the output or how the code works.
The average README has 2.89 images.