Explain why we need to read source code, how to read code and learn as much from source code.
I was talking to a younger programmer last week, when he asked me:
“How do I read code?“
We discussed it for some time, I described a few ways to read source code more efficiently.
“You should do an article about this,” he suggested then.
“It’ll be helpful to beginners. This isn’t the kind of thing you pick up from books or tutorials.”
So here we go. These are my tips for learning from source code.
Why we need to read code
As programmers, we deal with source code every day. After years of study, most programmers can “write” code, or at least copy and modify it.
Still, the way we teach programming emphasizes the art of writing code, not how to read it. When I say “read code” I refer to the practice of reading source code on purpose.
As we know, programming and writing have much in common. Donald Knuth even introduced the programming paradigm of literate programming. Coding and writing both have the same common ideal, to express our ideas.
Remember how you learned to write at school? Our ability to write comes from having read large volumes of text, starting from primary school and on. Over the years, we then read the works of great writers at differing levels of difficulty, and practice various writing techniques.
“If you don’t have time to read, you don’t have the time (or the tools) to write. Simple as that”
– Stephen King in his memoir, On Writing.
As Stephen King observed, a writer must read widely and frequently to develop their own voice, and learn how to pen sentences and structure stories in ways that compel readers to pick up their work and read it.
Same as reading books, reading code on purpose will help programmers grow much more quickly, especially for intermediate programmers.
There are three benefits to reading code intentionally.
To stand on the shoulders of giants
We get to learn from others. Great source code is like a literary masterpiece. It offers enlightenment, not just information and knowledge.
By browsing the source code of the Linux Kernel, Redis, Nginx, Rails or any other famous projects, you draw on the wisdom of thousands of top-level programmers all over the world.
There are countless examples of good programming, paradigm choices, designs and architecture to find in these projects. An added benefit of learning from others is the ability to avoid common pitfalls. Most mistakes have already been made by others.
To solve hard problems
Throughout your programming career, you will eventually encounter problems that you can’t solve by googling. If you haven’t met this kind of problem, it just means you haven’t programmed for long enough :). Reading source code is a good way to investigate this kind of problem, and a very good opportunity to learn something new.
To expand your limitations
Most programmers only code in a few specific domains. Generally speaking, if you don’t push yourself constantly, your programming skills will be the average of your colleagues. Don’t satisfied with the job of fixing some bugs or adding some trivial features to an existing system. Instead, try to expand to a new area, always try to find a new domain which you haven’t touched in your day-to-day work, but that interests you. This will broaden your understanding of coding as a whole.
What kind of source code to read
Ok, so there are benefits to reading source code. The next problem is, with so many great works to choose from, what kind of source code should we read?
You have to start out by choosing a target. Without that focus, your attempts at understanding the source will be less effective.
Here are a few typical scenarios:
When you want to learn a new programming language. Learning a new programming language doesn’t just mean learning the syntax. However, when taking on a new language, reading source code is a very efficient learning method. I learned a lot about Rust from the project rust-rosetta. Rosetta Code is a project which collects solutions to common tasks in various programming languages. It’s a useful resource for picking up a new programming language.
- When you want to understand a specific algorithm or implementation. For instance, we’ve all used the sort function from the standard library. Have you ever wondered how it’s implemented? Or say you needed to use the Set data structure in Redis, which data structures are used in its implementation? For this purpose you only need to look through the part of a project related to the implementation, which typically is a few files or functions.
When you code in a specific framework. Since this means you have some experience with the framework in question, it’s a good time to read some parts of the source code of the framework itself. Obviously, knowing its source code will improve your understanding.
When you want to branch out into a new field, read the classic and famous project of this field. For instance, since you are doing Web development, do distributed systems intrigue you? If so, maybe etcd is your good choice if you know Golang. Do you want to delve into the internals of operation system? Then maybe xv6 would be a good start. We are in a great time with many great open-source projects on Github. Try to find a few.
Remember, choose projects according to your current programming skills and knowledge level. If you choose a project too far above your current skill level, you’ll end up feeling dejected. Read some smaller projects, then move on to larger ones.
If you can’t understand some specific piece of code at a given time, this means you have a knowledge gap. Put the code away and try reading some books, papers or other related documents, then come back when you have more confidence.
We always make progress in a pattern: reading (code, books, papers), writing, reading more and writing more.
How to read source code
How to Read a Book is a guide to intelligent reading. As a beginner, how we read code is also a lesson to invest time and effort into. Reading code is not easy. It is not enough to simply read source code. You are trying to understand the design and thoughts of others.
To read code efficiently, you need to have a few things prepared and on hand:
- An editor you can use effectively. You will need the ability to quickly search for keywords or variables. Sometimes you need to find the references or the definition of a function. Get comfortable with your editor. To become more effective, learn to use it with just the keyboard. This will let you focus on the code without interruptions.
- Basic skills in Git or similar version control tools, so you can compare diffs between versions.
- Documents related to the source. These will serve as references for your reading, especially the design documents, the code conventions, and so on..
- Some knowledge and experience with the programming languages and design patterns in use. This is mandatory for large projects. If you know a programming language well, you will know how the source code is organized and what the paradigms and best practices are. Of course, this need time to accumulate. Be patient.
Process and tips
The reading process is not linear. You can’t just read source files one by one. Instead, most of the time we read code from top to bottom. Here are some tips for reading code more efficiently:
1. Read code with context
When you start to read code, always try to throw out questions.
For instance, if an application has a cache strategy, one good question is what happened if a key is invalidated, how values in the cache are updated?
With this kind of question, you are reading code in context, or with a goal in mind, which makes the reading process enjoyable. You can even make some assumptions to yourself. Then, with the code in hand, we need to confirm our assumptions.
It becomes a bit like spying: you want to discover the truth about the code, the logic of the code, how it flows like a story.
2. Run and interact with the code
Source code is like a LEGO kid, only already assembled. If you want to understand how it’s put together, you need to interact with it, even sometimes pick it apart. With code, it’s helpful to read older versions of the same source. Read the diffs from Git, and try to figure out how a specific feature is implemented (Changelogs are useful for this). For example, I found the first version of Lua much simpler, which helped me understand the original design ideas of its author.
Debugging is another way to play with code. Try adding some breakpoints (or print points) to the code, and to understand any output printed to the console.
If you understood enough of the code, try to make some modifications, then rebuild and run it. The simplest technique is to try to adjust the configuration in order to see the results of different configurations. After that, you can try adding some trivial features. If the result is useful for others, you should contribute to the upstream.
3. The relationship between data structures
“Bad programmers worry about the code. Good programmers worry about data structures and
– Linus Torvalds
Data structures are the most important elements of a program. Draw the relationship between data structures with a pen or any tool you prefer. The result is a map of the source code. You’ll need to refer to this map frequently in the reading process. Some tools like scitools can be used to generate a UML Class diagram.
4. The module dependency and boundary
Big projects contain multiple modules, typical one module has a single responsibility. This helps us reduce the code complicity, do the abstraction with a proper level. The interface of a module is the abstraction boundary, we can read one module and move onto another one. If you are reading a C/C++ projects which build with Make, the Makefile will be a good entry for understanding how module organized.
The boundary itself is also useful. Great code is well organized, its variables and functions named in a style meant to be readable. You don’t need to read all the source files, you can ignore the unimportant or familiar parts. If you’ve confirmed a module is just designed for parsing, you roughly know the functionality of it already; you can then skip reading the module. This will, of course, greatly save time.
5. Use the test case
The test cases are also a very good supplement for understanding code. Test cases are documentation. If you read a Class, try to read the related test code. That lets you figure out the interface of a Class and what the typical usage of it is. An integrated test case is also useful for debugging code with some specific input, which lets you follow the overall flow of the program.
After spending a long time reading a project, why not write a review of the code? It’s like reviewing a book. You can write down the good or bad parts of this source code and what you’ve learned from reading it. Writing this kind of article will clarify your understanding, and also help others with source code reading.
Some good books
I find code reading is a far more extensive topic than I thought. There is no real systematic way to train this skill. In one word, keep practicing to find your own way. These are some good books for improving your reading code ability:
Design Patterns: Elements of Reusable Object-Oriented Software
Clean Architecture: A Craftsman’s Guide to Software Structure and Design
How to Read a Book: The Classic Guide to Intelligent Reading.
Aha, this book is also useful for a programmer.