Why PhD Students Should Learn Programming

In Lesson 2, I wrote about habits. In Lesson 3, I focused on one of them: continuous learning. In Lesson 4, I zoom in on one skill I developed through continuous learning because I consider it crucial for PhDs in science: programming.

When I started my PhD, I was funded through a 4-year DFG project with a detailed proposal: hypotheses, methods, datasets, and four work packages. I loved the plan. It looked clear, structured, and straightforward.

Four years later, the funding ended, and the Max Planck Institute generously gave me one more year to wrap up. How many of the original work packages had I finished?

Zero.

Not because I was unproductive, but because a PhD is highly unpredictable. During a PhD, you move into territory no one has fully entered before. Datasets disappoint, methods fail, hypotheses change, reviewers ask unexpected questions, and side projects become main projects.

And when the plan changes, the code usually has to change too.

This is why software engineering skills are so important. Good software engineering means building code with future changes in mind. The goal is not only code that works today, but code that remains useful when your plans change tomorrow.

I did not learn programming only by struggling through my analyses, although there was plenty of that too. I also learned it systematically. During my continuous learning time, I spent 30 minutes every morning taking programming courses, reading about software engineering, and practicing concepts that were not immediately required for my current project. This mattered because learning-by-doing alone often teaches us only how to solve the urgent problem in front of us. Dedicated learning time helped me build a deeper foundation.

Many scientists think research code only needs to be correct. But correctness is not independent of code quality. Readable, modular, documented, and concise code makes errors easier to find. Messy code is therefore not just ugly. In science, it can make the difference between a trustworthy analysis and an unnoticed mistake.

This also changed how I think about research transparency. In my opinion, scientific journals should not accept papers for publication unless the underlying analysis code is shared. No scientific journal would publish a paper without a detailed methods section. But no methods section can be as precise as the analysis code.

Many researchers are afraid of publishing their code because it may look messy or reveal an error. To help with this, I founded Code Clinics at the MPI in Leipzig and the CRC “ReTune”. Before publication, researchers could receive external code review from someone outside their project. The reviewer checked the analysis code, suggested improvements, and became a coauthor for this important and time-intensive work.

The idea of implementing scientific code reviews received the “Best Presentation: Procedure Idea” award from the Department of Neurology at the MPI in Leipzig.

I experienced this process from both sides. Reviewing code helped me learn from real research code. Having my own code reviewed gave me confidence before publication and provided personalized feedback that made me a better programmer. It was a genuine win-win: better code for the project, better training for the researcher, and more trustworthy science.

In my latest PhD paper, the external code reviewer was included as a coauthor for reviewing my analysis code.

Now there is one obvious question: why should PhD students still learn programming in 2026, when AI can already write code?

I think this is a valid concern, but my answer is clear: AI makes software engineering skills more important, not less important. Not learning to program because AI can code is like not learning to write well because AI can write. The goal of learning to write well is not only to produce text. It is to sharpen our thinking. Programming is similar.

We all know the situation when our supervisor suggests a new analysis. We hear it, we understand it, and we believe the task is crystal clear. But when we sit down and start writing the analysis code, we suddenly realize how vague the idea still is and how many decisions are required to implement it. Programming forces us to turn that vague idea into precise instructions.

This is why PhDs usually have a much deeper understanding of their dataset than their supervisors. They have implemented the analyses, handled the exceptions, debugged the edge cases, and made the countless small decisions that turn an idea into a result.

In the era of AI, it is tempting to let an agent implement your vague idea. But without software engineering skills, you cannot fully understand all the decisions the agent makes during implementation. In science, these decisions can be crucial.

With AI, the bottleneck is no longer writing code. The bottleneck becomes specifying what the code should do, checking whether it does it correctly, testing it, and integrating it into a larger research workflow. All of this requires software engineering skills. Believing that you can solve all your software problems with AI without understanding programming is like believing you can lead a team of software engineers without understanding software.

So my recommendation is simple: dedicate protected time to learning programming properly. Set aside regular time to take courses, study other people’s code, and understand basic software engineering principles. Learn to write code that is readable, flexible, modular, version-controlled, and reviewable. And if you do not have a beard, I strongly recommend getting a fake one. I can never think as clearly and focused as when I am stroking my beard. 🎅

And if you can grow a beard, start today! You will spend a lot of your PhD coding. The better you code, the faster you will progress. The more cleanly you code, the more clearly you will think. And the more openly you code, the more trustworthy your science becomes.