How I Fixed a Real Linux Kernel Bug as a First-Time Contributor

What I Learned About RCU, Maintainers, and Open Source

I am excited to share that I will be speaking at Open Source Summit Mumbai 2026 on the topic “How I Merged 21 Patches as a First-Time Linux Kernel Contributor.”

Before that talk, I want to share one of the most exciting and educational experiences of my journey — fixing a real Linux kernel use-after-free bug reported by syzbot, with guidance from some of the best kernel developers in the world.

This is not a story about being a genius. This is a story about good intentions, curiosity, and the incredible kindness of the Linux kernel community.


What is Syzbot?

Syzbot is Google’s automated kernel fuzzer. It runs 24/7, hammering the Linux kernel with random inputs to find bugs. When it finds one, it reports it publicly on syzkaller.appspot.com with a detailed crash report.

Anyone in the world can pick up one of these bugs and try to fix it. That is exactly what I did.


The Bug

I picked up a syzbot bug report showing a KASAN use-after-free in sock_def_readable(). KASAN is the Kernel Address Sanitizer — when it says slab-use-after-free, it means some code is reading memory that has already been freed. This is a serious bug that can cause kernel crashes and security vulnerabilities.

The crash was happening in the ATM LAN Emulation subsystem in net/atm/lec.c. The call chain showed:

The original bug report that I fixed: syzbot report

mld_ifc_work -> lec_start_xmit -> send_to_lecd -> sock_def_readable -> CRASH

Understanding the Race Condition

Using simple tools like grep and sed, I traced the bug to a classic race condition. Two functions were racing against each other:

send_to_lecd() uses the lecd pointer to communicate with the LEC daemon:

if (!priv->lecd)       ->  check lecd, it is valid!
sk = sk_atm(priv->lecd) ->  use lecd
sk->sk_data_ready(sk)  ->  UAF CRASH HERE!

While lec_atm_close() clears it when the daemon disconnects:

priv->lecd = NULL;     ->  socket freed via RCU!

The check and the use of priv->lecd were not protected. Another CPU could free the socket between the check and the use — classic use-after-free.


Our First Attempt — v1 Patch

My initial approach was to use a spinlock + sock_hold/sock_put to protect the socket while it is being used. The idea was simple — hold a reference to the socket so it cannot be freed while we are using it.

I identified four vulnerable sites in lec.c and fixed all of them. I also added proper skb cleanup to prevent memory leaks on early exit paths.

I submitted this as v1 to the netdev mailing list. I was nervous — this was going to be reviewed by some of the best kernel developers in the world.

You can read the full v1 patch and Eric Dumazet’s review here: v1 on lore.kernel.org


The Review — Eric Dumazet

Within hours, I got a reply from Eric Dumazet, a top kernel networking developer at Google:

“What prevents priv->lecd to be NULL after you released priv->lec_arp_lock? More generally, lec_atm_close() clears the sk_receive_queue. So allowing providers to queue more packets would be wrong. So really a better fix is needed.”

Eric found two problems with my approach:

  • I was still accessing priv->lecd directly after releasing the lock instead of using a local copy — the race window was still there.
  • The spinlock did not prevent packets from being queued after lec_atm_close() drains the queue — timer and workqueue paths bypass netif_stop_queue().

Learning RCU

Eric hinted that an RCU-based approach would be better. I had heard of RCU before but never used it. So I had to learn it from scratch.

RCU stands for Read Copy Update. Think of it like a library book:

  • Readers can read the book anytime, very fast, with almost no overhead.
  • A writer who wants to replace the book must wait for ALL current readers to finish before removing the old book.

The three key rules of RCU:

  • Writers use rcu_assign_pointer() to safely publish a new value.
  • Readers use rcu_read_lock() and rcu_dereference() to safely access the value.
  • synchronize_rcu() blocks until ALL readers across ALL subsystems have finished — guaranteeing no new packets can be queued after it returns.

A Simpler Suggestion — Hillf Danton

Another developer, Hillf Danton, suggested a simpler approach — just reorder the calls in lec_atm_close() so lecd is cleared after stopping the queue and destroying the ARP table.

I investigated this seriously using code evidence. While cancel_delayed_work_sync() inside lec_arp_destroy() does stop lec_arp_work, the bug is triggered from mld_ifc_work — the IPv6 multicast workqueue which belongs to a completely different subsystem outside ATM/LEC control.

One simple command confirmed this:

grep -n "mld_ifc_stop" net/atm/lec.c  ->  empty output!

ATM/LEC has zero control over mld_ifc_work. After reviewing the evidence, Hillf agreed:

“Syncing RCU after clearing lecd is the correct fix because lecd is checked with RCU lock held.”


The v2 Patch — RCU Based Fix

The complete fix involved converting priv->lecd to an RCU-protected pointer across all access sites:

  • Mark priv->lecd as __rcu in lec.h to tell the kernel this pointer is RCU-protected.
  • Use rcu_assign_pointer() in lec_atm_close() and lecd_attach() for safe pointer assignment.
  • Use rcu_read_lock/rcu_dereference/rcu_read_unlock in send_to_lecd(), lec_handle_bridge() and lec_atm_send() to safely access lecd.
  • Add synchronize_rcu() in lec_atm_close() after clearing lecd — guarantees all readers have finished before proceeding.
  • Remove the redundant sk_receive_queue drain from lec_atm_close() since vcc_destroy_socket() already drains it afterwards.

Full v2 patch thread and discussion: v2 on lore.kernel.org

Merged commit: 922814879542


What I Learned

Technical Lessons

  • Always read KASAN reports carefully — they tell you exactly what happened.
  • Use simple tools: grep, sed, git log — they are enough to trace complex bugs.
  • RCU is the right tool for protecting pointer lifetime in the kernel.
  • Always back your arguments with code evidence — not assumptions.
  • synchronize_rcu() is global — it waits for ALL readers across ALL subsystems.

Community Lessons

  • You do not need to be an expert to contribute — good intentions matter more.
  • Maintainers are guides not gatekeepers — they want you to succeed.
  • Review feedback is a gift not a rejection — every comment teaches you something.
  • Always investigate suggestions seriously before accepting or rejecting them.
  • The kernel community is one of the most welcoming technical communities in the world.

My Message to You

When I started this journey, I did not know what RCU was. I did not know how syzbot worked. I did not know how to write a kernel patch.

But I had good intentions, a willingness to learn, and the courage to pick up a real kernel bug and see it through.

The Linux kernel community is one of the most welcoming technical communities in the world — if you approach it with respect and genuine desire to learn.

You can do this too.

Back to blog



Copyright 2026 Deepanshu Kartikey, all rights reserved — kartikey406@gmail.com