What I Learned About RCU, Maintainers, and Open Source
I am excited to share that I will be speaking at Open Source Summit Mumbai 2026 on the topic “How I Merged 21 Patches as a First-Time Linux Kernel Contributor.”
Before that talk, I want to share one of the most exciting and educational experiences of my journey — fixing a real Linux kernel use-after-free bug reported by syzbot, with guidance from some of the best kernel developers in the world.
This is not a story about being a genius. This is a story about good intentions, curiosity, and the incredible kindness of the Linux kernel community.
What is Syzbot?
Syzbot is Google’s automated kernel fuzzer. It runs 24/7, hammering the Linux kernel with random inputs to find bugs. When it finds one, it reports it publicly on syzkaller.appspot.com with a detailed crash report.
Anyone in the world can pick up one of these bugs and try to fix it. That is exactly what I did.
The Bug
I picked up a syzbot bug report showing a KASAN use-after-free in sock_def_readable(). KASAN is the Kernel Address Sanitizer — when it says slab-use-after-free, it means some code is reading memory that has already been freed. This is a serious bug that can cause kernel crashes and security vulnerabilities.
The crash was happening in the ATM LAN Emulation subsystem in net/atm/lec.c. The call chain showed:
The original bug report that I fixed: syzbot report
mld_ifc_work -> lec_start_xmit -> send_to_lecd -> sock_def_readable -> CRASH
Understanding the Race Condition
Using simple tools like grep and sed, I traced the bug to a classic race condition. Two functions were racing against each other:
send_to_lecd() uses the lecd pointer to communicate with the LEC daemon:
if (!priv->lecd) -> check lecd, it is valid!
sk = sk_atm(priv->lecd) -> use lecd
sk->sk_data_ready(sk) -> UAF CRASH HERE!
While lec_atm_close() clears it when the daemon disconnects:
priv->lecd = NULL; -> socket freed via RCU!
The check and the use of priv->lecd were not protected. Another CPU could free the socket between the check and the use — classic use-after-free.
Our First Attempt — v1 Patch
My initial approach was to use a spinlock + sock_hold/sock_put to protect the socket while it is being used. The idea was simple — hold a reference to the socket so it cannot be freed while we are using it.
I identified four vulnerable sites in lec.c and fixed all of them. I also added proper skb cleanup to prevent memory leaks on early exit paths.
I submitted this as v1 to the netdev mailing list. I was nervous — this was going to be reviewed by some of the best kernel developers in the world.
You can read the full v1 patch and Eric Dumazet’s review here: v1 on lore.kernel.org
The Review — Eric Dumazet
Within hours, I got a reply from Eric Dumazet, a top kernel networking developer at Google:
“What prevents priv->lecd to be NULL after you released priv->lec_arp_lock? More generally, lec_atm_close() clears the sk_receive_queue. So allowing providers to queue more packets would be wrong. So really a better fix is needed.”
Eric found two problems with my approach:
- I was still accessing
priv->lecddirectly after releasing the lock instead of using a local copy — the race window was still there. - The spinlock did not prevent packets from being queued after
lec_atm_close()drains the queue — timer and workqueue paths bypassnetif_stop_queue().
Learning RCU
Eric hinted that an RCU-based approach would be better. I had heard of RCU before but never used it. So I had to learn it from scratch.
RCU stands for Read Copy Update. Think of it like a library book:
- Readers can read the book anytime, very fast, with almost no overhead.
- A writer who wants to replace the book must wait for ALL current readers to finish before removing the old book.
The three key rules of RCU:
- Writers use
rcu_assign_pointer()to safely publish a new value. - Readers use
rcu_read_lock()andrcu_dereference()to safely access the value. synchronize_rcu()blocks until ALL readers across ALL subsystems have finished — guaranteeing no new packets can be queued after it returns.
A Simpler Suggestion — Hillf Danton
Another developer, Hillf Danton, suggested a simpler approach — just reorder the calls in lec_atm_close() so lecd is cleared after stopping the queue and destroying the ARP table.
I investigated this seriously using code evidence. While cancel_delayed_work_sync() inside lec_arp_destroy() does stop lec_arp_work, the bug is triggered from mld_ifc_work — the IPv6 multicast workqueue which belongs to a completely different subsystem outside ATM/LEC control.
One simple command confirmed this:
grep -n "mld_ifc_stop" net/atm/lec.c -> empty output!
ATM/LEC has zero control over mld_ifc_work. After reviewing the evidence, Hillf agreed:
“Syncing RCU after clearing lecd is the correct fix because lecd is checked with RCU lock held.”
The v2 Patch — RCU Based Fix
The complete fix involved converting priv->lecd to an RCU-protected pointer across all access sites:
- Mark
priv->lecdas__rcuinlec.hto tell the kernel this pointer is RCU-protected. - Use
rcu_assign_pointer()inlec_atm_close()andlecd_attach()for safe pointer assignment. - Use
rcu_read_lock/rcu_dereference/rcu_read_unlockinsend_to_lecd(),lec_handle_bridge()andlec_atm_send()to safely accesslecd. - Add
synchronize_rcu()inlec_atm_close()after clearinglecd— guarantees all readers have finished before proceeding. - Remove the redundant
sk_receive_queuedrain fromlec_atm_close()sincevcc_destroy_socket()already drains it afterwards.
Full v2 patch thread and discussion: v2 on lore.kernel.org
Merged commit: 922814879542
What I Learned
Technical Lessons
- Always read KASAN reports carefully — they tell you exactly what happened.
- Use simple tools:
grep,sed,git log— they are enough to trace complex bugs. - RCU is the right tool for protecting pointer lifetime in the kernel.
- Always back your arguments with code evidence — not assumptions.
synchronize_rcu()is global — it waits for ALL readers across ALL subsystems.
Community Lessons
- You do not need to be an expert to contribute — good intentions matter more.
- Maintainers are guides not gatekeepers — they want you to succeed.
- Review feedback is a gift not a rejection — every comment teaches you something.
- Always investigate suggestions seriously before accepting or rejecting them.
- The kernel community is one of the most welcoming technical communities in the world.
My Message to You
When I started this journey, I did not know what RCU was. I did not know how syzbot worked. I did not know how to write a kernel patch.
But I had good intentions, a willingness to learn, and the courage to pick up a real kernel bug and see it through.
The Linux kernel community is one of the most welcoming technical communities in the world — if you approach it with respect and genuine desire to learn.
You can do this too.