XDC 2025

17 Nov 2025

It has been a long time since I published any update in this space. Since this was a year of colossal changes for me, maybe it is also time for me to make something different with this blog and publish something just for a change — why not start talking about XDC 2025?

This year, I attended XDC 2025 in Vienna as an Igalia developer. I was thrilled to see some faces from people I worked with in the past and people I’m working with now. I had a chance to hang out with some folks I worked with at AMD (Harry, Alex, Leo, Christian, Shashank, and Pierre), many Igalians (Žan, Job, Ricardo, Paulo, Tvrtko, and many others), and finally some developers from Valve. In particular, I met Tímur in person for the first time, even though we have been talking for months about GPU recovery. Speaking of GPU recovery, we held a workshop on this topic together.

The workshop was packed with developers from different companies, which was nice because it added different angles on this topic. We began our discussion by focusing on the topic of job resubmission. Christian began sharing a brief history of how the AMDGPU driver started handling resubmission and the associated issues. After learning from erstwhile experience, amdgpu ended up adopting the following approach:

  1. When a job cause a hang, call driver specific handler.
  2. Stop the scheduler.
  3. Copy all jobs from the ring buffer, minus the job that caused the issue, to a temporary ring.
  4. Reset the ring buffer.
  5. Copy back the other jobs to the ring buffer.
  6. Resume the scheduler.

Below, you can see one crucial series associated with amdgpu recovery implementation:

The next topic was a discussion around the replacement of drm_sched_resubmit_jobs() since this function became deprecated. Just a few drivers still use this function, and they need a replacement for that. Some ideas were floating around to extract part of the specific implementation from some drivers into a generic function. The next day, Philipp Stanner continued to discuss this topic in his workshop, DRM GPU Scheduler.

Another crucial topic discussed was improving GPU reset debuggability to narrow down which operations cause the hang (keep in mind that GPU recovery is a medicine, not the cure to the problem). Intel developers shared their strategy for dealing with this by obtaining hints from userspace, which helped them provide a better set of information to append to the devcoredump. AMD could adopt this alongside dumping the IB data into the devcoredump (I am already investigating this).

Finally, we discussed strategies to avoid hang issues regressions. In summary, we have two lines of defense:

  • IGT: At the IGT level, we can have more tests that insert malicious instructions into the ring buffer, forcing the driver into an invalid state and triggering the recovery process.
  • HangTest suite: HangTest suite is a tool that simulates some potential hangs using Vulkan. Some tests are already available in this suite, but we should explore more creative combinations for trying to trigger hangs.
Lighting talk

This year, as always, XDC was super cool, packed with many engaging presentations which I highly recommend everyone check out. If you are interested, check the schedule and the presentation recordings available on the X.Org Foundation Youtube page. Anyway, I hope this blog post marks the inauguration of a new era for this site, where I will start posting more content ranging from updates to tutorials. See you soon.


Articles from blogs I follow around the net

Status update, November 2025

Hi! This month a lot of new features have added to the Goguma mobile IRC client. Hubert Hirtz has implemented drafts so that unsent text gets saved and network disconnections don’t disrupt users typing a message. He also enabled replying to one’s own messages…

via emersion November 16, 2025

Kworkflow at Kernel Recipes 2025

This was the first year I attended Kernel Recipes and I have nothing but say how much I enjoyed it and how grateful I’m for the opportunity to talk more about kworkflow to very experienced kernel developers. What I mostly like about Kernel Recipes is its inti…

via Wen.onweb November 3, 2025

October/November Conference News

The last part of October brings us a whole pile of events, with many of them featuring talks by Igalians. After the month ends, we’re looking forward to a couple of events in November. Here’s where we’ll be presenting talks: RISC-V Summit North America, O…

via Igalia October 22, 2025

Generated by openring