How Avowed's QA team stomped out bugs by working inside the design tools -

Obsidian Entertainment’s Avowed is a major technical achievement for the fabled role-playing game studio. Though the company has experience with 3D action RPGs like Fallout New Vegas and The Outer Worlds, Avowed brings the fantasy world of the company’s isometric Pillars of Eternity series to a massive 3D world. Like its predecessors, Avowed allows players to make a dizzying number of decision in gameplay and dialogue that all intersect and shape their journey. Combine those branching choices with the game’s advancements in first-person combat and animation, and you have an exceptionally complex game at risk of launching with a distressing number of bugs.

But that didn’t happen. Avowed launched to critical acclaim, with some critics noting the game isn’t as “glitchy” as many of Obsidian’s beloved older titles. The studio’s reputation for buggy games was so noteworthy that at the 2025 Game Developers Conference, Obsidian QA lead David Benefield briefly mentioned it in his presentation on Obsidian’s improvements to its QA workflow.

What are those improvements? According to Benefield, Obsidian has spent the last decade (since the release of 2016’s Tyranny) restructuring its QA department to work more closely with the rest of the studio. QA testers became QA analysts, and instead of only running tests on builds of games like Avowed, the QA team began working with designers of all stripes to review their work in Obsidian’s narrative tool and Unreal Engine, spotting bugs before anyone hit the “commit” button.

That process may sound dauting—but if you want to bolster your QA team, Benefield said you can boil the process down to one phrase: “train your QA team using whatever methods you’re training your designers.”

Obsidian’s QA testers got access to Tyranny’s narrative tools

The yearslong journey to reinvent Obsidian’s QA department began in 2015 while developing the CRPG Tyranny. Dialogue in Tyranny is accompanied by portraits of characters in different premade “poses” that illustrate their emotional state. Originally these poses were to be set up by the narrative team working in the Obsidian narrative tool, but according to Benefield, that team had become bogged down creating quests and content for the game, and work on implementing poses was falling behind schedule.

Implementing these poses wasn’t a complex process, it just required hours and hours of work, and the QA team had the bandwidth to take up the task. But with access to the tool they began to realize there was loads of implemented content they’d never seen or tested before.

“We found lines that had never been tested, or, in some cases, lines you couldn’t even reach as a player due to a bug,” he said. “Sometimes they were entire quest branches, small and large, hidden inside these files that unless you stumbled into it as a player, or it was documented somewhere, you wouldn’t know that it’s even there, meaning…QA couldn’t [log] the bug if we didn’t know the bug was present.”

A crowd of fantasy characters stands over the corpse of a monster in Tyranny.

Image via Obsidian Entertainment/Paradox Interactive.

Benefield was a tester at the time, and while he was implementing poses, he spotted a potentially game-breaking bug tied to the game’s reputation system. In the game’s opening hours, players balance demands from different factions like “The Disfavored” and “The Scarlet Forest” to garner reputation, culminating in a scene that will cement their initial allegiance. The scene checks the player’s reputation with each faction, which is calculated by numbers that go up or down depending on different choices. Whichever faction the player has a better reputation with will determine what scene plays out and how the game progresses.

Because Benefield was working in the narrative tool and could see the number values and narrative node pathways, he did the math and found it was possible for players to make a precise set of choices that would end with equal reputation values with each faction. This was not an intended outcome, and the player wouldn’t be able to progress the story.

He showed the bug to his lead…who congratulated him on his initiative but said they couldn’t file a ticket unless he reproduced it in a build. It took him two hours to test and retest his theory, and the fix took 30 seconds. “I found it personally frustrating to see I’d found such a high-severity bug, but the cost and time to reproduce it would prove to be greater than the time to fix it,” he said.

Fortunately, Obsidian listened to his feedback, and after this Tyranny bumped the status of its QA testers up to QA analysts (increasing their pay as well) and created a process for analysts to review quest and dialogue node trees before they went into the build.

Expanding in-tool testing on Avowed

Benefield took a few years away from Obsidian to work as a producer at Nexon, but was hired back at the company as a QA lead on Avowed, where he began more rigorously implementing this process. Avowed‘s conversation nodes weren’t larger or more complex than Tyranny, but now the QA team also had to track animation, audio, and other gameplay bugs that came with first-person combat and animated conversations.

Using a sample conversation with a merchant from early in Avowed that checks if a specific party member was present, Benefield showed three ways a mistake in the narrative tool could lead to a bug in the game. If a designer created a node where a “bark” (a line that occurs without dialogue UI) transitions into a full dialogue sequence, the conversation breaks. If a designer forgets to identify a speaker when setting up a node, or the speaker was deleted after the file was created, the conversation breaks. And if someone forgets to put a “red” talk node (that ends the conversation) at the end of their sequence, the conversation breaks. All three of these bugs came up “dozens” of times when making Avowed, and they’re “much easier” to spot inside the tools than if they’re reported from inside the game.

A GDC 2025 talk slide showing off three types of flowchart bugs in Obsidian's narrative tool.

Image by Bryant Francis.

In the process of catching these bugs and other logic breaks, the testing team began to spot more advanced (Benefield called them “fun”) scripting errors, spotting them in the game’s data and flagging the narrative designer. This saved time for both teams since narrative designers now knew the root cause of a bug rather than being told the symptom. “It also saves QA time by finding more bugs-per-minute than in the same time they’d spend testing content in the game.”

This process wasn’t restricted to Obsidian’s in-house testers—external testers from service provider QLOC were also given this level of access. Testers from both companies grew so proficient at working with the narrative tool that Benefield began thinking—what if they applied this process to Unreal Blueprints as well?

Obsidian’s Unreal Engine designers embraced working with QA

After enough time with this new process (and a healthy holiday break), Benefield began drafting a pitch for the rest of the studio. “If the analysts know these tools so well, and they know the game so well, the only pieces they’re missing are how triggers, trigger volumes, and blueprints work on the Unreal side,” he argued. “So what if we got them that info too?”

The pitch triggered a bit of “imposter syndrome” in Benefield. When he first joined Obsidian, the company’s org chart kept QA siloed away from the rest of the design team, and there wasn’t a lot of professional overlap between the different ends of the company. There wasn’t a mandated divide like you might find at other game studios (this was the tail end of an era where some companies forbade QA from ever speaking with teams outside their department), but it wasn’t what you’d call a close relationship. Though the departments became closer over the years, the lingering specter of viewing QA as the grunts at the bottom of the organization was still there.

Fortunately—and Benefield said this is one of his “favorite parts” of working at Obsidian—a number of designers quickly warmed up to the idea, and were willing to give it a shot. This led to the creation of the “joint analysis session,” where testers and designers stepped through a quest while reviewing the flow of information in Unreal.

Kay raises his gun while staring down a Beetle monster in Avowed.

Image via Obsidian Entertainment/Microsoft.

“As they go through, the designer calls out every trigger and script used on the quest, literally pulling them up on a shared screen for the analyst to see and ask questions,” Benefield said. “We also record these so we can be very brief about our notes and really just focus on what’s on screen.”

“Because QA is only ever seeing how it does play out, this gives [designers] an opportunity to say ‘wait, that’s what it should have been doing, I didn’t realize that’s a bug.'”

These meetings run for an hour (with multiple meetings scheduled if a quest takes longer). Sometimes bugs were so obvious they could be fixed on a call.

Bringing QA and design together improved morale

According to Benefield, Obsidian’s designers loved these sessions. Bugs were squashed, QA learned more about content they needed to test through conventional means, production continued more efficiently, and maybe most importantly, it was a major boost for morale.

“I didn’t see this one coming, but it was very noticeable to everybody involved and everybody adjacent to these sessions,” Benefield recalled. “Folks were enjoying them. [Designers] felt much better about their quests after they’d been beat on during a joint analysis session.

Even after these sessions, designers and QA analysts were more comfortable reaching out to each other with questions and comments.

Obsidian made a number of other improvements to the QA process during the making of Avowed, but all of them came back to the core practice of training testers on tools used by designers.

Though the team still used classic “black box” testing to check for bugs organically emerging in the game, this “white box” method brought joy and collaboration to what can be a grinding field in game development.

As Benefield concluded, “By pairing the QA mindset of ‘how do I get this to break’ with the designer mindset of ‘how do I get this to work,’ we’re allowing these people to work closely to each other, and we get a better product in a shorter amount of time.”

GDC and Game Developer are sibling organizations under Informa.

Source link