The botched Ecce Homo restoration

DLSS 5 Is an Aesthetic Abomination. It’s Also a Technical Failure

Featured Feature
Bruno Dias

By now you’ve seen the infamous Digital Foundry DLSS 5 preview, and the backlash to it, and the backlash to the backlash. And by now a narrative has emerged: Digital Foundry were dazzled by a very impressive technical presentation, so much so that they ignored the aesthetic and artistic problems with what they were seeing (an argument Rob made on Remap Radio 129).

And don’t get me wrong, DLSS 5’s output is aesthetically bankrupt. But I want to make the point that there’s a deeper failure at play in that dazzled reaction to Nvidia’s tech demo, one rooted in the assumption that the demo was impressively free of visual artifacts.

That’s simply not true: that tech demo is riddled with visual artifacts. But they’re a different category of visual artifact than the ones we’re used to seeing and analyzing. Conventional 3D graphics are prone to artifacts like aliasing or light leaks; raytracing and path tracing introduce noise; previous versions of DLSS have had issues with trails, ghosting, and flicker.

The DLSS 5 images seem to be free of all those sorts of problems, and so if you’re thinking of “visual artifacts” as a closed category, they are very clean images. But in reality, they’re actually introducing a whole new set of problems for video game graphics to contend with.

For context, DLSS is really a brand name for a few different products. Most people are familiar with DLSS 2, 3, and 4 which are incrementally-improving versions of a form of temporal anti-aliasing (TAA). TAA is an image processing technique where pixels are sampled in slightly different positions over time and then averaged out to reduce jagged edges and create a more precise image.

Unlike ordinary TAA, DLSS uses a machine learning model both to decide how exactly to sample pixel positions on screen and how to resolve all those samples into the final image. Unlike DLSS 1, which was a “true” upscaling model that had to be trained specifically for each game, DLSS 2/3/4 has access to a lot more information from the ordinary implementation of the game’s render pipeline, and it’s using that machine learning model to combine a bunch of partially-rendered frames together into higher-resolution frames that are relatively crisp. Whether it works better or worse than ordinary TAA is mostly a question of which kinds of rendering artifacts you tolerate better.

DLSS 5 is not doing any of that at all. Nvidia has confirmed that the only information it gets as input are the fully rendered frames from the first GPU, plus motion vectors. DLSS 5 is taking that full-resolution image rendered by the GPU, and then—in NVidia’s tech demo, on a whole second RTX 5090 GPU—it’s attempting to “improve” that image to, supposedly, add more lighting detail. The initial image it’s transforming is already rendered on max settings anyway, so DLSS 5 is essentially second-guessing the original render pipeline.  It’s analogous to pasting a screenshot of the game into ChatGPT’s dialogue box and asking it to make it look better and more detailed. The debatably impressive thing is that with a whole second 5090 they can do it 60 times a second.

Like DLSS 1, DLSS 5 has to generate a lot of novel information to create frames; in fact, it generates pretty much all of it. Unlike DLSS 1, it’s a generic image model with a much vaguer mission than the simple upscaling task DLSS 1 was semi-good at. DLSS 1 is trying to take a low-resolution image and extrapolate a high-resolution version, based on a highly specific sample of frames from the game running at a higher resolution. DLSS 5 is… punching it up in some nebulous way, based on a model trained on random images from the internet.

But the claim is that this post-processing pass creates an image with higher lighting fidelity; that there’s more lighting detail. There is more visual detail in the image in a sense, but that detail doesn’t necessarily correspond to the actual lights or geometry of the scene being rendered. Like so much genAI output, it’s signalish: it always looks like signal, but sometimes it’s noise.

If you ask a language model like Claude a question, you’ll get a confident answer that looks at first glance like true information, but may just be random nonsense. You can’t know without knowing the ground truth. When it comes to video game graphics, then, what is this “ground truth”? What does “fidelity” even mean in a video game?

Signing up is free!

By signing up—again, it costs nothing!—you can read the rest of "DLSS 5 Is an Aesthetic Abomination. It’s Also a Technical Failure," and receive free newsletters and emailed articles from Remap!

Sign up now Already have an account? Sign in
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.