2023-09-08

Why Django Ninja Slowed Us Down and the Simple Fix That Saved 30% Latency

Have you ever noticed that sometimes, when handling requests and generating responses from a search API, the CPU time seems unusually long compared to the actual Elasticsearch query? It’s a frustrating anomaly, especially when you’re expecting the system to fly through the tasks. I’ve encountered this myself while dealing with response generation in Django Ninja. Even after switching from dataclass to pydantic and back to dataclass (yes, this has happened), the problem persisted. Given that Django Ninja uses pydantic for requests and responses, I suspected the culprit was something lurking within the framework itself.

Profiling with Py-spy

To get a clearer picture, I turned to py-spy (GitHub link), a lightweight profiler. The results were eye-opening. Below, you can see a snapshot from around 10 API calls. On the far right, we can spot the _result_to_response function — the segment responsible for building the response. Though it didn’t show exact latency, this process was taking almost as long as the actual product query.

Let’s break it down a bit more.

Unveiling the Culprit: Field Mismatches

In this snapshot:

The portion of the time spent calculating encrypted_product_id stands out. It’s not huge, but it does consume enough resources that skipping the calculation when it’s not needed (like during searches) could lead to significant optimization. However, this wasn’t the primary issue.

The next major time sink is in generating the response itself. Diving into the code responsible for this, we can see the culprit:

try:
    item = getattr(self._obj, key)
except AttributeError:
    try:
        item = Variable(key).resolve(self._obj)  # 1
    except VariableDoesNotExist as e:
        raise KeyError(key) from e

The critical part here is the resolve function. The code enters the except block and recalculates the item (#1 in the code). This happens because the key wasn’t found on the object, so we’re forced to search for it again. These keys include fields like:

adPurchaseIndex
viewCount
ingredients
rank
rankDelta
product_review_topics

This leads to additional latency, as Django Ninja tries to resolve fields that don’t exist in the dataclass by using Django template’s resolve() method.

I left a detailed issue on the Django Ninja repository, where I described how this mismatch between the dataclass and the response schema causes DjangoGetter to use the slow resolve() process for unmatching fields.

Real-world Example and Profiling Insights

In our project, we noticed this issue when dealing with a mismatch between fields defined in our dataclass and the schema used in the API response. Here’s a simplified structure of the project:

ninja_test/
|- dataclasses.py
|- schemas.py
|- urls.py
|- views.py

In dataclasses.py:

from dataclasses import dataclass
from typing import Literal

@dataclass
class Data:
    x: int
    y: int = 0
    z: Literal[0] = 0

In schemas.py:

from typing import Literal
from ninja import Field, Schema

class HelloData(Schema):
    x: int = 0
    y: int = Field(0)
    z: Literal[0] = 0
    z0: Literal[0] = 0
    z1: Literal[0] = 0

class HelloResponse(Schema):
    data: list[HelloData] = []

In this setup, we have fields like z0 and z1 in the response schema that are not part of the dataclass. When I profiled the call stack using py-spy, I could see that DjangoGetter was spending significant time trying to resolve these missing fields, causing a noticeable delay.

Here’s a close-up of the call stack generated by py-spy:

But when I manually passed the missing fields in a dictionary, the latency caused by Variable(key).resolve(self._obj) disappeared entirely. The modified code in views.py looked like this:

@router.get("/hello", response=HelloResponse)
def hello(request):
    data = [asdict(Data(x=1)) | {"z0": 0, "z1": 0}]
    return {"data": data}

By explicitly setting the missing values, we effectively bypassed the need for resolve() and reduced response latency by 30% across the board. You can see this reflected in the py-spy profile comparison below:

Here’s the call stack before explicit assignment:

And here’s the call stack after explicit assignment. Call stacks are disappeared.

The Final Fix: Explicit is Better than Implicit

In the end, the solution was straightforward: we needed to be explicit. By providing values manually for the fields that weren’t part of the original dataclass, we eliminated the performance bottleneck caused by Django Ninja’s resolve() method.

Above is a graph showing the latency changes after deploying these optimizations. On the left, you see the impact of providing default values and removing encryptedProductId. The middle section shows the original state, while the right side shows the improved state after explicitly setting defaults.

While it’s unclear if this behavior is intended by Django Ninja, I have raised the issue with the Django Ninja maintainers. You can follow the discussion on the official GitHub issue.

Conclusion

By taking the time to profile and analyze your code, you can uncover hidden inefficiencies that might otherwise go unnoticed. In this case, a mismatch between a dataclass and the response schema led to a significant latency issue. However, by switching to an explicit field assignment, we reduced our API response time by 30%.

If you’re encountering unexplained slowdowns in your Django Ninja project, I’d highly recommend checking for similar issues and applying a similar fix.