Root Cause Analysis
We deployed a change to the GET /organization API endpoint, introducing a new response property: amount_of_members. This property calculates and returns the number of members in an organization.
Impact:
Shortly after deployment, we observed performance degradation for requests involving the “volunteers” organization. The member count calculation for this specific organization proved too computationally expensive without caching, leading to excessive response times and increased backend load.
Root Cause:
The amount_of_members value was computed on-the-fly for each request. For most organizations, this was acceptable. However, the “volunteers” organization contains a significantly larger number of members. The lack of caching made this query particularly expensive, creating performance bottlenecks.
Resolution:
We reverted the code change updated the implementation to opt out the Volunteers from this code path mitigates the performance impact without removing the field from the schema.
Timeline:
11:57 UTC - Code Deployed
11:58 UTC - Started investigating reports of issues from our internal tests
11:59 UTC - Code rollback.
12:00 UTC - We received confirmations that users are no longer receiving server errors.
Next Steps / Preventive Actions:
Evaluate the use of caching or pre-computation for properties with high computational cost. Add monitoring for expensive queries on API endpoints.