Problem:
The Partner API service experienced a sudden increase in response time to over 1000 ms starting at approximately 01:03 AM on May 08, 2026 AEST. All API endpoints were uniformly affected, with request latencies increasing by several seconds.
Impact:
Response times fluctuated but consistently breached the 1000 ms threshold.
Per-request durations increased across multiple API endpoints.
No errors were observed; however, significant latency degradation occurred.
Root Cause Analysis:
Elevated network latency between the Partner API service and the caching layer caused slower response times. The issue was traced to unusually large cache keys stored for the GET /v1/orders and GET /v2/orders endpoints, which were heavily requested by multiple partners. This inadvertently overloaded the Redis cache infrastructure.
Steps Taken to Resolve:
Adjusted the caching strategy for all endpoints to reduce the load on the cache.
Applied temporary rate limiting to targeted requests to allow the application to recover.
Further Actions:
Segregation of order endpoints from transaction services to be better manage resources.
Investigating alternative caching solutions for further improvement.