Alec Chapman
·Community Lead at Bromcom

Root Cause Analysis (RCA) Incident: Morning Registration Performance Degradation (HTTP 429 Errors)

Date: 3–10 March 2026

Resolved: 10 March (stabilisation), 16 March (full correction)

1. Executive Summary

Between 3 and 10 March 2026, customers experienced degraded performance during peak

morning registration periods, including slow response times, intermittent unavailability, and

“Too Many Requests” (HTTP 429) messages.

The issue was limited to peak usage windows and was fully stabilised following mitigation on 10

March, with further improvements applied on 16 March.

No data loss or security impact occurred.

2. Timeline (Summary)

• 3–10 Mar: Daily performance degradation during peak registration window

• 10 Mar (evening): Mitigation applied to reduce system load

• 11 Mar onwards: Performance stabilised

• 16 Mar: Additional corrections applied to improve resilience

3. Root Cause

Primary Cause (System Behaviour Under Load)

A combination of system configuration and application behaviour reduced effective processing

capacity during peak demand.

Technical Factors

• Imbalance in how concurrent processing tasks were handled reduced throughput under

load

• Increased request contention during peak activity led to request queuing and rejection

(HTTP 429)

• Repeated user retries amplified load, further impacting performance

4. Contributing Factors

• Peak usage pattern with high levels of concurrent activity

• Request retry behaviour increasing system load under stress

• Opportunities to optimise request handling and traffic distribution during peak periods

5. Impact Assessment

Systems Affected

• Bromcom MIS platform (registration workflows)

User Impact

• Slow performance and intermittent unavailability during peak morning periods

• Delays in completing registration activities

Data Impact

• No data loss, corruption, or unauthorised access

Risk Assessment

• Operational disruption only

• No information security or data integrity risk

Overall Risk: Low

6. Resolution

• Reduced unnecessary system requests through improved caching

• Adjusted request handling behaviour to improve performance under load

• Performance returned to expected levels following mitigation

7. Preventative Actions

• Optimisation of system behaviour under high concurrency

• Improved handling of repeated requests and retry patterns

• Additional traffic management controls to protect peak usage

• Ongoing improvements to resilience under varying load conditions

8. Conclusion

This incident was caused by a combination of system behaviour and configuration under peak

demand conditions, which reduced effective processing capacity.

Mitigations have restored stable performance, and further improvements are being

implemented to enhance resilience during high-demand periods

3 replies