DDOS incidents November 2023 Postmortem

Contents

Dear Infinity Hash Community,

We would like to provide a transparent overview of the recent events regarding multiple DDOS attacks on our services that occurred on multiple days over the last two weeks. And quickly inform you about our new pages and services we have launched to increase uptime transparency:

We added a status overview page to our blog
We added a detailed service status page for all Infinity Hash services to allow users to quickly check if there is a current problem

Summary

All websites, including the blog, landing pages, home page and dashboard, were inaccessible due to DDOS attacks, affecting customer access. Notably, BTCPay, our payout systems, mining pool, miners and backend remained unaffected. The outage lasted approximately two weeks, with a total downtime of 15-20%. The response involved our development team, sysadmins, and hosting provider. We resolved the issue by scaling server capacity, implementing server-side DDOS protection, optimizing system efficiency, and implementing additional protection measures.

Root Cause Analysis

The root cause was inadequate preparation against large-scale DDOS attacks on landing pages and the dashboard. Limited server protection and multiple services hosted on the same server contributed to the problem. DDOS attacks are common and can occur randomly, for ransom, or from competitors.

Steps Taken and Timeline

Initial detection through server access logs: First, we checked server access logs to find the cause of our downtime. We noticed unusual traffic and concluded that there was a DDOS attack coming from a small number of external IP addresses (about 100 new addresses per minute with about 50 to 150 connection attempts per second for each address).
Migration of each service to dedicated servers and a 10x performance increase for the dashboard: Since most of the services were hosted on only 2 smaller servers at the time, we decided to migrate each service to its own server and significantly scale the performance of the dashboard server by about 10x.
Worked with hosting providers; terminated contract with uncooperative hoster and domain transfer: We also contacted both of our hosting providers directly for more information and assistance in dealing with the attack, as the hosting plan we were using at the time included DDOS protection that clearly didn’t work for the current attack. We received fast and helpful support from one hoster (” hoster A”) and extremely slow and non-ideal support from the other hoster (” hoster B”). As a result, we decided to cancel our contract with hoster B, migrated all systems to hoster A, and initiated a domain transfer from hoster B to hoster A.
Implement server-side protection and logging: After that, we had a few days of uninterrupted service before the next DDOS attack (about twice the size of the first one) started. This attack mainly targeted the landing pages and the dashboard, making both unavailable. After assessing the scale of the second attack, we implemented additional server-side protections that brought the dashboard and landing pages back online. We then implemented more extensive logging and protection measures on all services (including the blog and api servers that weren’t affected by the second attack), and discussed implementing an external DDOS protection service, which we decided against due to privacy implications and a worse user experience for users who prefer to browse anonymously using Tor or VPNs.
Responded to subsequent attacks, including changing server software from Apache to nginx: After that, we had about two days of uninterrupted service before the third and largest DDOS attack hit our landing pages and dashboard (about 300 new inbound addresses per minute with about 100 new connection attempts per address per second and more sophisticated attack techniques specifically tailored to the server software we were running). Since our servers were not able to handle the significantly higher load, even with protections triggered, and our list of banned IP addresses growing by the second, we decided to upgrade to more efficient server software and implement a final layer of protection against optimized keep-alive attacks (mainly switching from Apache to nginx).
Activated fallback server: We additionally made the dashboard available on a fallback server so that customers could access their accounts through our parent company’s (MEATEC) servers.
Full recovery achieved by November 26, 2023: All services and systems are back up and running as of Saturday, November 26, 2023, and appear to be handling the ongoing DDOS attempts with ease.
Decision to add an external DDOS protection service: Since many customers were negatively affected by the unavailability of the dashboard and affiliate partners could not use the landing pages, we decided to also add an external DDOS protection service that is already prepared and will be activated as soon as host B is done transferring our domains.

Learnings and Next Steps

What went well:

Isolation of backend and critical systems ensured uninterrupted reward generation: We were really happy that our backend, where all the rewards are calculated, as well as our mining farm and payout systems, were not connected to the public internet in any publicly accessible way and ran on completely separate systems. This way, the actual reward generation ( Bitcoin mining ) as well as the functioning of the user accounts (receiving rewards, auto-compounding, reward splitting, wallet balances, etc.) were not affected by the attacks in any way. In other words: Funds are SAFU and no profits were missed).
Service migration to dedicated servers improved reliability and response times: Moving each service to its own server worked really well to provide better reliability, reduce response times, and make detecting and resolving attacks and general errors easier and faster.
In-house development facilitated immediate response: Having our own in-house development team has proven to be valuable because we can respond to problems immediately, rather than having to wait for support from a third party or hosting provider.
Moving to nginx and implementing DDOS protection were effective: Switching from Apache to nginx and implementing DDOS protection also proved effective and will help us keep everything online and stable as an additional measure to the new external DDOS protection service that will be activated next week.

What didn’t go well:

Underestimating the probability and magnitude of DDOS attack: We underestimated the likelihood and magnitude of a potential DDOS attack on a still small project and didn’t focus development resources on preventative measures from the very beginning.
Prolonged downtime due to various challenges: The downtime caused by the DDOS was far too long due to multiple problems occurring at the same time (problems with the second hosting provider and multiple domain transfers in addition to the attack).

Preventive measures:

Migration to a single, reliable hosting provider.
Each critical service is now hosted on dedicated servers.
We added server-level DDOS protection for each service.
10x increase in server hardware performance.
Switched from Apache to nginx for improved performance.
Introduced fallback servers.

Next week’s improvements:

Adding a proxy system for traffic filtering: We will add an additional proxy system to filter incoming traffic before it hits the actual servers (WAF and dedicated firewall).
Integration of an external DDOS protection service: We will add an external DDOS protection service for all critical services.

Thank You!

We sincerely thank our clients and the community for their unwavering trust and support.

DDOS incidents November 2023 Postmortem

Summary

Root Cause Analysis

Steps Taken and Timeline

Learnings and Next Steps

What went well:

What didn’t go well:

Preventive measures:

Next week’s improvements:

Thank You!

Domain Change Update 4. March 2024

Kaynaklar

Şeffaflık

Summary

Root Cause Analysis

Steps Taken and Timeline

Learnings and Next Steps

What went well:

What didn’t go well:

Preventive measures:

Next week’s improvements:

Thank You!

Похожие записи

Kaynaklar

Şeffaflık

Social Media