Lately, I have had plenty of conversations about complex legacy systems. Terms like the monolith, spaghetti ball, tightly coupled, single point of failure, or modernising systems only end up as a distributed monolith.

I had my fair share of hair-pulling working with complex legacy systems and, at the same time, that exhilarating feeling when you fix an MSO or get something going to enhance it. I learned so much from deep diving into stored procedures with business logic, various API and messaging components, batch jobs, task queues, a massive variation of helper classes and error handling utilities, half-ASP and half-MVC UI layer, data mappers, and so on. All these are woven into a single codebase to make it more fun, because why not? πŸ˜›

The Complexity of Monolithic Applications

Monolithic applications are notorious for their intricate dependencies. These dependencies make it incredibly challenging to test and automate processes effectively. One small change can have ripple effects, breaking functionalities in unexpected areas. This complexity often leads to significant difficulties in maintaining high-quality standards and ensuring security, especially as the application ages.

With this, I learned about the importance of penetration testing, reverse engineering, challenging the status quo, and the journey of modernising for scale.

β€œOne of the big dangers is to pretend that you can follow a predictable process when you can’t.”

martin fowler

Lessons from the SWAT Team

One of the first teams I was part of was the SWAT team, comprised of two full-time folks and a .5 SME in a consulting capacity. We effectively took on all the “too hard basket” issues and every MSO/Critical incident raised by our customer. Here are some of the things I learned along the way.

Reverse Engineering
Imagine you have a toy robot, and you want to know how it moves and makes sounds. To do this, you carefully open it up, look at all the parts inside, and figure out how they fit together. Similarly, reverse engineering in software allows you to understand the design, components, and function. Using this technique helped us analyse the source code and map dependencies and workflows, while behavioural analysis using logging and monitoring tools revealed system performance and unexpected behaviours.

Decompiling
What do you do when subject matter experts have left the company or the source code is nowhere to be found? On a few occasions, we had to decompile a DLL (Dynamic Link Library) to understand how it works due to issues like not having the original source code, debugging an issue, or needing to troubleshoot and analyse security issues. Decompiling tries to reconstruct the source code from the compiled code using specialised tools. In our case, we used JetBrains dotPeek and ILSpy. Fair warning: this isn’t always a bulletproof way to produce the exact source code, but it’s the closest we have to recreating lost source code.

Unit Tests
Legacy systems barely have unit test coverage, let alone integration tests. There are plenty of blogs about unit testing; one of my favourites is the Test Pyramid by Martin Fowler . Adding unit tests with each incremental refactor and integration at different layers ensured changes didn’t introduce new issues and helped us understand the system’s behaviour.

Scaling, Scaling, Scaling

The journey of scaling a legacy system didn’t come without its own challenges. An example is when we were tasked with an application with XSLT transformation needed for integration into a global distribution system (GDS). We would interrogate and have data returned for messaging through our system to show to the customer. The tricky part was that we didn’t know how much load it could take on before the application fell over, which made it hard to scale for load. This challenge took us down the path of performance testing to help us collect data and use it to provide focus areas to allow us to scale the application.

Performance Testing

Performance testing ensures that the application can handle the expected load and continues to operate efficiently. I find this exercise very exciting, mainly because I learned how to design tests and analyze historical data based on customer behavior. Working alongside a software engineer, we incrementally introduced enhancements to address the areas that slowed the application down.

Mimicking Production Behaviour

  • Load Testing: Once the baseline was set, I designed the load test using JMeter to mimic production behavior. I also introduced spike tests throughout the execution of the load test to simulate rapid spikes in traffic and ensure the system can handle sudden bursts of activity without crashing. The test results provided insights into where the bottlenecks were.
  • Baselining: Establishing performance baselines helped in planning and implementing modernisation efforts. Data was ingested from various sources using the ELK stack (Elastic, Logstash, Kibana). We indexed and searched data quickly and created graphs and dashboards to visualise data. This enabled us to understand current performance levels, set realistic goals, and measure improvements.
  • Capacity Testing: Over time, the team and I made incremental improvements to provide a good customer experience. Identifying capacity limits with varying loads and measuring performance thresholds allowed us to establish the maximum amount of traffic the system could handle while maintaining acceptable performance levels. This understanding also helped us create plans for responding to growth, such as preparing automated scripts to create virtual machines to scale out as needed.

Security in Legacy Applications

We also had a good amount of security vulnerabilities that had to be addressed, from outdated libraries or frameworks to cross-site scripting to configuration issues to insufficient logging and monitoring to detect security threats.

Penetration Testing
While an external third-party company conducted the actual execution of the penetration tests, the team and I predominantly remediated the security vulnerabilities prioritised based on their criticality. Throughout this exercise, I learned about the following:

  • Cross-site scripting (XSS) enables attackers to inject malicious scripts into web pages. This taught me the importance of input validation, sanitisation, and escaping data in an HTML context.
  • Cross-site request forgery (CSRF) tricks users into performing actions on a web application in which they are authenticated. Referrer header checks and same-site cookies come in handy in this instance (of course, there are plenty more ways to prevent CSRF; please refer to the OWASP site for more details).
  • SQL injection allows attackers to manipulate SQL queries, leading to data manipulation or unauthorised access. While using stored procedures reduced the risk of injection, validating and sanitising user inputs to conform to the expected format was another important precaution.
  • Outdated libraries and frameworks: For components that relied heavily on obsolete libraries and frameworks, we either refactored or rewrote parts of the application to use modern alternatives.

Due to dependencies and interconnections in legacy systems (remember the spaghetti ball?), regular penetration testing ensures that these interdependencies and new features do not introduce security risks and help maintain the system’s integrity amidst evolving threats.

Wrapping Up

Monolithic applications are complex and hard to maintain, but working with them has been a rewarding journey filled with valuable lessons. I learned techniques like reverse engineering, decompiling, and writing different types of tests, as well as the necessity of robust performance and security testing. These experiences have enhanced my technical skills and highlighted the significance of continuous improvement and modernisation. Embracing these challenges and opportunities has been instrumental in my growth as a leader, equipping me with the ability to tackle the complexities of both legacy and modern systems.