At an early point in my career I found myself on a project that was at a scale beyond anything I had ever heard of. It was a rewrite of an in house ERP system consisting of a number of async background processes, for which I was tasked with implementing. Enthusiastic, optimistic, enegetic, I was more than willing to take on the task.
From a technical implementation standpoint, not necessarily a difficult task, but the level of soft complexities turned out to be more than I bargained for.
I worked closely with business resources to develop well defined flowcharts to document the processes. The resulting product were initially intimidating, with a few routines ending 30 to 100+ page sets of logic, all driven by edge cases. Handed a set of well documented logic, I proceeded forward under tight deadlines, breaking down functions into small pieces, unit testing each to the extent possible.
Despite ‘a for effort‘, this did not go as smoothly as planned. There were a few mistakes that were made, and I have no problem admitting them, as really nobody saw this comming.
Performance became an issue immediately as real world data was introduced. In general everything had functioned under acceptable circumstances, however where there were a few edge cases where longer than expected data sets were sent into an algorithm, which created issues. ORM technologies are a huge convenience when it comes to working with data, however given the nature of the operations, I ended up migrating back into raw SQL for data interactions.
I really should have pushed for more realistic data set earlier on. This continues to bite me everytime I don‘t speak up for it upfront. The dealbreaker however was the fact that the occasional use case (worst case scenarion) really exposed the true performance issue.
Thoughts here, get as close to production data in as soon as possible, fight for it if you need to. Ignore pushback on team members explaining how this ‘costs money‘ to do this in the cloud, especially as it is an non production effort during the development phase. This is a cost of doing business, and instances can be powered down when not in use. I remember one particular instance where I requested this, my request was casually denied, and then shortly before go live for ‘last minute load testing‘ had spent a few weekends babysitting and tuning operations when the data I had originally requested became available. Long story short, drive that conversation of early on data sets (if available), you might find yourself making more progress than might be expected.
Another pain point would be visibility into ‘what the thing you wrote‘ just did with a transaction. Your output is only as good as your input, and that is subject to human error.
When you are asked to create something, it is typically be resources focused on building something to meet a need. We do what we are asked (that kinda is out job), however an overlooked area is monitoring, logging, troubleshooting. When implementing large sets of logic, you are essentially building a black box. In is paramount that discussions occur about how to add visibility into what is happening under the hood in order to streamline the troubleshooting process.
For one particular routine, after go live I realized that apprimately 30% of my time would be dedicated to researching transactions and reporting back to the business end what had happened. This was tedious and time consuming. The percentages of the root causes were split between bad data entry, changing business processes, or just plain validating (in most cases) everything went well. I have to admit, not a bad paying gig to put bandaids on a leaky dam, but we can all strive to do better than that.
I was able to carve out time to revisit this to build in auditing, and expose on demand auditing to the analyst and business roles, minimalizing my day to day support tasks. Bringing the correct level of visibility to the right stake holders is critical, and should be part of the up front discussion, going hand in hand with the original ask.
Again, as an engineer, tasked with ‘building something‘, but a few cycles into this and drive auditability into you design.
The large the black box you are building is, the harder it is to test. Refer back to the above point on auditability. I had once had exposure to a rewrite on a system where there was a similar level were implemented in the system we were trying to replace via stored procedures, scary. I know this is a hot topic (stored procedures), there are some very clear use cases where they shine, however effective, automated unit testing falls short, and debugability is an issue as well.
Enforce testing at an unit, granular level, whenever possible. Build a solid foundation, mocking data whenever possible to account for all ‘known‘ use cases.