Code bloat and Performance
Best Line of code written is … the one which is not written.
Computers in various forms and shapes are increasingly touching everyday lives. Various applications and software solutions to support these are churned out every day, resulting in a huge amount of code.
In such applications, performance is an un-written, but basic need. If it’s not given due attention in all stages of development, application efficiency and user experience are adversely affected. Performance problems are very costly to diagnose & resolve at later stages in the project life cycle and the infamous “add hardware” approach is not a viable solution, every time!
Performance needs to be built into the system from early stages; it needs to be architected-designed-programmed-assessed-refined for identified performance SLAs.
This document talks about the performance impact due to inefficiencies in programming, particularly those due to the code bloat, and describes the techniques to avoid code bloat.
Code Bloat – What is it, what leads to it?
Code bloat is the code that is not required. It creeps in as a result of inefficiencies in programming to achieve a certain function. It can slow down the system, make it too big or waste resources or all of the above. It is inherently un-optimized and inefficient.
Code bloat can lead to a humongous amount of code that is not only inefficient to maintain, but one that also defeats the concept of locality of code and data, a concept that most of the modern processors leverage to speed up overall processing and data access. Under the locality framework, currently executing code and a good amount of data is maintained in the processor’s L1 cache, resulting in big wins in terms of performance.
Code bloat can cause the code to jump around, forcing code and data in and out of the cache; causing inefficiencies. Here, we are strictly focusing on code bloat due to programming inefficiencies, NOT due to compilers or code generators generating inefficient code.
Programming Code bloat can be caused by a few of the following factors:
- Inappropriate encapsulation of solutions to partial problems and thus allowing for their re-use, resulting in code duplication
- Overuse of object oriented constructs like classes and inheritance can lead to messy and confusing designs, often using many more lines of code than an optimal solution.
- Developers may attempt to “force” design patterns as solutions to problems that do not need them.
- Excessive loop unrolling without justification through improved performance.
- Excessive use of multiple conditional If statements – instead of, for instance, using a lookup table.
Programming for Performance
Meeting performance SLAs cannot be an afterthought, nor can it be achieved without a sustained effort across all phases of system development. It has to be ingrained within the development flow, including your performance testing. All stages need equal diligence towards achieving desired performance numbers.
Performance engineering is NOT rocket science, but is about knowing, accepting and religiously following basic practices to ensure that we extract every permissible single drop from the resources.
Best Line of code written is, the one which is not written.
The ultimate goal should be to strive to achieve the desired functionality in lesser lines of code. This reduces code bloat, defects, testing effort, and in turn greatly helps leverage the modern processors’ ability to cache the executing code and mostly accessed data. Few cycles saved, few seconds shaved from a small operation can give us massive returns when these operations are being run millions of times as a part of millions requests on a Service in cloud.
Programming is more problem solving than actual coding. It does not just happen; a lot of smart but dogged effort has to go into making it Correct, Performant and Maintainable.
Reviews – Peer code reviews by experts, which bring in human intelligence to reason the refactoring, optimization, and possible inefficiencies, etc.
Profiling – Use of static and dynamic code profiling and binary profiling tools to identify the code usage statistics, to identify hot-spots for optimization based on high usage, high resource utilization.
Performance Critical Code – Identify and keep your performance critical sections as small as possible, so that they load once in the L1 processor cache and execute multiple times. This is the simple key to optimizing performance.
Re-factoring commonly used code to create efficient sub-routines, which can reduce code bloat, and increase locality
Right Usage of Object Oriented concepts to create self-contained objects with encapsulation for code and data, and interfaces to access it. Again, this goes a long way in increasing locality, and reducing code bloat.
Contiguous Memory allocation, can go long way in improving locality and hence performance. Use simple data structures, where possible for contiguous memory allocations.
KIS (Keep It Simple) – Consider Data in your design, optimize for data first rather than your code. Artificial abstractions can add latencies related to object creation and deletion.
Design for Specifics, Not Generics. A specific design allows to cut down on code that will never be used.
The above usually leads to code, which is simple to understand, easy to maintain and re-use. Solid and performant code is an integral part of every deliverable.
Again, agreeably, doing the above is NOT Rocket science, but it is an ardent ritual which still needs to be accepted, acknowledged and practiced day in day out, project after project to churn out software which performs better, utilizes optimal resources, and is designed to scale.