Why ONNX Runtime runs 2–3x slower in C++ than Python?

Kamran Ahmed Khan
2 min readJan 26, 2023

ONNX Runtime

There are a few reasons why ONNX Runtime might run slower in C++ than in Python:

Interpreter overhead:

Python is an interpreted language, which means that the source code is translated into machine code at runtime. This can add some overhead compared to compiled languages such as C++. For example, consider the following Python code that calculates the factorial of a number:

def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)

print(factorial(5)) # 120

This code will run fine in Python, but when executed, the interpreter needs to translate the code into machine code, which can take time. In contrast, a C++ version of this code would be compiled ahead of time, so the overhead of the interpreter is avoided.

Memory management:

Python has a built-in garbage collector that automatically manages memory for the developer. This means that when a variable is no longer in use, the memory it occupies is freed up automatically. However, this can add some overhead to the runtime of the program. For example, the following Python code creates a large list of integers and assigns it to the variable a. Once the variable a goes out of scope, the garbage collector will automatically free up the memory used by the list:

a = [i for i in range(1000000)]

In contrast, C++ requires manual memory management, which means the developer needs to explicitly allocate and deallocate memory. This can lead to more efficient memory usage, but it also means that the developer needs to be more careful to avoid memory leaks and other issues.

Different implementations of the ONNX Runtime library:

The ONNX Runtime library for C++ and Python are implemented differently, and the C++ version may not be as optimized as the Python version. For example, the C++ version may not take advantage of certain features of the Python language or the Python runtime that could speed up execution.

Third-party libraries:

Python has a larger ecosystem of third-party libraries and frameworks that are optimized for machine learning, such as NumPy and TensorFlow, which may be faster than the equivalent C++ libraries. For example, NumPy is a library that provides high-performance array operations in Python, which can be faster than manually performing the same operations in C++.

Key Note:

It’s worth noting that performance can vary depending on the specific hardware and operating system you’re using, as well as the specific model and dataset you’re running. Therefore, it’s always a good idea to benchmark your code using representative data and to profile it to identify any performance bottlenecks.

Sign up to discover human stories that deepen your understanding of the world.

Kamran Ahmed Khan
Kamran Ahmed Khan

Written by Kamran Ahmed Khan

Software Engineer, Tech enthusiast and Devops Engineer

No responses yet

Write a response