小能豆

Complexity of string += operation

py

I am kinda confused. I was sure that due do immutability of str operation like s += "abc" need to copy whole s content to another place in memory so effectively adding character to very long string will consume much time.

I wrote snippet to prove my theory:

import timeit

def foo(i):
    res = ""
    for _ in range(i):
        res += "a"  
    return res


def foo2(i):
    res = []
    for _ in range(i):
        res.append("a")
    return "".join(res)


iterations = 100000
print(timeit.timeit('foo(iterations)', globals=globals(), number=100))
print(timeit.timeit('foo2(iterations)', globals=globals(), number=100))

However it looks like

  1. foo execution time grows linearly based on iterations
  2. foo2 is like just two times faster than foo

I tried to inspect bytecode searching for some hidden optimizations. I tried to change constant string to randomized one with proper length to deny interning as well but couldn’t find any explanation of that behaviour.

Was I wrong then? Does += depend on string length or not? If so, how can I prove that?


阅读 68

收藏
2023-12-15

共1个答案

小能豆

You’re correct in your understanding of the immutability of strings in Python, and the += operation on strings can indeed lead to inefficient code due to the creation of new string objects. However, there are some aspects that could explain the behavior you’re observing.

In CPython (the reference implementation of Python), strings are implemented as arrays of characters, and when you use += on a string, a new string is created with the concatenated content. This involves creating a new array, copying the existing characters, and adding the new characters. The time complexity of this operation is O(n), where n is the length of the resulting string.

The reason your foo2 (using a list and join) is not significantly faster than foo could be due to the overhead of creating the list and then joining it back into a string. While the append operation for a list has an average time complexity of O(1), the actual cost of memory allocation and list resizing can contribute to the overall time.

However, for microbenchmarks like this, the specific implementation details, compiler optimizations, and other factors can influence the results.

To prove the inefficiency of += on strings more explicitly, you might want to consider using a profiler or more specialized tools designed for analyzing memory usage and object creation in Python code. Tools like memory_profiler or line_profiler can provide insights into how memory is allocated and deallocated during the execution of your functions.

In summary, while += on strings can lead to inefficient code due to the creation of new string objects, the specific results you observe may be influenced by various factors, and for more detailed analysis, profiling tools can be helpful.

2023-12-15