小能豆

Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

javascript

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1_000_000_000_000_000 in range(1_000_000_000_000_001)

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

I have also tried things like this, but the calculation is still almost instant:

# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)

If I try to implement my own range function, the result is not so nice!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

What is the range() object doing under the hood that makes it so fast?


Martijn Pieters’s answer was chosen for its completeness, but also see abarnert’s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert’s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.


阅读 52

收藏
2023-12-25

共1个答案

小能豆

The key to the efficiency of the range() object in Python 3 lies in the fact that it is not a generator, but rather a full-fledged sequence object. In Python 3, range() is implemented in C and optimized for performance.

When you create a range object, it doesn’t generate all the values immediately. Instead, it represents a mathematical sequence of numbers. The start, stop, and step values are stored, and the values are generated on the fly when needed.

In your example:

1_000_000_000_000_000 in range(1_000_000_000_000_001)

Python is able to recognize that the number 1 trillion falls within the specified range without actually generating all those trillion numbers. It can compute this efficiently using simple mathematical operations.

Similarly, in your second example:

1_000_000_000_000_000_000_000 in range(0, 1_000_000_000_000_000_000_001, 10)

Python understands the step value and can quickly determine whether the target number is within the range without generating all intermediate values.

Now, let’s contrast this with your custom my_crappy_range function, which is implemented as a generator. Each value is generated one at a time, and checking membership involves actually generating each value until the target value is reached or surpassed. This makes it much less efficient for large ranges.

The C implementation of range() is optimized for performance, and its efficiency doesn’t degrade significantly with the size of the range. It’s designed to handle large ranges efficiently by exploiting the mathematical properties of arithmetic progressions. This is why you observe consistent and fast performance, even with large numbers.

2023-12-25