-
-
Save Dragon1573/4a9cd8ff48e5727f568438165d3a112a to your computer and use it in GitHub Desktop.
from concurrent.futures import ThreadPoolExecutor | |
from functools import wraps | |
import numba | |
def calculate_pi(start: int, end: int) -> float: | |
""" | |
计算π | |
使用公式 π = 4(1 − 1/3 + 1/5 − 1/7 + 1/9 - 1/11 + ...) | |
:param start: 起始项编号 | |
:type start: int | |
:param end: 截断项编号 | |
:type end: int | |
:return: 圆周率值 | |
:rtype: float | |
""" | |
result = 0.0 | |
positive = True | |
for i in range(start, end): | |
tmp = 1.0 / float(i * 2 + 1) | |
if positive: | |
result += tmp | |
else: | |
result -= tmp | |
positive = not positive | |
return result * 4.0 | |
calculate_pi_jit = numba.jit()(calculate_pi) | |
calculate_pi_jit_nogil = numba.jit(nogil=True)(calculate_pi) | |
def parallelize(thread_cnt: int): | |
def decorator(func): | |
@wraps(func) | |
def wrapper(start: int, end: int) -> float: | |
step = (end - start) // thread_cnt | |
futures = [] | |
with ThreadPoolExecutor() as executor: | |
for i in range(start, end, step): | |
futures.append(executor.submit(func, i, i + step)) | |
result = 0.0 | |
for future in futures: | |
result += future.result() | |
return result | |
return wrapper | |
return decorator |
from main import calculate_pi, calculate_pi_jit, calculate_pi_jit_nogil, parallelize | |
TOTAL_ITERATIONS = 100_000_000 | |
def test_calculate_pi(benchmark) -> None: | |
benchmark(calculate_pi, 0, TOTAL_ITERATIONS) | |
def test_calculate_pi_jit(benchmark) -> None: | |
benchmark(calculate_pi_jit, 0, TOTAL_ITERATIONS) | |
def test_calculate_pi_jit_nogil(benchmark) -> None: | |
benchmark(calculate_pi_jit_nogil, 0, TOTAL_ITERATIONS) | |
def test_multithread_calculate_pi_jit(benchmark) -> None: | |
benchmark(parallelize(4)(calculate_pi_jit), 0, TOTAL_ITERATIONS) | |
def test_multithread_calculate_pi_jit_nogil(benchmark) -> None: | |
benchmark(parallelize(4)(calculate_pi_jit_nogil), 0, TOTAL_ITERATIONS) |
Important
Following contents were generated by Gemini 2.0 Flash.
English version
This pytest output shows the results of running five benchmarked tests related to calculating Pi, likely with different optimization techniques (JIT compilation, multithreading, and potentially disabling the Global Interpreter Lock - GIL). Let's break down the analysis:
Test Summary:
- 5 tests passed: All tests completed successfully without errors.
- Total runtime: 4.15 seconds: This is the total time taken to execute all tests, including setup, teardown, and the benchmark iterations.
- Benchmark results: The core of the output is the benchmark table, which provides detailed timing information for each test.
Benchmark Table Analysis:
The table shows the following metrics for each test:
- Name: The name of the test function. These names give clues about the optimization techniques used.
- Min, Max, Mean: The minimum, maximum, and average execution time (in microseconds) across all benchmark iterations.
- StdDev: The standard deviation of the execution times, indicating the variability of the measurements. A lower standard deviation is better, showing more consistent performance.
- Median: The middle value of the execution times. This is less sensitive to outliers than the mean.
- IQR: The interquartile range, the difference between the 75th and 25th percentiles. This is another measure of variability, less sensitive to outliers.
- Outliers: The number of outliers detected in the measurements. Outliers are data points that are significantly different from other values.
- OPS: Operations Per Second. This is calculated as 1 / Mean, representing how many times the test can be executed per second. Higher is better.
- Rounds: The number of benchmark rounds performed.
- Iterations: The total number of iterations across all rounds.
Key Observations and Comparisons:
-
test_calculate_pi
(Baseline): This is likely the most basic implementation, serving as a baseline for comparison. It's significantly slower (around 60 milliseconds) than the other tests, highlighting the impact of optimizations. -
test_calculate_pi_jit
andtest_calculate_pi_jit_nogil
: These tests likely use Just-In-Time (JIT) compilation._nogil
suggests an attempt to release the Global Interpreter Lock (GIL) for potential performance gains. Both JIT versions are dramatically faster than the baseline (around 730 microseconds). The difference betweenjit
andjit_nogil
is minimal, suggesting that for this particular calculation, releasing the GIL doesn't provide a substantial benefit. -
test_multithread_calculate_pi_jit
andtest_multithread_calculate_pi_jit_nogil
: These tests introduce multithreading in addition to JIT compilation.test_multithread_calculate_pi_jit_nogil
is the fastest (around 490 microseconds), showing that combining multithreading and JIT compilation provides the best performance. Multithreading with JIT but without GIL release is slower (around 1000 microseconds), indicating that the GIL is a bottleneck for multithreaded Pi calculation in CPython.
Summary:
The benchmark results clearly demonstrate the performance benefits of JIT compilation and multithreading for calculating Pi in Python. Disabling the GIL appears to provide a further, though smaller, improvement in the multithreaded case. The baseline implementation without optimizations is significantly slower. The tests effectively compare different optimization strategies and quantify their impact on performance.
简体中文版本
这段 pytest 输出显示了五个与计算 Pi 相关的基准测试结果,可能使用了不同的优化技术(JIT 编译、多线程,以及可能禁用了全局解释器锁 - GIL)。让我们来分析一下:
测试总结:
- 5 个测试通过: 所有测试都成功完成,没有错误。
- 总运行时间:4.15 秒: 这是执行所有测试(包括设置、清理和基准测试迭代)所花费的总时间。
- 基准测试结果: 输出的核心是基准测试表,它提供了每个测试的详细计时信息。
基准测试表分析:
该表显示了每个测试的以下指标:
- 名称: 测试函数的名称。这些名称提供了关于所使用的优化技术的线索。
- 最小、最大、平均值: 在所有基准测试迭代中,执行时间的最小值、最大值和平均值(以微秒为单位)。
- 标准差: 执行时间的标准差,表示测量的变异性。标准差越低越好,表明性能越一致。
- 中位数: 执行时间的中间值。它比平均值对异常值不敏感。
- IQR: 四分位距,第 75 百分位数和第 25 百分位数之间的差。这是另一个衡量变异性的指标,对异常值不敏感。
- 异常值: 检测到的测量值中的异常值数量。异常值是与其他值明显不同的数据点。
- OPS: 每秒操作数。它的计算方法是 1 / 平均值,表示每秒可以执行测试多少次。越高越好。
- 轮数: 执行的基准测试轮数。
- 迭代次数: 所有轮数的总迭代次数。
主要观察结果和比较:
-
test_calculate_pi
(基线): 这可能是最基本的实现,作为比较的基线。它明显比其他测试慢(大约 60 毫秒),突出了优化的影响。 -
test_calculate_pi_jit
和test_calculate_pi_jit_nogil
: 这些测试可能使用了即时 (JIT) 编译。_nogil
表明尝试释放全局解释器锁 (GIL) 以获得潜在的性能提升。两个 JIT 版本都比基线快得多(大约 730 微秒)。jit
和jit_nogil
之间的差异很小,表明对于此特定计算,释放 GIL 没有提供实质性的好处。 -
test_multithread_calculate_pi_jit
和test_multithread_calculate_pi_jit_nogil
: 这些测试除了 JIT 编译之外,还引入了多线程。test_multithread_calculate_pi_jit_nogil
是最快的(大约 490 微秒),表明结合使用多线程和 JIT 编译提供了最佳性能。使用 JIT 但不释放 GIL 的多线程速度较慢(大约 1000 微秒),表明 GIL 是 CPython 中多线程 Pi 计算的瓶颈。
总结:
基准测试结果清楚地表明了 JIT 编译和多线程对于在 Python 中计算 Pi 的性能优势。禁用 GIL 似乎在多线程情况下提供了进一步的(尽管较小的)改进。没有优化的基线实现明显较慢。这些测试有效地比较了不同的优化策略,并量化了它们对性能的影响。
Test environment
ℹ️ An Intel Core i7-14700KF installed.
Test result