gh-130167: Improve speed of `ftplib.parse150` by replacing `re` #130243

donBarbos · 2025-02-18T01:21:33Z

I also suggest adding tests for this function because they are missing in the Lib/test/test_ftplib.py file.
I wrote tests (would be in next comment) and benchmarks (results here)

`parse150_bench.py`:

from Lib.ftplib import error_reply
import timeit

# Regex version
_150_re = None
def parse150_re(resp):
    if resp[:3] != "150":
        raise error_reply(resp)
    global _150_re
    if _150_re is None:
        import re
        _150_re = re.compile(r"150 .* \((\d+) bytes\)", re.IGNORECASE | re.ASCII)
    m = _150_re.match(resp)
    if not m:
        return None
    return int(m.group(1))

# No regex version
def parse150(resp):
    if not resp.startswith("150"):
        raise error_reply(resp)

    start = resp.find("(")
    end = resp.find(" bytes)")

    if start == -1 or end == -1 or start >= end:
        return None

    try:
        return int(resp[start + 1 : end])
    except ValueError:
        return None

test_cases = [
    "150 Opening BINARY mode data connection (4096 bytes)",
    "150 Transfer starting (32768 bytes)",
    "150 Data connection accepted (131072 bytes)",
    "150 Some random message without bytes",
    "150 Large file transfer (987654321 bytes)",
    "150 Small file (1 bytes)",
    "150 Transfer starting (99999999 bytes)",
    "150 Weird case (42 bytes) with extra text",
]

for case in test_cases:
    regex_time = timeit.timeit(lambda: parse150_re(case), number=1_000_000)
    no_regex_time = timeit.timeit(lambda: parse150(case), number=1_000_000)
    print(f"Test case: {case}")
    print(f"  Regex version: {regex_time:.6f} sec")
    print(f"  No regex version: {no_regex_time:.6f} sec")

Results for different cases (from x1.82 to x1.92 as fast):

$ ./python -B parse150_bench.py
Test case: 150 Opening BINARY mode data connection (4096 bytes)
  Regex version: 1.147499 sec
  No regex version: 0.630863 sec
Test case: 150 Transfer starting (32768 bytes)
  Regex version: 1.113087 sec
  No regex version: 0.610494 sec
Test case: 150 Data connection accepted (131072 bytes)
  Regex version: 1.140198 sec
  No regex version: 0.616784 sec
Test case: 150 Some random message without bytes
  Regex version: 0.643958 sec
  No regex version: 0.302508 sec
Test case: 150 Large file transfer (987654321 bytes)
  Regex version: 1.160474 sec
  No regex version: 0.605776 sec
Test case: 150 Small file (1 bytes)
  Regex version: 1.005945 sec
  No regex version: 0.522877 sec
Test case: 150 Transfer starting (99999999 bytes)
  Regex version: 1.147927 sec
  No regex version: 0.616777 sec
Test case: 150 Weird case (42 bytes) with extra text
  Regex version: 1.140925 sec
  No regex version: 0.592486 sec

Issue: Improve speed of stdlib functions by replacing re uses #130167

donBarbos · 2025-02-18T01:22:51Z

my tests passed completely. this is a continuation of the previous file:

import unittest

class TestParse150(unittest.TestCase):
    def test_valid_cases(self):
        resp = "150 Opening BINARY mode data connection (1024 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 1024)

        resp = "150 Opening BINARY mode data connection (2048 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 2048)

        resp = "150 Transfer starting (8192 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 8192)

        resp = "150 Data connection accepted (65536 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 65536)

        resp = "150 Data connection accepted (65536 BYTES)"
        self.assertEqual(parse150(resp), parse150_re(resp), 65536)

    def test_invalid_cases(self):
        resp = "150 Opening connection"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (not a number bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (1234 bytes"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (5678 bytes"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150  () bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "200 OK"
        self.assertRaises(error_reply, parse150, resp)
        self.assertRaises(error_reply, parse150_re, resp)

if __name__ == "__main__":
    unittest.main()

Lib/ftplib.py

eendebakpt · 2025-02-18T12:31:50Z

This improves the performance of parse150 for some cases. But is the performance a bottleneck? It might be performance of ftplib is limited by the actual transfers.

rhettinger · 2025-02-18T18:21:21Z

I'm going to decline this one. It is ancient, well-exercised, stable code that is not much used anymore. It really isn't worth the code churn or the risk that the string method approach isn't exactly equivalent (i.e. lower() vs casefold()).

donBarbos added 3 commits February 18, 2025 04:55

Improve speed of ftplib.parse150 by replacing re

4055dbe

Add blurb NEWS.d entry

043de73

Correct blurb entry file

8ad9c5e

bedevere-app bot added the awaiting review label Feb 18, 2025

bedevere-app bot mentioned this pull request Feb 18, 2025

Improve speed of stdlib functions by replacing re uses #130167

Open

eendebakpt reviewed Feb 18, 2025

View reviewed changes

Lib/ftplib.py Outdated Show resolved Hide resolved

Ignore case while finding

e5dd027

Update NEWS.d entry markup

db0a29a

rhettinger closed this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-130167: Improve speed of `ftplib.parse150` by replacing `re` #130243

gh-130167: Improve speed of `ftplib.parse150` by replacing `re` #130243

donBarbos commented Feb 18, 2025 •

edited

Loading

donBarbos commented Feb 18, 2025 •

edited

Loading

eendebakpt commented Feb 18, 2025

rhettinger commented Feb 18, 2025 •

edited

Loading

gh-130167: Improve speed of ftplib.parse150 by replacing re #130243

gh-130167: Improve speed of ftplib.parse150 by replacing re #130243

Conversation

donBarbos commented Feb 18, 2025 • edited Loading

parse150_bench.py:

Results for different cases (from x1.82 to x1.92 as fast):

donBarbos commented Feb 18, 2025 • edited Loading

eendebakpt commented Feb 18, 2025

rhettinger commented Feb 18, 2025 • edited Loading

gh-130167: Improve speed of `ftplib.parse150` by replacing `re` #130243

gh-130167: Improve speed of `ftplib.parse150` by replacing `re` #130243

donBarbos commented Feb 18, 2025 •

edited

Loading

`parse150_bench.py`:

donBarbos commented Feb 18, 2025 •

edited

Loading

rhettinger commented Feb 18, 2025 •

edited

Loading