Skip to content

gh-130167: Improve speed of ftplib.parse150 by replacing re #130243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

donBarbos
Copy link
Contributor

@donBarbos donBarbos commented Feb 18, 2025

I also suggest adding tests for this function because they are missing in the Lib/test/test_ftplib.py file.
I wrote tests (would be in next comment) and benchmarks (results here)

parse150_bench.py:

from Lib.ftplib import error_reply
import timeit

# Regex version
_150_re = None
def parse150_re(resp):
    if resp[:3] != "150":
        raise error_reply(resp)
    global _150_re
    if _150_re is None:
        import re
        _150_re = re.compile(r"150 .* \((\d+) bytes\)", re.IGNORECASE | re.ASCII)
    m = _150_re.match(resp)
    if not m:
        return None
    return int(m.group(1))

# No regex version
def parse150(resp):
    if not resp.startswith("150"):
        raise error_reply(resp)

    start = resp.find("(")
    end = resp.find(" bytes)")

    if start == -1 or end == -1 or start >= end:
        return None

    try:
        return int(resp[start + 1 : end])
    except ValueError:
        return None

test_cases = [
    "150 Opening BINARY mode data connection (4096 bytes)",
    "150 Transfer starting (32768 bytes)",
    "150 Data connection accepted (131072 bytes)",
    "150 Some random message without bytes",
    "150 Large file transfer (987654321 bytes)",
    "150 Small file (1 bytes)",
    "150 Transfer starting (99999999 bytes)",
    "150 Weird case (42 bytes) with extra text",
]

for case in test_cases:
    regex_time = timeit.timeit(lambda: parse150_re(case), number=1_000_000)
    no_regex_time = timeit.timeit(lambda: parse150(case), number=1_000_000)
    print(f"Test case: {case}")
    print(f"  Regex version: {regex_time:.6f} sec")
    print(f"  No regex version: {no_regex_time:.6f} sec")

Results for different cases (from x1.82 to x1.92 as fast):

$ ./python -B parse150_bench.py
Test case: 150 Opening BINARY mode data connection (4096 bytes)
  Regex version: 1.147499 sec
  No regex version: 0.630863 sec
Test case: 150 Transfer starting (32768 bytes)
  Regex version: 1.113087 sec
  No regex version: 0.610494 sec
Test case: 150 Data connection accepted (131072 bytes)
  Regex version: 1.140198 sec
  No regex version: 0.616784 sec
Test case: 150 Some random message without bytes
  Regex version: 0.643958 sec
  No regex version: 0.302508 sec
Test case: 150 Large file transfer (987654321 bytes)
  Regex version: 1.160474 sec
  No regex version: 0.605776 sec
Test case: 150 Small file (1 bytes)
  Regex version: 1.005945 sec
  No regex version: 0.522877 sec
Test case: 150 Transfer starting (99999999 bytes)
  Regex version: 1.147927 sec
  No regex version: 0.616777 sec
Test case: 150 Weird case (42 bytes) with extra text
  Regex version: 1.140925 sec
  No regex version: 0.592486 sec

@donBarbos
Copy link
Contributor Author

donBarbos commented Feb 18, 2025

my tests passed completely. this is a continuation of the previous file:

import unittest

class TestParse150(unittest.TestCase):
    def test_valid_cases(self):
        resp = "150 Opening BINARY mode data connection (1024 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 1024)

        resp = "150 Opening BINARY mode data connection (2048 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 2048)

        resp = "150 Transfer starting (8192 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 8192)

        resp = "150 Data connection accepted (65536 bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), 65536)

        resp = "150 Data connection accepted (65536 BYTES)"
        self.assertEqual(parse150(resp), parse150_re(resp), 65536)

    def test_invalid_cases(self):
        resp = "150 Opening connection"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (not a number bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (1234 bytes"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150 (5678 bytes"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "150  () bytes)"
        self.assertEqual(parse150(resp), parse150_re(resp), None)

        resp = "200 OK"
        self.assertRaises(error_reply, parse150, resp)
        self.assertRaises(error_reply, parse150_re, resp)

if __name__ == "__main__":
    unittest.main()

@eendebakpt
Copy link
Contributor

This improves the performance of parse150 for some cases. But is the performance a bottleneck? It might be performance of ftplib is limited by the actual transfers.

@rhettinger
Copy link
Contributor

rhettinger commented Feb 18, 2025

I'm going to decline this one. It is ancient, well-exercised, stable code that is not much used anymore. It really isn't worth the code churn or the risk that the string method approach isn't exactly equivalent (i.e. lower() vs casefold()).

@rhettinger rhettinger closed this Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants