Skip to content

Scraping prescription drug prices from Rx site using the prescription drug name and zipcode #5959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

saptarshi1996
Copy link
Contributor

@saptarshi1996 saptarshi1996 commented Jan 30, 2022

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

@ghost ghost added the require tests Tests [doctest/unittest/pytest] are required label Jan 30, 2022
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@ghost ghost added the awaiting reviews This PR is ready to be reviewed label Jan 30, 2022
@ghost ghost removed the require tests Tests [doctest/unittest/pytest] are required label Jan 30, 2022
@saptarshi1996
Copy link
Contributor Author

I have fixed all the requests. And added tests as well

@ghost ghost added the tests are failing Do not merge until tests pass label Jan 30, 2022
@saptarshi1996
Copy link
Contributor Author

Fixed the issues and tested with black. This is my first contribution so please bear with me.

@cclauss
Copy link
Member

cclauss commented Jan 30, 2022

#5960 is driving our automated testing a bit crazy.

Comment on lines 68 to 69
request_url: str = f'https://www.wellrx.com/prescriptions/{drug_name}/{zip_code}/?freshSearch=true'
response: Response = get(request_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put type hints on function parameters and function return types but we do not need them everywhere. Both Python and mypy are capable of figuring out that a string literal is a string. ;-) Overuse slows down both the writer and reader of code.

Suggested change
request_url: str = f'https://www.wellrx.com/prescriptions/{drug_name}/{zip_code}/?freshSearch=true'
response: Response = get(request_url)
request_url = f'https://www.wellrx.com/prescriptions/{drug_name}/{zip_code}/?freshSearch=true'
response = get(request_url)

response: Response = get(request_url)

# Is the status code ok?
if response.status_code == 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use response.raise_for_status() instead to let the caller know what the problem is.
https://docs.python-requests.org/en/master/api/#requests.Response.raise_for_status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added response.raise_for_status() and changed the code accordingly.

# Get price of the drug.
price: str = grid.find(
"span", {"p", "price price-large"}).text
formatted_price: float = format_price(price)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to get rid of the $ and the two digits to the right of the decimal point? Are we going to do math (add subtract, multiply, divide) on these numbers? If not, let's not modify the formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the conversion as we were not using those values

else:
return None

except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed except Exception as e: with except (HTTPError, exceptions.RequestException, ValueError):. This was a new learning for me.

Comment on lines 112 to 113
drug_name: str = input("Enter drug Name:\n")
zip_code: str = input("Enter zip code:\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See README.md for advise for leading and/or trailing spaces in input().

Copy link
Contributor Author

@saptarshi1996 saptarshi1996 Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through readme and codebase and added the input as
drug_name = input("Enter drug name: ").strip()
zip_code = input("Enter zip code: ").strip()

@ghost ghost added awaiting changes A maintainer has requested changes to this PR and removed awaiting reviews This PR is ready to be reviewed labels Jan 30, 2022
@ghost ghost added awaiting reviews This PR is ready to be reviewed and removed awaiting changes A maintainer has requested changes to this PR labels Jan 30, 2022
@saptarshi1996
Copy link
Contributor Author

Thank you for the code review. These are really good points that i've missed. I'll go ahead with the requested changes.

@saptarshi1996
Copy link
Contributor Author

@cclauss Please check my commit. I've added all the requested changes.

@saptarshi1996 saptarshi1996 requested a review from cclauss January 30, 2022 20:40
@saptarshi1996 saptarshi1996 changed the title Scraping prescription drug prices from Rx site using the prescription drug name and price Scraping prescription drug prices from Rx site using the prescription drug name and zipcode Jan 30, 2022
@saptarshi1996 saptarshi1996 deleted the implement_rx_scraping_in_web_programming branch January 31, 2022 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting reviews This PR is ready to be reviewed tests are failing Do not merge until tests pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants