Skip to content

Scraping prescription drug prices from Rx site using the prescription drug name and zipcode #5959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

112 changes: 112 additions & 0 deletions web_programming/fetch_well_rx_price.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""

Scrape the price and pharmacy name for a prescription drug from rx site
after providing the drug name and zipcode.

"""

import lxml

from requests import Response, get
from bs4 import BeautifulSoup


BASE_URL: str = "https://www.wellrx.com/prescriptions/{0}/{1}/?freshSearch=true"


def format_price(price: str) -> float:
"""[summary]

Remove the dollar from the string and convert it to float.

>>> format_price("$14")
14.0

>>> format_price("$15.67")
15.67

>>> format_price("$0.00")
0.0

Args:
price (str): [price of drug in string format]

Returns:
float: [formatted price of drug in float]
"""
dollar_removed: str = price.replace("$", "")
formatted_price: str = float(dollar_removed)
return formatted_price


def fetch_pharmacy_and_price_list(drug_name: str, zip_code: str) -> list:

"""[summary]

This function will take input of drug name and zipcode, then request to the BASE_URL site,
Get the page data and scrape it to the generate the list of lowest prices for the prescription drug.

Args:
drug_name (str): [Drug name]
zip_code(str): [Zip code]

Returns:
list: [List of pharmacy name and price]
"""

try:

# has user provided both inputs?
if not drug_name or not zip_code:
return []

request_url: str = BASE_URL.format(drug_name, zip_code)
response: Response = get(request_url)

# Is the status code ok?
if response.status_code == 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use response.raise_for_status() instead to let the caller know what the problem is.
https://docs.python-requests.org/en/master/api/#requests.Response.raise_for_status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added response.raise_for_status() and changed the code accordingly.


# Scrape the data using bs4
soup: BeautifulSoup = BeautifulSoup(response.text, "lxml")

# This list will store the name and price.
pharmacy_price_list: list = []

# Fetch all the grids that contains the items.
grid_list: list = soup.find_all("div", { "class": "grid-x pharmCard" })
if grid_list and len(grid_list) > 0:
for grid in grid_list:

# Get the pharmacy price.
pharmacy_name: str = grid.find("p", { "class": "list-title" }).text

# Get price of the drug.
price: str = grid.find("span", { "p", "price price-large" }).text
formatted_price: float = format_price(price)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to get rid of the $ and the two digits to the right of the decimal point? Are we going to do math (add subtract, multiply, divide) on these numbers? If not, let's not modify the formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the conversion as we were not using those values


pharmacy_price_list.append({
"pharmacy_name": pharmacy_name,
"price": formatted_price,
})

# Print the pharmacy name and price for the drug.

return pharmacy_price_list

else:
return []

except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed except Exception as e: with except (HTTPError, exceptions.RequestException, ValueError):. This was a new learning for me.

return []


if __name__ == "__main__":

# Enter a drug name and a zip code
drug_name: str = input("Enter drug Name:\n")
zip_code: str = input("Enter zip code:\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See README.md for advise for leading and/or trailing spaces in input().

Copy link
Contributor Author

@saptarshi1996 saptarshi1996 Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through readme and codebase and added the input as
drug_name = input("Enter drug name: ").strip()
zip_code = input("Enter zip code: ").strip()

pharmacy_price_list: list = fetch_pharmacy_and_price_list(drug_name, zip_code)

print("Search results for {0} at location {1}\n".format(drug_name, zip_code))
for pharmacy_price in pharmacy_price_list:
print("Pharmacy: {0} Price: {1}".format(pharmacy_price["pharmacy_name"], pharmacy_price["price"]))