Skip to content

Commit f448dd2

Browse files
authored
Release 0.2.4 (#96)
- Increase timeout for requests to check web urls alive or not, defaults to 15 seconds. - Consider status codes between 2xx and 3xx as valid URLs. - Add headers Accept and User-Agent headers to the requests which are required by some websites.
1 parent 99929c6 commit f448dd2

File tree

20 files changed

+98
-70
lines changed

20 files changed

+98
-70
lines changed

.github/workflows/python-tests.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,13 @@ jobs:
4242
run: |
4343
cd azure-search-openai-demo
4444
markdown-checker -d . -f check_urls_locale -gu ''
45-
45+
4646
- name: Test Check Broken URLs in azure-search-openai-demo
4747
if: always()
4848
run: |
4949
cd azure-search-openai-demo
5050
markdown-checker -d . -f check_broken_urls -gu ''
51-
51+
5252
tests-phicookbook-repo:
5353
name: Python Tests on Phi-3CookBook
5454
runs-on: ubuntu-latest

CHANGELOG.md

+6
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ All notable changes to this project will be documented in this file.
1212

1313
### Other Changes
1414

15+
## [v0.2.4] 26 Jan 2025
16+
17+
- Increase timeout for requests to check web urls alive or not, defaults to 15 seconds.
18+
- Consider status codes between 2xx and 3xx as valid URLs.
19+
- Add headers Accept and User-Agent headers to the requests which are required by some websites.
20+
1521
## [v0.2.3] 26 Nov 2024
1622

1723
- Skip another domain by @IsuminI

docs/mkdocs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ nav:
3535
- About: index.md
3636
- Usage: usage.md
3737
- Advanced Usage: advanced.md
38-
- API Reference:
38+
- API Reference:
3939
- API Reference: api.md
4040
- Main: ./api/main.md
4141
- Urls: ./api/urls.md

docs/source/advanced.md

+36-21
Original file line numberDiff line numberDiff line change
@@ -1,104 +1,119 @@
1+
<!-- markdownlint-disable MD041 -->
12
## Advanced Usage
23

34
To further customize your experience with the Markdown Checker, you can utilize additional command-line interface (CLI) options.
45

56
## Command Line Options
67

78
### `-d`, `--dir`
9+
810
- **Type**: `click.Path`
911
- **Description**: Path to the root directory to check.
1012
- **Required**: Yes
1113

1214
### `-f`, `--func`
15+
1316
- **Type**: `click.Choice`
1417
- **Description**: Function to be executed.
1518
- **Choices**:
16-
- `check_broken_paths`
17-
- `check_broken_urls`
18-
- `check_paths_tracking`
19-
- `check_urls_tracking`
20-
- `check_urls_locale`
19+
- `check_broken_paths`
20+
- `check_broken_urls`
21+
- `check_paths_tracking`
22+
- `check_urls_tracking`
23+
- `check_urls_locale`
2124
- **Required**: Yes
2225

2326
### `-ext`, `--extensions`
27+
2428
- **Type**: `list[str]`
2529
- **Description**: File extensions to filter the files.
26-
- **Default**:
27-
- `.md`
28-
- `.ipynb`
30+
- **Default**:
31+
- `.md`
32+
- `.ipynb`
2933
- **Required**: No
3034

3135
### `-td`, `--tracking-domains`
36+
3237
- **Type**: `list[str]`
3338
- **Description**: List of tracking domains to check.
34-
- **Default**:
35-
- `github.com`
36-
- `microsoft.com`
37-
- `visualstudio.com`
38-
- `aka.ms`
39-
- `azure.com`
39+
- **Default**:
40+
- `github.com`
41+
- `microsoft.com`
42+
- `visualstudio.com`
43+
- `aka.ms`
44+
- `azure.com`
4045
- **Required**: No
4146

4247
### `-sf`, `--skip-files`
48+
4349
- **Type**: `list[str]`
4450
- **Description**: List of file names to skip check.
45-
- **Default**:
46-
- `CODE_OF_CONDUCT.md`
47-
- `SECURITY.md`
51+
- **Default**:
52+
- `CODE_OF_CONDUCT.md`
53+
- `SECURITY.md`
4854
- **Required**: No
4955

5056
### `-sd`, `--skip-domains`
57+
5158
- **Type**: `list[str]`
5259
- **Description**: List of domains to skip checking.
5360
- **Default**: `[]`
5461
- **Required**: No
5562

5663
### `-suc`, `--skip-urls-containing`
64+
5765
- **Type**: `list[str]`
5866
- **Description**: List of strings to skip checking if their urls are working or not.
59-
- **Default**:
60-
- `https://www.microsoft.com/en-us/security/blog`
61-
- `video-embed.html`
67+
- **Default**:
68+
- `https://www.microsoft.com/en-us/security/blog`
69+
- `video-embed.html`
6270
- **Required**: No
6371

6472
### `-gu`, `--guide-url`
73+
6574
- **Type**: `str`
6675
- **Description**: Full URL of your contributing guide.
6776
- **Required**: No
6877

6978
### `-to`, `--timeout`
79+
7080
- **Type**: `Click.IntRange`
7181
- **Description**: Timeout in seconds for the requests before retrying.
72-
- **Default**: `10`
82+
- **Default**: `15`
7383
- **Range**: `0-50`
7484
- **Required**: No
7585

7686
### `-rt`, `--retries`
87+
7788
- **Type**: `Click.IntRange`
7889
- **Description**: Number of retries for the requests before flagging a url as broken.
7990
- **Default**: `3`
8091
- **Range**: `0-10`
8192
- **Required**: No
8293

8394
### `-o`, `--output-file-name`
95+
8496
- **Type**: `str`
8597
- **Description**: Name of the output file.
8698
- **Default**: `comment`
8799
- **Required**: No
88100

89101
### `SRC ...`
102+
90103
- **Type**: `click.Path`
91104
- **Description**: Source files or directories to check.
92105
- **Required**: No
93106

94107
## Other Options
95108

96109
### `--version`
110+
97111
- **Type**: `bool`
98112
- **Description**: Show the version and exit.
99113
- **Required**: No
100114

101115
### `--help`
116+
102117
- **Type**: `bool`
103118
- **Description**: Show the help message and exit.
104119
- **Required**: No

docs/source/api.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
<!-- markdownlint-disable MD041 -->
12
- [Main](./api/main.md)
23
- [Urls](./api/urls.md)
34
- [Paths](./api/paths.md)

docs/source/api/main.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
::: markdown_checker
1+
::: markdown_checker

docs/source/api/markdown_link_base.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
::: markdown_checker.markdown_link_base
1+
::: markdown_checker.markdown_link_base

docs/source/api/paths.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
::: markdown_checker.paths
1+
::: markdown_checker.paths

docs/source/api/urls.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
::: markdown_checker.urls
1+
::: markdown_checker.urls

docs/source/api/utils.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44

55
::: markdown_checker.utils.list_files
66

7-
::: markdown_checker.utils.spinner
7+
::: markdown_checker.utils.spinner

docs/source/usage.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ The library provides the following functions:
44

55
[Usage](#usage):
66

7-
- [`check_broken_paths`](#check_broken_paths)
8-
- [`check_broken_urls`](#check_broken_urls)
9-
- [`check_urls_locale`](#check_urls_locale)
10-
- [`check_paths_tracking`](#check_paths_tracking)
11-
- [`check_urls_tracking`](#check_urls_tracking)
7+
- [`check_broken_paths`](#check_broken_paths)
8+
- [`check_broken_urls`](#check_broken_urls)
9+
- [`check_urls_locale`](#check_urls_locale)
10+
- [`check_paths_tracking`](#check_paths_tracking)
11+
- [`check_urls_tracking`](#check_urls_tracking)
1212

1313
## `check_broken_paths`
1414

@@ -60,4 +60,4 @@ Example:
6060
markdown-checker -d . -f check_urls_tracking -gu https://github.com/john0isaac/markdown-checker/blob/main/CONTRIBUTING.md
6161
```
6262

63-
## Want to do more? Check out the [Advanced Usage](./advanced.md) page.
63+
## Want to do more? Check out the [Advanced Usage](./advanced.md) page

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[project]
22
name = "markdown-checker"
33
description= "A markdown link validation reporting tool."
4-
version = "0.2.3"
4+
version = "0.2.4"
55
authors = [{ name = "John Aziz", email = "[email protected]" }]
66
maintainers = [{ name = "John Aziz", email = "[email protected]" }]
77
license = {file = "LICENSE"}

src/markdown_checker/__init__.py

+7-24
Original file line numberDiff line numberDiff line change
@@ -76,32 +76,15 @@ def detect_issues(
7676
# currently these domains are known to have restrictions on the requests
7777
skip_domains.extend(
7878
[
79-
"platform.openai.com",
80-
"help.openai.com",
79+
"openai.com",
8180
"beta.openai.com",
82-
"marketplace.visualstudio.com",
83-
"huggingface.co",
81+
"help.openai.com",
82+
"platform.openai.com",
83+
"vscode.dev",
8484
"en.wikipedia.org",
85-
"twitter.com",
86-
"www.linkedin.com",
87-
"make.powerautomate.com",
88-
"make.powerapps.com",
8985
"www.midjourney.com",
90-
"vscode.dev",
86+
"www.linkedin.com",
9187
"rodtrent.substack.com",
92-
"example.com",
93-
"www.nuget.org",
94-
"www.docker.com",
95-
"build.nvidia.com",
96-
"dotnet.microsoft.com",
97-
"www.gemini.com",
98-
"upload.wikimedia.org",
99-
"medium.com",
100-
"blogs.nvidia.com",
101-
"blog.gopenai.com",
102-
"towardsdatascience.com",
103-
"code.visualstudio.com",
104-
"opensource.org",
10588
]
10689
)
10790
with concurrent.futures.ProcessPoolExecutor() as executor:
@@ -260,7 +243,7 @@ def type_cast_value(self, ctx, value):
260243
"-to",
261244
"--timeout",
262245
type=click.IntRange(0, 50),
263-
default=10,
246+
default=15,
264247
help="Timeout in seconds for the requests.",
265248
required=False,
266249
)
@@ -289,7 +272,7 @@ def type_cast_value(self, ctx, value):
289272
required=False,
290273
)
291274
@click.version_option(
292-
message=(f"%(prog)s, %(version)s\n" f"Python ({platform.python_implementation()}) {platform.python_version()}"),
275+
message=(f"%(prog)s, %(version)s\nPython ({platform.python_implementation()}) {platform.python_version()}"),
293276
)
294277
def main(
295278
src: tuple[Path, ...],

src/markdown_checker/reports/md_reports/templates/paths/broken.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
We have automatically detected the following broken relative paths in your files.
44
Review and fix the paths to resolve this issue.
55

6-
Check the file paths and associated broken paths inside them.
6+
Check the file paths and associated broken paths inside them.

src/markdown_checker/reports/md_reports/templates/paths/tracking.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
We have automatically detected missing tracking IDs from the following relative paths in your files.
44
Review and add tracking to paths to resolve this issue.
55

6-
Check the file paths and associated paths inside them.
6+
Check the file paths and associated paths inside them.

src/markdown_checker/reports/md_reports/templates/urls/broken.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.
44

5-
Check the file paths and associated broken URLs inside them.
5+
Check the file paths and associated broken URLs inside them.

src/markdown_checker/reports/md_reports/templates/urls/locale.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
We have automatically detected added country locale to URLs in your files.
44
Review and remove country-specific locale from URLs to resolve this issue.
55

6-
Check the file paths and associated URLs inside them.
6+
Check the file paths and associated URLs inside them.

src/markdown_checker/reports/md_reports/templates/urls/tracking.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
We have automatically detected missing tracking IDs from the following URLs in your files.
44
Review and add tracking to URLs to resolve this issue.
55

6-
Check the file paths and associated URLs inside them.
6+
Check the file paths and associated URLs inside them.

src/markdown_checker/urls.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def host_name(self) -> str:
2727
"""
2828
return self.parsed_url.netloc
2929

30-
def is_alive(self, timeout: int = 10, retries: int = 3) -> bool:
30+
def is_alive(self, timeout: int = 15, retries: int = 3) -> bool:
3131
"""
3232
Check if the URL is alive
3333
@@ -38,14 +38,18 @@ def is_alive(self, timeout: int = 10, retries: int = 3) -> bool:
3838
Returns:
3939
bool: True if the URL is alive, False otherwise
4040
"""
41+
headers = {
42+
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
43+
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
44+
}
4145
for _ in range(retries):
4246
try:
43-
response = requests.head(self.link, timeout=timeout, allow_redirects=True)
44-
if response.status_code == 200:
47+
response = requests.head(self.link, timeout=timeout, allow_redirects=True, headers=headers)
48+
if 200 <= response.status_code < 300:
4549
return True
4650
else:
47-
response = requests.get(self.link, timeout=timeout, allow_redirects=True)
48-
if response.status_code == 200:
51+
response = requests.get(self.link, timeout=timeout, allow_redirects=True, headers=headers)
52+
if 200 <= response.status_code < 300:
4953
return True
5054
except requests.RequestException:
5155
continue

src/markdown_checker/utils/logging.py

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
"""
2+
Utilities for logging config.
3+
"""
4+
5+
import logging
6+
7+
8+
def setup_logging(
9+
level: int = logging.INFO,
10+
format: str = "[%(asctime)s] %(levelname)s in %(module)s: %(message)s",
11+
) -> None:
12+
"""
13+
Setup logging
14+
15+
Args:
16+
level (int): The logging level
17+
format (str): The logging format
18+
"""
19+
logging.basicConfig(level=level, format=format)

0 commit comments

Comments
 (0)