Skip to content

Commit 1d670be

Browse files
committed
Merge pull request #62 from OpenScienceFramework/issue_62
Create utility scripts for docx2html and docx2markdown
2 parents e00173e + d20a7dd commit 1d670be

File tree

4 files changed

+36
-1
lines changed

4 files changed

+36
-1
lines changed

CHANGELOG

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11

22
Changelog
33
=========
4+
* 0.3.12
5+
* Added command line support to convert from docx to either html or
6+
markdown.
47
* 0.3.11
58
* The non breaking hyphen tag was not correctly being imported. This issue
69
has been fixed.

README.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,3 +231,8 @@ Optional Arguments
231231
##################
232232

233233
You can pass in `convert_root_level_upper_roman=True` to the parser and it will convert all root level upper roman lists to headings instead.
234+
235+
Command Line Execution
236+
######################
237+
238+
First you have to install pydocx, this can be done by running the command `pip install pydocx`. From there you can simply call the command `pydocx --html path/to/file.docx path/to/output.html`. Change `pydocx --html` to `pydocx --markdown` in order to convert to markdown instead.

pydocx/__init__.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import sys
12
from .parsers import Docx2Html, Docx2Markdown
23

34

@@ -9,3 +10,25 @@ def docx2markdown(path):
910
return Docx2Markdown(path).parsed
1011

1112
VERSION = '0.3.11'
13+
14+
15+
def main():
16+
try:
17+
parser_to_use = sys.argv[1]
18+
path_to_docx = sys.argv[2]
19+
path_to_html = sys.argv[3]
20+
except IndexError:
21+
print 'Must specify which parser as well as the file to convert and the name of the resulting file.' # noqa
22+
sys.exit()
23+
if parser_to_use == '--html':
24+
html = Docx2Html(path_to_docx).parsed
25+
elif parser_to_use == '--markdown':
26+
html = Docx2Markdown(path_to_docx).parsed
27+
else:
28+
print 'Only valid parsers are --html and --markdown'
29+
sys.exit()
30+
with open(path_to_html, 'w') as f:
31+
f.write(html.encode('utf-8'))
32+
33+
if __name__ == '__main__':
34+
main()

setup.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
from ez_setup import use_setuptools
1010
use_setuptools()
1111
from setuptools import setup, find_packages # noqa
12-
1312
rel_file = lambda *args: os.path.join(
1413
os.path.dirname(os.path.abspath(__file__)), *args)
1514

@@ -55,4 +54,9 @@ def get_description():
5554
"Topic :: Text Processing :: Markup :: XML",
5655
],
5756
long_description=get_description(),
57+
entry_points={
58+
'console_scripts': [
59+
'pydocx = pydocx.__init__:main',
60+
],
61+
},
5862
)

0 commit comments

Comments
 (0)