Skip to content

Commit 7097368

Browse files
committed
ENH: allow fallback when lxml fails to parse
DOC: add release notes for new list convention ENH: add list of parsers CLN: remove raise_on_error BUG: fix format string BUG: fix different raise type BLD: travis python 3.2 is strange BUG: bring in urlparse again since 2to3 shows proper conversion TST: fix for python26
1 parent 0fdcf98 commit 7097368

File tree

6 files changed

+304
-211
lines changed

6 files changed

+304
-211
lines changed

Diff for: RELEASE.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,6 @@ pandas 0.11.1
7070
- ``melt`` now accepts the optional parameters ``var_name`` and ``value_name``
7171
to specify custom column names of the returned DataFrame (GH3649_),
7272
thanks @hoechenberger
73-
- ``read_html`` no longer performs hard date conversion
7473
- Plotting functions now raise a ``TypeError`` before trying to plot anything
7574
if the associated objects have have a dtype of ``object`` (GH1818_,
7675
GH3572_). This happens before any drawing takes place which elimnates any
@@ -133,6 +132,9 @@ pandas 0.11.1
133132
as an int, maxing with ``int64``, to avoid precision issues (GH3733_)
134133
- ``na_values`` in a list provided to ``read_csv/read_excel`` will match string and numeric versions
135134
e.g. ``na_values=['99']`` will match 99 whether the column ends up being int, float, or string (GH3611_)
135+
- ``read_html`` now defaults to ``None`` when reading, and falls back on
136+
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
137+
until success is also valid
136138

137139
**Bug Fixes**
138140

Diff for: doc/source/io.rst

+15
Original file line numberDiff line numberDiff line change
@@ -1054,6 +1054,21 @@ Read in pandas ``to_html`` output (with some loss of floating point precision)
10541054
dfin[0].columns
10551055
np.allclose(df, dfin[0])
10561056
1057+
``lxml`` will raise an error on a failed parse if that is the only parser you
1058+
provide
1059+
1060+
.. ipython:: python
1061+
1062+
dfs = read_html(url, match='Metcalf Bank', index_col=0, flavor=['lxml'])
1063+
1064+
However, if you have bs4 and html5lib installed and pass ``None`` or ``['lxml',
1065+
'bs4']`` then the parse will most likely succeed. Note that *as soon as a parse
1066+
succeeds, the function will return*.
1067+
1068+
.. ipython:: python
1069+
1070+
dfs = read_html(url, match='Metcalf Bank', index_col=0, flavor=['lxml', 'bs4'])
1071+
10571072
10581073
Writing to HTML files
10591074
~~~~~~~~~~~~~~~~~~~~~~

Diff for: doc/source/v0.11.1.txt

+4
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,10 @@ API changes
139139

140140
- sum, prod, mean, std, var, skew, kurt, corr, and cov
141141

142+
- ``read_html`` now defaults to ``None`` when reading, and falls back on
143+
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
144+
until success is also valid
145+
142146
Enhancements
143147
~~~~~~~~~~~~
144148

0 commit comments

Comments
 (0)