Skip to content

Commit f8b686d

Browse files
authored
Merge pull request #111 from GeospatialPython/more-encoding
Add encoding docs and tests
2 parents e117b76 + b938952 commit f8b686d

File tree

3 files changed

+65
-50
lines changed

3 files changed

+65
-50
lines changed

README.md

Lines changed: 61 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,11 @@ The Python Shapefile Library (pyshp) reads and writes ESRI Shapefiles in pure Py
2424
- [Adding Records](#adding-records)
2525
- [File Names](#file-names)
2626
- [Saving to File-Like Objects](#saving-to-file-like-objects)
27-
- [Working with Large Shapefiles](#working-with-large-shapefiles)
2827
- [Python Geo Interface](#python-geo-interface)
29-
- [Testing](#testing)
28+
- [Working with Large Shapefiles](#working-with-large-shapefiles)
29+
- [Unicode and Shapefile Encodings](#unicode-and-shapefile-encodings)
30+
31+
[Testing](#testing)
3032

3133
# Overview
3234

@@ -714,6 +716,35 @@ write them.
714716
>>> # Normally you would call the "StringIO.getvalue()" method on these objects.
715717
>>> shp = shx = dbf = None
716718

719+
## Python Geo Interface
720+
721+
The Python \_\_geo_interface\_\_ convention provides a data interchange interface
722+
among geospatial Python libraries. The interface returns data as GeoJSON which gives you
723+
nice compatibility with other libraries and tools including Shapely, Fiona, and PostGIS.
724+
More information on the \_\_geo_interface\_\_ protocol can be found at:
725+
[https://gist.github.com/sgillies/2217756](https://gist.github.com/sgillies/2217756).
726+
More information on GeoJSON is available at [http://geojson.org](http://geojson.org).
727+
728+
729+
>>> s = sf.shape(0)
730+
>>> s.__geo_interface__["type"]
731+
'MultiPolygon'
732+
733+
Just as the library can expose its objects to other applications through the geo interface,
734+
it also supports receiving objects with the geo interface from other applications.
735+
To write shapes based on GeoJSON objects, simply send an object with the geo interface or a
736+
GeoJSON dictionary to the shape() method instead of a Shape object. Alternatively, you can
737+
construct a Shape object from GeoJSON using the "geojson_as_shape()" function.
738+
739+
740+
>>> w = shapefile.Writer()
741+
>>> w.field('name', 'C')
742+
743+
>>> w.shape( {"type":"Point", "coordinates":[1,1]} )
744+
>>> w.record('two')
745+
746+
>>> w.save('shapefiles/test/geojson')
747+
717748
## Working with Large Shapefiles
718749

719750
Despite being a lightweight library, PyShp is designed to be able to read and write
@@ -756,43 +787,43 @@ process and write any number of items, and even merging many different source fi
756787
large shapefile. If you need to edit or undo any of your writing you would have to read the
757788
file back in, one record at a time, make your changes, and write it back out.
758789

759-
## Python Geo Interface
790+
## Unicode and Shapefile Encodings
760791

761-
The Python \_\_geo_interface\_\_ convention provides a data interchange interface
762-
among geospatial Python libraries. The interface returns data as GeoJSON which gives you
763-
nice compatibility with other libraries and tools including Shapely, Fiona, and PostGIS.
764-
More information on the \_\_geo_interface\_\_ protocol can be found at:
765-
[https://gist.github.com/sgillies/2217756](https://gist.github.com/sgillies/2217756).
766-
More information on GeoJSON is available at [http://geojson.org](http://geojson.org).
792+
PyShp has full support for unicode and shapefile encodings, so you can always expect to be working
793+
with unicode strings in shapefiles that have text fields.
794+
Most shapefiles are written in UTF-8 encoding, PyShp's default encoding, so in most cases you don't
795+
have to specify the encoding. For reading shapefiles in any other encoding, such as Latin-1, just
796+
supply the encoding option when creating the Reader class.
767797

768798

769-
>>> s = sf.shape(0)
770-
>>> s.__geo_interface__["type"]
771-
'MultiPolygon'
799+
>>> r = shapefile.Reader("shapefiles/test/latin1.shp", encoding="latin1")
800+
>>> r.record(0) == [2, u'Ñandú']
801+
True
772802

773-
Just as the library can expose its objects to other applications through the geo interface,
774-
it also supports receiving objects with the geo interface from other applications.
775-
To write shapes based on GeoJSON objects, simply send an object with the geo interface or a
776-
GeoJSON dictionary to the shape() method instead of a Shape object. Alternatively, you can
777-
construct a Shape object from GeoJSON using the "geojson_as_shape()" function.
803+
Once you have loaded the shapefile, you may choose to save it using another more supportive encoding such
804+
as UTF-8. Provided the new encoding supports the characters you are trying to write, reading it back in
805+
should give you the same unicode string you started with.
778806

779807

780-
>>> w = shapefile.Writer()
781-
>>> w.field('name', 'C')
782-
783-
>>> w.shape( {"type":"Point", "coordinates":[1,1]} )
784-
>>> w.record('one')
808+
>>> w = shapefile.Writer(encoding="utf8")
809+
>>> w.fields = r.fields[1:]
810+
>>> w.record(*r.record(0))
811+
>>> w.null()
812+
>>> w.save("shapefiles/test/latin_as_utf8.shp")
785813

786-
>>> shape = shapefile.geojson_to_shape( {"type":"Point", "coordinates":[2,2]} )
787-
>>> shape.shapeType
788-
1
789-
>>> shape.points
790-
[[2, 2]]
814+
>>> r = shapefile.Reader("shapefiles/test/latin_as_utf8.shp", encoding="utf8")
815+
>>> r.record(0) == [2, u'Ñandú']
816+
True
791817

792-
>>> w.shape(shape)
793-
>>> w.record('two')
818+
If you supply the wrong encoding and the string is unable to be decoded, PyShp will by default raise an
819+
exception. If however, on rare occasion, you are unable to find the correct encoding and want to ignore
820+
or replace encoding errors, you can specify the "encodingErrors" to be used by the decode method. This
821+
applies to both reading and writing.
794822

795-
>>> w.save('shapefiles/test/geojson')
823+
824+
>>> r = shapefile.Reader("shapefiles/test/latin1.shp", encoding="ascii", encodingErrors="replace")
825+
>>> r.record(0) == [2, u'�and�']
826+
True
796827

797828
# Testing
798829

shapefile.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1269,14 +1269,15 @@ def check_output(self, want, got, optionflags):
12691269
if sys.version_info[0] == 2:
12701270
got = re.sub("u'(.*?)'", "'\\1'", got)
12711271
got = re.sub('u"(.*?)"', '"\\1"', got)
1272-
return doctest.OutputChecker.check_output(self, want, got, optionflags)
1272+
res = doctest.OutputChecker.check_output(self, want, got, optionflags)
1273+
return res
12731274
def summarize(self):
12741275
doctest.OutputChecker.summarize(True)
12751276

12761277
# run tests
12771278
runner = doctest.DocTestRunner(checker=Py23DocChecker(), verbose=verbosity)
1278-
with open("README.md","r") as fobj:
1279-
test = doctest.DocTestParser().get_doctest(string=fobj.read(), globs={}, name="README", filename="README.md", lineno=0)
1279+
with open("README.md","rb") as fobj:
1280+
test = doctest.DocTestParser().get_doctest(string=fobj.read().decode("utf8"), globs={}, name="README", filename="README.md", lineno=0)
12801281
failure_count, test_count = runner.run(test)
12811282

12821283
# print results

test.py

Lines changed: 0 additions & 17 deletions
This file was deleted.

0 commit comments

Comments
 (0)