How to set alt text on image using figure caption? #245

bitscompagnie · 2017-11-01T21:15:12Z

Hello,

How to use the figure caption text from the source Ms Word document as alt text on the generated tag?

In the source document.xml I see the following:

 <w:drawing>
                    <wp:inline distT="0" distB="0" distL="0" distR="0">
                        <wp:extent cx="5943600" cy="8210550"/>
                        <wp:effectExtent l="0" t="0" r="0" b="0"/>
                        <wp:docPr id="16" name="Picture 16"
                            descr="Image description."
                            title="Image description"/>
                        <wp:cNvGraphicFramePr>
.......

I need to grab the value of the descr="Image description" or title="Image description" from the above excerpt of the source document.xml

I am currently using below custom image export function, which allows me to save my images to a local folder:

[]( def get_image_tag(self, image, width=None, height=None, rotate=None):
# unique_filename = str(uuid.uuid4())

    img_src = self.get_image_source(image)
    if img_src:
        # Getting images from the source
        attrs = {
            'src': img_src
        }
        # get base64 file extension from bytes
        # https://matthewdaly.co.uk/blog/2015/07/04/handling-images-as-base64-strings-with-django-rest-framework/

        format, img_src2 = img_src.split(';base64,') # format ~= data:image/X,
        ext = format.split('/')[-1] # guess file extension
        # Capture the generated filename with the proper extension to use in img source attribute
        img_src_new = 'img_' + image_name() + '.' + ext
        # Function to convert base64 string to image using urlretireve
        urlretrieve(img_src, 'c:/git/output/' + img_src_new)

        # Set the image source to the newly created filename
        attrs = {
            'src': img_src_new
        }
    if width and height:
        attrs['width'] = width
        attrs['height'] = height
    if rotate:
        attrs['style'] = 'transform: rotate(%sdeg);' % rotate
    return HtmlTag('img', allow_self_closing=True, allow_whitespace=True, **attrs))

Thanks for your help.

The text was updated successfully, but these errors were encountered:

jlward · 2017-11-03T14:27:48Z

In order to pull the descr or title, you would need to make a change to pydocx.openxml.drawing.wordprocessing.inline to pull that field. If you would like to make a PR to implement this, I'd be happy to review it.

IuryAlves · 2017-11-23T02:42:51Z

Hello @bitscompagnie, @jlward

I've made a PR #248 to pull descr from pictures

Also while the PR is not merged, you can do the following:

# coding: utf-8
from __future__ import (
    absolute_import,
    print_function,
    unicode_literals,
)

from pydocx.export import PyDocXHTMLExporter
from pydocx.export.html import  convert_emus_to_pixels, HtmlTag
from pydocx.models import XmlModel, XmlAttribute, XmlChild
from pydocx.openxml.wordprocessing.drawing import Inline


class DocPr(XmlModel):
    XML_TAG = 'docPr'

    title = XmlAttribute(name='title')
    descr = XmlAttribute(name='descr')


Inline.docPr = XmlChild(type=DocPr)


class PyDocXHTMLExporterWithAlt(PyDocXHTMLExporter):

    def export_drawing(self, drawing):
        length, width = drawing.get_picture_extents()

        try:
            description = drawing.inline.docPr.descr
        except AttributeError:
            description = None
        rotate = drawing.get_picture_rotate_angle()
        relationship_id = drawing.get_picture_relationship_id()
        if not relationship_id:
            return
        image = None
        try:
            image = drawing.container.get_part_by_id(
                relationship_id=relationship_id,
            )
        except KeyError:
            pass
        attrs = {}
        if length and width:
            # The "width" in openxml is actually the height
            width_px = '{px:.0f}px'.format(px=convert_emus_to_pixels(length))
            height_px = '{px:.0f}px'.format(px=convert_emus_to_pixels(width))
            attrs['width'] = width_px
            attrs['height'] = height_px
        if rotate:
            attrs['rotate'] = rotate
        if description:
            attrs['alt'] = description

        tag = self.get_image_tag(image=image, **attrs)
        if tag:
            yield tag

    def get_image_tag(self, image, width=None, height=None, rotate=None, alt=None):
        image_src = self.get_image_source(image)
        if image_src:
            attrs = {
                'src': image_src
            }
            if width and height:
                attrs['width'] = width
                attrs['height'] = height
            if rotate:
                attrs['style'] = 'transform: rotate(%sdeg);' % rotate
            if alt:
                attrs['alt'] = alt

            return HtmlTag(
                'img',
                allow_self_closing=True,
                allow_whitespace=True,
                **attrs
            )


html = PyDocXHTMLExporterWithAlt('test.docx').export()

bitscompagnie · 2017-11-23T05:02:27Z

Thanks a lot @IuryAlves,

I just tested the solution and it worked as expected.

bitscompagnie changed the title ~~Question: How to set alt text on image using figure caption?~~ How to set alt text on image using figure caption? Nov 3, 2017

IuryAlves added a commit to IuryAlves/pydocx that referenced this issue Nov 23, 2017

extracting alt from images - related issue CenterForOpenScience#245

ae9bd43

IuryAlves mentioned this issue Nov 23, 2017

extracting alt from images - related issue #245 #248

Open

bitscompagnie closed this as completed Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to set alt text on image using figure caption? #245

How to set alt text on image using figure caption? #245

bitscompagnie commented Nov 1, 2017 •

edited

Loading

jlward commented Nov 3, 2017

Uh oh!

IuryAlves commented Nov 23, 2017

Uh oh!

bitscompagnie commented Nov 23, 2017 •

edited

Loading

Uh oh!

How to set alt text on image using figure caption? #245

How to set alt text on image using figure caption? #245

Comments

bitscompagnie commented Nov 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jlward commented Nov 3, 2017

Uh oh!

IuryAlves commented Nov 23, 2017

Uh oh!

bitscompagnie commented Nov 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bitscompagnie commented Nov 1, 2017 •

edited

Loading

bitscompagnie commented Nov 23, 2017 •

edited

Loading