Images

In WordML images are embedded with the <w:pict> element and defined more precisely by the  markup language VML (Vector Markup Language). You can find a detailed description of VML on the English Microsoft page under the search term »VML Reference«. In the case of VML it is not a proprietary Microsoft markup language, but a so-called »Note« which is published by the W3C and has been decisively co-designed by Microsoft. You can find the Note of the W3C under www.w3.org/TR/NOTE-VML.

In principle, there are two types of graphics. On the one hand, graphics which have been referenced, which means they have been embedded into the document via a path name to a grapic file, as is the case in HTML. On the other hand, graphics which have been integrated into the document without using a link, whether per "copy and paste" or by the appropriate menus.

Referenced images are embedded into the document by using relatively simple structures. Most VML elements can be ignored for our purposes. Those elements are crucial which determine the width and height and which contain the path to the graphic file. The following example shows such a graphic call.

<w:p>
  <w:r>
    <w:pict>                                                 (1)
      <v:shapetype id="_x0000_t75" coordsize="21600,21600" 
       o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" 
       filled="f" stroked="f">
       <v:stroke joinstyle="miter"/>
       <v:formulas>
          <v:f eqn="if lineDrawn pixelLineWidth 0"/>
          <v:f eqn="sum @0 1 0"/>
          <v:f eqn="sum 0 0 @1"/>
          <v:f eqn="prod @2 1 2"/>
          <v:f eqn="prod @3 21600 pixelWidth"/>
          <v:f eqn="prod @3 21600 pixelHeight"/>
          <v:f eqn="sum @0 0 1"/>
          <v:f eqn="prod @6 1 2"/>
          <v:f eqn="prod @7 21600 pixelWidth"/>
          <v:f eqn="sum @8 21600 0"/>
          <v:f eqn="prod @7 21600 pixelHeight"/>
          <v:f eqn="sum @10 21600 0"/>
       </v:formulas>
       <v:path o:extrusionok="f" gradientshapeok="t" 
        o:connecttype="rect"/>
       <o:lock v:ext="edit" aspectratio="t"/>
      </v:shapetype>
      <v:shape id="_x0000_i1025" type="#_x0000_t75"         (2)
       style="width:333.75pt;height:250.5pt">
        <v:imagedata src="2004_1112Bild0033.JPG"/>          (3)
      </v:shape>
    </w:pict>
  </w:r>
</w:p>

(1) The <w:pict> element is the container element for images and other multimedia objects.

(2) The <v:shape> element determines with its style attribute the height and the width of a graphic file.

(3) The src attribute of the <v:imagedata> element indicates the path which leads to the graphic. This path can be relative to the storage location of the WordML document or can also be generated as an absolute path.

image - example with a graphic

Figure: example with a graphic

Graphics which have been embedded into the document are stored in a combination of VML and binary code (Base64). Depending on the type of application, such a storage method can be really problematic.

The Base64 encoding is a recoding of the binary codes in the graphics into a 6 bit encoding. This means that the graphics are preserved in their respective formats and only the characters in the binary code of these graphics have been recoded.

The Base64 encoding can be generated by tools. One of these free tools is Base64 which can be downloaded free of charge at www.fourmilab.ch/webtools/base64.

Such an embedding of the graphics is particularly problematic because most XML layout structures are working with external graphics which are referenced. During a WordML to XML transformation, for example, these graphics must first be generated by a decoding of the Base64 data and these data have to be saved in separate files.

Within WordML, the Base64 code is stored in a <w:binData> element and may be extremly extensive.

Here a small excerpt from such a binary code:

<w:binData w:name="wordml://02000001.jpg">
/9j/4S0aRXhpZgAATU0AKgAAAAgACwEPAAIAAAAJAAAAkgEQAAI
  AAAAQAAAAnAESAAMAAAABAAEAAAEaAAUAAAABAAAArAEbAAUAA 
  AABAAAAtAEoAAMAAAABAAIAAAExAAIAAAAnAAAAvAEyAAIAAAA
  UAAAA5AITAAMAAAABAAIAAIKYAAIAAAAFAAAA+Idp
  AAQAAAABAAAA/gAABHBGVUpJRklMTQAARmlu......
</w:binData>

Copyright © dpunkt.verlag GmbH 2007
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "Professionelle XML-Verarbeitung mit Word" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.

dpunkt.verlag GmbH, Ringstraße 19B, 69115 Heidelberg, fon 06221-14830, fax 06221-148399, hallo(at)dpunkt.de