Sam Winstanley’s Blog

Market Research Technology News and Views

Sam Winstanley’s Blog header image 1

Understanding the Stylesheet in SpreadsheetML

March 18th, 2011 · 4 Comments

Preamble: Warning this page has nothing to do with Market Research, it is purely technical information about Office OpenXML.

Introduction

This page discusses the way that Styles are applied to cells in SpreadsheetML, it is correct at the time of writing and is correct in accordance with ECMA 376 – Second Edition Part 1 which relates to the Microsoft Office Open XML schemas.

Initially I wrote this document just for me, to explain to myself how my code needed to work, but I decided it might be universally useful.

In terms of the actual XML code in this, its as produced by Excel 2010, therefore where people spot additional  attributes that seem pointless/irregular this should be directed at the Excel team not at me.

Style Anatomy – How Excel Does This

In terms of open office xml, a workbook loosely speaking has this structure (intentionally reduced for readability), I have bolded the areas I will be discussing in the document.

  • Workbook
    • Styles
      • Fonts
      • Fills
      • Borders
      • CellStyleFormats (cellStyleXfs)
      • CellFormats (cellXfs)
      • CellStyles (cellStyles)
      • Differential Formats
      • Table Styles
      • StylesheetExtensionList
  • Theme
  • Worksheet (multiple)
    • SheetData
      • Row (multiple)
        • Cell (multiple)

The Way Styles are Looked Up

The Cell (c)

Each cell in a row can have a StyleIndex (s) attribute, this attribute is a number starting from 0. It represents a lookup into the CellFormats (cellXfs) by position (this is a zero index based lookup).

For example, you have a cell defined as:

<x:c r="C4" s="2" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
	<x:v>123</x:v>
</x:c>

This is telling Excel, this cell (C4) uses CellFormat (cellXfs) index 2. The actually CellFormat xf record at that position looks like this:

The CellFormat (xf) for the cell

<x:xf numFmtId="10" fontId="3" fillId="0" borderId="0" xfId="1" applyNumberFormat="1" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
	<x:alignment horizontal="center" vertical="top" />
</x:xf>

Inside this CellFormat record there are various format specifiers but the one to note is the FormatID (xfId) this  is saying its another lookup, this time into the CellStyleFormats (cellStyleXfs).

The CellFormat for the “style” inside CellStyleFormats

The CellFormat above contains an xfIf (FormatID) of 1, this is a reference to the item at index 1 in the CellStyleFormats (cellStyleXfs), in my workbook this looks as follows:

<x:xf numFmtId="0" fontId="3" fillId="0" borderId="0" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
	<x:alignment horizontal="center" vertical="top" />
</x:xf>

Note that this format record does not contain a FormatID (xfID) and it should not do that according to documentation, although this would make a nice future extension in terms of style inheritance in Excel if Microsoft ever chooses to implement that.

The Relationship between CellStyleFormats and CellFormats

In programming terms the best way I can come up with to describe the difference between the cellStyleXfs and the cellXfs records is like this:

Logically the CellFormat inside the CellStyleFormats acts as an abstract base class to the CellFormat in the CellFormats, as such you cannot bind a cell directly to one as it cannot be instantiated directly. The CellFormat which lives inside CellFormats effectively inherits this base class and can be instantiated and if needed override one or more of the Properties that exist on it in order to customize the appearance applied to one or more cells.

CellStyles (cellStyles)

Cell styles are not directly related to the actual formatting of cells, instead they primarily exist for the Excel UI to  show the user a style so that it can be modified etc. in essence they are like Labels for CellStyleFormats. My CellStyle for the style we’ve been talking about so far looks very simply like this:

<x:cellStyle name="CellValue" xfId="1" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main" />

Some Working Examples

Bold a single cell within a workbook, the logical process.

For the purposes of the example, assume there are many cells in a workbook which share a style. And you want to just bold one of these. This is what you have to do in order to make that happen:

  1. You have your Cell (c) it is currently set up to use a style xfId=1.
  2. You go to your StyleSheet.CellFormats (styleSheet/cellXfs) and find the element
    at index 1 (the 2nd child). This is your CellFormat (xf) for the cell.

    1. If your xf for the cell has ApplyFont (the applyFont) attribute set, then get the
      value of that attribute and use that as the id of the font that is being used from
      the Font (fontID) attribute.
    2. If your xf does not have ApplyFont set, then gets its FormatId (xfId) attribute
      (lets say this is set to 1 for now)

      1. Go to your StyleSheet.CellStyleFormats (styleSheet/cellStyleXfs) and get the child
        at index 1
      2. read the Font ID from the Font (fontId) attribute of this CellFormat (xf)
  3. Assuming a fontID of 1. Go to StyleSheet.Fonts (/styleSheet/fonts) and get the Font
    (font) child element at index 1.
  4. Verify that the Fonts collection does not already contain an identical font with
    Bold=true (b=1) set already

    • if one exists use its position as the font index.
    • if one does not, Copy the font element, then add a value Bold=true (b=1) to it, then append this
      newly cloned font specification back to StyleSheet.Fonts. The increment the value of Fonts.Count by 1. take the position of the newly added font definition as the font index.
  5. Assuming a font index of the newly added/found font of 6. Create a copy of your CellFormat (xf) for your cell (the one that lives in CellFormats (cellXfs))
  6. On your newly copied CellFormat, set the attribute Font=6 (font=”6″), along with the attribute ApplyFont=true (applyFont=”1″).
  7. Verify that you do not already have an identical CellFormat to the one you just made in your CellFormats (cellXfs)
  8. Assuming that you do not have an identical CellFormat already, append your newly created CellFormat (xf) to CellFormats(cellXfs). Then increment the value of CellFormat.Count
  9. Take the index of the newly added/created CellFormat within CellFormats (assume 19), set this value to the StyleIndex attribute of the Cell e.g. Cell.StyleIndex=19 (s=”19)

Although the above is quite a lot of steps, repeating for another cell once you know that styleID=19 is the one to use is very simple.

In terms of being totally correct, you should really check that your old CellFormat for the cell you updated, is still in use by some other cells, then remove it if it has become Orphaned by the process, but be careful doing this as in doing so you are decrementing the index of any higher CellFormats in the workbook so effectively if you remove a CellFormat, you really may need to perform an update on every cell in the workbook. These are the areas that SpreadsheetML becomes really tricky and having good wrapper code becomes pretty important.

General OpenXml Good Practises with Style elements

  1. When Appending children onto a collection, always increment the Count.
  2. Never remove any children from a collection unless you are prepared to do the work of finding all of the references to that collection
    and decrementing ID’s where possible.
  3. Do not blindly add children to a collection without checking that there is not already an appropriate child the same as the one you are adding as it can lead to massive file size bloating.

→ 4 CommentsTags: DeepTech