Hahah! FOOL! This shit is NOT for the faint of heart. Instead of there being one neat little chart object to tweak, that chart has crapped all over the Word document and you have THREE places you need to make the change, otherwise you'll end up with a Word document that won't open in Word.
I originally wrote this code for a proprietary system so, for copyright reasons, I cannot provide the code verbatim, but I will walk you through the steps so that you can re-create the results.
Step 1
Make sure your environment is set up. You need to have Microsoft's Open XML SDK (version 2.0 or later) installed. In your VS project, add a reference to DocumentFormat.OpenXml. Depending on the .NET version you're using, you may also have to add a reference to WindowsBase.
Step 2
Get a WordprocessingDocument object by calling WordprocessingDocument.Open.
Step 3
If you examine the XML for the word document, you'll find that the chart is contained within a series of tags: w:drawing, wp:inline, a:graphic, a:graphicData, and finally c:chart. We don't care about any of that except, tangentially, the c:chart, which is represented in the OpenXML SDK with a ChartReference object. That object doesn't actually contain any of the chart data, but what it does have is a reference to the chart part that does contain the chart data, which is what we want. So we first find the ChartReference that we're interested in. Obviously, a document could contain many charts, so you have to have some way of picking out the chart you want. For this example, I'm just going to pick the first ChartReference for simplicity's sake. If you need something more sophisticated, I recommend looking into putting the charts inside custom XML elements (that way you can use XML metadata to identify them). Anyway, finding the first chart in the document:
var chartRef = doc.MainDocumentPart.Document.Descendants<ChartReference>().First();
Step 4
Now that we've got the chart reference, we can get the chart part, which contains the actual chart data:
var chartPart = (ChartPart)doc.MainDocumentPart.GetPartById( chartRef.Id );
Step 5
Now that we've got the chart part, we can set out actually modifying the data in the chart. This is not, however, straightforward, as the chart data is repeated in three different places in the chart part: chart data in the chart part itself, an embedded spreadsheet object, and a range table in the embedded spreadsheet package. Whew. It's a lot to do. Let's get to it by modifying the chart data first.
In the ChartPart object, there is a ChartSpace object. This contains the various charts contained in this single chart (keep in mind that a single chart could contain a line chart and a bar chart, for exampled to be displayed in the same chart). So in the ChartSpace's list of descendants, there will be BarChart, LineChart, etc. objects. For the purposes of this tutorial, we will assume that there is a single bar chart contained in this chart. If your chart contains different types of charts, you will have to account for them accordingly. So let's grab that bar chart:
var barChart = chartPart.ChartSpace.Descendants<BarChart>().First();
Contained within the BarChart object are are BarDirection, BarGrouping, some BarChartSeries, and two AxisId objects. (Obviously, if we're using line charts or pie charts or whatever, these class names will be different. This is one of the real frustrations of dealing with mixed chart types; there are no super-classes, so we have to handle each chart type separately. Poor design, Microsoft! Poor design!) We're really only concerned with the BarChartSeries objects because they contain the actual series data. Again, we're going to simplify for this example, and assume the number of series, and data points within the series in the embedded chart matches our needs. If it doesn't (i.e., we have to add/remove series or data points), we're really going to have to get our hands dirty. If you need to do this, remember that OpenXmlElement.CloneNode and OpenXmlElement.InsertAfterSelf are your friends.
For the Open XML to be valid, each BarChartSeries object in each BarChart must have a valid series number. It starts at zero, and goes up by one. For this tutorial, it doesn't matter since we're assuming that the number of series matches exactly the amount of data we have to replace, but if you have a more sophisticated example, you'll have to take care to set the series numbers correctly. Unfortunately, there's even more. Since each series is linked to data in the spreadsheet, you have to make sure that matches up correctly, too. I'll show how to do that in this example, even though it's not necessary for our simple case.
We're going to start by changing the series labels. In the original (unmodified) document, there are three series, labeled "Series 1", "Series 2", and "Series 3". I'm going to change those to read "Modified 1", "Modified 2", and "Modified 3". You can be more imaginative when you re-name your series. Here's the code to do that (explanation to follow):
var barChartSeries = barChart.Descendants<BarChartSeries>();
foreach( var series in barChartSeries ) {
uint seriesIdx = series.Index.Val;
var spreadsheetCol = (char) ('B' + seriesIdx);
series.SeriesText.StringReference.Formula.Text = "Sheet1!quot; + spreadsheetCol + "$1";
var sc = series.SeriesText.StringReference.StringCache;
sc.RemoveAllChildren();
sc.AppendChild( new PointCount {Val = 1} );
var sp = new StringPoint( new[] { new NumericValue( "Modified " + seriesIdx ) } );
sp.SetAttribute( new OpenXmlAttribute( "idx", null, "0" ) );
sc.AppendChild( sp );
}
Ok, are you getting an idea how convoluted this is? It's ugly stuff. Ok, so going through this bit of code, we first get the series that make up this particular bar chart, then we get the series index (seriesIdx). This is what you'll have to modify yourself if you're adding/removing series. Then we have to set the spreadsheet reference to match the series. Not necessary in this example, but this gives you an idea how to handle this in more complicated cases. Like most chart-related data, in addition to the spreadsheet reference, there is a data cache. In this case, the series labels are cached. "StringCache" contains "StringPoint" objects that, mysteriously, contain "NumericValue" objects. Oddly, "NumericValue" contains string data. Go figure. So we first have to empty out everything that's there (sc.RemoveAllChildren()), then we replace it with the new data. I've run into trouble (invalid Open XML) by not explicitly setting the "idx" attribute, which is what that penultimate line is about. This has nothing to do with the index of the series; it's the index (zero) of the lone StringPoint in the StringCache.
TO BE CONTINUED...
Ethan -
ReplyDeleteI've had a love/hate relationship with the OpenXML SDK for a couple of years now (I hate it and my boss loves what I do with it *sigh*).
I've been generating charts for a couple of months now (pretty much exactly as you describe above).
My problem is that I've got a base document with 11 different charts, so grabbing the first ome doesn't work for me.
So, I do this:
Stream stream = this._wpDoc.MainDocumentPart.ChartParts.ElementAt( chartIndex ).EmbeddedPackagePart.GetStream( );
where chartIndex is an int containing the zero-based index of the chart I want to update.
But here's the rub: every time I make a change to the base document, it randomly reindexes all of the charts. Of course it does. Why wouldn't it.
Anyway, I'm looking for an alternative to this for finding specific charts. If you have any thoughts on this, I'd love to hear them.
Thanks.
JP
Oh, I forgot to mention why this matters to me:
ReplyDeleteMy base document has 11 charts and growing. Every time it reindexs, I spend a good hour commenting out the chart updates and using trial and error one at a time toget them sorted out.