TFS Cube: Introduction

Forget LINQ, .NET 3.0, WPF and Silverlight. Analysis Services is truly the most wondrous thing to have been released this decade. I’ve spent this morning getting intimate with the TFS Cube, as I have some custom data that I would like to inject into some reports, and since I’ve never used Analysis Services it’s been a fun exercise for me. Over the next couple of days I’ll be keeping a log here explaining what I’ve learnt, but remember that most of this is probably wrong as I only started playing with it this morning :)

In this post I’ll explain what cubes are made of, how to work with the cube in Excel, and what a Pivot Table is.

What is the Cube?

If you take a look at the SQL Server databases that are used by Team Foundation Server, you’ll find a list that looks something like this:

You can probably guess what most of them do. The TFSWarehouse database is the important one here - it contains all of the data that reports are made from, and is the one we’ll focus on today.

The process looks something like this:

  1. Developers, PM’s and testers are doing stuff all the time against the other databases. We call these databases ”Online Transactional Processing” databases, or OLTP, because their job is to store stuff while it is being done. Since developers never sleep, these databases have to be online at all times and they need to be responsive. They also contain a lot of data that is used by the application (TFS), but which probably isn’t necessary for reporting.
  2. As things happen in TFS, the data is added to the TFSWarehouse database. This is a standard, relational SQL server database that contains a slightly simplified version of what’s in the other databases. For example, while the TfsVersionControl database contains the contents of every file over time, the TFSWarehouse database just contains a table of overall Code Churn (how much code was added, removed or deleted) for each file.
  3. Inside Analysis Services, there is an Online Analytical Processing (OLAP) database that contains a “cube”, which is essentially a load of meta data around the TFSWarehouse database. The OLAP database is hooked up to the TFSWarehouse database, and every hour or so it “Processes” the TFSWarehouse, importing the data into analysis services.

Cubes are broken into three main things:

  • Measures
    Measures are the numbers. These will often appear as the values in the report, and are the quantitative figures that tell you what’s going on. Examples of measures are: Total Lines of Code, Total Code Churn, Total Number of Bugs, Total Failed Tests, Total Code Coverage, and Total Compiler Errors. Basically anything that is an aggregate would be called a measure.
  • Dimensions
    These are used for slicing and dicing Measures, and will usually be placed in the row and column headers of the report. Examples of dimensions are Dates, File names, Areas, Team Projects, Work items, and Iterations. If you said to yourself, “I want to see the number of CD Players per Franchise”, CD Player and Franchise would be dimensions.
  • Details
    Think of these as the “columns” for Dimensions. As an example, the Date dimension has a number of details: Year, Month, Week.

I’d suggest opening up the TFSWarehouse database and having a look at the tables inside, to get an idea of exactly what the cube stores. It’s all laid out in a pretty logical schema, and you’ll notice that it relates almost perfectly to the contents of the OLAP database.

Working with the Cube

Now lets have a go at working with the cube in Excel 2007. I’m going to use Excel because it’s the most fun way to work with the cube, because you can create graphs and do other calculations alongside the cube data, and it looks nicer than Visual Studio. In subsequent posts I’ll discuss how to use Visual Studio to edit and manipulate the data in the cube.

To open the cube in Excel, create a new spreadsheet, and select the Data tab in the tab control, erm, I mean, Ribbon. Click the “From Other Sources…” button, and select From Analysis Services:

 

This will bring up a dialog, where you can enter the name of your TFS server. Select the “Team System” cube (you may not have a choice, depending on your version of SQL Server) and hit “Finish”:

This will insert a Pivot Table into your spreadsheet. Pivot Tables are pretty simple when you get used to them, but for me originally they were a bit confusing. The TFS Cube is full of data, and there are many thousands of combinations in which we can view it. Pivot Tables are the tool we use to slice and dice this data to locate exactly what we want.

Pivot Tables are broken into four parts:

As the image indicates, Dimensions can be used as Filters, Column Fields or Row Fields, but not as Values. The only thing that can be used in the Values area are Measures. When you begin to use the pivot table and see the results, you’ll understand why.

Let’s take a scenario: I want to see the total number of lines of code, per project.

Now let’s analyze that sentence:

  • “Total number of lines of code” - that sounds like an aggregate, so it must be a Measure.
  • “per project” - that’s not an aggregate, but rather it’s a breakdown of the above aggregate, so it sounds like a Dimension

In Excel, when you select the Pivot Table, a side window appears giving you a list of all the measures and dimensions, as well as their details:

The top half of the side window lists the measures and dimensions, and the bottom half is broken into four squares - the Filters, Column Fields, Row Fields, and Values, just like the image I showed earlier.

In Excel, Measure Groups (which are the cube’s way of grouping a number of similar Measures) have this icon:

Whilst dimensions and their details have this icon:

In the TFS cube, the best way I’ve found to get the total number of lines of code is in the Code Churn measure group, with the measure being Total Lines. Simply tick the checkbox, and the sum will be shown on the Pivot Table on the sheet:

This is coming from my TFS server at home, so there’s not that much code - I’m sure if you tried this at work there would be a lot more. Note that Total Lines has been added to the Values section of the four squares in the side window:

 

Now we need to ”slice” the data by our Dimension, in this case the Team Project. You can do this by locating the Team Project dimension in Excel, and selecting it:

 

Note that the Team Project has been added to the Row Fields section of the four squares in the side window. As you can see, our dimension (the Team Project) has sliced the data up easily for us.

We can continue to slice the data up by adding other dimensions to the pivot table (either in the Column Fields or Row Fields, or as Filters) or by adding other measures to the Values section. In my next post, I’ll show how we can add custom dimensions to the cube using Visual Studio.

5 Responses to “TFS Cube: Introduction”

  1. […] from my previous post that there are three main types of things in a […]

  2. Analysis Services is unbelievably useful, we use some of the mining algorithms for some forecasting stuff, its very cool. I’m not sure of the structure of TFS cubes but you should look at trying to forecast the number of bugs or something cool like that. Also you could try and use the clustering models to find what areas have the larger bug clusters, I bet you could find some interesting information.

  3. […] why I produce an HTML report when I could have used reporting services. You might remember a couple of blog posts I did a fortnight ago about the TFS Cube. The good news is I’ve managed to […]

  4. I will be waiting for another post about how to add custom dimensions to the cube. This post really helps me a lot! It was great! Thanks!

  5. […] and documentation of FXCop 1.35 on the Code Analysis Team blog Data Warehouse and Reporting TFS Cube – Introduction on Paul Stovell's blog Published Mar 07 2008, 10:32 PM by […]

Leave a Reply