# Every time get data, save under a new name, so you can go back to

```SPREADSHEETS
(Data for this tutorial at www.peteraldhous.com/Data)
Spreadsheets are great tools for sorting, filtering and running calculations on
tables of data. Journalists who know the basics can “interview” data to find
stories and trends that others may miss. In this tutorial, we will work with
Once you know your way around a spreadsheet, it’s fairly easy to pick up the
skills needed to work with many other tools for data analysis.
TIP! Whenever you get some new data, save it under a new name! Then
if you mess things up, you can always go back to the original.
So, let’s open the file City Budget.xls, available online here, by selecting
File>Open and save it under a new name by selecting File>Save As.
Formatting, basic calculations and sorting
Spreadsheets use a simple co-ordinate system, with letters for columns and
numbers for rows. You can make calculations on data using these
coordinates.
This data was provided some years ago to a reporter working on a small city
paper.
1. Formatting data
First, let’s format the columns as currency. Select columns B and C by
clicking on the letters while pressing Ctrl (Windows) or command (Mac);
then select Format > Cells from the top Menu; when the dialog box pops
up, select Currency, USD \$ and 0 decimal places.
So, now we have everything in \$, but we still can’t see everything in column
A, so let’s format the column width.
Click the square at the top left, which selects all the data in the sheet. Then
Format>Column>Optimal Width and OK at the next dialog box.
TIP! If ever you see a column containing ###, it contains a number that is
wider than the column. Use the same trick to fix the problem.
2. Performing basic calculations
Now let’s calculate how the budget has changed from last year to this year.
All calculations start by entering = into the cell. So, the formula for first cell
is =(C2-B2).
(At this stage, we may need to format the cells in the column so that they are
also in \$.)
Move the cursor to the bottom right hand corner of the cell until you see it
change to a cross, then double click. Calc autofills the column, performing
the same calculation for each row until it hits a space.
TIP! Calc can recognize other patterns in your data and autofill in the same
way. To number a list of records, for instance, type 1, 2, 3, into the first three
cells next to a column of data, select those three cells and autofill.
You can also Copy (Ctrl-C or command-C) the first cell and Paste (Ctrl-V
pr command-V) all the way down the column. See how the “marching ants”
form around the cell you’ve copied.
(This will give 0s where the cells are blank, and #VALUE! errors as Calc
tries to perform a calculation on text. These can simply be deleted.)
TIP! You can lose your work if you later delete data that was used to
calculate values that you are still working with. If this is a possibility, Copy
the calculated data and paste it into a new column by selecting Edit>Paste
Special and selecting Numbers, Text etc, as appropriate.
So let’s do that here:
You may need to format our new column to currency once more. See how
the cells now contain the numbers, not the formulas used to calculate them.
Then delete the old column D by selecting the cells, then Edit>Delete Cells.
Now let’s calculate the percentage change in budget, rather than the absolute
\$ value.
The formula for the first cell is:
=((C2-B2)/B2) or =(D2/B2).
So enter this, and copy down the column as before. Again, you may need to
format to have the results displayed as a percentage, rather than a decimal.
3. Sorting data
Let’s sort by % change. To do so, highlight the range of data you want to
sort, then select Data>Sort. When the dialog box pops up, select Sort key
1> % Change and check Descending order.
TIP! When sorting, make sure Calc has recognized that your data has a
header row if there is one present, otherwise that will be sorted as well. Look
under the Options tab in the Sort dialog box and if necessary check Range
contains column labels.
TIP! Make sure when sorting to select the full range of data you want to
sort, and no more. Missing rows or columns from sorts, or including
extraneous data are common ways to scramble your data. Use Edit>Undo if
you make a mistake. In this case we want to sort separately on the
Departments and Revenues tables, Taking care not to include the Totals.
Sorting by % change and Difference, for both Departments and Revenues
can produce ideas for story angles. What might the reporter have pursued in
this case?
TIP! Always check any calculations that you haven’t done yourself
The mantra of journalism should be: if your mother says she loves you,
check it out! We should be as skeptical of numbers as we are of human
sources. So let’s just check that the totals we were given add up correctly.
To check the Total for This Year, Departments, the formula is
=SUM(B2:B14).
Now Copy this cell, and Paste into columns C and D. What’s our reporting
plan now?
4. Performing anchored calculations
In each calculation we’ve performed so far, we’ve moved down or across the
spreadsheet performing the equivalent calculation for that row or column.
But sometimes that isn’t what we want to do.
Let’s calculate the increase in revenue from each source as a percentage of
the total increase in revenue.
Use the formula:
=(D20/\$D\$31), and then autofill.
The \$ signs anchor the calculation. To anchor to a specific cell, you need a \$
in front of both coordinates.
Importing data, filtering, subtotals and pivot tables
Now we’re going to look at a larger dataset, used in reporting this story
about the drug company Pfizer's payments to doctors, to show how you can
use a spreadsheet to drill down to selections from the data, and to perform
useful summary calculations.
Data is often provided as text (.txt) or comma separated values (.csv) files,
so we’ll first learn how to import a text file into Calc.
TIP! Databases should always be able to export data as .txt or .csv files, so
bear this in mind whenever an organization says it can’t provide data
because it’s in a special format.
1. Importing a text file
Find the file Pfizer payments.txt, available online here, and open it in a text
editor See that individual entries are separated by Tabs and some of the
entries are surrounded by double quote marks.
Open a new document in Calc by selecting File>New>Spreadsheet. Then
select Insert>Sheet From File.
Browse for the file Pfizer payments.txt and open it. Fill in the dialog box as
follows:
Click OK at the next dialog box, and the data should import.
Format the columns as desired, making the last three currency, and the
TIP! To keep the header row present when you scroll through the
spreadsheet, select the row beneath it and then select Window>Freeze.
Save this file as Pfizer payments, and then again under a new name, by
selecting File>Save As. You can save as an ODF spreadsheet, or in Excel
format.
Notice that we have a list of more than 10,000 payments, which in this form
doesn’t tell us very much.
2. Filtering data
So let’s set up the spreadsheet so we can filter for what we’re interested in.
Let’s say we want a list of all doctors in California who were paid \$10,000
or more to run expert-led forums.
Select the entire spreadsheet be clicking the square at top left, then select
Data>Filter>Standard Filter, and fill in the dialog box as follows:
Click OK, and you should have a list of 31 doctors.
For this filter, we connected the Filter criteria by AND operators, which
ensured that rows were selected only if all the stated criteria were met.
Filters obey Boolean logic; see what happens if you replace the first AND
with OR.
TIP! If you need to keep, or do further calculations on a filtered subset of
the data select the data, Copy it and Paste into a new worksheet. To rename
the sheet for future reference, right click on the sheet tab and select Rename
Sheet
3. Calculating subtotals
Let’s now find out how much money went to each state. Select
Data>Subtotals and fill in the dialog box as follows:
Click OK, and then click on the 2 at top left to hide the data and just leave
the subtotals (1 hides everything except the grand total.)
Notice that we added together the totals by state using the function Sum, but
we could have used other functions to aggregate the data in other ways.
Select Data>Subtotals>Delete to clear the subtotals.
4. Making pivot tables
What if we want to know subtotals by state and by category of payment? We
can do that in one step, using a pivot table.
Select Data>Pivot Table>Create, click OK at the initial dialog box, and
then drag and drop in the dialog box as follows:
Notice that the calculation for Totals has defaulted to Sum, which is what we
want, but we can select other aggregate functions by double clicking on Sum
– Total.
Click OK, and the pivot table will be inserted into a new sheet. Format the
column widths as necessary.
The final pivot table should look like this:
```