SPREADSHEETS (Data for this tutorial at www.peteraldhous.com/Data) Spreadsheets are great tools for sorting, filtering and running calculations on tables of data. Journalists who know the basics can “interview” data to find stories and trends that others may miss. In this tutorial, we will work with LibreOffice Calc, which you can download from here. Once you know your way around a spreadsheet, it’s fairly easy to pick up the skills needed to work with many other tools for data analysis. TIP! Whenever you get some new data, save it under a new name! Then if you mess things up, you can always go back to the original. So, let’s open the file City Budget.xls, available online here, by selecting File>Open and save it under a new name by selecting File>Save As. Formatting, basic calculations and sorting Spreadsheets use a simple co-ordinate system, with letters for columns and numbers for rows. You can make calculations on data using these coordinates. This data was provided some years ago to a reporter working on a small city paper. 1. Formatting data First, let’s format the columns as currency. Select columns B and C by clicking on the letters while pressing Ctrl (Windows) or command (Mac); then select Format > Cells from the top Menu; when the dialog box pops up, select Currency, USD $ and 0 decimal places. So, now we have everything in $, but we still can’t see everything in column A, so let’s format the column width. Click the square at the top left, which selects all the data in the sheet. Then Format>Column>Optimal Width and OK at the next dialog box. TIP! If ever you see a column containing ###, it contains a number that is wider than the column. Use the same trick to fix the problem. 2. Performing basic calculations Now let’s calculate how the budget has changed from last year to this year. All calculations start by entering = into the cell. So, the formula for first cell is =(C2-B2). (At this stage, we may need to format the cells in the column so that they are also in $.) Move the cursor to the bottom right hand corner of the cell until you see it change to a cross, then double click. Calc autofills the column, performing the same calculation for each row until it hits a space. TIP! Calc can recognize other patterns in your data and autofill in the same way. To number a list of records, for instance, type 1, 2, 3, into the first three cells next to a column of data, select those three cells and autofill. You can also Copy (Ctrl-C or command-C) the first cell and Paste (Ctrl-V pr command-V) all the way down the column. See how the “marching ants” form around the cell you’ve copied. (This will give 0s where the cells are blank, and #VALUE! errors as Calc tries to perform a calculation on text. These can simply be deleted.) TIP! You can lose your work if you later delete data that was used to calculate values that you are still working with. If this is a possibility, Copy the calculated data and paste it into a new column by selecting Edit>Paste Special and selecting Numbers, Text etc, as appropriate. So let’s do that here: You may need to format our new column to currency once more. See how the cells now contain the numbers, not the formulas used to calculate them. Then delete the old column D by selecting the cells, then Edit>Delete Cells. Now let’s calculate the percentage change in budget, rather than the absolute $ value. The formula for the first cell is: =((C2-B2)/B2) or =(D2/B2). So enter this, and copy down the column as before. Again, you may need to format to have the results displayed as a percentage, rather than a decimal. 3. Sorting data Let’s sort by % change. To do so, highlight the range of data you want to sort, then select Data>Sort. When the dialog box pops up, select Sort key 1> % Change and check Descending order. TIP! When sorting, make sure Calc has recognized that your data has a header row if there is one present, otherwise that will be sorted as well. Look under the Options tab in the Sort dialog box and if necessary check Range contains column labels. TIP! Make sure when sorting to select the full range of data you want to sort, and no more. Missing rows or columns from sorts, or including extraneous data are common ways to scramble your data. Use Edit>Undo if you make a mistake. In this case we want to sort separately on the Departments and Revenues tables, Taking care not to include the Totals. Sorting by % change and Difference, for both Departments and Revenues can produce ideas for story angles. What might the reporter have pursued in this case? TIP! Always check any calculations that you haven’t done yourself The mantra of journalism should be: if your mother says she loves you, check it out! We should be as skeptical of numbers as we are of human sources. So let’s just check that the totals we were given add up correctly. To check the Total for This Year, Departments, the formula is =SUM(B2:B14). Now Copy this cell, and Paste into columns C and D. What’s our reporting plan now? 4. Performing anchored calculations In each calculation we’ve performed so far, we’ve moved down or across the spreadsheet performing the equivalent calculation for that row or column. But sometimes that isn’t what we want to do. Let’s calculate the increase in revenue from each source as a percentage of the total increase in revenue. Use the formula: =(D20/$D$31), and then autofill. The $ signs anchor the calculation. To anchor to a specific cell, you need a $ in front of both coordinates. Importing data, filtering, subtotals and pivot tables Now we’re going to look at a larger dataset, used in reporting this story about the drug company Pfizer's payments to doctors, to show how you can use a spreadsheet to drill down to selections from the data, and to perform useful summary calculations. Data is often provided as text (.txt) or comma separated values (.csv) files, so we’ll first learn how to import a text file into Calc. TIP! Databases should always be able to export data as .txt or .csv files, so bear this in mind whenever an organization says it can’t provide data because it’s in a special format. 1. Importing a text file Find the file Pfizer payments.txt, available online here, and open it in a text editor See that individual entries are separated by Tabs and some of the entries are surrounded by double quote marks. Open a new document in Calc by selecting File>New>Spreadsheet. Then select Insert>Sheet From File. Browse for the file Pfizer payments.txt and open it. Fill in the dialog box as follows: Click OK at the next dialog box, and the data should import. Format the columns as desired, making the last three currency, and the header row bold. TIP! To keep the header row present when you scroll through the spreadsheet, select the row beneath it and then select Window>Freeze. Save this file as Pfizer payments, and then again under a new name, by selecting File>Save As. You can save as an ODF spreadsheet, or in Excel format. Notice that we have a list of more than 10,000 payments, which in this form doesn’t tell us very much. 2. Filtering data So let’s set up the spreadsheet so we can filter for what we’re interested in. Let’s say we want a list of all doctors in California who were paid $10,000 or more to run expert-led forums. Select the entire spreadsheet be clicking the square at top left, then select Data>Filter>Standard Filter, and fill in the dialog box as follows: Click OK, and you should have a list of 31 doctors. For this filter, we connected the Filter criteria by AND operators, which ensured that rows were selected only if all the stated criteria were met. Filters obey Boolean logic; see what happens if you replace the first AND with OR. TIP! If you need to keep, or do further calculations on a filtered subset of the data select the data, Copy it and Paste into a new worksheet. To rename the sheet for future reference, right click on the sheet tab and select Rename Sheet Select Data>Filter>Remove Filter to return to the entire spreadsheet. 3. Calculating subtotals Let’s now find out how much money went to each state. Select Data>Subtotals and fill in the dialog box as follows: Click OK, and then click on the 2 at top left to hide the data and just leave the subtotals (1 hides everything except the grand total.) Notice that we added together the totals by state using the function Sum, but we could have used other functions to aggregate the data in other ways. Select Data>Subtotals>Delete to clear the subtotals. 4. Making pivot tables What if we want to know subtotals by state and by category of payment? We can do that in one step, using a pivot table. Select Data>Pivot Table>Create, click OK at the initial dialog box, and then drag and drop in the dialog box as follows: Notice that the calculation for Totals has defaulted to Sum, which is what we want, but we can select other aggregate functions by double clicking on Sum – Total. Click OK, and the pivot table will be inserted into a new sheet. Format the column widths as necessary. The final pivot table should look like this:

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising