About Spreadsheets

Oliver Knill

April 5, 2004
(last update August 2016)
The following Dilbert cartoon summarizes it all:
Source: Dilbert strip, January 2016
A spreadsheet is a collection matrices together with some tools to visualize or the manipulate these data. Open source "Excel" replacements like "gnumeric" or "planmaker" have the advantage that they do the math correctly, but they do not solve conceptional problems of spreadsheet programs like Excel. Virtually any real programming language allows to deal with data more reliably and do that in a more modular way. Here is an example: a course assistant keeps track of homework grades in a text file "hw.txt"
#  student HW grades for course  "Ocean 12a" 
#  name first         HW1 HW2 HW3 HW4 HW5 
#
Clooney  George        10  18  29  17   7  
Pitt     Brad           5   6  34   4   3 
Damon    Matt          13   1  22   0  10
Roberts  Julia          9  19  27   7   6
Affleck  Casey         16  14  15   3  13
Caan     Scott         12   2  41   1  19
Qin      Shaobo         9  28  28  16   7
Mac      Bernie         5  15  34   4   2
Coltrane Robbie        13  11  21  19  10
Willis   Bruce          8  28  27   7   5


Running the line (studentaverage.txt)
cat hw.txt|grep -v "#"|awk '{k=0;s=0; for (i=3;i<=NF;i++) {k++;s=s+$i;} print $1,$2,s/k}'
spits out a list of average homework scores for each student.

The unix tool "grep" filters out comments and "awk" does the math. Average scores for HW1, HW2 of the entire class are obtained with
cat hw.txt|grep -v "#"|awk '{n1+=$3;n2+=$4;s++} END {print n1/s,n2/s}'
Here is a script "homeworkaverage.txt" which gives the averages of each of the colums. These rather silly examples can be part of a larger script which will produce the final grade.

It is more convenient to use Perl or Python or even a computer algebra system like Mathematica do such things. But most of the time, especially in educational setups, spreadsheet are used for trivial things like sorting, taking averages or adding up weighted scores.

Often one has to merge different spreadsheets. Here is an example program merge.perl written in Perl which accesses entries in both matrices individually by its indices. While it does the task not very efficiently, it allows a "mathematically trained mind" to manipulate these matrices. Having a powerful programming language at hand is especially useful when more complex row or column manipulations have to be done, for example when entries have to be checked for certain conditions and if there are exceptions. Dealing with exceptions is especially difficult with "out of the shelf" spreadsheet programs. Exceptions are also the main source for errors, when dealing with data.

The two postings in a slashdot discussion which hit the heart of the matter. Problems with the spreadsheet concept:

  • Mixing of code with data makes it difficult to test and audit things. How am I going to know from looking at a spread sheet which data have been computed and which have been entered? Splitting of code with data allows to test the code with lots of different data. Testing a code with different (also extreme) data can show problems with the code.
  • No Modularization: is the main obstacle to maintain complex data. The Unix philosophy of building things up with independent building blocks or filters is more efficient in the long run. Each filter is typically small and can be developed and tested independently.
  • Difficult to audit: how do we debug a complex spreadsheet written by someone else, how do I proofread it? If I want to believe a statement, I must be able to verify how it was derived or falsify it. This is very difficult to achieve in general and pretty graphs do not help.
  • Spagetti logic : entering formulas for cells just asks for spaghetti coding. Problems come also from rounding errors which build up.

Updates:

  • Added: April 24, 2005:
    The Register writes: "According to both PricewaterhouseCoopers and KPMG, more than 90% of corporate spreadsheets have material errors in them. Worse, estimates suggest that such errors costs between 10'000 and 100'000 Dollars per error per month." Source
  • Added: August 25, 2005:
    Some more links:
  • Added: June 4, 2006:
    An interesting link (see ) a new slashdot discussion on the topic:
  • Added: July 3, 2007
    The German site "Spiegel.de" has an article explaining a fundamental Excel error to some of its competitors.


  • Added December 20, 2008
    Quote:

    "The technology acceptance model holds that there are two main factors that determine the uptake of a technology: the perceived usefulness and the perceived ease-of-use. Perception need not correspond to reality. The perception of the ease-of-use of spreadsheets is to some extent an illusion. It is dead easy to get an answer from a spreadsheet, however, it is not necessarily easy to get the right answer. Thus the distorted view. The difficulty of using alternatives to spreadsheets is overestimated by many people. Safety features can give the appearance of difficulty when in fact these are an aid. The hard way looks easy, the easy way looks hard."

    Source
  • Added August 24, 2016
    20 percent of scientific papers on geners contains gene name conversion errors caused by Excel. Some comments in that article say that it is not a technology problem but a user problem. The fact remains that errors are easily done with that software and that the errors are hard to verify as Excel is opaque as code and data are mixed. There is a Slashdot discussion about it. The topic was picked up at many news outlets: Register, Heise article in German.
    Last update, August 24, 2016, Oliver Knill, 2004-2016