Talk:Observer bias

Latest comment: 5 months ago by Yomomo in topic Interesting example

Examples of cognitive bias - wrong article?

edit

I have removed the following content, as it seems to be relevant not to this article but to cognitive bias. --Piotrus at Hanyang| reply here 07:21, 16 March 2024 (UTC)Reply

Examples of cognitive biases include:

  • Anchoring – a cognitive bias that causes humans to place too much reliance on the initial pieces of information they are provided with for a topic. This causes a skew in judgement and prevents humans and observers from updating their plans and predictions as appropriate.
  • Bandwagon effect – the tendency for people to "jump on the bandwagon" with certain behaviours and attitudes, meaning that they adopt particular ways of doings things based on what others are doing.
  • Bias blind spot – the tendency for people to recognize the impact of bias on others and their judgements, while simultaneously failing to acknowledge and recognize the impact that their own biases have on their own judgement.
  • Confirmation bias – the tendency for people to look for, interpret, and recall information in such a way that their preconceived beliefs and values are affirmed.
  • Guilt and innocence by association bias – the tendency for people to hold an assumption that individuals within a group share similar characteristics and behaviours, including those that would hail them as innocent or guilty.
  • Halo effect – the tendency for the positive impressions and beliefs in one area around a person, brand, company, product or the like to influence an observers opinions or feelings in other unrelated areas.
  • Framing effect – the tendency for people to form conclusions and opinions based on whether the pertinent relevant is provided to them with positive or negative connotations.
  • Recency effect – the tendency for more recent pieces of information, ideas, or arguments to be remembered more clearly than those that preceded.

Interesting example

edit

Hi!

I found the following example concerning statistical bias that I find interesting, particularly because of its unexpected outcome. As a teacher in mathematics and physics I would like to use it and also put it as an example in this article. This is how I would like to use it here in the article, in the examples part:

The observer bias may also have to do with cultural characteristics of the observer. Let's suppose that the number of tourists per month was counted in a village. The outcome, including a linear regression for the yearly number of tourists, is presented in the following graphics:

The villagers noted a decline of the number of Tourists through the years. In the year 2013 they organised three concerts of a local music group in the hope to increase the number of tourists. The conclusion was that this event had an effect on the number of tourists, as the number of tourists in these months was rather high and the total number of tourists for 2013 was higher than the years direct before and lies outside the 95% interval according to the linear model. The R² value lies in this case also quite high (0.81), which shows a rather high correlation.

Let's suppose now, that the outcome would be as follows:

We have here exactly the same data with the difference that the data are shifted some months. In this case we cannot statistically contradict the null Hypothesis, that the concerts had no effect on the total number of tourists through the year. The differences could be due to statistical fluctuation. We have though a value in 2012, that is slightly under the 95% limit and would maybe need an explanation. The R² value is slightly higher (0.91).

In the first case we use two abrupt "highs" in a period of less than one year. Of course the months, when the group played, have the highest amount of tourists. This is though exactly the case also in the second case. The only difference is the timing of the unstable maximal values ("highs"). Because the "highs" are unstable and abrupt, we should use intervals that include only one "high", in order to make valuable conclusions, if this is possible. The conclusion is therefore that, with these data, we cannot dispose of the null hypothesis and thus, that we cannot say that the group actually affected the number of tourists (more precisely: we cannot statistically contradict the hypothesis that the group had no effect on the total amount of tourists) in BOTH cases: In the second case this is obvious; In the first case we can conclude that we have a statistical bias due to the use of two abrupt "highs" in one (12-month) period (for the year in question, namely 2013), although there is a way to have 12-months periods with just one "high". The most probable conclusion would be rather that tourists, that would anyway come, preferred to come in the concert months. Thus we can see, that a cultural characteristic (here: defining the beginning and the end of a year) can have an effect on the outcome of the statistical analysis.

My questions are:

  • Is the example (and its conclusions) correct?
  • If so, do you find it interesting enough, so as to use it here, in this article about observer bias (or maybe also in your class, if you are teaching)?

Here are the data sets for the first and the second case, in case you want to test the outcomes.

year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec total
(x1000)
1994 934 895 889 838 861 822 822 855 864 943 948 836 10.51
1995 1010 915 847 791 847 853 808 903 861 984 1040 850 10.71
1996 928 861 811 772 833 811 805 856 825 947 969 819 10.24
1997 875 808 830 780 797 802 780 836 827 921 910 871 10.04
1998 1038 882 827 794 799 788 766 794 790 890 928 812 10.11
1999 906 818 818 790 818 807 796 834 834 889 927 845 10.08
2000 960 856 795 773 790 768 790 823 817 904 909 822 10.01
2001 909 855 806 768 800 768 773 806 805 908 1016 854 10.07
2002 864 794 794 735 746 735 724 805 792 846 915 782 9.53
2003 862 803 782 733 776 749 760 798 791 913 919 770 9.66
2004 866 797 775 712 765 743 754 807 759 865 891 886 9.62
2005 923 812 785 717 754 743 733 791 764 884 921 832 9.66
2006 895 827 754 717 727 727 717 764 752 824 840 767 9.31
2007 798 747 767 705 767 747 726 772 744 826 888 831 9.32
2008 888 795 754 703 729 698 718 744 746 807 883 792 9.26
2009 827 766 726 680 736 726 695 761 755 845 930 805 9.25
2010 825 775 750 685 695 700 690 760 741 815 870 825 9.13
2011 934 771 687 657 736 687 687 736 694 782 816 733 8.92
2012 762 718 699 650 679 689 670 738 721 765 808 731 8.63
2013 828 1021 862 721 687 673 653 707 790 983 983 727 9.64
2014 751 703 698 636 674 679 689 727 734 821 854 792 8.76
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec total
(x1000)
1995 864 943 948 836 1010 915 847 791 847 853 808 903 10.57
1996 861 984 1040 850 928 861 811 772 833 811 805 856 10.41
1997 825 947 969 819 875 808 830 780 797 802 780 836 10.07
1998 827 921 910 871 1038 882 827 794 799 788 766 794 10.22
1999 790 890 928 812 906 818 818 790 818 807 796 834 10.01
2000 834 889 927 845 960 856 795 773 790 768 790 823 10.05
2001 817 904 909 822 909 855 806 768 800 768 773 806 9.94
2002 805 908 1016 854 864 794 794 735 746 735 724 805 9.78
2003 792 846 915 782 862 803 782 733 776 749 760 798 9.60
2004 791 913 919 770 866 797 775 712 765 743 754 807 9.61
2005 759 865 891 886 923 812 785 717 754 743 733 791 9.66
2006 764 884 921 832 895 827 754 717 727 727 717 764 9.53
2007 752 824 840 767 798 747 767 705 767 747 726 772 9.21
2008 744 826 888 831 888 795 754 703 729 698 718 744 9.32
2009 746 807 883 792 827 766 726 680 736 726 695 761 9.15
2010 755 845 930 805 825 775 750 685 695 700 690 760 9.22
2011 741 815 870 825 934 771 687 657 736 687 687 736 9.15
2012 694 782 816 733 762 718 699 650 679 689 670 738 8.63
2013 721 765 808 731 828 1021 862 721 687 673 653 707 9.18
2014 790 983 983 727 751 703 698 636 674 679 689 727 9.04

R Statistics was used for the linear regression diagrams (first and last column of the table), OpenOffice for the presentation of the data in diagrams. The code for R statistics follows:

library(readxl)
rs1 <- read_excel("Documents/rs1.xlsx")
rs1f <- data.frame(rs1)
modelA<- lm(total~ year, data= rs1)
a<- length(rs1$year)
yearValues <- seq(1, a, 1)
Apredict <- predict( modelA, list(year=yearValues))
Apredictf <- data.frame(Apredict)
ConfInSwed <- predict(modelA,interval = "confidence")
ConfInSwed <- data.frame(ConfInSwed)
ConfInSwed$year <- rs1$year
ConfInSwed$year2 <- ConfInSwed$year^2
modelUpCI<- lm(upr~ year+year2, data= ConfInSwed)
LineUpCI <- predict( modelUpCI, list(year=yearValues,year2=yearValues^2))
modelDownCI<- lm(lwr~ year+year2, data= ConfInSwed)
LineDownCI <- predict( modelDownCI, list(year=yearValues,year2=yearValues^2))
rs1f$pred <- Apredictf$Apredict
rs1f <- transform(rs1f, PercPred = 100*(total-pred) / pred)
rs1f$downCI <- LineDownCI
rs1f$upCI <- LineUpCI
rs1f <- transform(rs1f, PercDownCI = 100*(total-upCI) / upCI)
rs1f <- transform(rs1f, PercUpCI = 100*(total-downCI) / downCI)
PredInSwed <- predict(modelA,interval = "prediction")
PredInSwed <- data.frame(PredInSwed)
PredInSwed$year <- rs1$year
PredInSwed$year2 <- PredInSwed$year^2
modelUpPI<- lm(upr~ year+year2, data= PredInSwed)
LineUpPI <- predict( modelUpPI, list(year=yearValues,year2=yearValues^2))
modelDownPI<- lm(lwr~ year+year2, data= PredInSwed)
LineDownPI <- predict( modelDownPI, list(year=yearValues,year2=yearValues^2))
rs1f$downPI <- LineDownPI
rs1f$upPI <- LineUpPI
rs1f <- transform(rs1f, PercDownPI = 100*(total-upPI) / upPI)
rs1f <- transform(rs1f, PercUpPI = 100*(total-downPI) / downPI)
rs1f <- transform(rs1f, DifPIMCI = (upPI-downPI) - (upCI-downCI))
yearList<-seq(2015-a, 2014, 1)
rs1$year<-yearList
rs1f$year<- yearList
plot(rs1$year,rs1$total,xlab="Year",ylab="Tourists")
lines <- lines(yearList, Apredict, col=2, lwd=2)
lines <- lines(yearList, LineUpCI, col=2, lwd=3, lty=2)
lines <- lines(yearList, LineDownCI, col=2, lwd=3, lty=2)
lines <- lines(yearList, LineUpPI, col=2, lwd=2, lty=3)
lines <- lines(yearList, LineDownPI, col=2, lwd=2, lty=3)
write.table(rs1f, col.names = NA)
summary(modelA)

In your documents you should load (and save) the tables one after another putting "(x1000)" away, before running R-statistics

Thanks in advance for your advice! Yomomo (talk) 11:32, 16 July 2025 (UTC)Reply


Wikipedia is not for 'self research' - it should only include such analysis if it was published in a trusted source (e.g., peer-reviewed journal) Tal Galili (talk) 13:46, 12 January 2026 (UTC)Reply
Hallo Tal! Thanks for the answer and for the time. I am a teacher and would like to know if the example is right. You already wrote to me in an e-mail, that it might be right. It is though no "self research". It is just an example using already existing knowledge like in Cross-multiplication#Double rule of three or like most of the diagrams in linear regression. As far as I know and as the aforementioned examples show, such examples may exist in many articles in mathematics. The most important for me is though, if it is correct. If you have some time, please look at it one more time and write to me with some more certainty :-) if it s right. If you still think this is "new knowledge" and this is also the mind of others in the wiki statistic group, then it can be removed. But still, if it is correct, and you are a professor teaching statistics, then it would be maybe also for you an interesting example to use... Greetings! Yomomo (talk) 20:50, 12 January 2026 (UTC)Reply