In this paper, we perform a linear regression analysis on data related to the number of daily e-mails received and sent by a user. The data is represented as a time series at uniform time intervals. The outcome is a statistical analysis to determine whether a natural temporal ordering is inherent in the data.
Correlation, Simple Linear Regression
In this paper, we perform a linear regression analysis on previously collected data related to the number of daily e-mails received (R) and sent (S) by a particular user (the author). We have depicted the original daily e-mail data as a time series, incrementing N. By 1 for each day's measurement. The computed regression coefficient r is the slope of the regression line.
A time series consists of sequencing successive data points at uniform time intervals. As such, this exercise represents a meaningful statistical analysis to determine whether a natural temporal ordering is inherent in the data. It should be recalled that the collection of data for each of R. And S. consisted of 15 daily samples, which were collected during two exercises spanning 10 and 5 days respectively. This factor will be noted in the analysis to follow. Table 1 illustrates the predicted values and subsequent regression analysis for the e-mails received (R). Table 2 illustrates the predicted values and subsequent regression analysis for the e-mails sent (S).
This section summarizes the regression analysis for e-mails received (R).
Time Series (N)
E-mails Received (R)
Predicted Value
1
2.581
2
0.147
3
6.098
4
2.311
5
97
6.909
6
72
10.290
7
81
9.073
8
77
9.614
9
87
8.261
10
93
7.450
11
56
12.454
12
67
10.966
13
70
10.561
14
61
11.778
15
63
11.507
Table 1: Predicted Values for R.
Based on the online analysis using the tools by Waner et. al. (1999), the linear regression equation, the regression coefficient r, and the resulting graphical portrayal of the time series for R. are given by:
y = f (R) = -0.135243 x + 20.0276
r = -0.843775
Figure 1: linear regression for R, all 15 days
Analysis of the results for parameter R. display a distinct downward trend over time, and the best-fit regression line does appear to visually correspond to the distribution of data. This may therefore be considered a credible interpretation of the trend of received e-mails (R), over a period of 15 consecutive days.
This section summarizes the regression analysis for e-mails sent (S).
Time Series (N)
E-mails Sent (S)
Predicted Value
1
68
7.771
2
72
8.280
3
64
7.263
4
55
6.118
5
43
4.592
6
49
5.355
7
52
5.737
8
55
6.118
9
46
4.974
10
62
7.008
11
98
11.586
12
12.730
13
11.967
14
84
9.806
15
91
10.696
Table 2: Predicted Values for S.
Based on the online analysis using the tools by Waner et. al. (1999), the linear regression equation, the regression coefficient r, and the resulting graphical portrayal of the time series for S. are given by:
y = f(S) = 0.127148 x + -0.874922
r = 0.606856
Figure 2: linear regression for S, all 15 days
Analysis of the results for parameter S. result in a best-fit regression line that is gradually increasing over time. In other words, r is positive, and the regression line has a positive (rising) slope. However this best-fit regression line does not appear to correspond visually to the distribution of the data in S. Furthermore, we can visually observe that the data appears to be grouped into two distinct sets, which incidentally corresponds to the first 10 days and the last 5 days respectively. It is therefore highly likely that we are seeing a result that might be better represented by two regression lines, the first for the initial 10 days, and the second for the final 5 days, as seen in Figures 3 and 4.
Figure 3: Linear regression for S, first 10 days y = f(S) = -0.170591 x + 15.1555
r = -0.54187
Figure 4: Linear regression for S, last 5 days y = f(S) = -0.11606 x + 14.165
You’re 80% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.