Dot Maps of Three Datasets
Cell
Pine
RedwoodFigure 1. Dot plot representations of the normalized datasets. Leftmost of the screen shows the sample sizes of the biological cell, pine tree and redwood seedling. The mean center was computed for each dataset (see the red dots). The figure on the lower right side is a comparative representation of three datasets along with respective mean centers (se the scatter dots in the respective colors)
The visual representation of three data sets provides information about the distributions. First of all, the average intensity of cell data is very homogenous whereas redwood and pine datasets are distributed inhomogeneous. Accordingly, figure 1 provides information about the stochastic dependence between the data points in a unit area. The neighboring data points of redwood dataset are clustered. Therefore, the data points constructing the same pattern interact with each other. On the other hand, the distribution of pine dataset is random; therefore, the visual representation suggests that the data points are independent. In the case of cell dataset, the distribution of data points is regular. In other words, number of data points in a unit area is approximately equal.
2. Quadrat Counts and Histogram
Three data sets (biological cells, redwood seedlings, and pine trees) are divided into rectangular grids of size 2x2 and 4x4. Figure 2 and 3 show the numbering and location of the quadrats. Tables 1 and 2 represents the data point counts in each quatrad, 2x2 grids and 4x4 grids, respectively. Subsequently, figures 4, and 5 show the histograms of frequency distributions of the quadrat counts for respective grids.
Figure 2. 2x2 grid. Each rectangular unit represents a quadrat. The quadrats were divided into equal sizes.
Figure 3. 4x4 grid. Each rectangular unit represents a quadrat. The quadrats were divided into equal sizes.
Table 1. Number of data points in 2x2 grids.
Quadrat
Cell
Redwood
Pine
1
11
9
13
2
10
18
13
3
10
21
22
4
11
14
17
Figure 4. Histogram of the frequency distribution of the quadrat counts of 2x2 grids.
Table 2. Number of data points in 4x4 grids.
Quadrat
Cell
Redwood
Pine
1
2
0
2
2
2
5
4
3
2
8
5
4
3
2
4
5
3
4
6
6
4
0
1
7
1
2
5
8
4
9
8
9
3
2
6
10
2
7
2
11
4
5
5
12
2
2
4
13
2
9
5
14
3
0
0
15
3
2
4
16
2
5
4
Figure 5. Histogram of the frequency distribution of the quadrat counts of 4x4 grids.
The overall histogram analysis shows the distribution of the data points. The cell data points in figure 4 almost uniformly distributed in 2x2 grids whereas Gaussian distribution function would fit for redwood data (maximum number of data points are in the third quadrat). In the same figure, pine tree data points are randomly distributed in grids. The distribution density changes by changing the size of quadrat area (see Figure 5). For example, 8th and 13th quadrats of redwood data exhibit the highest density of data points whereas 1st, 6th and 14th quadrats do not contain any data point.
In order to test the CRS hypothesis, the appropriate quadrat counts were visually investigated (see Figure 2, 3, and 6. Eventually, the grid size is chosen as 10x10. The mean value, variance, standard deviation and VMR for each dataset were calculated by using MATLAB scripts (i.e., mean, var, std). The results are shown in table 3.
Table 3. Mean, variance, standard deviation and VMR values calculated for 10x10 quadrats.
Cell
Redwood
Pine
Mean
0.42
0.62
0.65
Variance
0.25
1.39
0.61
STD
0.50
1.18
0.78
VMR
0.59
2.24
0.94
The mean values of Redwood and pine datasets are similar; however, it was already analyzed that these two datasets show different distribution characteristics. Variance (square of standard deviation) is another value used to describe the distribution of datasets. Since the variance provides the information about the relative distance between mean and data point, one could say that the cell data is the most uniform data within analyzed datasets. The last value calculated in table 3 is the variance mean ratio (VMR). This ratio of the variance value to the mean value is used to identify whether the dataset is dispersed or clustered compared to CSR hypothesis. Finally, VMR value would quantify the distribution model of any dataset. The VMR value for Poisson distribution is defined as 1.The negative binomial distribution requires a VMR value larger than 1 while binomial distribution has a VMR value smaller than 1. In this regard, the cell and the pine datasets are under-dispersed. In other words, the data points are distributed uniformly in the spatial domain. Therefore, these two datasets obey the definition of binomial distribution. The redwood data is over-dispersed meaning the data points are clustered together. Thus, the underlying distribution model for redwood is negative binomial distribution.
Taken all together, one can define the distribution models of the datasets; however, the relationship between individual data points cannot be discussed only by quadrat count method.
You’re 81% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.