Research Paper Doctorate 2,601 words

Kde and Kme Kernel Density

Last reviewed: January 17, 2010 ~14 min read

KDE and KME

Kernel Density Estimation (KDE)

Abstract-- Kernel Density Estimation KDE is also known as the Parzen Window Method, after Emanuel Parzen. Parzen is the pioneer of kernel density estimation. Density estimation entails constructing an estimate based upon observed data, where the underlying probability density function cannot be observed. A kernel in turn is used as a weighting function in nonparametric tests. In KDE, kernels play a central role in estimating the density functions of random variables. The phenomenon can be usefully applied to the communication sector of Electrical Engineering.

Index Terms-- KDE, Parzen, Electrical Engineering, Parzen Window Method

Non-parametric statistics is often referred to as distribution free, as they do not require a normal data distribution [8]. These tests also entail that less restrictive data assumptions are required; hence allowing for categorical and rank data analysis. Although nonparametric tests can be used with a broader data type ranges, parametric tests are often preferred for a number of reasons [8]:

They are robust; they are more power efficient than nonparametric tests; they provide unique information; and they answer a specific type of question. Nonparametric tests can be preferred in certain circumstances and as a result of certain data conditions, but they do not encompass all possible areas of study. Nonparametric statistics is generally used when there is significant uncertainty about the data.

A parametric statistical model is therefore one with a distribution that depends upon unknown constants, or parameters [3]. Only the parameters are unknown within such models. Parametric statistics has as its goal to infer unobserved parameters by means of observed statistics. When there is no prior knowledge of underlying distributions, parametric statistics could yield flawed results, and so a nonparametric model is required. The assumption in such a model is that the data distribution is independently identically distributed. Such a model has no parameters.

A third possibility is the semiparametric model [3]. In such a model, parameters are present, but there are very weak assumptions about the distribution form of the observed data. Semiparametric models are often regarded as nonparametric and studied as such. The distinction, originating in the 1960's, is however increasingly common among statisticians. Robust procedures are required for both non- and semiparametric models because of the weak assumptions upon which they depend.

New research [2] has revealed that the traditional method of kernel density estimate can be enhanced by a reasonable parametric guess about density. In this way, the parametric method can be combined with the traditional nonparametric method. This is then an example of the semiparametric approach mentioned above, where the initial parametric guess is multiplied by a kernel estimate of the correction factor. The result is not a density, and is corrected through division by its own total mass [2]. It was found that this corrected estimate generally performs better than the one obtained by the traditional method.

II. KERNEL DENSITY ESTIMATION (KDE)

KDE is also known as the Parzen Window Method, after Emanuel Parzen [1]. Emanuel Parzen is the pioneer of kernel density estimation, and has worked on signal detection theory and time series analysis. Density estimation entails constructing an estimate based upon observed data, where the underlying probability density function cannot be observed. A kernel in turn is used as a weighting function in nonparametric tests. In Kernel density estimation, kernels play a central role in estimating the density functions of random variables.

KDE has several applications; it is most commonly used in specific computer applications such as signal processing and histograms. In electrical engineering, signal processing concerns the operations and analysis of signals, where KDE is then used to perform operations on the signals.

A histogram is a statistical function that displays tabulated frequencies in the form of bars. As a form of data binning, the histogram provides statistical information about categories of information and proportions of the population being studied within these categories. It is the simplest and most frequently encountered nonparametric density estimator [5].

Further applications include MATLAB, a numerical computing environment and programming language. It allows metric manipulation, function and data plotting, algorithm implementation, user interfaces, and interfacing with multi-lingual programs. The Kernel Density Estimator is furthermore used in numerous software programs, such as Mathematica and Strata, as well as programming languages such as R. R presents a software environment for computing and graphics within the statistical field. R is implemented by the S. programming language, and uses lexical scoping semantics derived from Scheme.

KDE is also applied to the Java programming language developed by James Gosling. The Java language includes such facilities as the Weka machine learning suite, developed at the University of Waikato [1]. Gnuplot is an application for the generation of two- and three-dimensional function and data plots. In education, this is used to create graphics and other educational materials. Gnuplot is compatible with all major computers and operating systems.

The software company ESRI provides Geographic Information System software and geodatabase management applications with the use of KDE. It is managed from the Spatial Analyst toolbox by means of the Silverman quadratic kernel function.

From the above, it is clear that KDE has many and varied applications in the computing and thus also in the communication field. It is logical that this function can then also be applied to Electrical Engineering and communication.

The communication field in Electrical Engineering is a rapidly developing field of both investigation and application. All businesses of any significant size in today's world make use of some form of communication technology. Hence it is important to investigate ways in which KDE can be applied to facilitate the storage, movement, and dissemination of data.

III. APPLICATION of KDE to ELECTRICAL ENGINEERING

Some studies focus upon Kernel Density Estimation and its impact on distributed data [6]. The study is based upon the fact that today's business, health and other sectors often incorporate large data repositories distributed over many sites. Often, it is difficult to centralize such data when communication is limited. Hence the application of KDE to the communication field of Electrical Engineering.

In order to address the problem of data centralization, a framework is examined in which local models are build to combine at a central site and produce an approximation to the KDE. It is also noted that, although traditional data mining assumes the data set to be located on a single server, this is not necessarily true. Indeed, particularly when datasets are particularly large, the repositories are distributed over many servers, and also often over great geographic distances.

Traditional data analysis technology would then also require the data to be centralized. The problem here is communication costs, which significantly rise in order to centralize the data of interest. In addition, some data are sensitive and not to be displayed publicly. The necessity to centralize data would incur unacceptable costs in terms of both finance and privacy.

The KDE can then be applied to enhance both cost estimates and privacy. Such an application increases the cost effectiveness of communication and data technology. Security issues are also a very important issue in today's business, health, and especially politics. The application of KDE in this way then has significant advantages in the communications field. KDE is therefore a very useful application in both the computer and engineering fields, where all sectors can benefit.

Kaplan-Meier Estimator

Abstract-- the Kaplan-Meier Estimator is used to calculate the survival rate for subjects of a certain type. The survival rate is monitored, and the probability of continued future accessibility is calculated on a year-by-year basis. The survival rate can then be applied to larger population sizes in order to determine the likelihood of future year-by-year survival. In Electrical Engineering, KME is usefully applied where data sets exist over a large conglomerate of networks. The likelihood of such data being lost as a result of their remote locations, their lack of back-up, or their inefficient storage can be investigated. Companies can then optimize their data in response to these estimates.

Index Terms -- KME, survival rate, data networks, Electrical Engineering

I. INTRODUCTION

Non-parametric statistics is often referred to as distribution free, as they do not require a normal data distribution [8]. These tests also entail that less restrictive data assumptions are required; hence allowing for categorical and rank data analysis. Although nonparametric tests can be used with a broader data type ranges, parametric tests are often preferred for a number of reasons [8]:

They are robust; they are more power efficient than nonparametric tests; they provide unique information; and they answer a specific type of question. Nonparametric tests can be preferred in certain circumstances and as a result of certain data conditions, but they do not encompass all possible areas of study. Nonparametric statistics is generally used when there is significant uncertainty about the data.

A parametric statistical model is therefore one with a distribution that depends upon unknown constants, or parameters [3]. Only the parameters are unknown within such models. Parametric statistics has as its goal to infer unobserved parameters by means of observed statistics. When there is no prior knowledge of underlying distributions, parametric statistics could yield flawed results, and so a nonparametric model is required. The assumption in such a model is that the data distribution is independently identically distributed. Such a model has no parameters.

A third possibility is the semiparametric model [3]. In such a model, parameters are present, but there are very weak assumptions about the distribution form of the observed data. Semiparametric models are often regarded as nonparametric and studied as such. The distinction, originating in the 1960's, is however increasingly common among statisticians. Robust procedures are required for both non- and semiparametric models because of the weak assumptions upon which they depend.

II. KAPLAN-MEIER ESTIMATOR (KME)

The Kaplan-Meier Estimator is used to calculate survival rate for subjects of a certain type. The survival rate is monitored, and the probability of continued future accessibility is calculated on a year-by-year basis. The survival rate can then be applied to larger population sizes in order to determine the likelihood of future year-by-year survival [7].

Such studies are complicated by subjects dying or becoming inaccessible for reasons other than those being studied. This incurs a significant amount of uncertainty, which cannot be controlled [7]. The Kaplan-Meier procedure refers to subjects who become unavailable for such reasons as censored. They nonetheless remain part of the study in order to maintain the integrity of the global population and concurrent information.

The creators of the Kaplan-Meier estimator then determined that, while censored subjects remain as part of the study, they should be deleted from the number at risk for the next time period. They are then included once again for a later time period.

It is emphasized that the Kaplan-Meier procedure does not apply only to biological or indeed lifetime survival. It can also apply to accessibility, or the likelihood of a machine to maintain its performance for a number of years. It can also apply to the success rate of a certain action, and so on.

Because of its nature as an estimator of the survival function, one of the common applications of the Kaplan-Meier Estimator is medical science. The estimator lends itself well to this field, so that estimated survival times can be communicated to patients and family members. It can however also be applied to the shelf life of certain medicines or the success rate of certain procedures. As an estimator of these functions, the Kaplan-Meier estimator has favorable properties, including self-consistency, strong consistency, and asymptotic normality. An inherent problem is however the step-function of the estimator, necessitating a search for a smoothing function. This has led to studies that combine the kernel density estimator with the Kaplan-Meier estimator [4].

A recent suggestion has been the use of Bivariate survival time data in combination with the Kaplan-Meier procedure in order to smooth the data estimates. Bivariate survival time data refers to datasets where a set of two organs or limbs are estimated; such as time to visual loss in the left and right eyes, and so on. When censoring occurs, the above-mentioned nonparametric estimation becomes important for the bivariate survival function.

You’re 82% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2010). Kde and Kme Kernel Density. PaperDue. https://www.paperdue.com/essay/kde-and-kme-kernel-density-74503

Always verify citation format against your institution’s current style guide requirements.