Mining the Process of Extracting New Information essay

Download this essay in word format (.doc)

Note: Sample below may appear distorted but all corresponding word document files contain proper formatting

Excerpt from essay:

Mining

The process of extracting new information from existing information through the use of computer system is called Text Mining. Text mining retrieves data of available information and establishes the connection between the facts mentioned in that data. This is how, new information is developed. Since it is newly formed information, its validation is conducted through experimentation. The process of web search is often confused with that of text mining, though these are two entirely different processes. In web search, the computers match the keywords in the database and bring the relevant records. The information is written down by somebody and then uploaded on the internet to make it searchable. On the other hand, in text mining, altogether new information is generated out of existing body of knowledge (Berry, 2004).

Text mining finds its roots in data mining. Data mining refers to the process in which the computer system retrieves unique information from the existing database. Hence text mining is also named as Text Data Mining. Other names for text mining are Intelligent Text Analysis and Knowledge-Discovery in Text (KDD). It extracts the interesting information out of unstructured text. Data mining from unstructured information has high value in the emerging field of text mining. It is because of readily availability of unstructured data and its large volume. Text mining enjoys the perception of high commercial value as more than 80% of the information is stored in the form of text and can be explored to generate new body of knowledge. In addition to data extraction, text mining includes computational linguistics, statistics and machine learning as well (Berry, 2004).

Knowledge Discovery from Database (KDD) is enjoying portion of eminence in the field of emerging applications, like Text Understanding. It works through extracting both implicit and explicit concepts from the existing data and then forming semantic relations among the concepts. It is done with the help of Natural Language Processing Techniques commonly known as NLP Techniques. KDD when combined with NLP discovers useful information though knowledge management, information extraction, machine learning, statistics and reasoning (Navathe et al., 2000).

As mentioned earlier, data mining and text mining are somewhat similar concepts. The only difference lies in the type of data explored and the tools used. Data mining works well with highly structured data only, while text mining is applicable for semi-structured or unstructured data as well. The unstructured data includes HTML files, full-text documents and emails. In this perspective, it becomes more preferable to the companies. But there is also an aspect which prevents the use of text mining. This hindrance is the dependence on NLP. It is because natural language was not meant for computer systems initially nor it is developed for this purpose. Because of this issue, structured data and data mining practices are more prevalent in the field of research and development (Navathe et al., 2000).

The obstacles posed by computers system in regard of NLP does not exist in case of human beings. The human beings can easily comprehend the language patterns and can even distinguish between the various ones applied in the same text. The examples are contextual meanings, the slangs and spelling variation in a database. The computer systems are not yet equipped with the capability of linguistic patterns identification quickly (Weiguo, 2005).

A collection of documents is provided to the text mining tool. After exploring them, it selects one particular document to identify its character set and format. After this phase, it starts analyzing the text mentioned in the document. It repeatedly applies various techniques to extract information from the database. The presented example quote three techniques of text analysis, however, there be many others based on the combination of these techniques. It basically depends upon the organizational goals, which provide guidelines about the data to be extracted. The retrieved data is inserted in the organizational management information systems so that the end users may retrieve it for their use (Weiguo, 2005).

Statement of the problem

There is a gap in the literature regarding the text information extraction from a huge database.

Purpose of the study

The study investigates how to extract a specific phrase from a text. It employs survey techniques to interview experts in the field and assesses results using coding techniques.

Rationale of the study

It is important to note that several research studies related to text extraction have been carried out. However, no research has focused on the evaluating text information extraction in large datbases using survey interview techniques. Therefore this research will fill this vital gap existing in the literature and focus on investigating the extent to which text extraction can be made accurate and precise.

Lastly, this study offers a number of theoretical contributions as well. Common analytical and operational issues have become increasingly vital as institutions move from comparatively simple methods and communication models, to intricate multi-channel models. Also, it is worth noting that the collective forces of technology, demography, control, as well as, globalization have been pushing organizational information systems, all over the world, to change their strategy so as to keep pace with the ever changing world. Evaluating the extent to which text extraction from large databases can be made accurate and precise has been a neglected topic. This study will shed light on this vital subject.

Research Questions

The question below are the main research questions:

How to extract a specific phrase from a text in large databases?

Literature Review

Technological foundations

The gap that had started to occur between computer and human languages, because of the numerous variations between them, is now narrowing down due to the improvement in technology. The computer is now able to comprehend, criticize and produce text on its own because they have been taught the natural language with the help of a program created by the people who work in the field of natural language. Some of the things developed in the program that helps the computer in producing text are how to track a topic, how to get relevant information from the database, form data in organized manner, shorten it, form links between topics and how to answer questions. All these developments and their role along with how the user will find these programs to be useful will be discussed in detail (Sergio, 2002).

A. Extraction of Information

That program helps realize the main things of a text which is done by identifying how the text is written, known as pattern matching. The link between all places, time and people is indentified so that the user is given useful information out of the database. This is helpful when large quantity of data is being processed. Previously, it was assumed that the information to be used is the related one. However, that is not the case. In many programs the electronic information is not in the form of a structure but freely available. This issue is dealt by the IE as their work is to form a structured data from the raw one. To do this, the IE module used KDD module. After useful information is taken out from all the information provided, DISCOTEX, by using discovery rules, sees if any information has been missed in the database (Sergio, 2002).

B. Topic Tracking

The free of charge topic tracking tool is available to the users at www.alert.yahoo.com which is offered by Yahoo. This tool informs the user about any news available regarding the topic that the user chooses. Thus, a topic tracking system is a system that maintains a user's profile and suggests different documents to the user associated to documents that the user has viewed earlier. Despite being beneficial, topic tracking has its limitations, for instance, a user can get many news on mining for minerals or characteristics of minerals instead of text mining, although he/she has previously set an alert for 'text mining'. A company can be notified when a competitor enters market through topic tracking, which can add to its advantages, so the company will get updated with the changes in market and take a step further accordingly. Students can utilize topic tracking for research on their subjects and articles related to their studies. Organizations can even find out about news on them through topic tracking. Moreover, topic tracking can help doctors and individuals who search for treatments and latest development in the medical field. More and better text mining tools can be utilized which benefit the users who can opt their interests or the software can conclude the user's interest through their previous selections of articles from the database (Sergio, 2002).

A set of particular words in an article that provides a significant explanation of its substance to the users are known as keywords. It has been very time consuming and almost impossible to extract keywords manually from a given database, which can be more difficult in case of news articles that are published in huge quantities on daily basis. The keyword extraction has developed into a source for several text mining applications…[continue]

Cite This Essay:

"Mining The Process Of Extracting New Information" (2012, January 09) Retrieved December 10, 2016, from http://www.paperdue.com/essay/mining-the-process-of-extracting-new-information-115220

"Mining The Process Of Extracting New Information" 09 January 2012. Web.10 December. 2016. <http://www.paperdue.com/essay/mining-the-process-of-extracting-new-information-115220>

"Mining The Process Of Extracting New Information", 09 January 2012, Accessed.10 December. 2016, http://www.paperdue.com/essay/mining-the-process-of-extracting-new-information-115220

Other Documents Pertaining To This Topic

  • Extracting Information Sentiment From Blogs

    4. Transparency, authenticity, and focus are good. Bland is bad. Many people are looking for someone who is in authority to share their ideas, experiences, or suggestions (Bielski, 2007, p. 9). Moreover, just as content analysis of other written and symbolic forms has provided new insights that might have otherwise gone unnoticed, the analysis of blog content may reveal some unexpected findings concerning hot topics and significant social trends that are

  • Mining the Concept of Text

    The heuristics that are considered are probabilistic machine learning approaches. Such an approach is the 'Alignment Conditional Random Fields' that is designed for a scoring sequence for undirected graphical models. (Bilenko; Mooney, 2005) There are demands for this type of software and there is a vast area of information analysis where text mining is beginning to get important. One field is in the analysis of literature and research reviews. Literary

  • Vendor Management IT Management Process

    In this regard, a project manager must have a follow-up on facilities development in order to ascertain success. Strategic Planning and Project Programming A good strategic plan shapes programming of essential capital projects in an organization. Market demands and resource constrictions impede the success of the projects. The programming activities linked with planning, and other management functions establishes the priorities and time required for completion of various projects to achieve the

  • Pollution From Mining Activities How Serious Is

    Pollution From Mining Activities How serious is the pollution that results from mining activities? How clean are the coal mining activities in Kentucky, West Virginia, and other Appalachian areas where mountaintops are stripped away to get at the coal? What other mining activities cause pollution of the air, the land, and the waterways? This paper will delve into those mining activities and report the pollution that results from those strategies. The Pollution

  • Data Mining

    Data Mining Determine the benefits of data mining to the businesses when employing: Predictive analytics to understand the behaviour of customers "The decision science which not only helps in getting rid of the guesswork out of the decision-making process but also helps in finding out the perfect solutions in the shortest possible time by making use of the scientific guidelines is known as predictive analysis" (Kaith, 2011). There are basically seven steps involved

  • Data Mining the Amount of Knowledge Available

    Data Mining The amount of knowledge available in today's world is massive. The information technology specialist who's responsible to his or her organization for maximizing the capacity for practical usage of this knowledge, it is becoming increasingly difficult to have a total grasp of the problem. The purpose of this essay is to discuss the importance of implementing data warehousing and mining systems inside an organization. In order to do this,

  • Data Mining Evaluating Data Mining

    The use of databases as the system of record is a common step across all data mining definitions and is critically important in creating a standardized set of query commands and data models for use. To the extent a system of record in a data mining application is stable and scalable is the extent to which a data mining application will be able to deliver the critical relationship data,


Read Full Essay
Copyright 2016 . All Rights Reserved