Mining the Concept of Text Term Paper

Download this Term Paper in word format (.doc)

Note: Sample below may appear distorted but all corresponding word document files contain proper formatting

Excerpt from Term Paper:

The heuristics that are considered are probabilistic machine learning approaches. Such an approach is the 'Alignment Conditional Random Fields' that is designed for a scoring sequence for undirected graphical models. (Bilenko; Mooney, 2005) There are demands for this type of software and there is a vast area of information analysis where text mining is beginning to get important. One field is in the analysis of literature and research reviews.

Literary and Scientific Demands:

There is more demand for the text mining in the literature review and library sections. There are extensive researches done for creating algorithms for book-based text mining. Researchers Sophia Ananiadou et al. (2009) have used text mining solutions in creating literature reviews. The creation of a text mining framework for systematic reviews and the creation of what the researchers Ananiadou, Sophia; et al. (2009) called as the 'service exemplar' were used as a test bed for deriving the possible requirements for text mining tools for literature services. Thus the use of text mining can enhance literature reviews and also create a new stream of literary analysis. (Ananiadou, Sophia; et al., 2009)

In another research and study of news and internet, Montes et. al. (1999) established that text mining techniques are effective in the analysis of internet and newspaper news. They focused on the current topics of opinion that ensued from the Spanish examples. They used a classical statistical model based on average calculus, distribution analysis, and standard deviation and the results shoed the society interests and its changing nature and they could pinpoint the change points.

Likewise the text mining has been effective in medical research, which is significant because the use of the method for another entirely different concept like medical research shows how significant it is. For example, Natarajan et al. (2006) compared the expression profiles for the same cell lines under the influence of epidermal growth factor -- EGF, an important growth factor. We found a set of 72 genes that are significantly differentially expressed as a unique response to S1P. "Based on the result of mining full-text articles from 20 scientific journals in the field of cancer research published over a period of five years," Natarajan, et al.; (2006) said they found a gene to gene interaction networks for seventy two different types of genes. Thus the researchers, Natarajan et al. say that the "automated extraction of information from biological literature will prompt the progress of the discoveries in biological knowledge." (Natarajan, et al., 2006) the other uses are commercial and business oriented and also for analysis of behemoths like the internet.

Uses and Advantages:

Text files, hold over eighty percent of any business and is the most difficult to find or use and therefore business find the prospect of text mining attractive. The new generation of text mining tools is increasingly being used by companies for the purpose of discovering relationships and to summarize the information. One such is the 'ClearResearch' software from 'ClearForest Corporation.' This software 'ClearResearch' uses the pattern-matching and shows the relation as a graph. Though not as accurate as the established data mining tools, text mining tools are basically effective. (Robb, 2004)

Other software in the market includes SAS text mining and Wordstat which have established a presence in the market. Wordstat developed by Provalis Research, and SAS Textminer from SAS Company. In both cases the software was found to have flaws and benefits and both packages have features that researchers can use to find associations. (Davi; Haughton; Nasr; Shah; Skaletsky; Spack, 2005) but in the process of extracting themes from unstructured data, they are not helpful. Thus as of now the available software searches for specific terms, or categorize documents based on the terms. This is not satisfactory because the same term may mean different things for different people and thus it can be stated that in the text mining approach, which is based on analysis is not yet complete or attained to the full. The text mining can be used for the process of reviewing a product that is being marketed by analyzing the reviews that are obtained by surveys and since it is of the unorganized data type the mining will help establish things like identifying the facts about product features, and the public opinion on the product and also find the polarity of opinions and rank an opinion which would not be possible other wise. (Kao; Poteet, 2007)

Though this is the general need, there are obstacles in the diffusion of text mining. One is that there is no conclusive research that has been shown that a particular method has been largely successful. On the other hand successful technologies have been kept under wraps for commercial reasons. Other than that the acceptance of the method in commercial practices are hampered by distrust of the process and the reluctance to use it to gather information. This is seen by the fact that in the CRM sector, where the consumer relations are paramount, the data mining techniques have grown well, and the data of the consumer is put to good use. (Sirmakessis, 2004)

In the same sector there is a need to parse the unstructured text and text mining is a supplementary addition to the process of data mining and can be used independently of the type of the data base. The client message, content management, sales and client demands that come with varying formats and emails are all potential text mining fields in the 'Customer Relation Management' -- CRM data bases and the CRM programs largely benefit by text mining. (Sirmakessis, 2004)

Some software's are capable of analyzing both the types of data, the structured and the unstructured, but the preference is to use the traditional BI software for structured data and separate software for text mining. (Robb, 2004) the internet is the vast gold mine for data mining and also for text miners. The reason is that though there are other forms of data exchange on the internet it is mostly text which is important. It has its down sides also with crimes being done and the method of text analysis can be used to bring down crimes. (Berry; Kogan, 2010)

The internet growth has helped communications between the younger generation and also between people of shared interests. Cyber bullies and internet predation are crimes that have come in after the internet came to be accessed by the masses. Texting in cell phones and the data available of unorganized text data bases both in the conventional internet exchange like chat and mail, along with the text messages on phones, therefore are the instruments of these criminals. Likewise the same sets of tools are also useful to the persons engaged in bringing these people to book. (Berry; Kogan, 2010)

Any technology can be misused, and one of the dangers is the methods falling into the hands of cyber predators. However tables can be turned on these antisocial by using the text mining methods to understand and isolate cyber predators. The use of transcripts for analysis of predation is still in the debate stage, but some research by Berry; Kogan (2010) has been done in this field, and a pseudo victim who posed as a teenager and the analysis of the transcripts of the conversations later helped in getting convictions for the predators. Thus transcripts of text matter in many spheres like criminal justice, law enforcement could benefit from the chat logs collected using a crawler and the text mining thus has a vital role in the administration of justice and preventing crime. (Berry; Kogan, 2010) the concept and design of text mining is evolving and is being redesigned and developed.


The business and research communities have the pressure to decode information that they obtain in mounds of text documents that hold relationships and pointers that can extract high end information from a data base or other unstructured text for a decision making process. Text is the most used media and data type. This is true in all transactions and though data mining from structured databases is used extensively the fact that text mining must be used extensively to understand the greater amount of text matter that are not into databases, must be remembered. Thus text mining helps in knowledge management, analysis and decision-making. Thus 'text mining' combined with data mining provides a method of analysis not only of words and phrases but also of whole strings from unstructured text.

We can say that text mining is a supplementary addition to data mining. It is the most effective way for digging information from the internet because of the need to search documents with tags. And it can be used with any type of the data base. The mining application has business uses, commercial and civil uses and also finds uses in other areas like research, including medical research and even law enforcement. Text mining thus is a modern tool to understand the interconnection with the text matter that is found to…[continue]

Cite This Term Paper:

"Mining The Concept Of Text" (2011, May 04) Retrieved October 23, 2016, from

"Mining The Concept Of Text" 04 May 2011. Web.23 October. 2016. <>

"Mining The Concept Of Text", 04 May 2011, Accessed.23 October. 2016,

Other Documents Pertaining To This Topic

  • Mining the Process of Extracting New Information

    Mining The process of extracting new information from existing information through the use of computer system is called Text Mining. Text mining retrieves data of available information and establishes the connection between the facts mentioned in that data. This is how, new information is developed. Since it is newly formed information, its validation is conducted through experimentation. The process of web search is often confused with that of text mining,

  • Mining Unstructured Text to Build

    It is through these inscriptions that the significance of human torture and sacrifice could be detected in the Mayan Culture. One of the greatest rulers of this civilization was seen in the shape of Lord Pacal or Lord Pakal the Great, K'inich Janaab' Pakal (23 March 603-28 August 683). He took over the reins at the age of 12 on July 29th, 615 a.D., a mature age for the

  • Database and Data Mining Security

    In addition to these two Director-level positions, the roles of the users of the databases and data mining applications also need to be taken into account. The sales, marketing, product management, product marketing, and services departments all need to have access to the databases and data mining applications. In addition, branch offices that access the company's applications over the shared T1 line will also need to have specific security

  • Data Mining Evaluating Data Mining

    The use of databases as the system of record is a common step across all data mining definitions and is critically important in creating a standardized set of query commands and data models for use. To the extent a system of record in a data mining application is stable and scalable is the extent to which a data mining application will be able to deliver the critical relationship data,

  • Data Mining in Business Research

    The ability to parse through the many records of transactions, customer contacts, and many other items stored electronically creates the foundation for data mining's definition. Data mining specifically is defined as the process of data selection, exploration and building models using vast data stores to uncover previously unknown patterns, insights, and observations that lead to strategies for effective differentiation and growth. Central to the development of data modeling is the creation of data and prediction models based

  • Data Mining

    Computer Science The overall theme or focus: The media industry is an industry that is resistant to the validity of data mining and the kind of insight data mining in this field could yield. There are two primary pieces of software with respect to film and television editing. They are Avid and Final Cut Pro, while there are more programs available. The latest version of Final Cut Pro, FCPX, takes more a

  • Relationships and Distinctions Between the Information Systems

    relationships and distinctions between the information systems concepts of data warehousing and data mining, which combined with online analytical processing (OLAP) form the backbone of decision support capability in the database industry. Decision support applications impose different demands for OLAP database technology than the online transaction processing (OLTP) model that preceded it. Data mining with OLAP differs from OLTP queries in the use of multidimensional data models, different data

Read Full Term Paper
Copyright 2016 . All Rights Reserved