The heuristics that are considered are probabilistic machine learning approaches. Such an approach is the 'Alignment Conditional Random Fields' that is designed for a scoring sequence for undirected graphical models. (Bilenko; Mooney, 2005) There are demands for this type of software and there is a vast area of information analysis where text mining is beginning to get important. One field is in the analysis of literature and research reviews.
Literary and Scientific Demands:
There is more demand for the text mining in the literature review and library sections. There are extensive researches done for creating algorithms for book-based text mining. Researchers Sophia Ananiadou et al. (2009) have used text mining solutions in creating literature reviews. The creation of a text mining framework for systematic reviews and the creation of what the researchers Ananiadou, Sophia; et al. (2009) called as the 'service exemplar' were used as a test bed for deriving the possible requirements for text mining tools for literature services. Thus the use of text mining can enhance literature reviews and also create a new stream of literary analysis. (Ananiadou, Sophia; et al., 2009)
In another research and study of news and internet, Montes et. al. (1999) established that text mining techniques are effective in the analysis of internet and newspaper news. They focused on the current topics of opinion that ensued from the Spanish examples. They used a classical statistical model based on average calculus, distribution analysis, and standard deviation and the results shoed the society interests and its changing nature and they could pinpoint the change points.
Likewise the text mining has been effective in medical research, which is significant because the use of the method for another entirely different concept like medical research shows how significant it is. For example, Natarajan et al. (2006) compared the expression profiles for the same cell lines under the influence of epidermal growth factor -- EGF, an important growth factor. We found a set of 72 genes that are significantly differentially expressed as a unique response to S1P. "Based on the result of mining full-text articles from 20 scientific journals in the field of cancer research published over a period of five years," Natarajan, et al.; (2006) said they found a gene to gene interaction networks for seventy two different types of genes. Thus the researchers, Natarajan et al. say that the "automated extraction of information from biological literature will prompt the progress of the discoveries in biological knowledge." (Natarajan, et al., 2006) the other uses are commercial and business oriented and also for analysis of behemoths like the internet.
Uses and Advantages:
Text files, hold over eighty percent of any business and is the most difficult to find or use and therefore business find the prospect of text mining attractive. The new generation of text mining tools is increasingly being used by companies for the purpose of discovering relationships and to summarize the information. One such is the 'ClearResearch' software from 'ClearForest Corporation.' This software 'ClearResearch' uses the pattern-matching and shows the relation as a graph. Though not as accurate as the established data mining tools, text mining tools are basically effective. (Robb, 2004)
Other software in the market includes SAS text mining and Wordstat which have established a presence in the market. Wordstat developed by Provalis Research, and SAS Textminer from SAS Company. In both cases the software was found to have flaws and benefits and both packages have features that researchers can use to find associations. (Davi; Haughton; Nasr; Shah; Skaletsky; Spack, 2005) but in the process of extracting themes from unstructured data, they are not helpful. Thus as of now the available software searches for specific terms, or categorize documents based on the terms. This is not satisfactory because the same term may mean different things for different people and thus it can be stated that in the text mining approach, which is based on analysis is not yet complete or attained to the full. The text mining can be used for the process of reviewing a product that is being marketed by analyzing the reviews that are obtained by surveys and since it is of the unorganized data type the mining will help establish things like identifying the facts about product features, and the public opinion on the product and also find the polarity of opinions and rank an opinion which would not be possible other wise. (Kao; Poteet, 2007)
Though this is the general need, there are obstacles in the diffusion of text mining. One is that there is no conclusive research that has been shown that a particular method has been largely successful. On the other hand successful technologies have been kept under wraps for commercial reasons. Other than that the acceptance of the method in commercial practices are hampered by distrust of the process and the reluctance to use it to gather information. This is seen by the fact that in the CRM sector, where the consumer relations are paramount, the data mining techniques have grown well, and the data of the consumer is put to good use. (Sirmakessis, 2004)
In the same sector there is a need to parse the unstructured text and text mining is a supplementary addition to the process of data mining and can be used independently of the type of the data base. The client message, content management, sales and client demands that come with varying formats and emails are all potential text mining fields in the 'Customer Relation Management' -- CRM data bases and the CRM programs largely benefit by text mining. (Sirmakessis, 2004)
Some software's are capable of analyzing both the types of data, the structured and the unstructured, but the preference is to use the traditional BI software for structured data and separate software for text mining. (Robb, 2004) the internet is the vast gold mine for data mining and also for text miners. The reason is that though there are other forms of data exchange on the internet it is mostly text which is important. It has its down sides also with crimes being done and the method of text analysis can be used to bring down crimes. (Berry; Kogan, 2010)
The internet growth has helped communications between the younger generation and also between people of shared interests. Cyber bullies and internet predation are crimes that have come in after the internet came to be accessed by the masses. Texting in cell phones and the data available of unorganized text data bases both in the conventional internet exchange like chat and mail, along with the text messages on phones, therefore are the instruments of these criminals. Likewise the same sets of tools are also useful to the persons engaged in bringing these people to book. (Berry; Kogan, 2010)
Any technology can be misused, and one of the dangers is the methods falling into the hands of cyber predators. However tables can be turned on these antisocial by using the text mining methods to understand and isolate cyber predators. The use of transcripts for analysis of predation is still in the debate stage, but some research by Berry; Kogan (2010) has been done in this field, and a pseudo victim who posed as a teenager and the analysis of the transcripts of the conversations later helped in getting convictions for the predators. Thus transcripts of text matter in many spheres like criminal justice, law enforcement could benefit from the chat logs collected using a crawler and the text mining thus has a vital role in the administration of justice and preventing crime. (Berry; Kogan, 2010) the concept and design of text mining is evolving and is being redesigned and developed.
The business and research communities have the pressure to decode information that they obtain in mounds of text documents that hold relationships and pointers that can extract high end information from a data base or other unstructured text for a decision making process. Text is the most used media and data type. This is true in all transactions and though data mining from structured databases is used extensively the fact that text mining must be used extensively to understand the greater amount of text matter that are not into databases, must be remembered. Thus text mining helps in knowledge management, analysis and decision-making. Thus 'text mining' combined with data mining provides a method of analysis not only of words and phrases but also of whole strings from unstructured text.
We can say that text mining is a supplementary addition to data mining. It is the most effective way for digging information from the internet because of the need to search documents with tags. And it can be used with any type of the data base. The mining application has business uses, commercial and civil uses and also finds uses in other areas like research, including medical research and even law enforcement. Text mining thus is a modern tool to understand the interconnection with the text matter that is found to…