Extracting Information Sentiment From Blogs Research Proposal

PAGES
20
WORDS
5886
Cite

9). Moreover, just as content analysis of other written and symbolic forms has provided new insights that might have otherwise gone unnoticed, the analysis of blog content may reveal some unexpected findings concerning hot topics and significant social trends that are shaping the users of this information. For example, a data infrastructure engineering team intern working at Facebook recently generated an eerily accurate global map based on Facebook friendship links. According to the developer, "I was interested in seeing how geography and political borders affected where people lived relative to their friends. I wanted a visualization that would show which cities had a lot of friendships between them" (Butler, 2010, para. 3). While Butler had some vague ideas about the types of clusters that would populate the map, he would surprised by the results in the way they mirrored the population densities of the world so accurately, with some noticeable absences (Cuba, North Korea, large parts of Africa and South America, the western half of the United States, etc.).

Based on his content analysis of 10 million Facebook friendship links, Butler plotted the location of each individual's latitude and longitude lines and generated connecting lines between each friendship pair, with higher levels of paired links being shown as brighter lines in the map in Figure 1 below.

Figure 1. Butler's Facebook friendship links map: dark areas on the map represent where Facebook use is less prevalent

The map's striking similarity to geopolitical maps was also noted by Butler. According to Butler, "Not only were continents visible, certain international borders were apparent as well. What really struck me, though, was knowing that the lines didn't represent coasts or rivers or political borders, but real human relationships. Each line might represent a friendship made while travelling, a family member abroad, or an old college friend pulled away by the various forces of life" (2010, para. 4).

This analytical approach is also used by Finin and his associates for sentiment-identification purposes. According to these authorities, "Our approach uses the link structure of a blog graph to associate sentiments with the links connecting blogs. Such links are manifested as a URL that blogger a uses in his blog post to refer to blogger B's post. We call this sentiment link polarity, and the sign and magnitude of this value is based on the sentiment of text surrounding the link" (p. 78). Clearly, this type of online data can be used to reveal some valuable new information in ways that have never been possible in the past.

Such graphic representations are just some of the attributes of written communication that content analysis can provide. Because blogs (and this term can be expanded to include the idle chit-chat, back-and-forth, thoughts, ramblings, viewpoints and other posts shared on Facebook and other social networking fora ever day) represent an incredibly accessible way to reach other people, and people who know those people and so forth in an ever-widening network of social interaction. This accessibility may be fundamentally more significant in the long-term than other important innovations in communication such as the telephone. In this regard, a growing number of observers cite the increasing importance of the Internet in the business world and suggest that blogging has become the platform of choice for consumers and their favorite companies (Pikas, 2005). For instance, Bielski emphasizes that not all bloggers are created equally, at least with respect to their online posts. "Certainly, there is hype surrounding Web 2.0 with its dual message of the internet as application platform and internet as the ultimate participatory forum. and, blogging is viewed as a staple of this new internet" (2007, p. 8).

Identifying recurring themes and emerging trends in this type of dynamic environment is a challenging enterprise to be sure. As Bielski points out, "Yet out of the glare, the reality of user-generated content is a mixed bag. The writing can be freeform, to put it politely. Many blogs look horrible," she notes and adds that many are "boring, or 'safe' might be better adjectives" (2007, p. 8). Furthermore, this "mixed bag" of blog content makes identifying posts that may communicate certain sentiments even more challenging. According to Bielski, "Corporate creators don't make these blogs easy to subscribe to, search through, or otherwise interact with" (2007, p. 8).

Fortunately, Google provides a series of URL templates that can be "invoked via command M-x emacspeak-url-template-fetch normally bound to control e u . This command prompts for the name of the template, and completion is available via Emacs' minibuffer completion" (Google Blog Search, 2010, para. 2). The steps involved in conducting this analysis for each URL template are as follows:

A. Prompt for the relevant information.

B. Fetch the resulting URL using an appropriate fetcher.

...

Set up the resulting resource with appropriate customizations.
Although "unblog-related," the template application used by Google Blog Search developers provides a useful example of how this procedure operates. According to Google Blog Search, "As an example, the URL templates that enable access to NPR media streams prompt for a program id and date, and automatically launch the realmedia player after fetching the resource" (2010, para. 3). As to their online application, the developers at Google Blog Search describe their efforts thusly: "Blog Search is Google search technology focused on blogs. Google is a strong believer in the self-publishing phenomenon represented by blogging, and we hope Blog Search will help our users to explore the blogging universe more effectively, and perhaps inspire many to join the revolution themselves" (2010, para. 2). As to the expected blog content that will be sentiment related, the developers make it clear their hosting ranges the entire human experience:

Whether you're looking for Harry Potter reviews, political commentary, summer salad recipes or anything else, Blog Search enables you to find out what people are saying on any subject of your choice. Your results include all blogs, not just those published through Blogger; our blog index is continually updated, so you'll always get the most accurate and up-to-date results; and you can search not just for blogs written in English, but in French, Italian, German, Spanish, Korean, Brazilian Portuguese, Dutch, Russian, Japanese, Swedish, Malay, Polish, Thai, Indonesian, Tagalog, Turkish, Vietnamese and other languages as well (Google Blog Search, 2010, para. 3).

Some of the other key features that make Google Blog Search useful for the purposes of the proposed study include the following:

A. The links allow user to browse Google Blog Search results by topic. For example, clicking the Technology link shows top stories in the tech world.

B. The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom). It is not restricted to Blogger blogs, or blogs from any other service.

C. Google Blog Search uses a set of algorithms to try to determine the most popular stories in the blogosphere. The applications takes into account factors such as a blog's title and content, as well as its popularity throughout the rest of the blogging community. The results are displayed based on groups of posts that are closely related..

An informal blog search using Google's "search blogs" feature provides the following raw sentiment-related search results:

Table 1

Blog Search Results of Sentiment-Related Terms (as of December 20, 2010)

Search Term

Number of Matches

Love

467,098,607

Hate

67,059,281

Awesome

79,550,156

Terrible

17,692,083

Angry

24,621,192

Like

821,870,100

Dislike

6,399,023

Enjoy

152,132,318

Clearly, there is a great deal of sentiment being expressed in blogs, but without knowing the specific context in which these sentiment-related terms are used, though, it is impossible to discern their true meanings. For instance, some bloggers might enthuse that they "just love the pasta at Joe's Spaghetti House," while others might state they "love the president's economic policies." Likewise, other bloggers might "hate the weather" while others "hate the president's economic policies." Given the enormous response to the search term "like," it is clear that some bloggers might "like Ike" while others use the term as a comparison as in, "Eating at this restaurant is like a trip to the dentist's office." The context of the sentiment-related posts will therefore require comparison to a corpus of various sentiments used in common practice to identify positive from negative sentiments (Ojala, 2009). For example, the word "like" or "love" when used immediately with or adjacent to descriptors such as "movie" or "restaurant" could be categorized as a review, while these words used with descriptors such as personal nouns might indicate a romantic relationship. This corpus would be fine-tuned as the learning process proceeded through additional permutations of the supporting algorithms.

The results of a study by Manning (2009) that sought to identify effective ways to garner sentiment-related data from online reviews provides some useful insights into what steps are involved in the blog-searching process. According to Manning, "A large and growing body of user-generated reviews is available on the Internet, from product reviews at sites like Amazon.com to restaurant reviews at sites like Yelp.com. For users making a purchasing or dining decision, the opinions of others can be an important factor" (p. 1). The need for a method by which blog posts can be…

Sources Used in Documents:

References

Bichard, S.L. (2006). Building blogs: a multi-dimensional analysis of the distribution of frames on the 2004 presidential candidate Web sites. Journalism and Mass Communication

Quarterly, 83, 329-333.

Bielski, L. (2007). Got blogs? Not exactly a banking staple, a few pioneers have embraced this 'new media.' ABA Banking Journal, 99(5), 7-9.

Brynko, B. (2007, June). Northern Light's MI Analyst: New visions in marketing research.
Butler, P. (2010, December 10). Facebook friendship map visualizes connections around the world. Huffington Post. Retrieved from http://www.huffingtonpost.com/2010/12/14/facebook-friendship-map_n_796448.html.
Google Blog Search. (2010). Google. Retrieved from http://emacspeak.sourceforge.net/info / html/URL-Templates.html.
Department of Computer Science. Retrieved from http://nlp.stanford.edu/courses / cs224n/2009/fp/14.pdf.


Cite this Document:

"Extracting Information Sentiment From Blogs" (2010, December 21) Retrieved April 25, 2024, from
https://www.paperdue.com/essay/extracting-information-sentiment-from-11566

"Extracting Information Sentiment From Blogs" 21 December 2010. Web.25 April. 2024. <
https://www.paperdue.com/essay/extracting-information-sentiment-from-11566>

"Extracting Information Sentiment From Blogs", 21 December 2010, Accessed.25 April. 2024,
https://www.paperdue.com/essay/extracting-information-sentiment-from-11566

Related Documents

Sentiment There are as many sentiment analysis techniques as there are reasons for conducting sentiment analysis. Analysis techniques are employed to discern sentence, phrase, word and text meanings, and predictive, machine-related, emotional and psychological aspects are measured by sentiment analysis as well. This literature review will attempt to navigate the various avenues presented by such diverse usage of sentiment analysis and provide information that categorizes and differentiates between the various techniques

66). Furthermore, social software will only increase in importance in helping organizations maintain and manage their domains of knowledge and information. When networks are enabled and flourish, their value to all users and to the organization increases as well. That increase in value is typically nonlinear, where some additions yield more than proportionate values to the organization (McCluskey and Korobow, 2009). Some of the key characteristics of social software applications

cloud computing will be discussed to show that the good outweighs the bad. Furthermore, it will be further discussed that the government is looking into using cloud computing because it will cut IT cost down and increase capabilities despite the fact people are concerned with security issues that this may bring to the public. In completing a dissertation, it is very hard to go through the challenges that it requires.

The economic environment is difficult. The United States may finally be showing signs of emerging from recession, but the recent economic difficulty has taken its toll of Ford. Following the short-lived spike provided by the 'cash for clunkers' program, auto sales have slumped again. Many competitors saw sales fall dramatically in the wake of that program. Ford, however, did not suffer as much. While two of its most popular models,

Unilever is a consumer products multinational is listed in London and the Netherlands simultaneously. The company has a highly diversified product base such that it is not dependent on any one business or market for its success. The consumer products industry in the West is a mature business, and it is a growing business in the growing economies of the world like the BRIC economies and many Asian countries. Being a