Extracting Information Sentiment From Blogs Research Proposal
- Length: 20 pages
- Sources: 50
- Subject: Education - Computers
- Type: Research Proposal
- Paper: #44933379
Excerpt from Research Proposal :
4. Transparency, authenticity, and focus are good. Bland is bad. Many people are looking for someone who is in authority to share their ideas, experiences, or suggestions (Bielski, 2007, p. 9).
Moreover, just as content analysis of other written and symbolic forms has provided new insights that might have otherwise gone unnoticed, the analysis of blog content may reveal some unexpected findings concerning hot topics and significant social trends that are shaping the users of this information. For example, a data infrastructure engineering team intern working at Facebook recently generated an eerily accurate global map based on Facebook friendship links. According to the developer, "I was interested in seeing how geography and political borders affected where people lived relative to their friends. I wanted a visualization that would show which cities had a lot of friendships between them" (Butler, 2010, para. 3). While Butler had some vague ideas about the types of clusters that would populate the map, he would surprised by the results in the way they mirrored the population densities of the world so accurately, with some noticeable absences (Cuba, North Korea, large parts of Africa and South America, the western half of the United States, etc.).
Based on his content analysis of 10 million Facebook friendship links, Butler plotted the location of each individual's latitude and longitude lines and generated connecting lines between each friendship pair, with higher levels of paired links being shown as brighter lines in the map in Figure 1 below.
Figure 1. Butler's Facebook friendship links map: dark areas on the map represent where Facebook use is less prevalent
The map's striking similarity to geopolitical maps was also noted by Butler. According to Butler, "Not only were continents visible, certain international borders were apparent as well. What really struck me, though, was knowing that the lines didn't represent coasts or rivers or political borders, but real human relationships. Each line might represent a friendship made while travelling, a family member abroad, or an old college friend pulled away by the various forces of life" (2010, para. 4).
This analytical approach is also used by Finin and his associates for sentiment-identification purposes. According to these authorities, "Our approach uses the link structure of a blog graph to associate sentiments with the links connecting blogs. Such links are manifested as a URL that blogger a uses in his blog post to refer to blogger B's post. We call this sentiment link polarity, and the sign and magnitude of this value is based on the sentiment of text surrounding the link" (p. 78). Clearly, this type of online data can be used to reveal some valuable new information in ways that have never been possible in the past.
Such graphic representations are just some of the attributes of written communication that content analysis can provide. Because blogs (and this term can be expanded to include the idle chit-chat, back-and-forth, thoughts, ramblings, viewpoints and other posts shared on Facebook and other social networking fora ever day) represent an incredibly accessible way to reach other people, and people who know those people and so forth in an ever-widening network of social interaction. This accessibility may be fundamentally more significant in the long-term than other important innovations in communication such as the telephone. In this regard, a growing number of observers cite the increasing importance of the Internet in the business world and suggest that blogging has become the platform of choice for consumers and their favorite companies (Pikas, 2005). For instance, Bielski emphasizes that not all bloggers are created equally, at least with respect to their online posts. "Certainly, there is hype surrounding Web 2.0 with its dual message of the internet as application platform and internet as the ultimate participatory forum. and, blogging is viewed as a staple of this new internet" (2007, p. 8).
Identifying recurring themes and emerging trends in this type of dynamic environment is a challenging enterprise to be sure. As Bielski points out, "Yet out of the glare, the reality of user-generated content is a mixed bag. The writing can be freeform, to put it politely. Many blogs look horrible," she notes and adds that many are "boring, or 'safe' might be better adjectives" (2007, p. 8). Furthermore, this "mixed bag" of blog content makes identifying posts that may communicate certain sentiments even more challenging. According to Bielski, "Corporate creators don't make these blogs easy to subscribe to, search through, or otherwise interact with" (2007, p. 8).
Fortunately, Google provides a series of URL templates that can be "invoked via command M-x emacspeak-url-template-fetch normally bound to control e u . This command prompts for the name of the template, and completion is available via Emacs' minibuffer completion" (Google Blog Search, 2010, para. 2). The steps involved in conducting this analysis for each URL template are as follows:
A. Prompt for the relevant information.
B. Fetch the resulting URL using an appropriate fetcher.
C. Set up the resulting resource with appropriate customizations.
Although "unblog-related," the template application used by Google Blog Search developers provides a useful example of how this procedure operates. According to Google Blog Search, "As an example, the URL templates that enable access to NPR media streams prompt for a program id and date, and automatically launch the realmedia player after fetching the resource" (2010, para. 3). As to their online application, the developers at Google Blog Search describe their efforts thusly: "Blog Search is Google search technology focused on blogs. Google is a strong believer in the self-publishing phenomenon represented by blogging, and we hope Blog Search will help our users to explore the blogging universe more effectively, and perhaps inspire many to join the revolution themselves" (2010, para. 2). As to the expected blog content that will be sentiment related, the developers make it clear their hosting ranges the entire human experience:
Whether you're looking for Harry Potter reviews, political commentary, summer salad recipes or anything else, Blog Search enables you to find out what people are saying on any subject of your choice. Your results include all blogs, not just those published through Blogger; our blog index is continually updated, so you'll always get the most accurate and up-to-date results; and you can search not just for blogs written in English, but in French, Italian, German, Spanish, Korean, Brazilian Portuguese, Dutch, Russian, Japanese, Swedish, Malay, Polish, Thai, Indonesian, Tagalog, Turkish, Vietnamese and other languages as well (Google Blog Search, 2010, para. 3).
Some of the other key features that make Google Blog Search useful for the purposes of the proposed study include the following:
A. The links allow user to browse Google Blog Search results by topic. For example, clicking the Technology link shows top stories in the tech world.
B. The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom). It is not restricted to Blogger blogs, or blogs from any other service.
C. Google Blog Search uses a set of algorithms to try to determine the most popular stories in the blogosphere. The applications takes into account factors such as a blog's title and content, as well as its popularity throughout the rest of the blogging community. The results are displayed based on groups of posts that are closely related..
An informal blog search using Google's "search blogs" feature provides the following raw sentiment-related search results:
Blog Search Results of Sentiment-Related Terms (as of December 20, 2010)
Number of Matches
Clearly, there is a great deal of sentiment being expressed in blogs, but without knowing the specific context in which these sentiment-related terms are used, though, it is impossible to discern their true meanings. For instance, some bloggers might enthuse that they "just love the pasta at Joe's Spaghetti House," while others might state they "love the president's economic policies." Likewise, other bloggers might "hate the weather" while others "hate the president's economic policies." Given the enormous response to the search term "like," it is clear that some bloggers might "like Ike" while others use the term as a comparison as in, "Eating at this restaurant is like a trip to the dentist's office." The context of the sentiment-related posts will therefore require comparison to a corpus of various sentiments used in common practice to identify positive from negative sentiments (Ojala, 2009). For example, the word "like" or "love" when used immediately with or adjacent to descriptors such as "movie" or "restaurant" could be categorized as a review, while these words used with descriptors such as personal nouns might indicate a romantic relationship. This corpus would be fine-tuned as the learning process proceeded through additional permutations of the supporting algorithms.
The results of a study by Manning (2009) that sought to identify effective ways to garner sentiment-related data from online reviews provides…