Over 75% of a typical company’s information is in the form of unstructured text. In small groups, you will track a topic of interest throughout the semester and build up a database of relevant text. Through a series of blog posts, you will reveal the context and specific analysis questions related to your topic. When we began this section of the course, small groups will carry out the necessary analysis to provide answers to these questions.

For those students interested in working on a text project for a business, Luna Metrics, a certified Google Analytics Partner, has provided a unique problem and relevant data for this part of the class. This group will crawl previous versions of client websites for leading text indicators of client profitability and revenue – stay tuned for more info.

Assignment Details.

1. Analyze a text analytics topic of your choice (seven different text databases are available). Your analysis must include, but is not limited to, the following components:

  • Parsing
  • Filtering
  • Topics (i.e. factor analysis for text)
  • Text Cluster (for exploratory projects) or Regression (for predictive projects)

2. Write a report that summarizes your analysis, please send me a hard copy as well as an electronic copy (pdf). The report must meet the following criteria:

  •  Cover page
  • Appendix
    • Tables, Graphs, etc.
      • Must be labeled with a title (1,2,3,…)
      • Do not contribute to the page requirement
      • Must include overall diagram
      • For each node in your analysis (e.g. see parts a thru d in #1)
      • Every table/graph/etc… must be referenced
  •  Body
    • Objective
      • In two to three sentences, state the objective of your analysis.
    • Summary
      • In two to five succinct paragraphs, summarize the main results of your analysis.
    • Data Source
      • In two to three sentences, describe the source and nature of the data set.
    • Data Analysis
      • In two to five content-rich paragraphs, describe each of your analysis steps in depth.
    • Extensions
      • In one to two succinct paragraphs, provide at least two extensions that would make your analysis more complete.