From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung
ID: 3671174 • Letter: F
Question
From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung
Thank you
Chp-3
Exercise 3.1: In what way is NLP related to text mining?
Exercise 3.3: Given a collection of documents for a specific topic, how can we use maximum
likelihood estimation to create a topic unigram language model?
Exercise 3.7: A unigram language model as defined in this chapter can take a sequence of words as
input and output its probability. Explain how this calculation has strong independence
assumptions.
Exercise 3.9: An n-gram language model records sequences of n words. How does the number of
possible parameters change if we decided to use a 2-gram (bigram) language model
instead of a unigram language model? How about a 3-gram (trigram) model? Give your
answer in terms of V , the unigram vocabulary size.
Chp-5
Exercise 5.3: Often, push and pull modes are combined in a single system. Give an example of such
an application.
Exercise 5.5: In a future chapter, we will discuss recommender systems. These are systems in
push mode that deliver information to users. What are some specific applications of recommender systems? Can you name some services available to you that fit into this access mode?
Exercise 5.7 : Design a text information system used to explore musical artists. For example, you can
search for an artist’s name directly. The results are displayed as a graph, with edges
to similar artists (as measured by some similarity algorithm). Use TIS access mode
vocabulary to describe this system and any enhancements you could make to satisfy
different information needs.
Ch-6
Exercise 6.1: Here’s a query and document vector. What is the score for the given document using dot
product similarity?
d = f1; 0; 0; 0; 1; 4g q = f2; 1; 0; 1; 1; 1g
Exercise 6.3: Let d be a document in a corpus. Suppose we add another copy of d to collection. How
does this affect the IDF of all words in the corpus?
Exercise 6.6: If you perform stemming on words in V to create V 0 then jV 0j > jV j. True or false?
Why?
Ch-7
Exercise 7.1: How should you set the Rocchio parameters _; _; and depending on what type of
feedback you are using? That is, should the parameters be set differently if you are using
pseudo feedback compared to user-supplied relevance judgements? What about implicit
feedback through clickthrough data?
Exercise 7.9: Design a heuristic to automatically determine the best _ for mixture model feedback
on a query-by-query basis. You could look at the query itself, the number of matching
documents, or the distribution of ranking scores in the original results. Test your heuristic
by doing experiments.
Explanation / Answer
In what way is NLP related to text mining?
Natural language processing (NLP) deals with the automatic processing and analysis of unstructured textual information. One direction of NLP research relies on statistical techniques, typically involving the processing of words found in texts [7]. Another approach makes use of rule based techniques, leveraging knowledge resources such as ontologies, taxonomies, and linguistic rule bases. Statistical human language processing systems require collections of training material which exemplify the desirable (and/or undesirable) relationships and dependencies. Subsequent modification of the system then requires some degree of retraining of the system. Instead of requiring training material, rule based techniques require knowledge in the form of on-line dictionaries, established linguistic theories, and they are able to leverage existing classification systems or taxonomic frameworks. NLP applications may make use of either or both of these techniques, and the decision of which technique to use is often dependent on the availability of training materials, external resources, and the actual text analysis tasks required in the resulting application.
Often, push and pull modes are combined in a single system. Give an example of such an application.
Definitions are very important to create context for this article. A supply chain is a minimum of a network of a business, its suppliers and customers. Thus, supply chain management by a particular business is the management of human capital, processes, materials and information between that business, its suppliers and its customers that ensures maximum customer service at maximum margin to that business. Importantly, while all participants in a supply chain can benefit from improvements to the functioning of the supply chain as a whole, rarely do they benefit equally.
Customer or demand push is usually defined as a business response in anticipation of customer demand and customer or demand pull as a response resulting from customer demand. However, from a whole supply chain viewpoint, deciding whether a particular supply chain is push or pull is often difficult and generally depends on the perspective of what constitutes the supply chain and where particular participants are placed in the chain. For example, the manufacture of Toyota automobiles is heralded as a leading example of a demand driven supply chain.
However, the mining of the iron ore or operation of blast furnaces that process the iron ore for ultimate manufacture of automobiles is not. At some point in most supply chains, in their widest sense, demand push meets demand pull, and at this point inventory accumulates. This point is referred to as the push-pull interface or as the supply chain decoupling point.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.