Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

. As discussed in lecture Information Theory is critical to understanding and de

ID: 3735102 • Letter: #

Question

. As discussed in lecture Information Theory is critical to understanding and developing communication systems. One particularly important application s the efficient encoding of information, for instance to minimize the total number of bits required to send an electronic message or save a file on a hard drive. The mapping of symbols from your original message (i.e., in alpha-numeric) to symbols in a target alphabet (e.g., binary) is called an encoding, and a variety of algorithms exist. You will learn one such encoding, defined by the Shannon-Fano algorithm, which is used in some zip file algorithms. Before doing so, some basic (Shannon) information theory concepts will need to be introduced information: represents the certainty about the probability of the outcome of an event. The event itself will have several possible outcomes that can be modeled by a random variable X. Each of the n possible outcomes -1,... EX has a probability P(Xpi of occurring. information content: can be viewed as how much useful information is contained in r E X. Shannon derived the measure I(zi logb where b corresponds the basis units in which information is measured (e.g., b-2 means information is measured in bits) entropy: is essentially the expected amount of information from an event, and can be calculated as

Explanation / Answer

While I generate many (and often very creative) errors in R, there are three simple things that will most often go wrong for me. Those include:

Capitalization. R is case sensitive - a graph vertex named “Jack” is not the same as one named “jack”. The function rowSums won’t work if spelled as rowsums or RowSums.

Object class. While many functions are willing to take anything you throw at them, some will still surprisingly require character vector or a factor instead of a numeric vector, or a matrix instead of a data frame. Functions will also occasionally return results in an unexpected formats.

Package namespaces. Occasionally problems will arise when different packages contain functions with the same name. R may warn you about this by saying something like “The following object(s) are masked from ‘package:igraph’ as you load a package. One way to deal with this is to call functions from a package explicitly using ::. For instance, if function blah() is present in packages A and B, you can call A::blah and B::blah. In other cases the problem is more complicated, and you may have to load packages in certain order, or not use them together at all. For example (and pertinent to this workshop), igraph and Statnet packages cause some problems when loaded at the same time. It is best to detach one before loading the other.

The description of an igraph object starts with up to four letters:

The two numbers that follow (7 5) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:

In the following sections of the tutorial, we will work primarily with two small example data sets. Both contain data about media organizations. One involves a network of hyperlinks and mentions among news sources. The second is a network of links between media venues and consumers. While the example data used here is small, many of the ideas behind the analyses and visualizations we will generate apply to medium and large-scale networks

The first data set we are going to work with consists of two files, “Media-Example-NODES.csv” and “Media-Example-EDGES.csv” (download here).

Examine the data:

Notice that there are more links than unique from-to combinations. That means we have cases in the data where there are multiple links between the same two nodes. We will collapse all links of the same type between the same two nodes by summing their weights, using aggregate() by “from”, “to”, & “type”. We don’t use simplify() here so as not to collapse different link types.