Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Any languages can be used, but recommend R, Python, or Matlab Task: Infer a gene

ID: 3863769 • Letter: A

Question

Any languages can be used, but recommend R, Python, or Matlab

Task: Infer a gene regulatory network from gene expression data and make a ROC plot.

Download the gene expression data in the link below, where there are 500 samples and

each sample has 10 gene expression.

http://ksuweb.kennesaw.edu/~mkang9/teaching/CS4491_CS7990/Gene_expression_1.csv

http://ksuweb.kennesaw.edu/~mkang9/teaching/CS4491_CS7990/Adj_1.csv

Task : Gene regulatory networks inference based on the correlation-based approach.

- Dataset:

o Gene_expression_1.csv: contains gene expression data for task 1

o Adj_1.csv: contains adjacency matrix of ground truth for task 1

1. Load the gene expression data (Gene_expression_1.csv) and the ground truth adjacency

matrix (Adj_1.csv).

2. Compute pairwise correlation matrix, and show the matrix. E.g., see Fig. 1.

3. Given the range of threshold (e.g., 0, 0.1, 0.2, 0.3, …, 0.9, 1), compare the adjacency

matrices between the network and the ground truth.

4. Compute a confusion matrix for each threshold

5. Compute TPR and FPR for each threshold

6. Make a ROC plot. E.g., see Fig. 2

L10 10) (2,10.1731708398 0.000000e 00 2612879e-o4 Bla.00010s41912.612879e-04 (4.10.0004 2152017774737e-07 (sela.36169033675.903489-01 (6.10.6628425202 2.824813e-01 2. lo 32388276a3 5.282464e-01 1.954879e-D42.50638le o4 5.220s84e 01 5.010s28e-01 0.0000000000 2.508S63e-04 3.773s63e-015.754074e-01. IT, (8.10.0006998978 1.042277e-03 (9.10 1313074479 10.10.4178 738114 6582586e-01 Figure 1. Correlation matrix 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2. ROC in Task 1

Explanation / Answer

R and Python are equally good if you want to find outliers in a dataset, but if you want to create a web service to enable other people to upload datasets and find outliers, Python is better. Python is a general purpose programming language, which means that people have built modules to create websites, interact with a variety of databases, and manage users.

In general, if you want to build a tool or service that uses data analysis, Python is a better choice.

R builds in data analysis functionality by default, whereas Python relies on packages

Because Python is a general purpose language, most data analysis functionality is available through packages like NumPy and pandas. However, R was built with statistics and data analysis in mind, so many tools that have been added to Python through packages are built into base R.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote