Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Explain why big data can more cost-effectively handled by clouds that using a

ID: 3602358 • Letter: 1

Question

1. Explain why big data can more cost-effectively handled by clouds that using a super computer. Why do big data scientists require domain expertise? Also, explain the difference between supervised and unsupervised machine learning techniques.

2. Design a healthcare system which consists of body sensors and wearable devices to collect human physiological signals. This system should possess the following function: real-time monitoring, disease prediction, early detection of chronic disease. Also design a monitoring and management system that can optimize the distribution of medial resources and facilitate the data sharing for such resources.

Explanation / Answer

Answer 1) BIG DATA is the data which is being generated by us at a very fast pace in the form of various videos, images, documents etc. This data needs to be stored and managed at some place. It is very difficult to manage and process such data. Therefore HADOOP was introduced for this. Using a supercomputer to store it is going to be little more expensive as for a super computer, one will have to take care of everything i.e. the processor, RAM,high storage capacity and other input and output devices and that too with respect to the super computers, which is gonna cost very much to an individual. Whereas on the other hand one can simply take some space on a cloud to store his/her personal big data. One will have to pay only for the space acquired by him. Thus one can choose the space in accordance to their needs. One need not to take care of other stuffs. And thus indirectly leading to the saving of money of individuals. Therefore, clouds can be used as a cost effective means of storing big data.

As we know, that BIG DATA is really not easy to take care of and handle. To extract important information from such data is not an easy task at all. Various tools are used like HIVE, MAP REDUCE, PIG etc which are a part of HADOOP architecture. To use such tools, one needs to learn a lot about them. A data scientist needs to have a domain expertise definitely. Because if he doesn't have knowledge regarding the domain he is working in than there are chances that he may not be able to mine data properly. Understanding of data which one is working on is a very important thing in data mining. If proper knowledge of the data is not there than there are chances that wrong predictions can be made.

Difference between supervised and unsupervised machine learning

Unsupervised

No training data sets are there

Supervised

Unsupervised

Training data sets are there

No training data sets are there

Outcomes are known Outcomes are unknown Little easy to work with More complex