Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Which of the following is a distributed indexing system based on Apache Lucen

ID: 3768884 • Letter: 1

Question

1.Which of the following is a distributed indexing system based on Apache Lucene?

a.ElasticSearch

b.Katta

c.Solr

d.All of the above

Answer D – All of the above.

2.Oozie workflows are defined as

a.A Java program

b.A MapReduce job

c.A Directed Acyclic Graph

d.A shell script

Answer C – Directed Acyclic Graph

3.Which component provides tools to do bulk imports and exports of data stored in HDFS?

a.Hue

b.Oozie

c.Sqoop

d.Impala

Answer C - Sqoop

4.MapReduce jobs can only be written in Java.

a.True

b.False

Answer - False

5.Which of the following Hadoop distributions is completely reimplemented in C++?

a.Cloudera

b.Hortonworks

c.MapR

d.Wandisco

Answer - MapR

6.Which NoSQL implementation uses a marking called Visibility to control access to data?

a.Hive

b.HBase

c.Accumulo

d.Cassandra

Answer - HBase

7.Which NoSQL implementation would be appropriate to use to store unstructured text documents for searching?

a.HBase

b.MongoDB

c.Neo4J

d.Hive

8.Apache Drill is a tool that allows SQL queries over data stored in HDFS, HBase, Hive, and even local files, using MapReduce as a query engine.

a.True

False

Explanation / Answer

1.

ElasticSearch, Solr, Katta are different distributed indexing systems based on Apache Lucene.

(A) ElasticSearch is released as open source under the Apache Licence. It was developed in Java. Features of ElasticSearch are made available through JSON and Java API by using Lucene.

(B) Solr is also an open source enterprise search platform from Apache Lucene written in Java.

(C) Katta is also from Apache Lucene that serves as open source distributed indexing system.

Hence the correct option is (D)

-------------------------------------------------------------------------------------------------------------------------------------------

2. OOzie is a workflow scheduler system that manages Apache Hadoop Jobs. OOzie workflow jobs in action are Directed Acyclic Graphs(DAGs).
Oozie gets integrated with Hadoop stack and supports several types of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs such as Java programs and shell scripts.

Hence the correct option is (C)

-------------------------------------------------------------------------------------------------------------------------------------------

3.

(A) Hue is an Open source Web Interface licensed under Apache. Hue targets the user interface by combining all the Apache Hadoop Components in to a single Interface.

(B) OOzie is a workflow scheduler system that manages Apache Hadoop Jobs.

(C) Sqoop is a tool designed under Apache for efficient transferring of bulk data between Apache Haddop and structured datastores like relational databases.

(D) Impala runs on Apache Hadoop which is basically a SQL Engine for the data stored in computer clusters running Apache Hadoop.

Hence the correct option is (C)

---------------------------------------------------------------------------------------------------------------------------------------------

4.

Hadoop MapReduce is a software framework for writting applications that processes vast amount of data.
Applications of MapReduce can be written in any language. There is no compulsion of writing only in Java, although MapReduce framework is written in Java.

Hence the correct Answer is (B)

----------------------------------------------------------------------------------------------------------------------------------------------

5.

Apache Hadoop is developed in java and many of its distributions are also written in java. MapR is reimplemented in C++.

Hence the correct option is (C).
----------------------------------------------------------------------------------------------------------------------------------------------

6.

(A) Apache Hive Configurations doesn't control the data access.

(B) HBase uses a marking called visibility for providing controlled access to data.

(C) Accumulo provides a cell level access control which is a fine grained data access control which is important for organizations with complex policies.

(D) Cassandra had a Role Based Access Control (RBAC) for controlling data access.

Hence the correct option is (B)

-----------------------------------------------------------------------------------------------------------------------------------------------

7.

MongoDB's document data model supports unstructured data natively and makes it easy to build on. Even if application requirements change,It doesn't require costly and time-consuming migrations.

Hence the correct option is (B)


-----------------------------------------------------------------------------------------------------------------------------------------------

8.

Apache Drill allows SQL Queries over data stored in a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.

Hence the correct option is True(A)