MCQ on Big Data

Category: MCQ

Post Published On: October 3, 2021

12 min read

Unit-IV : Big data MCQ

1. Which among the following is Hadoop’s cluster resource management system?
(A) GLOB
(B) YARN
(C) ARM
(D) SPARK
Answer: b
2. Which of the following processing framework interacts with YARN directly?
(A) Pig
(B) Hive
(C) Crunch
(D) None of these
Answer: D
3. Which of the following processing frameworks run on MapReduce?
(A) Pig
(B) Hive
(C) Crunch
(D) All of the above
Answer: d
4. Which among the following are the core services of YARN?
(A) resource manager and node manager
(B) namenode and datanode
(C) data manager and resource manager
(D) data manager and application manager
Answer: a
5. Which constraints can be used to request a container on a specific node or rack, or anywhere on the cluster in YARN?
(A) Container constraints
(B) Space constraints
(C) Locality constraints
(D) Resource constraints
Answer: c
6. Which among the following can be used to model YARN applications?
(A) one application per user job
(B) run one application per workflow
(C) long-running application that is shared by different users
(D) All of the above
Answer: d
7. Which follows one application per user job model?
(A) MapReduce
(B) Spark
(C) Apache Slider
(D) Samza
Answer: a
8. Which application runs per user session?
(A) MapReduce
(B) Spark
(C) Apache Slider
(D) None of the above
Answer: b
9. Which among the following has a long-running application master for launching other applications on the cluster?
(A) MapReduce
(B) Spark
(C) Apache Slider
(D) None of the above
Answer: c
10. Which among the following can be used for stream processing?
(A) Spark
(B) Samza
(C) Storm
(D) All of the above
Answer: d
11. Which provides a simple programming model for developing distributed applications on YARN?
(A) Apache Slider
(B) Apache Twill
(C) Spark
(D) Tez
Answer: b
12. Which among the following statements are true with respect to Apache Twill? S1: Twill supports real-time logging S2: Allows the usage of a Java Runnable interface
(A) S1 only
(B) S2 only
(C) Both S1 and S2
(D) Neither S1 nor S2
Answer: c
13. Which daemon control the job execution process in MapReduce 1?
(A) jobtracker
(B) tasktrackers
(C) Both jobtracker and tasktrackers
(D) Name node and data node
Answer: c
14. Which among the following coordinates all the jobs run on the system by scheduling tasks in MapReduce 1?
(A) jobtracker
(B) tasktrackers
(C) data node
(D) Name node
Answer: a
15. Which of the following which keeps a record of the overall progress of each job in MapReduce 1?
(A) jobtracker
(B) tasktrackers
(C) data node
(D) Name node
Answer: a
16. Which among the following run tasks and send progress reports in MapReduce 1?
(A) jobtracker
(B) tasktrackers
(C) data node
(D) Name node
Answer: b
17. Choose the tasks of jobtracker in MapReduce 1?
(A) job scheduling
(B) task progress monitoring
(C) task bookkeeping
(D) All of the above
Answer: d
18. Which is responsible for storing job history in MapReduce 1?
(A) jobtracker
(B) tasktrackers
(C) data node
(D) Name node
Answer: a
19. In YARN, the responsibility of jobtracker is handled by
(A) Resource manager
(B) application master
(C) timeline server
(D) All of the above
Answer: d
20. In YARN, the responsibility of tasktracker is handled by
(A) Resource manager
(B) application master
(C) timeline server
(D) Node manager
Answer: d
21. Which stores the application history in YARN?
(A) Resource manager
(B) application master
(C) timeline server
(D) Node manager
Answer: c
22. Which among the following are the features of YARN?
(A) Scalability
(B) Multitenancy
(C) Availabilit
(D) All of the above
Answer: d
23. Which among the following schedulers available in YARN?
(A) FIFO
(B) Shortest Job First
(C) Round Robin
(D) Shortest Remaining Time
Answer: a
24. Which are/is the schedulers available in YARN?
(A) FIFO
(B) Capacity
(C) Fair Schedulers
(D) All of the above
Answer: d
25. Which among the following schedulers attempts to allocate resources so that all running applications get the same share of resources in YARN
(A) FIFO
(B) Capacity
(C) Fair Schedulers
(D) Round Robin
Answer: c
26. Which among the following schedulers provides queue elasticity in YARN?
(A) FIFO
(B) Capacity
(C) Fair Schedulers
(D) Round Robin
Answer: b
27. Which among the following schedulers in YARN is used by default?
(A)FIFO
(B)Capacity
(C)Fair Schedulers
(D)Round Robin
Answer: b
28. In which xml, is the default configuration of schedulers to be changed?
(A) yarn-site.xml
(B) config.xml
(C) scheduler.xml
(D) yarn-scheduler.xml
Answer: a
29. Which among the following queue scheduling policies are/is supported by Fair Schedulers in YARN?
(A) FIFO
(B) Dominant Resource Fairness
(C) preemption
(D) All of the above
Answer: d
30. Which holds the list of rules for queue placement in Fair Scheduling?
(A) queuePlacementPolicy
(B) rulePlacementolicy
(C) scheduleQueuePolicy
(D) schedulingPolicy
Answer: a
31. Which of the setting is used to set preemption globally?
(A) yarn.scheduler.fair.preemption = true
(B) yarn.scheduler.preemption = true
(C) yarn.scheduler.global.preemption = true
(D) yarn.scheduler.enable.preemption = true
Answer: a
32. Which among the following supports delay scheduling?
(A) FIFO
(B) Capacity Scheduler
(C) Fair Scheduler
(D) Both Capacity and Fair Scheduler
Answer: d
33. What is the default period of heartbeat request sent by node manager?
(A) one per millisecond
(B) one per second
(C) one per minute
(D) one per nanosecond
Answer: b
34. Which error detection code is used in HDFS?
(A) CRC-32
(B) CRC-32C
(C) SHA
(D) SHA-1
Answer: b
35. CRC-32C has the storage overhead
(A) less than 1%
(B) less than 5%
(C) less than 10%
(D) less than 2.5%
Answer: a
36. The heartbeat signal are sent from
(A) Jobtracker to Tasktracker
(B) Tasktracker to Job tracker
(C) Jobtracker to namenode
(D) Tasktracker to namenode
Answer: b
37. Spark was initially started by ____ at UC Berkeley AMPLab in 2009.
(A) Mahek Zaharia
(B) Matei Zaharia
(C) Doug Cutting
(D) Stonebraker
Answer: (b)
38. ____ is a component on top of Spark Core.
(A) Spark Streaming
(B) Spark SQL
(C) RDDs
(D) All of the mentioned
Answer: (b)
39. Spark SQL provides a domain-specific language to manipulate _____ in Scala, Java, or Python.
(A) Spark Streaming
(B) Spark SQL
(C) RDDs
(D) All of the mentioned
Answer: (c)
40. ______ leverages Spark Core fast scheduling capability to perform streaming analytics.
(A) MLlib
(B) Spark Streaming
(C) GraphX
(D) RDDs
Answer: (b)
41. ____ is a distributed machine learning framework on top of Spark.
(A) MLlib
(B) Spark Streaming
(C) GraphX
(D) RDDs
Answer: (a)
42. Users can easily run Spark on top of Amazon’s ____
(A) Infosphere
(B) EC2
(C) EMR
(D) None of the mentioned
Answer: (b)
43. Which of the following can be used to launch Spark jobs inside MapReduce?
(A) SIM
(B) SIMR
(C) SIR
(D) RIS
Answer: (b)
44. Which of the following language is not supported by Spark?
(A) Java
(B) Pascal
(C) Scala
(D) Python
Answer: (b)
45. Spark is packaged with higher level libraries, including support for ___ queries.
(A) SQL
(B) C
(C) C++
(D) None of the mentioned
Answer: (a)
46. Spark includes a collection over ____
operators for transforming data and familiar
data frame APIs for manipulating semistructured dat(A)
(A) 50
(B) 60
(C) 70
(D) 80
Answer: (d)
47. Spark is engineered from the bottom-up for performance, running _____ faster than Hadoop by exploiting in memory computing and other optimizations.
(A) 100x
(B) 150x
(C) 200x
(D) None of the mentioned
Answer: (a)
48. Spark powers a stack of high-level tools including Spark SQL, MLlib for ___
(A) regression models
(B) statistics
(C) machine learning
(D) reproductive research
Answer: (c)
49. For Multiclass classification problem which algorithm is not the solution?
(A) Naive Bayes
(B) Random Forests
(C) Logistic Regression
(D) Decision Trees
Answer: (d)
50. Which of the following is a tool of Machine Learning Library?
(A) Persistence
(B) Utilities like linear algebra, statistics
(C) Pipelines
(D) All of the above
Answer: (d)
51. Which of the following is true for Spark core?
(A) It is the kernel of Spark
(B) It enables users to run SQL / HQL queries on the top of Spark.
(C) It is the scalable machine learning library which delivers efficiencies
(D) Improves the performance of iterative algorithm drastically.
Answer: (a)
52. Which of the following is true for Spark MLlib?
(A) Provides an execution platform for all the Spark applications
(B) It is the scalable machine learning library which delivers efficiencies
(C) enables powerful interactive and data analytics application across live streaming data
(D) All of the above
Answer: (b)
53. Which of the following is true for RDD?
(A) We can operate Spark RDDs in parallel with a low-level API
(B) RDDs are similar to the table in a relational database
(C) It allows processing of a large amount of structured data
(D) It has built-in optimization engine
Answer: (a)
54. RDD is fault-tolerant and immutable
(A) True
(B) False
Answer: (a)
55. The read operation on RDD is
(A) Fine-grained
(B) Coarse-grained
(C) Either fine-grained or coarse-grained
(D) Neither fine-grained nor coarse-grained
Answer: (c)
56. The write operation on RDD is
(A) Fine-grained
(B) Coarse-grained
(C) Either fine-grained or coarse-grained
(D) Neither fine-grained nor coarse-grained
Answer: (b)
57. Is it possible to mitigate stragglers in RDD?
(A) Yes
(B) No
Answer: (a)
58. Fault Tolerance in RDD is achieved using
(A) Immutable nature of RDD
(B) DAG (Directed Acyclic Graph)
(C) Lazy-evaluation
(D) None of the above
Answer: (b)
59. What is action in Spark RDD?
(A) The ways to send result from executors to the driver
(B) Takes RDD as input and produces one or more RDD as output.
(C) Creates one or many new RDDs
(D) All of the above
Answer: (a)
60. The shortcomings of Hadoop MapReduce was overcome by Spark RDD by
(A) Lazy-evaluation
(B) DAG
(C) In-memory processing
(D) All of the above
Answer: (d)
61. Spark is developed in which language
(A) Java
(B) Scala
(C) Python
(D) R
Answer: (b)
62. Which of the following is not a component of the Spark Ecosystem?
(a) Sqoop
(b) GraphX
(c) MLlib
(d) BlinkDB
Answer: (a)
63. Which of the following algorithm is not present in MLlib?
(A) Streaming Linear Regression
(B) Streaming KMeans
(C) Tanimoto distance
(D) None of the above
Answer: (c)
64. Which of the following is not the feature of Spark?
(A) Supports in-memory computation
(B) Fault-tolerance
(C) It is cost-efficient
(D) Compatible with other file storage system
Answer: (c)
65. Which of the following is the reason for Spark being Speedy than MapReduce?
(A) DAG execution engine and in-memory computation
(B) Support for different language APIs like Scala, Java, Python and R
(C) RDDs are immutable and fault-tolerant
(D) None of the above
Answer: (a)
66. Which of the following is true for RDD?
(A) RDD is a programming paradigm
(B) RDD in Apache Spark is an immutable collection of objects
(C) It is a database
(D) None of the above
Answer: (b)
67. Which of the following is a tool of the Machine Learning Library?
(A) Persistence
(B) Utilities like linear algebra, statistics
(C) Pipelines
(D) All of the above
Answer: (d)
68. ____ is a online NoSQL developed by Clouder
(A) HCatalog
(B) Hbase
(C) Imphala
(D) Oozie
Answer: (b)
69. Which of the following is not a NoSQL database?
(A) SQL Server
(B) MongoDB
(C) Cassandra
(D) None of the mentioned
Answer: (a)
70. Which of the following is a NoSQL Database Type?
(A) SQL
(B) Document databases
(C) JSON
(D) All of the mentioned
Answer: (b)
71. Which of the following is a wide-column store?
(A) Cassandra
(B) Riak
(C) MongoDB
(D) Redis
Answer: (a)
72. “Sharding” a database across many server instances can be achieved with _
(A) LAN
(B) SAN
(C) MAN
(D) All of the mentioned
Answer: (b)
73. Most NoSQL databases support automatic ____ meaning that you get high availability and disaster recovery.
(A) processing
(B) scalability
(C) replication
(D) all of the mentioned
Answer: (c)
74. Which of the following are the simplest NoSQL databases?
(A) Key-value
(B) Wide-column
(C) Document
(D) All of the mentioned
Answer: (a)
75. ____ stores are used to store information about networks, such as social connections.
(A) Key-value
(B) Wide-column
(C) Document
(D) Graph
Answer: (d)
76. NoSQL databases is used mainly for handling large volumes of ___ dat(A)
(A) unstructured
(B) structured
(C) semi-structured
(D) all of the mentioned
Answer: (a)
77. Which of the following language is MongoDB written in?
(A) Javascript
(B) C
(C) C++
(D) All of the mentioned
Answer: (d)
78. Point out the correct statement.
(A) MongoDB is classified as a NoSQL database
(B) MongoDB favors XML format more than JSON
(C) MongoDB is column-oriented database store
(D) All of the mentioned
Answer: (a)
79. Which of the following format is supported by MongoDB?
(A) SQL
(B) XML
(C) BSON
(D) All of the mentioned
Answer: (c)
80. NoSQL was designed with security in mind, so developers or security teams don’t need to worry about implementing a security layer. Is it true or false?
(A) True
(B) False
Answer: (b)
81. Which of the following is not a reason NoSQL has become a popular solution for some organizations?
(A) Better scalability
(B) Improved ability to keep data consistent
(C) Faster access to data than relational database management systems (RDBMS)
(D) More easily allows for data to be held across multiple servers
Answer: (b)
82. NoSQL prohibits structured query language (SQL). Is it True or False?
(A) True
(B) False
Answer: (b)
83. When is it best to use a NoSQL database?
(A) When providing confidentiality, integrity, and availability is crucial
(B) When the data is predictable
(C) When the retrieval of large quantities of data is needed
(D) When the retrieval speed of data is not critical
Answer: (c)
84. Which of the following companies developed NoSQL database Apache Cassandra?
(A) LinkedIn
(B) Twitter
(C) MySpace
(D) Facebook
Answer: (d)
85. NoSQL databases are most often referred to as:
(A) Relational
(B) Distributed
(C) Object-oriented
(D) Network
Answer: (b)
86. SQL databases are:
(A) Horizontally scalable
(B) Vertically scalable
(C) Either horizontally or vertically scalable
(D) They don’t scale
Answer: (b)
87. Which of the following is not an example of a NoSQL database?
(A) CouchDB
(B) MongoDB
(C) HBase
(D) PostgreSQL
Answer: (d)
88. SQL command types include data manipulation language (DML) and data definition language (DDL).
(A) True
(B) False
Answer: (a)
89. ____ systems are scale-out file-based (HDD) systems moving to more uses of memory in the nodes.
(A) NoSQL
(B) NewSQL
(C) SQL
(D) All of the mentioned
Answer: (a)
90. Point out the correct statement.
(A) Hadoop is ideal for the analytical, postoperational, data-warehouse-ish type of workload
(B) HDFS runs on a small cluster of commodityclass nodes
(C) NEWSQL is frequently the collection point for big data
(D) None of the mentioned
Answer: (a)
91. Which is an advantage of NewSQL ?
(A) Less complex applications, greater consistency.
(B) Convenient standard tooling.
(C) SQL influenced extensions.
(D) All of the mentioned
Answer: (d)
92. Following represent column in NoSQL ____.
(A) Database
(B) Field
(C) Document
(D) Collection
Answer:(b)
93. What is the aim of NoSQL?
(A) NoSQL provides an alternative to SQL databases to store textual dat(A)
(B) NoSQL databases allow storing nonstructured dat(A)
(C) NoSQL is not suitable for storing structured dat(A)
(D) NoSQL is a new data format to store large datasets.
Answer: (d)
94. Which of the following is not a feature for NoSQL databases?
(A) Data can be easily held across multiple servers
(B)Relational Data
(C) Scalability
(D) Faster data access than SQL databases
Ans : b
95. Which of the following statement is correct with respect to mongoDB?
(A) MongoDB is a NoSQL Database
(B) MongoDB used XML over JSON for data exchange
(C) MongoDB is not scalable
(D) All of the above
Ans : a
96. Which of the following represent column in mongoDB?
(A) document
(B) database
(C) collection
(D) field
Ans : d
97. The system generated _id field is?
(A) A 12 byte hexadecimal value
(B) A 16 byte octal value
(C) A 12 byte decimal value
(D) A 10 bytes binary value
Ans : a
98. Which of the following true about mongoDB?
(A) MongoDB is a cross-platform
(B)MongoDB is a document oriented database
(C) MongoDB provides high performance
(D)All of the above
Ans : d
99. Collection is a group of MongoDB __?
(A)Database
(B) Document
(C)Field
(D) None of the above
Ans : b
100. A developer want to develop a database for LFC system where the data stored is mostly in similar manner. Which database should use?
(A) Relational
(B) NoSQL
(C) Both A and B can be used
(D) None of the above
Ans : b
101. Documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data is known as ?
(A) dynamic schema
(B) mongod
(C) mongo
(D) Embedded Documents
Ans : a
102.Instead of Primary Key mongoDB use?
(A) Embedded Documents
(B) Default key _id
(C) mongod
(D) mongo
Ans : B

zusammenhängende Posts

May 18, 2024

Android 15: Nachrichten und erwarteter Preis (kostenlos), Erscheinungsdatum, Funktionen und andere Gerüchte

April 23, 2024

Google Pixel 9: Nachrichten und erwarteter Preis, Veröffentlichungsdatum, Spezifikationen; und weitere Gerüchte

April 17, 2024

Samsung bringt 2 neue USB-Flash-Laufwerke mit je 512 GB auf den Markt.

Stichworte

Kommentare

MCQ on Big Data

Android 15: Nachrichten und erwarteter Preis (kostenlos), Erscheinungsdatum, Funktionen und andere Gerüchte

Google Pixel 9: Nachrichten und erwarteter Preis, Veröffentlichungsdatum, Spezifikationen; und weitere Gerüchte

Samsung bringt 2 neue USB-Flash-Laufwerke mit je 512 GB auf den Markt.

Android 15: Nachrichten und erwarteter Preis (kostenlos), Erscheinungsdatum, Funktionen und andere Gerüchte

Google Pixel 9: Nachrichten und erwarteter Preis, Veröffentlichungsdatum, Spezifikationen; und weitere Gerüchte

Samsung bringt 2 neue USB-Flash-Laufwerke mit je 512 GB auf den Markt.

Mit dem neuen Apple-Reparaturprogramm können einige gebrauchte Teile verbessert werden

Die neue Pressemaschine von Apple löst ein Problem ohne Probleme.

Android 15: Nachrichten und erwarteter Preis (kostenlos), Erscheinungsdatum, Funktionen und andere Gerüchte

Google Pixel 9: Nachrichten und erwarteter Preis, Veröffentlichungsdatum, Spezifikationen; und weitere Gerüchte

Samsung bringt 2 neue USB-Flash-Laufwerke mit je 512 GB auf den Markt.

Mit dem neuen Apple-Reparaturprogramm können einige gebrauchte Teile verbessert werden

Die neue Pressemaschine von Apple löst ein Problem ohne Probleme.

Android 15: Nachrichten und erwarteter Preis (kostenlos), Erscheinungsdatum, Funktionen und andere Gerüchte

Google Pixel 9: Nachrichten und erwarteter Preis, Veröffentlichungsdatum, Spezifikationen; und weitere Gerüchte

Samsung bringt 2 neue USB-Flash-Laufwerke mit je 512 GB auf den Markt.

Mit dem neuen Apple-Reparaturprogramm können einige gebrauchte Teile verbessert werden