Unit II : Big data MCQ
1. Which one of the following is false about Hadoop?
(A) It is a distributed framework
(B) The main algorithm used in it is Map Reduce
(C) It runs with commodity hardware
(D) All are true
Answer: (d)
2. What license is Apache Hadoop distributed under?
(A) Apache License 2.0
(B) Shareware
(C) Mozilla Public License
(D) Commercial
Answer: (a)
3. Which of the following platforms does Apache Hadoop run on ?
(A) Bare metal
(B) Unix-like
(C) Cross-platform
(D) Debian
Answer: (c)
4. Apache Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ____ storage on hosts.
(A) Standard RAID levels
(B) RAID
(C) ZFS
(D) Operating system
Answer: Option (b)
5. Hadoop works in
(A) master-worker fashion
(B) master – slave fashion
(C) worker/slave fashion
(D) All of the mentioned
Answer: (b)
6. Which type of data Hadoop can deal with is
(A) Structured
(B) Semi-structured
(C) Unstructured
(D) All of the above
Answer: (d)
7. Which statement is false about Hadoop
(A) It runs with commodity hardware
(B) It is a part of the Apache project sponsored by the ASF
(C) It is best for live streaming of data
(D) None of the above
Answer: (c)
8. As compared to RDBMS, Apache Hadoop
(A) Has higher data Integrity
(B) Does ACID transactions
(C) Is suitable for read and write many times
(D) Works better on unstructured and semi-structured dat(A)
Answer: (d)
9. Hadoop can be used to create distributed clusters, based on commodity servers, that provide low-cost processing and storage for unstructured data
(A) True
(B) False
Answer: (a)
10. __ is a framework for performing remote procedure calls and data serialization.
(A) Drill
(B) BigTop
(C) Avro
(D) Chukwa
Answer: (c)
11. IBM and ____ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
(A) Google Latitude
(B) Android (operating system)
(C) Google Variations
(D) Google
Answer: (d)
12. What was Hadoop written in?
(A) Java (software platform)
(B) Perl
(C) Java (programming language)
(D) Lua (programming language)
Answer: (c)
13. Apache ___ is a serialization framework that produces data in a compact binary format.
(A) Oozie
(B) Impala
(C) Kafka
(D) Avro
Answer: (d)
14. Avro schemas describe the format of the message and are defined using ______
(A) JSON
(B) XML
(C) JS
(D) All of the mentioned
Answer: (a)
15. In which all languages you can code in Hadoop
(A) Java
(B) Python
(C) C++
(D) All of the above
Answer: (d)
16. All of the following accurately describe Hadoop, EXCEPT
(A) Open source
(B) Real-time
(C) Java-based
(D) Distributed computing approach
Answer: (b)
17. ____ has the world’s largest Hadoop cluster.
(A) Apple
(B) Datamatics
(C) Facebook
(D) None of the mentioned
Answer: (c)
18. Which among the following is the default OutputFormat?
(A) SequenceFileOutputFormat
(B) LazyOutputFormat
(C) DBOutputFormat
(D) TextOutputFormat
Answer: (d)
19. Which of the following is not an input format in Hadoop?
(A) ByteInputFormat
(B) TextInputFormat
(C) SequenceFileInputFormat
(D) KeyValueInputFormat
Answer: (a)
20. What is the correct sequence of data flow in MapReduce?
a. InputFormat
b. Mapper
c. Combiner
d. Reducer
e. Partitioner
f. OutputFormat
Choose the correct from below list
(A) abcdfe
(B) abcedf
(C) acdefb
(D) abcdef
Answer: (b)
21. In which InputFormat tab character (‘/t’) is used
(A) KeyValueTextInputFormat
(B) TextInputFormat
(C) FileInputFormat
(D) SequenceFileInputFormat
Answer: (a)
22. Which among the following is true about SequenceFileInputFormat
(A) Key- byte offset. Value- It is the contents of the line
(B) Key- Everything up to tab character. Value- Remaining part of the line after tab character
(C) Key and value- Both are userdefined
(D) None of the above
Answer:(c)
22. Which is key and value in TextInputFormat
(A) Key- byte offset Value- It is the contents of the line
(B) Key- Everything up to tab character Value- Remaining part of the line after tab character
(C) Key and value- Both are userdefined
(D) None of the above
Answer: (a)
23. Which of the following are Built-In Counters in Hadoop?
(A) FileSystem Counters
(B) FileInputFormat Counters
(C) FileOutputFormat counters
(D) All of the above
Answer: (d)
24. Which of the following is not an output format in Hadoop?
(A) TextoutputFormat
(B) ByteoutputFormat
(C) SequenceFileOutputFormat
(D) DBOutputFormat
Answer: (b)
25. Is it mandatory to set input and output type/format in Hadoop MapReduce?
(A) Yes
(B) No
Answer: (b)
26. The parameters for Mappers are:
(A) text (input)
(B) LongWritable(input)
(C) text (intermediate output)
(D) All of the above
Answer: (d)
27. For 514 MB file how many InputSplit will be created
(A) 4
(B) 5
(C) 6
(D) 10
Answer: (b)
28. Which among the following is used to provide multiple inputs to Hadoop?
(A) MultipleInputs class
(B) MultipleInputFormat
(C) FileInputFormat
(D) DBInputFormat
Answer: (a)
29. The Mapper implementation processes one line at a time via ___ metho(D)
(A) map
(B) reduce
(C) mapper
(D) reducer
Answer: (a)
30. The Hadoop MapReduce framework spawns one map task for each ____ generated by the InputFormat for the jo(B)
(A) OutputSplit
(B) InputSplit
(C) InputSplitStream
(D) All of the mentioned
Answer: (b)
31. ____ can best be described as a programming model used to develop Hadoopbased applications that can process massive amounts of dat(A)
(A) MapReduce
(B) Mahout
(C) Oozie
(D) All of the mentioned
Answer: (a)
32. _____ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
(A) Maptask
(B) Mapper
(C) Task execution
(D) All of the mentioned
Answer: (a)
33. ____ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
(A) Map
(B) Reduce
(C) Reducer
(D) Reduced
Answer: (b)
34. The number of maps is usually driven by the total size of
(A) task
(B) output
(C) input
(D) none
Answer: (c)
35. The right number of reduces seems to be :
(A) 0.65
(B) 0.55
(C) 0.95
(D) 0.68
Answer: (c)
36. Mapper and Reducer implementations can use the ____ to report progress or just indicate that they are alive.
(A) Partitioner
(B) OutputCollector
(C) Reporter
(D) All of the mentioned
Answer: (c)
37. The major components in the Hadoop 2.0 are:
(A) 2
(B) 3
(C) 4
(D) 5
Answer: (b)
38. Which of the statement is true about PIG.
(A) Pig is also a data ware house system used for analysing the Big Data Stored in the HDFS
(B) .It uses the Data Flow Language for analysing the data
(C) a and b
(D) Relational Database Management System
Answer: (c)
39. Which of the following platforms does Hadoop run on?
(A) Bare metal
(B) Debian
(C) Cross-platform
(D) Unix-like
Answer: (c)
40. The Hadoop list includes the HBase database, the Apache Mahout ____ system, and matrix operations.
(A) Machine learning
(B) Pattern recognition
(C) Statistical classification
(D) Artificial intelligence
Answer: (a)
41. Which of the Node serves as the master and there is only one NameNode per cluster.
(A) Data Node
(B) NameNode
(C) Data block
(D) Replication
Answer: (b)
42. HDFS consists as the
(A) master-worker
(B) master node and slave node
(C) worker/slave
(D) all of the mentioned
Answer: (b)
43. The name node used, when the secondary node get failed is .
(A) Rack
(B) Data node
(C) Secondary node
(D) None of the mentioned
Answer: (c)
44. Which of the following scenario may not be a good fit for HDFS?
(A) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
(B) HDFS is suitable for storing data related to applications requiring low latency data access
(C) HDFS is suitable for storing data related to applications requiring low latency data access
(D) None of the mentioned
Answer: (a)
45. The need for data replication occurs:
(A) Replication Factor is changed
(B) DataNode goes down
(C) Data Blocks get corrupted
(D) All of the mentioned
Answer: (d)
46. HDFS uses only one language for implementation:
(A) C++
(B) Java
(C) Scala
(D) None of the Above
Answer: (d)
47. In YARN which node is responsible for
managing the resources
(A) Data Node
(B) NameNode
(C) Resource Manager
(D) Replication
Answer: (c)
48. As Hadoop framework is implemented in Java, MapReduce applications are required to be written in Java Language
(A) True
(B) False
Answer: (b)
49. ___ maps input key/value pairs to a set of intermediate key/value pairs.
(A) Mapper
(B) Reducer
(C) Both Mapper and Reducer
(D) None of the mentioned
Answer: (d)
50. The number of maps is usually driven by the total size of _____
(A) Inputs
(B) Outputs
(C) Tasks
(D) None of the mentioned
Answer: (a)
51. which of the File system is used by HBase
(A) Hive
(B) Imphala
(C) Hadoop
(D) Scala
Answer: (c)
52. The information mapping data blocks with their corresponding files is stored in
(A) Namenode
(B) Datanode
(C) Job Tracker
(D) Task Tracker
Answer: (a)
53. In HDFS the files cannot be
(A) read
(B)deleted
(C) excuted
(D)archived
Answer: (d)
54. The datanode and namenode are, respectiviley, which of the following?
(A)Slave and Master nodes
(B)Master and Worker nodes
(C) Both worker nodes
(D)both master nodes
Answer: (a)
55. Hadoop is a framework that works with a variety of related tools. Common cohorts include
(A) MapReduce, Hive and HBase
(B)MapReduce, MySQL and Google Apps
(C) MapReduce, Hummer and Iguana
(D)MapReduce, Heron and Trumpet
Answer: (a)
56. Hadoop was named after?
(A) Creator Doug Cuttings favorite circus act
(B)The toy elephant of Cuttings son
(C) Cuttings high school rock band
(D)A sound Cuttings laptop made during Hadoops development
Answer: (b)
57. All of the following accurately describe Hadoop, EXCEPT:
(A) Open source
(B)Java-based
(C) Distributed computing approach
(D)Real-time
Answer: (d)
58. Hive also support custom extensions written in:
(A) C
(B)C#
(C) C++
(D)Java
Answer: (d)
59. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to :
(A) JSON
(B) XML
(C) SQL
(D)Jquer
Answer: (c)
60. In comparison to Rational DBMS, Hadoop
(A) A – Has higher data In
(B) B – Does ACID transactions
(C) C – IS suitable for read and write many times
(D) D – Works better on unstructured and semi-structured dat(A)
Answer: (d)
61. The Files in HDFS are ment for
(A) Low latency data access
(B) Multiple writers and modifications at arbitrary offsets.
(C) Only append at the end of file
(D) Writing into a file only once.
Answer: (b)
62. The main role of the secondary namenode is to
(A) Copy the filesystem metadata from primary namenode.
(B) Copy the filesystem metadata from NFS stored by primary namenode
(C) Monitor if the primary namenode is up and running.
(D) Periodically merge the namespace image with the edit log.
Answer: (b)
63. The MapReduce algorithm contains three important tasks, namely ____.
(A) Splitting, mapping, reducing
(B)scanning, mapping, Reduction
(C) Map, Reduction, decluttering
(D) Cleaning, Map, Reduce
Answer: (a)
64. In how many stages the MapReduce program executes?
(A) 2
(B) 3
(C) 4
(D) 5
Answer: (d)
65. What is the function of Mapper in the MapReduce?
(A) Splitting the Data File
(B) Job
(C) Scanning the subblock of files
(D) PayLoad
Answer: (c)
66. Although the Hadoop framework is implemented in Java, MapReduce applications need be written in ___
(A) C
(B) C#
(C) Java
(D) None of the above
Answer: (d)
67. What is the meaning of commodity Hardware in Hadoop
(A) Very cheap hardware
(B) Industry standard hardware
(C) Discarded hardware
(D) Low specifications Industry grade hardware
Answer: (d)
68. Which of the following are true for Hadoop?
(A) It’s a tool for Big Data analysis
(B) It supports structured and unstructured data analysis
(C) It aims for vertical scaling out/in scenarios
(D) Both (a) and (b)
Answer: (d)
69. Which of the following are the core components of Hadoop 2.0?
(A) HDFS
(B) Map Reduce
(C) YARN
(D) all the above
Answer: (d)
70. Pogramming Language is used for real time queries.
(A) TRUE
(B) FALSE
Answer: (b)
71. What is the default HDFS block size for Hadoop 2.0?
(A) 32 MB
(B) 128 MB
(C) 128 KB
(D) 64 MB
Answer: (b)
72. Which of the following phases occur simultaneously ?
(A) Shuffle and Sort
(B) Reduce and Sort
(C) Shuffle and Map
(D) All of the mentioned
Answer: (a)
73. Major Components of Hadoop 1.0 are:
(A) HDFS and MapReduce
(B) Map Reduce, HDFS and YARN
(C) YARN and HDFS
(D) None of Above
Answer: (a)