Unit III : Big data MCQ
1. A ____ serves as the master and there is only one NameNode per cluster.
(A) Data Node
(B) NameNode
(C) Data block
(D) Replication
Answer: b
2. Point out the correct statement.
(A) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
(B) Each incoming file is broken into 32 MB by default
(C) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
(D) None of the mentioned
Answer: a
3. HDFS works in a ____ fashion.
(A) master-worker
(B) master-slave
(C) worker/slave
(D) all of the mentioned
Answer: a
4. ____ NameNode is used when the Primary NameNode goes down.
(A) Rack
(B) Data
(C) Secondary
(D) None of the mentioned
Answer: c
5. Point out the wrong statement.
(A) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
(B) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
(C) User data is stored on the local file system of DataNodes
(D) DataNode is aware of the files to which the blocks stored on it belong to
Answer: d
6. Which of the following scenario may not be a good fit for HDFS?
(A) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
(B) HDFS is suitable for storing data related to applications requiring low latency data access
(C) HDFS is suitable for storing data related to applications requiring low latency data access
(D) None of the mentioned
Answer: a
7. The need for data replication can arise in various scenarios like ____
(A) Replication Factor is changed
(B) DataNode goes down
(C) Data Blocks get corrupted
(D) All of the mentioned
Answer: d
8. ____ is the slave/worker node and holds the user data in the form of Data Blocks.
(A) DataNode
(B) NameNode
(C) Data block
(D) Replication
Answer: a
9. HDFS provides a command line interface called ____ used to interact with HDFS.
(A) “HDFS Shell”
(B) “FS Shell”
(C) “DFS Shell”
(D) None of the mentioned
Answer: b
10. HDFS is implemented in _____ programming language.
(A) C++
(B) Java
(C) Scala
(D) None of the mentioned
Answer: b
11. For YARN, the _____ Manager UI provides host and port information.
(A) Data Node
(B) NameNode
(C) Resource
(D) Replication
Answer: c
12. Point out the correct statement.
(A) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster
(B) Each incoming file is broken into 32 MB by default
(C) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
(D) None of the mentioned
Answer: a
13. For ____ the HBase Master UI provides information about the HBase Master uptime.
(A) HBase
(B) Oozie
(C) Kafka
(D) All of the mentioned
Answer: a
14. During start up, the _____ loads the file system state from the fsimage and the edits log file.
(A) DataNode
(B) NameNode
(C) ActionNode
(D) None of the mentioned
Answer: b
15. What is the utility of the HBase ?
(A) It is the tool for Random and Fast Read/Write operations in Hadoop
(B) Acts as Faster Read only query engine in Hadoop
(C) It is MapReduce alternative in Hadoop
(D) It is Fast MapReduce layer in Hadoop
Answer: a
16. What is Hive used as?
(A) Hadoop query engine
(B) MapReduce wrapper
(C) Hadoop SQL interface
(D) All of the above
Answer: d
17. What is the default size of the HDFS block ?
(A) 32 MB
(B) 64 KB
(C) 128 KB
(D) 64 MB
Answer: d
18. In the HDFS what is the default replication factor of the Data Node?
(A) 4
(B) 1
(C) 3
(D) 2
Answer: c
19. What is the protocol name that is used to create replica in HDFS?
(A) Forward protocol
(B) Sliding Window Protocol
(C) HDFS protocol
(D) Store and Forward protocol
Answer: c
20. HDFS data blocks can be read in parallel.
(A) True
(B) False
Answer: a
21. Which of the following is fact about combiners in HDFS?
(A) Combiners can be used for mapper only job
(B) Combiners can be used for any Map Reduce operation
(C) Mappers can be used as a combiner class
(D) Combiners are primarily aimed to improve Map Reduce performance
(E) Combiners can’t be applied for associative operations
Answer: d
22. In HDFS the Distributed Cache is used in which of the following
(A) Mapper phase only
(B) Reducer phase only
(C) In either phase, but not on both sides simultaneously
(D) In either phase
Answer: d
23. Which of the following type of joins can be performed in Reduce side join operation?
(A) Equi Join
(B) Left Outer Join
(C) Right Outer Join
(D) Full Outer Join
(E) All of the above
Answer: e
24. A Map reduce function can be written:
(A) Java
(B) Ruby
(C) Python
(D) Any Language which can read from input stream
Answer: d
25. In the map is there any input format?
(A) Yes, but only in Hadoop 0.22+.
(B) Yes, there is a special format for map files.
(C) No, but sequence file input format can read map files.
(D)Both 2 and 3 are correct answers
Answer: c
26. Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
(A) Split
(B)Map
(C) Combine
(D)Reduce
Answer: a
27. Which method of the FileSystem object is used for reading a file in HDFS
(A) open()
(B) access()
(C) select()
(D) None of the above
Answer: a
28. The world’s largest Hadoop cluster.
(A) Apple
(B)Facebook
(C) Datamatics
(D) None of the mentioned
Answer: b
29. The Big Data Tackles Facebook are based on on____ Hadoop.
(A) ‘Project Data
(B)‘Prism’
(C) ‘Project Big’
(D)‘Project Prism’
Answer: d
30. Which SequenceFile are present in Hadoop I/O ?
(A) 2
(B) 8
(C) 9
(D) 3
Answer: c
31. slowest compression technique is __
(A) Bzip2
(B)LZO
(C) Gzip
(D) All of the mentioned
Answer: c
32. Which of the following is a typically compresses files which are best available techniques.10% to 15 %.
(A)Bzip2
(B)LZO
(C) Gzip
(D) both Dand C
Answer: a
33. Which of the following is provides search technology? and Java-based indexing
(A) Solr
(B) Lucy
(C) Lucene Core
(D) None of these
Answer: c
34. Are defined with Avro schemas ___
(A) JAVA
(B) XML
(C) All of the mentioned
(D) JSON
Answer: d
35. ___ of the field is used to Thrift resolves possible conflicts.
(A) Name
(B) UID
(C) Static number
(D) All of the mentioned
Answer: c
36. __ layer of is said to be the future Hadoop Avro.
(A) RMC
(B) RPC
(C) RDC
(D) All of the mentioned
Answer: b
37. High storage density Which of the following has high storage density?
(A) RAM_DISK
(B)ARCHIVE
(C) ROM_DISK
(D) All of the mentioned
Answer: b
38. HDFS provides a command line interface called ____ used to interact with HDFS.
(A) “HDFS Shell”
(B) “FS Shell”
(C) “DFS Shell”
(D) None of the mentioned
Answer: b
39. Which format from the given format is more compression-aggressive?
(A) Partition Compressed
(B) Record Compressed
(C) Block-Compressed
(D) Uncompressed
Answer: c
40. Avro schemas describe the format of the message and are defined using __
(A) JSON
(B) XML
(C) JS
(D) All of the mentioned
Answer: b
41. Which editor is used for editing files in HDFS
(A) Vi Editor
(B) Python editor
(C) DOS editor
(D) DEV C++ Editor
Answer: a
42. Command to view the directories and files in specific directory:
(A) Ls
(B) Fs –ls
(C) Hadoop fs –ls
(D) Hadoop fs
Answer: a
43. Which among the following is correct?
S1: MapReduce is a programming model for data processing
S2: Hadoop can run MapReduce programs written in various languages
S3: MapReduce programs are inherently parallel
(A) S1 and S2
(B) S2 and S3
(C) S1 and S3
(D) S1, S2 and S3
Answer: d
44. Mapper class is
(A) generic type
(B) abstract type
(C) static type
(D) final
Answer: a
45. Which package provides the basic types of Hadoop?
(A) org.apache.hadoop.io
(B) org.apache.hadoop.util
(C) org.apache.hadoop.type
(D) org.apache.hadoop.lang
Answer: a
46. Which among the following does the Job control in Hadoop?
(A) Mapper class
(B) Reducer class
(C) Task class
(D) Job class
Answer: d
47. Hadoop runs the jobs by dividing them into
(A) maps
(B) tasks
(C) individual files
(D) None of these
Answer: b
48. Which are the two nodes that control the job execution process of Hadoop?
(A) Job Tracker and Task Tracker
(B) Map Tracker and Reduce Tracker
(C) Map Tracker and Job Tracker
(D) Map Tracker and Task Tracker
Answer: a
49. Which among the following schedules tasks to be run?
(A) Job Tracker
(B) Task Tracker
(C) Job Scheduler
(D) Task Controller
Answer: A
50. What are fixed size pieces of MapReduce job called?
(A) records
(B) splits
(C) tasks
(D) maps
Answer: b
51. Where is the output of map tasks written?
(A) local disk
(B) HDFS
(C) File System
(D) secondary storge
Answer: a
52. Which among the following is responsible for processing one or more chunks of data and producing the output results.
(A) Maptask
(B) jobtask
(C) Mapper class
(D) Reducetask
Answer: a
53. Which acts as an interface between Hadoop and the program written?
(A) Hadoop Cluster
(B) Hadoop Streams
(C) Hadoop Sequencing
(D) Hadoop Streaming
Answer: d
54. What are Hadoop Pipes?
(A) Java interface to Hadoop MapReduce
(B) C++ interface to Hadoop MapReduce
(C) Ruby interface to Hadoop MapReduce
(D) Python interface to Hadoop MapReduce
Answer: b
55. What does Hadoop Common Package contain?
(A) war files
(B) msi files
(C) jar files
(D) exe files
Answer: c
56. Which among the following is the master node?
(A) Name Node
(B) Data Node
(C) Job Node
(D) Task Node
Answer: a
57. Which among the following is the slave node?
(A) Name Node
(B) Data Node
(C) Job Node
(D) Task Node
Answer: b
58. Which acts as a checkpoint node in HDFS?
(A) Name Node
(B) Data Node
(C) Secondary Name Node
(D) Secondary Data Node
Answer: c
59. Which among the following holds the location of data?
(A) Name Node
(B) Data Node
(C) Job Tracker
(D) Task Tracker
Answer: a
60. What is the process of applying the code received by the JobTracker on the file called?
(A) Naming
(B) Tracker
(C) Mapper
(D) Reducer
Answer: a
61. In which mode should Hadoop run in order to run pipes job?
(A) distributed mode
(B) centralized mode
(C) pseudo distributed mode
(D) parallel mode
Answer: b
62. Which of the following are correct?
S1:Namespace volumes are independent of each other
S2: Namespace volumes are manages by namenode
(A) S1 only
(B) S2 only
(C) Both S1 and S2
(D) Neither S1 nor S2
Answer: c
63. Which among the following architectural changes need to attain High availability in HDFS?
(A) Clients must be configured to handle namenode failover
(B) Datanodes must send block reports to both namenodes since the block mappings are stored in a namenode’s memory, and not on disk
(C) namenodes must use highly-available shared storage to share the edit log
(D) All of the above
Answer: d
64. Which controller in HDFS manages the transition from the active namenode to the standby?
(A) failover controller
(B) recovery controller
(C) failsafe controller
(D) fencing controller
Answer: a
65. Which among the following is not an fencing mechanism employed by system in HDFS?
(A) killing the namenode’s process
(B) disabling namenode’s network port via a remote management command
(C) revoking namenode’s access to the shared storage directory
(D) None of the above
Answer: d
66. What is the value of the property dfs.replicationet in case of pseudo distributed mode?
(A) 0
(B) 1
(C) null
(D) yes
Answer: b
67. What is the minimum amount of data that a disk can read or write in HDFS?
(A) block size
(B) byte size
(C) heap
(D) None
Answer: a
68. Which HDFS command checks file system and lists the blocks?
(A) hfsck
(B) fcsk
(C) fblock
(D) fsck
Answer: d
69. What is an administered group used to manage cache permissions and resource usage?
(A) Cache pools
(B) block pool
(C) Namenodes
(D) HDFS Cluster
Answer: a
70. Which object encapsulates a client or server’s configuration?
(A) File Object
(B) Configuration object
(C) Path Object
(D) Stream Object
Answer: b
71. Which interface permits seeking to a position in the file and provides a query method for the current offset from the start of the file?
(D) DataStream
(A) Seekable
(B) PositionedReadable
(C) Progressable
Answer: b
72. Which method is used to list the contents of a directory?
(A) listFiles
(B) listContents
(C) listStatus
(D) listPaths
Answer: C
73. What is the operation that use wildcard characters to match multiple files with a single expression called?
(A) globbing
(B) pattern matching
(C) regex
(D) regexfilter
74. What does the globStatus() methods return?
(A) an array of FileStatus objects
(B) an array of ListStatus objects
(C) an array of PathStatus objects
(D) an array of FilterStatus objects
Answer: a
75. What does the glob question mark(?) matches?
(A) zero or more characters
(B) one or more characters
(C) a single character
(D) metacharacter
Answer: c
76. Which method on FileSystem is used to permanently remove files or directories?
(A) remove()
(B) rm()
(C) del()
(D) delete()
Answer: d
77. Which streams the packets to the first datanode in the pipeline?
(A) DataStreamer
(B) FileStreamer
(C) InputStreamer
(D) PathStreamer
Answer: a
78. Which queue is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas?
(A) ack queue
(B) data queue
(C) path queue
(D) stream queue
Answer: b
79. Which command is used to copy files/directories?
(A) distcp
(B) hcp
(C) copy
(D) cp
Answer: a
80. Which flag is used with distcp to delete any files or directories from the destination?
(A) -remove
(B) -rm
(C) -del
(D) -delete
Answer: d