1. The branch of statistics that deals with the development of particular statistical methods are classified as
A) industry statistics
B) economic statistics
C) applied statistics
D) applied statistics
Answer: C
2. Which of the following is true about regression analysis?
A) answering yes/no questions about the data
B) estimating numerical characteristics of the data
C) modeling relationships within the data
D) describing associations within the data
Answer: C
3. Text Analytics, also referred to as Text Mining?
A) True
B) False
C) Can be true or False
D) Can not say
Answer: A
4. What is a hypothesis?
A) A statement that the researcher wants to test through the data collected in a study.
B) A research question the results will answer.
C) A theory that underpins the study.
D) A statistical method for calculating the extent to which the results could have happened by chance.
Answer: A
5. What is the cyclical process of collecting and analyzing data during a single research study called?
A) Interim Analysis
B) Inter analysis
C) inter-item analysis
D) constant analysis
Answer: A
6. The process of quantifying data is referred to as __
A) Topology
B) Diagramming
C) Enumeration
D) coding
Answer: C
7. An advantage of using computer programs for qualitative data is that they _
A) Can reduce time required to analyse data (i.e., after the
data are transcribed)
B) Help in storing and organising data
C) Make many procedures available that are rarely done by hand due to time constraints
D) All of the above
Answer: D
8 __ are the basic building blocks of qualitative data.
A) Categories
B) Units
C) Individuals
D) None of the above
9. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
A) Segmenting
B) Coding
C) Transcription
D) Mnemoning
Answer: C
10. A graph that uses vertical bars to represent data is called a _
A) Line graph
B) Bar graph
C) Scatterplot
D) Vertical graph
Answer: B
11. __ are used when you want to visually examine the relationship between two quantitative variables.
A) Bar graph
B) pie graph
C) line graph
D) Scatterplot
Answer: D
12. The denominator (bottom) of the z-score formula is
A) The standard deviation
B) The difference between a score and the mean
C) The range
D) The mean
Answer: A
13. Which of these distributions is used for a testing hypothesis?
A) Normal Distribution
B) Chi-Squared Distribution
C) Gamma Distribution
D) Poisson Distribution
Answer: B
14. A statement made about a population for testing purpose is called?
A) Statistic
B) Hypothesis
C) Level of Significance
D) Test-Statistic
Answer: B
15. If the assumed hypothesis is tested for rejection considering it to be true is called?
A) Null Hypothesis
B) Statistical Hypothesis
C) Simple Hypothesis
D) Composite Hypothesis
Answer: A
16. If the null hypothesis is false then which of the following is
accepted?
A) Null Hypothesis
B) Positive Hypothesis
C) Negative Hypothesis
D) Alternative Hypothesis.
Answer: D
17. Alternative Hypothesis is also called as?
A) Composite hypothesis
B) Research Hypothesis
C) Simple Hypothesis
D) Null Hypothesis
Answer: B
18. Data Analysis is a process of?
A. inspecting data
B. cleaning data
C. transforming data
D. All of the above
Answer: D
19. Which of the following is not a major data analysis approaches?
A. Data Mining
B. Predictive Intelligence
C. Business Intelligence
D. Text Analytics
Answer: B
20. How many main statistical methodologies are used in data analysis?
A. B
B. C
C. D
D. 5
Answer: A
21. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors
B. floating descriptors
C. numerical descriptors
D. decimal descriptors
Answer: C
22. Data Analysis is defined by the statistician?
A. William S.
B. Hans Peter Luhn
C. Gregory Piatetsky-Shapiro
D. John Tukey
Answer: D
23. Which of the following is true about hypothesis testing?
A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. describing associations within the data
D. modeling relationships within the data
Answer: A
24. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer: A
25. The branch of statistics that deals with the development of particular statistical methods is classified as
A. industry statistics
B. economic statistics
C. applied statistics
D. mathematical statistics
Answer: D
26. Which of the following is true about regression analysis?
A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. modeling relationships within the data
D. describing associations within the data
Answer: C
27. Text Analytics, also referred to as Text Mining?
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer: A
28. What is the minimum no. of variables/ features required to perform clustering?
A) 0
B) A
C) B
D) C
Answer: B
29. For two runs of K-Mean clustering is it expected to get same clustering results?
A) Yes
B) No
Answer: B
30. Which of the following algorithm is most sensitive to outliers?
A) K-means clustering algorithm
B) K-medians clustering algorithm
C) K-modes clustering algorithm
D) K-medoids clustering algorithm
Answer: A
31. The discrete variables and continuous variables are two types of
A) Open end classification
B) Time series classification
C) Qualitative classification
D) Quantitative classification
Answer: D
32. Bayesian classifiers is
A) A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
B) Any mechanism employed by a learning system to constrain the search space of a hypothesis
C) An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D) None of these
Answer: A
33. Classification accuracy is
A) A subdivision of a set of examples into a number of classes
B) Measure of the accuracy, of the classification of a concept that is given by a certain theory
C) The task of assigning a classification to a set of examples
D) None of these
Answer: C
34. Euclidean distance measure is
A) A stage of the KDD process in which new data is added to the existing selection.
B) The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them
C) The distance between two points as calculated using the Pythagoras theorem
D) none of above
Answer: C
35. Hybrid is
A) Combining different types of method or information
B) Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
C) Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
D) none of above
Answer: A
36. Decision trees use __ , in that they always choose the option that seems the best available at that moment.
A) Greedy Algorithms
B) divide and conquer
C) Backtracking
D) Shortest path algorithm
Answer: A
37. Discovery is
A) It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
B) The process of executing implicit previously unknown and potentially useful information from data
C) An extremely complex molecule that occurs in human chromosomes and that carries genetic information in the form of genes.
D) None of these
Answer: B
38. Hidden knowledge referred to
A) A set of databases from different vendors, possibly using different database paradigms
B) An approach to a problem that is not guaranteed to work but performs well in most cases
C) Information that is hidden in a database and that cannot be recovered by a simple SQL query.
D) None of these
Answer: C
39. Enrichment is
A) A stage of the KDD process in which new data is added to the existing selection
B) The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them
C) The distance between two points as calculated using the Pythagoras theorem.
D) None of these
Answer: A
40.. ___ are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.
A) IDC
B) Naïve Bayes classifiers
C) CART
D) None of above
Answer: B
41. High entropy means that the partitions in classification are
A) Pure
B) Not Pure
C) Usefull
D) useless
Answer: B
42. Which of the following statements about Naive Bayes is incorrect?
A) Attributes are equally important.
B) Attributes are statistically dependent of one another given the class value.
C) Attributes are statistically independent of one another given the class value.
D) Attributes can be nominal or numeric
Answer: B
43. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.
A) Max Entropy is A
B) Max Entropy is B
C) Max Entropy is C
D) Max Entropy is D
Answer: C
44. Point out the wrong statement.
A) k-nearest neighbor is same as k-means
B) k-means clustering is a method of vector quantization
C) k-means clustering aims to partition n observations into k clusters
D) none of the mentioned
Answer: A
45. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not
know the theme of the articles ahead of time) ” is this:
A) Clustering
B) Classification
C) Regression
D) None of these
Answer: A
46. Clustering techniques are __ in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.
A) Unsupervised
B) supervised
C) Reinforcement
D) Neural network
Answer: A
47. ___ metric is examined to determine a reasonably optimal value of k.
A) Mean Square Error
B) Within Sum of Squares (WSS)
C) Speed
D) None of these
Answer: B
48. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.
A) Apriori Property
B) Downward Closure Property
C) Either A or B
D) Both A and B
Answer: D
49. bread,eggs,milk} has a support of 0.A5 and {bread,eggs} also has a support of 0.A5, the confidence of rule {bread,eggs}→{milk} is
A) 0
B) A
C) B
D) C
Answer: A
50. __ recommend items based on similarity measures between users and/or items.
A) Content Based Systems
B) Hybrid System
C) Collaborative Filtering Systems
D) None of these
Answer: C
51. There are __ major Classification of Collaborative Filtering Mechanisms
A) A
B) B
C) C
D) none of above
Answer: B
52. Movie Recommendation to people is an example of
A) User Based Recommendation
B) Item Based Recommendation
C) Knowledge Based Recommendation
D) content based recommendation
Answer: B
53. ___ recommenders rely on an explicitely defined set of recommendation rules
A) Constraint Based
B) Case Based
C) Content Based
D) User Based
Answer: B
zusammenhängende Posts
Stichworte
Kommentare