Most Important questions - Rahul Senchuri

Learn Agriculture Statistics with Rahul

About Lesson

Explain the term “Bayesian Analysis frame”.

Ans: Bayesian Analysis is a method of statistical interference that allows one to combine prior information about a population parameter with evidence from information contained in a sample to guide the statistical interference process.

Differentiate between “Sampling error” and “Standard error of the mean”.

Sampling error	Standard error of the mean
a. Occurs if you have the sample rather than the entire population.	a. It is essentially the standard deviation of sample, means around the population mean.
b. Sampling error = sample mean – population mean (µ)	b. Standard error = σ/
c. Can be found if the population composition is known.	c. Indicates that sampling error decreases as sample size increases.

Explain the term ‘Normal Approximation’.

Ans: The process of using the normal curves to estimate the shape of the distribution of a data set is called Normal Approximation.

Let x be a random variable based on n trials and Success probability p, So that

µ= np and σ =

If n and P are such that

np ≥10and n(1-p) ≥10.

The, x has an approximately normal distribution.

Differentiate “ Control” and “Placebo”.

Ans:

Placebo	Control
a. Doesn’t have active ingredient meant to affect the health.	a. Have active ingredient to affect health.
b. Is anything that seems to have real medical treatment but isn’t.	b. Real medicine for any disease.
c. Are used by researchers to understand the effect of new drug on a particular condition.	c. To treat any kind of illness.
d. Administered to determine effectiveness of new drug and check for side effects.	d. May or mayn’t have side effects.
e. Are sugar pills or dummy drugs.	e. Are real medicine having medical effect.

Explain the term ‘ bootstrap value’.

Ans: Boot strap value indicates how many times out of 100, the same branch was observed when repeating the phylogenetic reconstruction on a resampled set of your data.

Low bootstrap value indicates that there is conflicting signal in the data set.

Differentiate between F-test and t-test.

Ans:

T-test	F-test
a. Is used to test the hypothesis, if the given mean is significantly different from the sample mean or not.	a. Is used to compare the standard deviation from two samples and check their variability.
b. Two types: i. Paired T-test ii. Normal test	b. Only one type
c. Null hypothesis: The sample mean is equal to zero.	c. Null hypothesis: The sample mean has same variance.
d. Degree of freedom is n-1, where n is no. of sample.	d. Degree of freedom is n1-1 and n2-1, where n1 and n2 are no. of observation of sample 1 and 2.
e. To compare two sample mean.	e. To compare more than two sample mean.

Meta-Analysis in relation to Agriculture statistics?

Ans: Meta-Analysis is a statistical Analysis of a large group of detailed research results on a specific topic.

Its purpose is statistical integration between the results taken from a large community using several studies on different samples derived from the same community. This analysis carefully reviews the results that have already been published; it works on organizing, integrating and scientific evaluation of previous researchers and studies on a specific topic.

What are the problems faced with Data on research work?

Ans:

During Data collection:

Participants who are resistant to participate.
Dressing style i.e. whether to keep it formal or informal.
Lack of experience conducting qualitative interviews.
Feeling of isolation from peer and other researchers.
Choose, locate and convince participants for a research interview.
Dealing with vulnerable population and sensitive topic.

During Analysis:

Manually organizing the data for processing.
Conduct visual representation of data.
Analyzing data across multiple, disjointed sources leading to incomplete or inaccurate analysis.
Manual errors made during data entry.

During reporting:

Design of survey leading to incomplete report dur to time constraints.
Sampling error when the sample is too small or doesn’t truly represent the entire group.
Preparing mass of data and turning it into a comprehensive repot with conclusions and suggestions.

What is Prisma-Guidelines?

Ans: It is an evidence-based minimum set of items for reporting in systematic reviews and meta-analysis. It comprises of 27-item checklist and a four-phase flow diagram.

Differentiate between Research questions and Research Hypothesis.

Ans:

Research Questions	Research hypothesis
a. It is inquisitive in nature.	a. It is predictive in nature.
b. It can be used if little previous research is available on the subject.	b. It can be used if there is significant knowledge or previous research on this subject.
c. It can be used in both qualitative and quantitative studies.	c. Can be used in quantitative studies.
d. Allows a wide range of outcomes.	d. Doesn’t allow a wide range of outcomes.

What is Cochran’s Q test?

Ans: It is a statistical test that is used to determine whether the proportion of ‘success’ is equal across three or more groups in which the same individual appear in each group.

For example: We may use Cochran’s Q test to determine if the proportion of students who pass a test is equal when using three different studying techniques.

If P-value associated with the test statistic is less than a certain significance level ( α = 0.05), we can reject the null hypothesis and conclude that we have sufficient evidence to say the proportion of success is different in at least one of the groups.

Define I² Statistic.

Ans: It is the test of heterogeneity. It can be calculated from Cochran’s Q by formula,

I² =100% x Cochran’s Q- degree of freedom.

Any negative value of I² is considered equal to zero, so that the range of I² values is between 0-100%.

Define effect size.

Ans: It is a quantitative measure of the magnitude of experimental effect. The larger the effect size, the stronger the relationships between the two variables. It can be measured n 3 ways:

a) Standardized mean difference
b) odd ratio
c) Correlation coefficient

Why do researchers conduct multiple experiments?

Effects of factors under study vary from location to location or from year to year.
To obtain unbiased estimate.
To determine the effects of factors over time.
To investigate genotype x environment interactions.

What are the major data collection tools?

Ans: It includes the following:

Participatory method
Records and secondary data
Observation
Surveys and interviews
Focus groups
Diaries, Journals and self-centered checklist
Expert judgement
Delphi technique
Other tools

In a normal distribution, what % is within 1.6 SD from the mean?

Ans: Fig:

From standard normal table (z), 0.4452 observations or area lie within the range of µ ± 1.6. So, total % with 1.6 SD from the mean is 44.52% + 44.52% = 89.04%.

Explain the importance of precision and accuracy in sampling.

Ans: Precision is important to sampling as it measures the dispersion or scattering of the observation around the mean value.

i) Less the dispersion, more the scattering/ dispersion : Accuracy is important in the sampling as it represents the relative biasness of the observation means from true population mean.
ii) How far away from the true (unknown) population means is from our known sample mean is determined by accuracy.

Explain split plot design and mention its advantages.

Ans: A split plot design is an experimental design in which the levels of one or more experimental factors are held constant for a batch of several consecutive experimental runs, which is called a whole plot.

It has a mixture of hard to randomize and easy to randomize factors. The hard to change factors are implemented first, followed by easier-to-change factors.

Replicate step 2 two times

Field 1

Field 2

Field 3

Field 4

Advantage

Cheaper to run
More efficient statistically with increased precision

Examples of different sampling technique.

Ans:

Probability sampling:
Simple random sampling: Doing lottery system to select random samples.
Stratified sampling :

Larger population can be divided into smaller groups.
Usually don’t overlap and represent entire population.
Eg: Male-Male, Female-Female, teenager-teenager, Old-Old, etc.
Every unit has same chance of being selected.

III. Systematic sampling: Taking nth population i.e. 2^nd , 4^th, 6^th, 8^th, 10^th .

Cluster sampling: Selecting samples from population who are geographically apart.
Multistage sampling: Country can be divided into states, cities, urban and rural areas and all the areas with similar characteristics can be merged together to form a stratum.

Non-Probability sampling:
Convenience sampling: Researchers prefer this during the initial stage of survey research, as it is quick and easy to deliver results.
Quota sampling: If our population has 45% female and 55% males then our sample should reflect the same % of males and females.

III. Snowball sampling: Used in sensitive topics like HIV AIDS where people willn’t openly discuss and participate in surveys to share information about HIV AIDS.

Judgmental sampling: If you are trying to find who are interested in Master’s degree then selection criteria would be, “Are you interested in Masters in …? All the people who respond with ‘No’ will be excluded from our sample.
Briefly explain Type-I and Type-II errors.

Ans: Type-I error is defined as error that occurs when the sample results cause the rejection of the null hypothesis, inspite of the fact that it is true. It also refers to detecting an effect that is not present. Eg: Type-I error: Have positive result but don’t have cancer.

Type-II error arises when the researcher fails to deny the false null hypothesis. It also refers to faling to detect an effect that is present. Eg: Type-II error: Have negative results but have cancer.

Type-I error	Type-II error
a) Non-acceptance of hypothesis which ought to be accepted.	a) Acceptance of hypothesis which ought to be rejected.
b) False positive i.e. Male is pregnant.	b) False negative.
c) Incorrect rejection of true null hypothesis.	c) Incorrect acceptance of false null hypothesis.
d) A false hit.	d) A miss.
e) Equals the level of significance.	e) Equals the power of test.
f) Denoted by ‘α’.	f) Denoted by ‘β’.

Define level of significance / Power of test.

Ans: The level of significance or power of test , also denoted by ‘α’, is the probability of rejecting the null hypothesis when it is false. It is the likelihood that a particular study will detect a deviation from the null hypothesis given that one exists. It is also probability that it will correctly lead to rejection of a false hypothesis. It is also an ability to detect an effect ( not committing type II error)

We can calculate this probability by first calculating the probability that we accept the null hypothesis when we shouldn’t. It needs to be large ( at least 80%) or else we will waste our resources.

What is local control and why is it important in field research?

Ans: Local control means the control of all factors except the ones about which we are investigating. It is done to reduce or control the variation due to extraneous factors and increase the precision of the experiment.

The main objective of local control is to increase the efficiency of experimental design by designing the experimental errors.

Local control is important in field research because of the following reasons:

a) Reduce variations due to extraneous factors.
b) Increase the precision of experiment.
c) Control genetic variation.
d) Control homogenous environmental conditions affecting plant growth.
e) To provide a valid test of significance.

Difference between descriptive and inferential statistics.

Ans:

Descriptive statistics	Inferential statistics
a) It is concerned with describing the population study.	a) It is concerned with drawing conclusions about the population.
b) It helps in organize, analyze and present data in a meaningful way.	b) It helps in comparing, testing and predicting data.
c) Result is in form of charts, graphs and tables.	c) Result is in form of probability.
d) It is used to describe the situation.	d) It is used to explain the chances of occurrence of an event.
e) It explains the data, which is already known, to summarize sample.	e) It attempts to reach the conclusion to learn about the population.

Difference between Explained and unexplained variance.

Ans:

Explained variance	Unexplained variance
a) It is slope of the line. If the line doesn’t go up, there is no variation.	a) It is the difference between each point and that line.

Difference between Paired and Unpaired t-test.

Ans:

Paired t-test	Unpaired t-test
a) It is a statistical method to determine if there is a significant difference between the means of two dependent samples.	a) It is a statistical test to determine if there is a significant difference between the means of two independent samples.
b) There is relationship between groups.	b) There is no relationship between groups.
c) It doesn’t assume equal variance between groups.	c) Assumes equal variance between groups.

Define standard error.

Ans: Standard error is the standard deviation of a sample population . It measures the accuracy with which a sample represent a population.

S.E = S.D / √n

As SE increases, it becomes more likely that any given means is an inaccurate representation of the true population mean.

Define field experimentation.

Ans: Experiment that are carried out outside the laboratory settings. It enforces scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory.

How to test a hypothesis?

Ans :

Is a formal procedure to accept or reject a hypothesis.
Hypothesis refers to assumptions about a population parameter.
Consists of 4 steps:

i) State the hypothesis i.e. either it is null or alternate hypothesis.
ii) Formulate an analysis plan.

iii) Analyze sample data

iv) Interpret the result

Difference between experimental unit and treatment.

Ans:

Experimental unit	Treatment
a) Are basic objects in which experiments are done.	a) Are experimental conditions applied to experimental unit.
b) May be land, plant, animals, etc.	b) Are combination of specific values called levels.

What is statistical inferences?

Ans: It is the process of drawing conclusions from data that are subjected to random variation. It aims to learn characteristics of the population from a sample. It can be done by hypothesis testing.

How can you control errors in experiment?

Ans: It can be done by following ways:

Using principles of experimental design ( Replication, randomization and local control).
Correct selection of experimental materials.
Use of proper plot technique.
Selection of treatment.
Data Analysis

Difference between fixed and random effects.

Ans:

Fixed effects	Random effects
a) variables that are constant across individuals like age, sex, ethnicity i.e. change at constant rate over time.	a) Variables are unpredictable
b) Eg: A person will age at a constant rate.	b) Price of car may fluctuate depending on the year it is released.
c) Estimated using Least square method.	c) Estimated using the shrinkage.
d) There is small number of group or treatment.	d) There is large no. of group or treatment.
e) Produce small standard error.	e) Produce large standard error.

Difference between one tail test and two tail test.

Ans:

One-tail test	Two-tail test
a) Allow for the possibility of an effect in one direction.	a) Allow for the possibility of an effect in two directions ( +ve and -ve)
b) Alternate hypothesis has only one end.	b) Alternate hypothesis has two ends
c) Rejection region is either left or right.	c) Both left and right.
d) Denoted by sign > or <	d) Denoted by sign ≠