1. The document discusses key concepts in biostatistics including measures of central tendency, dispersion, correlation, regression, and sampling.
2. Measures of central tendency described are the mean, median, and mode. Measures of dispersion include range, standard deviation, and quartile deviation.
3. The importance of statistical analysis for living organisms in areas like medicine, biology and public health is highlighted. Examples are provided to demonstrate calculation of statistical measures.
WHAT IS STATISTICSStatistics is a branch of Applied Mathematics, which is applied for collection, compilation and organisation of data so as to transform them into Information. Information so obtained is applied for drawing inferences and used in planning, management and research.
APPLICATION OF STATISTICSSTATISTICS POPULATION STUDY METHODS OF DATA REDUCTION/ CORRECTION VARIATION STUDY
5.
BIO_STATISTICS Application ofstatistical methods for living organisms such as medical, biological and public health related problems. BIO-STATISTICS DESCRIPTIVE INFERENTIAL
6.
BIO-STATISTICS DESCRIPTIVE Observeall the subjects in the population. Used in small population groups. Resources are limited. Compiles data in to tables and graphs to give it meanings and information. The findings are applicable only to population studied. It is expressed in numbers and shows a pattern. INFERENTIAL Observes a part (sample)of the population Used when the population group is very large. Resources are adequate. The information drawn from the sample is valid on population. The finding of the sample is applicable to the whole population. It is expressed through rates/ratio, averages and dispersion.
7.
COMPONENTS OF BIO-STATISTICSBIO- STATISTICS RATES & RATIOS STATISTICAL AVERAGE DISPERSION CO-RELATION & REGRESSION SAMPLING INTERPRETATION
8.
STATISTICAL AVERAGES Statisticalaverages are determined through central tendency. When repeated samples are taken from the same population , then the value differs in each sample. The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This is called central tendency. Which can be measured by various methods.
9.
MEASURES OF CENTRALTENDENCY CENTRAL TENDENCY Aggregate/ Sum of the multiple Observations divided by no. of observations is mean value. The mid point value of an arranged no. of observations The most frequently occurring Value in the Observation.
10.
BIO- STATISTICS Measuresof Central Tendency: The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This central tendency can be measured in three ways. Arithmetic Mean/ average: M = x 1 +x 2 +x 3 +x 4 --------------x n N b) Median: i) It is a positional average whose value depends on central position occupied by a value in the frequency distribution. M 1 = N + 1 2 When the total number are even mean of two central values can also be defined as median. c) Mode: Mode is the frequently occurring variable in the distribution. AVERAGES
11.
MEAN (ARITHMETIC MEAN/ AVERAGE) Most commonly used measure of location Calculated by adding all observed values and dividing by the total number of observations Each observation is denoted as x 1 , x 2 , … x n The total number of observations: n Summation process = Sigma : Mean X = /n
12.
Computation of themean Duration of stay in days in a hospital 8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 M ean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease 8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 M ean incubation period = 107 / 9 = 11.89
13.
Advantages and disadvantages of the mean Advantages Disadvantages Has a lot of good properties Used as the basis of many statistical tests Good summary statistic for a symmetrical distribution Less useful for an asymmetric distribution Can be disorted by outliers , therefore giving a less ’’typical ’’ value
14.
The median describesliterally the middle value of the data It is defined as the value above or below which half (50%) the observations fall If the mean height of women in a village is 160 cm; it means 50% of the women in the village are taller than 160 cm.
15.
Arrange the observations in order from smallest to largest (ascending order) or vice-versa C ount the number of observations “ n ” I f “ n ” is an odd number Median = value of the (n+1) / 2th observation If “ n ” is an even number M edian = the average of the n / 2th and (n /2)+1th observations
16.
What is themedian of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order 2 , 3, 10, 12, 14 , 16, 18, 20, 25 n= 9 (odd) so median = (n+1)/2 the observation Median = (9+1)/2 = 5 th observation = 14 Suppose there is one more observation (8) 2 , 3, 8, 10, 12, 14, 16, 18, 20, 25 n = 10 (even) Median = the average of the 10 /2th and ( 10/2 )+1th observations Median = Mean of 12 & 14 = 13
17.
The median isnot sensitive to extreme values Median Same median
18.
ADVANTAGES AND DISADVANTAGES OF THE MEDIAN Advantages The median is unaffected by extreme values Disadvantages The median does not contain information on the other values of the distribution Only selected by its rank The median is less amenable to statistical tests
19.
The mode ofa distribution is the value that is observed most frequently in a given set of data How to obtain it? A rrange the data in sequence from low to high C ount the number of times each value occur s T he most frequently occurring value is the mode OR Prepare frequency table – the observation with the highest frequency is the mode
20.
EXAMPLES OF MODE 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order: 2, 2, 3, 3, 3, 3 , 4, 4, 7, 8 7, 8 The mode is 3 which occurs the highest no. of times a nnual salary (in 10,000 rupees )
IDEAL CENTRAL TENDENCYThe mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution
23.
0 2 46 8 10 12 14 N Mean = 10.8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FREQUENCY GRAPH Median = 10 Mode = 13.5
24.
DIFFERENCE MeanMedian Mode Represents centre of gravity of data set Represents middle of data set (half below & half above) Represents most common value Sensitive to extreme values Not sensitive to extreme values Data set may have no mode, one or multiple mode Most useful when data is normally distributed Most useful when data set is skewed or has few extreme values in one direction Not popular with bio-scientists
25.
1. Measures the scattering / variability of data around a measure of CT. 2. Gives an idea about the homogeneity or heterogeneity of the distribution of data. : Range, Mean Deviation (MD), Standard Deviation (SD), Quartile Deviation (QD) Coefficients of range, Coefficient of Range: [L-S]/[L+S. Co-efficient of MD Coefficient of MD = MD/M (Where M = Mean/ Median) MEASURES OF DISPERSION/VARIATIONS 1 Absolute Measures Relative Measures:
NORMAL STATISTICAL CURVE 68% 95% 99.7% Mean – 0 Total Area – 1 SD-1 SD-2 SD-3 In 95% confidence level there is probability of 5% of the data is outside SD-2 i.e. 1 in 20 samples therefore the probability P = 5/100 = 1/20 = 0.05 1 CT
DISPERSION RANGE MEASURES OF DISPERSION MEAN DEVIATION STANDARD DEVIATION
30.
Measures of DispersionThis is the measure of the range of the variables and their deviation from the mean. S No Diastolic Pr. (n) Mean/ average{m} Deviation from mean{n-m} (n- m) 2 83 81 + 2 4 75 81 - 6 36 81 81 + 0 - 79 81 - 2 4 71 81 - 10 100 95 81 +1 4 196 75 81 - 6 36 77 81 - 4 16 84 81 3 9 90 81 + 9 81 Total 810 81 56 482 Deviation from mean = 56 = 5.6 10 Standard Deviation = √ ( n- m) 2 = √ 482 = 7.31 N- 1 10-1 When the number of samples are less than 30. Then one to be deducted from the total number of samples. Range = 71 - 95
31.
INTERPRETATION (STANDARDERROR) Standard Error is a measure which enables to judge whether the mean of a given sample is within the set confidence level of population mean or not. STANDARD ERROR STANDARD ERROR OF MEAN STANDARD ERROR BETWEEN TWO MEANS STANDARD ERROR OF PROPORTION STANDARD ERROR OF DIFFERENCE BETWEEN TWO PROPORTION CHI- SQUARE TEST
32.
STANDARD ERROR OFMEAN Different samples from same population taken at different time will produce different means. (m) Frequency distribution of all sample means is normal distribution, Mean of the sample mean matches with population mean distribution. ( µ) And 95% of the sample means will remain between 2 SD on either side of true or population mean. Standard error of mean = µ + 2 σ / √n (n = Sample Size)
33.
EXAMPLE TakeRandom Sample of 25 males of age 12 years Mean Height is 50” and SD of 0.6 S.E of mean = SD/ √ (n) [n = Total Sample] = 0.6/ √25 = 0.6/5 = 0.12 at 95% confidence level = 50” ± (2x 0.12) = 50” ± 0.24 = 49.76 to 50.24 i.e. the population mean chance is 1 in 20 out side these limits.
34.
Standard Error ofproportion: p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52- 40 = 2.4 which is more than 2 hence the deviation is significant. SE (P) = √ pq n √ 52 X 48 100 = √ 2496 100 = √ 24.96 = 5 52 + 2 (5) = 62 52- 2 (5) = 42 5
35.
Co- relation: Itmeasures the extent of relation between two related variables such as ht and wt. Income and Exp. Etc If the variables are taken as x & y. (x- x , y- y deviation from mean) Correlation Correlation Co-efficient of correlation = r r = ∑ ( x- xi) (y- yi) ∑ √ (x- xi) 2 (y- yi) 2 The correlation co- efficient lies between -1 to +1. the value nearer to + 1 suggests degree of relation between variables but can not study the cause and effect relation.
36.
Regression: It measuresback wards and study of relation in cause and effect is possible between two dependable or independent variables. Given the value of independent variable, value of the dependent variable can be obtained by the formula y = y + b (x- x) Regression co- efficient (b) can be calculated as for Y upon ‘X’ b = ∑ (x- xi) (y- yi) ∑ (x-xi) 2
37.
SAMPLING When alarge population is to be studied a sample has to be taken which is optimum i.e. small enough to make the study feasible but large enough to make it scientifically or statistically valid. Sample Size: n = z 2 Pq d 2 n = Size of the population z = Confidence level for 95% = 1.96 for 99% = 2.64 p = Prior estimate of the proportion i.e. 50% q = 100 – P d = Error to be tolerated in the sample 5% i.e. 0.05
38.
SAMPLING METHODS RandomSampling Non Random Sampling Simple Restricted Judgment Convenience (When no of unit are less) (Easy to approach) Systematic Stratified Multistage
39.
Samplings Error: Ifrepeated samples are taken from the same population the result obtained from one sample will differ from the other because of Size of the sample Variability of the individuals reading. Standard Error: Though the different samples will have different means, the distribution of the sample means will be normal distribution around the population mean i.e. M + SD Standard Error of Mean: Population Mean = mean + 2 standard error of mean at 95% confidence i.e. p< 0.05 Standard Error of proportion: SE (P) = √ pq = √ 52 x 48 = √ 2496 = √ 24.96 = 5 n 100 100 52 + 2 (5) = 62 52- 2 (5) = 42 p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52-40 5 = 2.4 which is more than 2 hence the deviation is significant And not acceptable.
40.
Hospital Administration MadeEasy http//hospiad.blogspot.com An effort solely to help students and aspirants in their attempt to become a successful Hospital Administrator. hospi ad DR. N. C. DAS