BIO_STATISTICS DR. N.C DAS
WHAT IS STATISTICS Statistics is a  branch of Applied Mathematics, which is applied for collection, compilation and  organisation of data so as to transform them into Information. Information so obtained is applied for  drawing inferences and used in planning, management and research.
ELEMENTS OF STATISTICS
APPLICATION OF STATISTICS STATISTICS POPULATION STUDY METHODS OF DATA REDUCTION/ CORRECTION VARIATION STUDY
BIO_STATISTICS Application of statistical methods for living organisms such as  medical, biological and public health related problems.  BIO-STATISTICS DESCRIPTIVE INFERENTIAL
BIO-STATISTICS DESCRIPTIVE Observe all the subjects in the population. Used in small population groups. Resources are limited. Compiles data in to tables and graphs to give it meanings and information. The findings are applicable only to population studied. It is expressed in numbers and shows a pattern. INFERENTIAL Observes a part (sample)of the population Used when the population group is very large. Resources are adequate. The information drawn from the sample is valid on population. The finding of the sample is applicable to the whole population. It is expressed through rates/ratio,  averages and dispersion.
COMPONENTS OF BIO-STATISTICS BIO- STATISTICS   RATES & RATIOS   STATISTICAL AVERAGE  DISPERSION  CO-RELATION  & REGRESSION   SAMPLING  INTERPRETATION
STATISTICAL AVERAGES Statistical averages are determined through central tendency. When repeated samples are taken from the same population , then the value differs in each sample. The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This is called  central tendency. Which can be measured by various methods.
MEASURES OF CENTRAL TENDENCY CENTRAL TENDENCY Aggregate/ Sum of the multiple Observations divided by no. of observations is mean value. The mid point value of an arranged no. of observations The most frequently occurring  Value in the Observation.
BIO- STATISTICS Measures of Central Tendency: The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This central tendency can be measured in three ways. Arithmetic Mean/ average: M =  x 1  +x 2 +x 3 +x 4  --------------x  n   N b)  Median:   i) It is a positional average whose value depends on central position occupied by a value in the frequency distribution. M 1  =  N + 1 2 When the total number are even mean of two central values can also be defined as median. c)  Mode: Mode is the frequently occurring variable in  the distribution. AVERAGES
MEAN (ARITHMETIC MEAN / AVERAGE) Most commonly used measure of location Calculated by adding all observed values and dividing by the total number of observations Each observation is denoted as x 1 , x 2 , … x n The total number of observations: n Summation process = Sigma :   Mean X  =     /n
Computation of the mean Duration of stay in days in a hospital   8,25,7,5,8,3,10,12,9  9 observations (n=9) Sum  of all observations  =   87 M ean  duration of stay  = 87 / 9 = 9.67  Incubation period in days of a disease 8,45,7,5,8,3,10,12,9  9 observations (n=9) Sum  of all observations  =107  M ean  incubation period  = 107 / 9 = 11.89
Advantages and disadvantages  of the mean Advantages Disadvantages Has a lot of good properties  Used as the basis of many statistical tests Good summary statistic for a symmetrical distribution  Less useful for an asymmetric distribution Can be disorted by outliers , therefore giving a less ’’typical ’’ value
The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall If the mean height of women in a village is 160 cm; it means 50% of the women in the village are taller than 160 cm.
Arrange  the  observations  in order from smallest to largest (ascending   order)  or vice-versa C ount the number of  observations   “ n ” I f  “ n ”  is an odd number Median =  value of  the (n+1) / 2th observation If  “ n ”  is an even number M edian = the average of the n / 2th and (n /2)+1th observations
What is the median of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order 2 , 3, 10, 12,  14 , 16, 18, 20, 25 n= 9 (odd) so median = (n+1)/2 the observation Median = (9+1)/2 = 5 th  observation = 14 Suppose there is one more observation (8) 2 ,  3,  8,  10,  12,  14,  16,  18,  20,  25 n = 10 (even)  Median =  the average of the  10 /2th and ( 10/2 )+1th observations Median  =  Mean of 12  &  14 =  13
The median is not sensitive to  extreme values Median Same median
ADVANTAGES AND DISADVANTAGES  OF THE MEDIAN Advantages The median is unaffected by extreme values  Disadvantages The median does not contain information on the other values of the distribution  Only selected by its rank The median is less amenable to statistical tests
The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it? A rrange the data in sequence from low to high  C ount the number of times each value  occur s  T he most frequently  occurring  value is the mode   OR Prepare frequency table – the observation with the highest frequency is the mode
EXAMPLES OF MODE  4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order: 2, 2, 3,  3, 3, 3 , 4, 4, 7, 8 7, 8 The mode is  3 which occurs the highest no. of times a nnual salary  (in 10,000 rupees )
FREQUENCY DISTRIBUTION OF MODE 0 2 4 6 8 10 12 14 16 18 20 N MODE Mode
IDEAL CENTRAL TENDENCY The mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution
0 2 4 6 8 10 12 14 N Mean  = 10.8 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19 FREQUENCY GRAPH Median  = 10 Mode = 13.5
  DIFFERENCE Mean Median Mode Represents centre of gravity of data set Represents middle of data set (half below & half above) Represents most common value Sensitive to extreme values Not sensitive to extreme values Data set may have no mode, one or multiple mode Most useful when data is normally distributed Most useful when data set is skewed or has few extreme values in one direction Not popular  with  bio-scientists
1.  Measures  the scattering  / variability   of data  around a measure of CT.   2.  Gives an idea about the homogeneity or   heterogeneity of the distribution of data. : Range,  Mean Deviation (MD), Standard Deviation (SD),  Quartile Deviation (QD) Coefficients of range, Coefficient of Range: [L-S]/[L+S. Co-efficient of MD Coefficient of MD = MD/M (Where M = Mean/ Median) MEASURES OF DISPERSION/VARIATIONS  1 Absolute Measures  Relative Measures:
Position Dispersion CT
NORMAL STATISTICAL CURVE   68% 95% 99.7% Mean – 0 Total Area – 1 SD-1 SD-2 SD-3 In 95% confidence level there is probability of 5% of the data is outside SD-2  i.e. 1 in 20 samples therefore the probability P = 5/100 = 1/20 = 0.05 1 CT
NORMAL STATISTICAL DISTRIBUTION
DISPERSION  RANGE  MEASURES OF DISPERSION   MEAN  DEVIATION   STANDARD  DEVIATION
Measures of Dispersion This is the measure of the range of the variables and their deviation from the mean. S No Diastolic Pr. (n) Mean/ average{m} Deviation from mean{n-m} (n- m) 2 83 81 + 2 4 75 81 - 6 36 81 81 + 0 - 79 81 - 2 4 71 81 - 10 100 95 81 +1 4 196 75 81 - 6 36 77 81 - 4 16 84 81 3 9 90 81 + 9 81 Total 810   81 56 482 Deviation from mean = 56   = 5.6  10 Standard Deviation = √ ( n- m) 2   =  √  482 = 7.31   N- 1  10-1  When the number of samples are less than 30. Then one to be deducted  from the total number of samples. Range  =  71 - 95
INTERPRETATION  (STANDARD ERROR) Standard Error is a measure which enables to judge whether the mean of a given sample is within the set confidence level of population mean or not.  STANDARD  ERROR  STANDARD ERROR OF MEAN STANDARD ERROR BETWEEN TWO MEANS STANDARD ERROR OF PROPORTION STANDARD ERROR  OF DIFFERENCE BETWEEN  TWO PROPORTION CHI- SQUARE TEST
STANDARD ERROR OF MEAN   Different samples from same population taken at different time will produce different means. (m) Frequency distribution of all sample means is normal distribution,  Mean of the sample mean matches with population mean distribution. ( µ) And 95% of the sample means will remain between 2 SD on either side of true or population mean.  Standard error of mean =  µ  +  2  σ / √n (n = Sample Size)
EXAMPLE  Take Random Sample of 25 males of age 12 years  Mean Height is 50” and SD of 0.6 S.E of mean = SD/  √ (n)  [n = Total Sample]   = 0.6/ √25    = 0.6/5    = 0.12 at 95% confidence level = 50” ± (2x 0.12)   = 50” ± 0.24   = 49.76 to 50.24 i.e. the population mean chance is 1 in 20 out side these limits.
Standard Error of proportion: p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample  = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52- 40 = 2.4 which is more than 2 hence the deviation is significant. SE (P) =  √ pq   n √  52 X 48 100 =  √ 2496 100 =  √ 24.96 = 5 52 + 2 (5) = 62 52- 2 (5) = 42 5
Co- relation: It measures the extent of relation between two related variables such as ht and wt. Income and Exp. Etc If the variables are taken as x & y.  (x-  x , y- y   deviation from mean) Correlation  Correlation  Co-efficient of correlation  = r r  =  ∑   ( x- xi) (y- yi) ∑  √ (x- xi) 2  (y- yi) 2 The correlation co- efficient lies between -1 to +1. the value nearer to + 1 suggests degree of relation between variables but can not study the cause and effect relation.
Regression: It measures back wards and study of relation in cause and effect is possible between two dependable or independent variables. Given the value of independent variable, value of the dependent variable can be obtained by the formula y =  y  + b (x- x) Regression co- efficient (b) can be calculated as for Y upon ‘X’  b =  ∑  (x- xi) (y- yi)     ∑  (x-xi) 2
SAMPLING When a large population is to be studied a sample has to be taken which is optimum i.e. small enough to make the study feasible but large enough to make it scientifically or statistically valid. Sample Size: n =  z 2   Pq   d 2 n = Size of the population z = Confidence level for 95% = 1.96 for 99% = 2.64 p = Prior estimate of the proportion i.e. 50%  q = 100 – P d = Error to be tolerated in the sample 5% i.e. 0.05
SAMPLING METHODS Random Sampling  Non Random Sampling Simple  Restricted  Judgment  Convenience   (When no of unit are less)  (Easy to approach) Systematic  Stratified  Multistage
Samplings Error: If repeated samples are taken from the same population the result obtained from one sample will differ from the other because of   Size of the sample Variability of the individuals reading. Standard Error: Though the different samples will have different means, the distribution of the sample means will be normal distribution around the population mean i.e. M  +  SD  Standard Error of Mean: Population Mean = mean  +  2 standard error of mean at 95% confidence i.e. p< 0.05  Standard Error of proportion: SE (P) =  √ pq  =  √ 52 x 48  = √ 2496  = √ 24.96  = 5 n 100  100 52 + 2 (5)  =  62 52- 2 (5)  = 42 p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample  = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52-40  5 = 2.4 which is more than 2 hence the deviation is significant And not acceptable.
Hospital Administration Made Easy http//hospiad.blogspot.com An effort solely to help students and aspirants in their attempt to become a successful Hospital Administrator. hospi ad DR. N. C. DAS

Bio statistics

  • 1.
  • 2.
    WHAT IS STATISTICSStatistics is a branch of Applied Mathematics, which is applied for collection, compilation and organisation of data so as to transform them into Information. Information so obtained is applied for drawing inferences and used in planning, management and research.
  • 3.
  • 4.
    APPLICATION OF STATISTICSSTATISTICS POPULATION STUDY METHODS OF DATA REDUCTION/ CORRECTION VARIATION STUDY
  • 5.
    BIO_STATISTICS Application ofstatistical methods for living organisms such as medical, biological and public health related problems. BIO-STATISTICS DESCRIPTIVE INFERENTIAL
  • 6.
    BIO-STATISTICS DESCRIPTIVE Observeall the subjects in the population. Used in small population groups. Resources are limited. Compiles data in to tables and graphs to give it meanings and information. The findings are applicable only to population studied. It is expressed in numbers and shows a pattern. INFERENTIAL Observes a part (sample)of the population Used when the population group is very large. Resources are adequate. The information drawn from the sample is valid on population. The finding of the sample is applicable to the whole population. It is expressed through rates/ratio, averages and dispersion.
  • 7.
    COMPONENTS OF BIO-STATISTICSBIO- STATISTICS RATES & RATIOS STATISTICAL AVERAGE DISPERSION CO-RELATION & REGRESSION SAMPLING INTERPRETATION
  • 8.
    STATISTICAL AVERAGES Statisticalaverages are determined through central tendency. When repeated samples are taken from the same population , then the value differs in each sample. The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This is called central tendency. Which can be measured by various methods.
  • 9.
    MEASURES OF CENTRALTENDENCY CENTRAL TENDENCY Aggregate/ Sum of the multiple Observations divided by no. of observations is mean value. The mid point value of an arranged no. of observations The most frequently occurring Value in the Observation.
  • 10.
    BIO- STATISTICS Measuresof Central Tendency: The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This central tendency can be measured in three ways. Arithmetic Mean/ average: M = x 1 +x 2 +x 3 +x 4 --------------x n N b) Median: i) It is a positional average whose value depends on central position occupied by a value in the frequency distribution. M 1 = N + 1 2 When the total number are even mean of two central values can also be defined as median. c) Mode: Mode is the frequently occurring variable in the distribution. AVERAGES
  • 11.
    MEAN (ARITHMETIC MEAN/ AVERAGE) Most commonly used measure of location Calculated by adding all observed values and dividing by the total number of observations Each observation is denoted as x 1 , x 2 , … x n The total number of observations: n Summation process = Sigma :  Mean X =  /n
  • 12.
    Computation of themean Duration of stay in days in a hospital 8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 M ean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease 8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 M ean incubation period = 107 / 9 = 11.89
  • 13.
    Advantages and disadvantages of the mean Advantages Disadvantages Has a lot of good properties Used as the basis of many statistical tests Good summary statistic for a symmetrical distribution Less useful for an asymmetric distribution Can be disorted by outliers , therefore giving a less ’’typical ’’ value
  • 14.
    The median describesliterally the middle value of the data It is defined as the value above or below which half (50%) the observations fall If the mean height of women in a village is 160 cm; it means 50% of the women in the village are taller than 160 cm.
  • 15.
    Arrange the observations in order from smallest to largest (ascending order) or vice-versa C ount the number of observations “ n ” I f “ n ” is an odd number Median = value of the (n+1) / 2th observation If “ n ” is an even number M edian = the average of the n / 2th and (n /2)+1th observations
  • 16.
    What is themedian of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order 2 , 3, 10, 12, 14 , 16, 18, 20, 25 n= 9 (odd) so median = (n+1)/2 the observation Median = (9+1)/2 = 5 th observation = 14 Suppose there is one more observation (8) 2 , 3, 8, 10, 12, 14, 16, 18, 20, 25 n = 10 (even) Median = the average of the 10 /2th and ( 10/2 )+1th observations Median = Mean of 12 & 14 = 13
  • 17.
    The median isnot sensitive to extreme values Median Same median
  • 18.
    ADVANTAGES AND DISADVANTAGES OF THE MEDIAN Advantages The median is unaffected by extreme values Disadvantages The median does not contain information on the other values of the distribution Only selected by its rank The median is less amenable to statistical tests
  • 19.
    The mode ofa distribution is the value that is observed most frequently in a given set of data How to obtain it? A rrange the data in sequence from low to high C ount the number of times each value occur s T he most frequently occurring value is the mode OR Prepare frequency table – the observation with the highest frequency is the mode
  • 20.
    EXAMPLES OF MODE 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order: 2, 2, 3, 3, 3, 3 , 4, 4, 7, 8 7, 8 The mode is 3 which occurs the highest no. of times a nnual salary (in 10,000 rupees )
  • 21.
    FREQUENCY DISTRIBUTION OFMODE 0 2 4 6 8 10 12 14 16 18 20 N MODE Mode
  • 22.
    IDEAL CENTRAL TENDENCYThe mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution
  • 23.
    0 2 46 8 10 12 14 N Mean = 10.8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FREQUENCY GRAPH Median = 10 Mode = 13.5
  • 24.
      DIFFERENCE MeanMedian Mode Represents centre of gravity of data set Represents middle of data set (half below & half above) Represents most common value Sensitive to extreme values Not sensitive to extreme values Data set may have no mode, one or multiple mode Most useful when data is normally distributed Most useful when data set is skewed or has few extreme values in one direction Not popular with bio-scientists
  • 25.
    1. Measures the scattering / variability of data around a measure of CT. 2. Gives an idea about the homogeneity or heterogeneity of the distribution of data. : Range, Mean Deviation (MD), Standard Deviation (SD), Quartile Deviation (QD) Coefficients of range, Coefficient of Range: [L-S]/[L+S. Co-efficient of MD Coefficient of MD = MD/M (Where M = Mean/ Median) MEASURES OF DISPERSION/VARIATIONS 1 Absolute Measures Relative Measures:
  • 26.
  • 27.
    NORMAL STATISTICAL CURVE 68% 95% 99.7% Mean – 0 Total Area – 1 SD-1 SD-2 SD-3 In 95% confidence level there is probability of 5% of the data is outside SD-2 i.e. 1 in 20 samples therefore the probability P = 5/100 = 1/20 = 0.05 1 CT
  • 28.
  • 29.
    DISPERSION RANGE MEASURES OF DISPERSION MEAN DEVIATION STANDARD DEVIATION
  • 30.
    Measures of DispersionThis is the measure of the range of the variables and their deviation from the mean. S No Diastolic Pr. (n) Mean/ average{m} Deviation from mean{n-m} (n- m) 2 83 81 + 2 4 75 81 - 6 36 81 81 + 0 - 79 81 - 2 4 71 81 - 10 100 95 81 +1 4 196 75 81 - 6 36 77 81 - 4 16 84 81 3 9 90 81 + 9 81 Total 810 81 56 482 Deviation from mean = 56 = 5.6 10 Standard Deviation = √ ( n- m) 2 = √ 482 = 7.31 N- 1 10-1 When the number of samples are less than 30. Then one to be deducted from the total number of samples. Range = 71 - 95
  • 31.
    INTERPRETATION (STANDARDERROR) Standard Error is a measure which enables to judge whether the mean of a given sample is within the set confidence level of population mean or not. STANDARD ERROR STANDARD ERROR OF MEAN STANDARD ERROR BETWEEN TWO MEANS STANDARD ERROR OF PROPORTION STANDARD ERROR OF DIFFERENCE BETWEEN TWO PROPORTION CHI- SQUARE TEST
  • 32.
    STANDARD ERROR OFMEAN Different samples from same population taken at different time will produce different means. (m) Frequency distribution of all sample means is normal distribution, Mean of the sample mean matches with population mean distribution. ( µ) And 95% of the sample means will remain between 2 SD on either side of true or population mean. Standard error of mean = µ + 2 σ / √n (n = Sample Size)
  • 33.
    EXAMPLE TakeRandom Sample of 25 males of age 12 years Mean Height is 50” and SD of 0.6 S.E of mean = SD/ √ (n) [n = Total Sample] = 0.6/ √25 = 0.6/5 = 0.12 at 95% confidence level = 50” ± (2x 0.12) = 50” ± 0.24 = 49.76 to 50.24 i.e. the population mean chance is 1 in 20 out side these limits.
  • 34.
    Standard Error ofproportion: p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52- 40 = 2.4 which is more than 2 hence the deviation is significant. SE (P) = √ pq n √ 52 X 48 100 = √ 2496 100 = √ 24.96 = 5 52 + 2 (5) = 62 52- 2 (5) = 42 5
  • 35.
    Co- relation: Itmeasures the extent of relation between two related variables such as ht and wt. Income and Exp. Etc If the variables are taken as x & y. (x- x , y- y deviation from mean) Correlation Correlation Co-efficient of correlation = r r = ∑ ( x- xi) (y- yi) ∑ √ (x- xi) 2 (y- yi) 2 The correlation co- efficient lies between -1 to +1. the value nearer to + 1 suggests degree of relation between variables but can not study the cause and effect relation.
  • 36.
    Regression: It measuresback wards and study of relation in cause and effect is possible between two dependable or independent variables. Given the value of independent variable, value of the dependent variable can be obtained by the formula y = y + b (x- x) Regression co- efficient (b) can be calculated as for Y upon ‘X’ b = ∑ (x- xi) (y- yi) ∑ (x-xi) 2
  • 37.
    SAMPLING When alarge population is to be studied a sample has to be taken which is optimum i.e. small enough to make the study feasible but large enough to make it scientifically or statistically valid. Sample Size: n = z 2 Pq d 2 n = Size of the population z = Confidence level for 95% = 1.96 for 99% = 2.64 p = Prior estimate of the proportion i.e. 50% q = 100 – P d = Error to be tolerated in the sample 5% i.e. 0.05
  • 38.
    SAMPLING METHODS RandomSampling Non Random Sampling Simple Restricted Judgment Convenience (When no of unit are less) (Easy to approach) Systematic Stratified Multistage
  • 39.
    Samplings Error: Ifrepeated samples are taken from the same population the result obtained from one sample will differ from the other because of Size of the sample Variability of the individuals reading. Standard Error: Though the different samples will have different means, the distribution of the sample means will be normal distribution around the population mean i.e. M + SD Standard Error of Mean: Population Mean = mean + 2 standard error of mean at 95% confidence i.e. p< 0.05 Standard Error of proportion: SE (P) = √ pq = √ 52 x 48 = √ 2496 = √ 24.96 = 5 n 100 100 52 + 2 (5) = 62 52- 2 (5) = 42 p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52-40 5 = 2.4 which is more than 2 hence the deviation is significant And not acceptable.
  • 40.
    Hospital Administration MadeEasy http//hospiad.blogspot.com An effort solely to help students and aspirants in their attempt to become a successful Hospital Administrator. hospi ad DR. N. C. DAS