Bio statistics

WHAT IS STATISTICS Statistics is a branch of Applied Mathematics, which is applied for collection, compilation and organisation of data so as to transform them into Information. Information so obtained is applied for drawing inferences and used in planning, management and research.

APPLICATION OF STATISTICS STATISTICS POPULATION STUDY METHODS OF DATA REDUCTION/ CORRECTION VARIATION STUDY

BIO_STATISTICS Application of statistical methods for living organisms such as medical, biological and public health related problems. BIO-STATISTICS DESCRIPTIVE INFERENTIAL

BIO-STATISTICS DESCRIPTIVE Observe all the subjects in the population. Used in small population groups. Resources are limited. Compiles data in to tables and graphs to give it meanings and information. The findings are applicable only to population studied. It is expressed in numbers and shows a pattern. INFERENTIAL Observes a part (sample)of the population Used when the population group is very large. Resources are adequate. The information drawn from the sample is valid on population. The finding of the sample is applicable to the whole population. It is expressed through rates/ratio, averages and dispersion.

COMPONENTS OF BIO-STATISTICS BIO- STATISTICS RATES & RATIOS STATISTICAL AVERAGE DISPERSION CO-RELATION & REGRESSION SAMPLING INTERPRETATION

STATISTICAL AVERAGES Statistical averages are determined through central tendency. When repeated samples are taken from the same population , then the value differs in each sample. The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This is called central tendency. Which can be measured by various methods.

MEASURES OF CENTRAL TENDENCY CENTRAL TENDENCY Aggregate/ Sum of the multiple Observations divided by no. of observations is mean value. The mid point value of an arranged no. of observations The most frequently occurring Value in the Observation.

BIO- STATISTICS Measures of Central Tendency: The objective of statistical analysis is to arrive at one numeric value which represents the inherent characteristic of entire population under study. This central tendency can be measured in three ways. Arithmetic Mean/ average: M = x 1 +x 2 +x 3 +x 4 --------------x n N b) Median: i) It is a positional average whose value depends on central position occupied by a value in the frequency distribution. M 1 = N + 1 2 When the total number are even mean of two central values can also be defined as median. c) Mode: Mode is the frequently occurring variable in the distribution. AVERAGES

MEAN (ARITHMETIC MEAN / AVERAGE) Most commonly used measure of location Calculated by adding all observed values and dividing by the total number of observations Each observation is denoted as x 1 , x 2 , … x n The total number of observations: n Summation process = Sigma :  Mean X =  /n

Computation of the mean Duration of stay in days in a hospital 8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 M ean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease 8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 M ean incubation period = 107 / 9 = 11.89

Advantages and disadvantages of the mean Advantages Disadvantages Has a lot of good properties Used as the basis of many statistical tests Good summary statistic for a symmetrical distribution Less useful for an asymmetric distribution Can be disorted by outliers , therefore giving a less ’’typical ’’ value

The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall If the mean height of women in a village is 160 cm; it means 50% of the women in the village are taller than 160 cm.

Arrange the observations in order from smallest to largest (ascending order) or vice-versa C ount the number of observations “ n ” I f “ n ” is an odd number Median = value of the (n+1) / 2th observation If “ n ” is an even number M edian = the average of the n / 2th and (n /2)+1th observations

What is the median of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order 2 , 3, 10, 12, 14 , 16, 18, 20, 25 n= 9 (odd) so median = (n+1)/2 the observation Median = (9+1)/2 = 5 th observation = 14 Suppose there is one more observation (8) 2 , 3, 8, 10, 12, 14, 16, 18, 20, 25 n = 10 (even) Median = the average of the 10 /2th and ( 10/2 )+1th observations Median = Mean of 12 & 14 = 13

The median is not sensitive to extreme values Median Same median

ADVANTAGES AND DISADVANTAGES OF THE MEDIAN Advantages The median is unaffected by extreme values Disadvantages The median does not contain information on the other values of the distribution Only selected by its rank The median is less amenable to statistical tests

The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it? A rrange the data in sequence from low to high C ount the number of times each value occur s T he most frequently occurring value is the mode OR Prepare frequency table – the observation with the highest frequency is the mode

EXAMPLES OF MODE 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order: 2, 2, 3, 3, 3, 3 , 4, 4, 7, 8 7, 8 The mode is 3 which occurs the highest no. of times a nnual salary (in 10,000 rupees )

FREQUENCY DISTRIBUTION OF MODE 0 2 4 6 8 10 12 14 16 18 20 N MODE Mode

IDEAL CENTRAL TENDENCY The mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution

0 2 4 6 8 10 12 14 N Mean = 10.8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FREQUENCY GRAPH Median = 10 Mode = 13.5

DIFFERENCE Mean Median Mode Represents centre of gravity of data set Represents middle of data set (half below & half above) Represents most common value Sensitive to extreme values Not sensitive to extreme values Data set may have no mode, one or multiple mode Most useful when data is normally distributed Most useful when data set is skewed or has few extreme values in one direction Not popular with bio-scientists

1. Measures the scattering / variability of data around a measure of CT. 2. Gives an idea about the homogeneity or heterogeneity of the distribution of data. : Range, Mean Deviation (MD), Standard Deviation (SD), Quartile Deviation (QD) Coefficients of range, Coefficient of Range: [L-S]/[L+S. Co-efficient of MD Coefficient of MD = MD/M (Where M = Mean/ Median) MEASURES OF DISPERSION/VARIATIONS 1 Absolute Measures Relative Measures:

NORMAL STATISTICAL CURVE 68% 95% 99.7% Mean – 0 Total Area – 1 SD-1 SD-2 SD-3 In 95% confidence level there is probability of 5% of the data is outside SD-2 i.e. 1 in 20 samples therefore the probability P = 5/100 = 1/20 = 0.05 1 CT

NORMAL STATISTICAL DISTRIBUTION

DISPERSION RANGE MEASURES OF DISPERSION MEAN DEVIATION STANDARD DEVIATION

Measures of Dispersion This is the measure of the range of the variables and their deviation from the mean. S No Diastolic Pr. (n) Mean/ average{m} Deviation from mean{n-m} (n- m) 2 83 81 + 2 4 75 81 - 6 36 81 81 + 0 - 79 81 - 2 4 71 81 - 10 100 95 81 +1 4 196 75 81 - 6 36 77 81 - 4 16 84 81 3 9 90 81 + 9 81 Total 810 81 56 482 Deviation from mean = 56 = 5.6 10 Standard Deviation = √ ( n- m) 2 = √ 482 = 7.31 N- 1 10-1 When the number of samples are less than 30. Then one to be deducted from the total number of samples. Range = 71 - 95

INTERPRETATION (STANDARD ERROR) Standard Error is a measure which enables to judge whether the mean of a given sample is within the set confidence level of population mean or not. STANDARD ERROR STANDARD ERROR OF MEAN STANDARD ERROR BETWEEN TWO MEANS STANDARD ERROR OF PROPORTION STANDARD ERROR OF DIFFERENCE BETWEEN TWO PROPORTION CHI- SQUARE TEST

STANDARD ERROR OF MEAN Different samples from same population taken at different time will produce different means. (m) Frequency distribution of all sample means is normal distribution, Mean of the sample mean matches with population mean distribution. ( µ) And 95% of the sample means will remain between 2 SD on either side of true or population mean. Standard error of mean = µ + 2 σ / √n (n = Sample Size)

EXAMPLE Take Random Sample of 25 males of age 12 years Mean Height is 50” and SD of 0.6 S.E of mean = SD/ √ (n) [n = Total Sample] = 0.6/ √25 = 0.6/5 = 0.12 at 95% confidence level = 50” ± (2x 0.12) = 50” ± 0.24 = 49.76 to 50.24 i.e. the population mean chance is 1 in 20 out side these limits.

Standard Error of proportion: p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52- 40 = 2.4 which is more than 2 hence the deviation is significant. SE (P) = √ pq n √ 52 X 48 100 = √ 2496 100 = √ 24.96 = 5 52 + 2 (5) = 62 52- 2 (5) = 42 5

Co- relation: It measures the extent of relation between two related variables such as ht and wt. Income and Exp. Etc If the variables are taken as x & y. (x- x , y- y deviation from mean) Correlation Correlation Co-efficient of correlation = r r = ∑ ( x- xi) (y- yi) ∑ √ (x- xi) 2 (y- yi) 2 The correlation coefficient lies between -1 to +1. the value nearer to + 1 suggests degree of relation between variables but can not study the cause and effect relation.

Regression: It measures back wards and study of relation in cause and effect is possible between two dependable or independent variables. Given the value of independent variable, value of the dependent variable can be obtained by the formula y = y + b (x- x) Regression coefficient (b) can be calculated as for Y upon ‘X’ b = ∑ (x- xi) (y- yi) ∑ (x-xi) 2

SAMPLING When a large population is to be studied a sample has to be taken which is optimum i.e. small enough to make the study feasible but large enough to make it scientifically or statistically valid. Sample Size: n = z 2 Pq d 2 n = Size of the population z = Confidence level for 95% = 1.96 for 99% = 2.64 p = Prior estimate of the proportion i.e. 50% q = 100 – P d = Error to be tolerated in the sample 5% i.e. 0.05

SAMPLING METHODS Random Sampling Non Random Sampling Simple Restricted Judgment Convenience (When no of unit are less) (Easy to approach) Systematic Stratified Multistage

Samplings Error: If repeated samples are taken from the same population the result obtained from one sample will differ from the other because of Size of the sample Variability of the individuals reading. Standard Error: Though the different samples will have different means, the distribution of the sample means will be normal distribution around the population mean i.e. M + SD Standard Error of Mean: Population Mean = mean + 2 standard error of mean at 95% confidence i.e. p< 0.05 Standard Error of proportion: SE (P) = √ pq = √ 52 x 48 = √ 2496 = √ 24.96 = 5 n 100 100 52 + 2 (5) = 62 52- 2 (5) = 42 p = Proportion of Male = 52 q = Proportion of Female = 48 n = Size of the sample = 100 In random sample of 100 the proportion of Male is 40 while in the population male is 62 Relative Deviate – 52-40 5 = 2.4 which is more than 2 hence the deviation is significant And not acceptable.

Hospital Administration Made Easy http//hospiad.blogspot.com An effort solely to help students and aspirants in their attempt to become a successful Hospital Administrator. hospi ad DR. N. C. DAS

Bio statistics

In this document

More Related Content

What's hot

Viewers also liked

Similar to Bio statistics

More from Nc Das

Recently uploaded

Bio statistics