# Statistics I INTRODUCTION Statistics, branch of mathematics that deals with the collection, organization, and analysis of numerical data and with such problems as experiment design and decision making.

Publié le 12/05/2013

## Extrait du document

« Professional pollsters typically conduct their surveys among sample populations of 1,000 people.

Statistical measurementsshow that reductions in the margin of error flatten out considerably after the sample size reaches 1,000.© Microsoft Corporation.

All Rights Reserved. The raw materials of statistics are sets of numbers obtained from enumerations or measurements.

In collecting statistical data, adequate precautions must be taken tosecure complete and accurate information. The first problem of the statistician is to determine what and how much data to collect.

Actually, the problem of the census taker in obtaining an accurate and completecount of the population, like the problem of the physicist who wishes to count the number of molecule collisions per second in a given volume of gas under givenconditions, is to decide the precise nature of the items to be counted.

The statistician faces a complex problem when, for example, he or she wishes to take a samplepoll or straw vote.

It is no simple matter to gauge the size and constitution of the sample that will yield reasonably accurate predictions concerning the action of thetotal population. In protracted studies to establish a physical, biological, or social law, the statistician may start with one set of data and gradually modify it in light of experience.

Forexample, in early studies of the growth of populations, future change in size of population was predicted by calculating the excess of births over deaths in any givenperiod.

Population statisticians soon recognized that rate of increase ultimately depends on the number of births, regardless of the number of deaths, so they began tocalculate future population growth on the basis of the number of births each year per 1000 population.

When predictions based on this method yielded inaccurateresults, statisticians realized that other limiting factors exist in population growth.

Because the number of births possible depends on the number of women rather thanthe total population, and because women bear children during only part of their total lifetime, the basic datum used to calculate future population size is now thenumber of live births per 1000 females of childbearing age.

The predictive value of this basic datum can be further refined by combining it with other data on thepercentage of women who remain childless because of choice or circumstance, sterility, contraception, death before the end of the childbearing period, and otherlimiting factors.

The excess of births over deaths, therefore, is meaningful only as an indication of gross population growth over a definite period in the past; thenumber of births per 1000 population is meaningful only as an expression of the proportion of increase during a similar period; and the number of live births per 1000women of childbearing age is meaningful for predicting future size of populations. IV TABULATION AND PRESENTATION OF DATA Frequency-Distribution TableA frequency-distribution table summarizes data.

For example, there were 1200 grades received on 4 examinations by 10sections of 30 students each.

The first column lists the ten intervals into which the grades were grouped.

The secondcolumn lists the midpoints of these intervals.

The third column lists the number of grades in each interval, that is, theirfrequency.

(There were 20 grades between 0 and 10.) The fourth column lists the proportion of grades in each interval,that is, their relative frequency.

(.017 of the 1200 grades were between 0 and 10.) The fifth column lists the number ofgrades in an interval and all intervals below it, that is, their cumulative frequency.

(35 grades were in or below theinterval between 10 and 20.) The sixth column lists the proportion of grades in or below an interval, that is, their relativecumulative frequency.

(0.029 of the 1200 grades were in or below the interval 10 to 20.)© Microsoft Corporation.

To study and interpret the examination-grade distribution in a class of 30 pupils, for instance, the grades are arranged in ascending order: 30, 35, 43, 52, 61, 65, 65, 65, 68, 70, 72, 72, 73, 75, 75, 76, 77,78, 78, 80, 83, 85, 88, 88, 90, 91, 96, 97, 100, 100.

This progression shows at a glance that the maximum is 100, the minimum 30, and the range, or difference,between the maximum and minimum is 70. In a cumulative-frequency graph, such as Fig.

1, the grades are marked on the horizontal axis and double marked on the vertical axis with the cumulative number ofthe grades on the left and the corresponding percentage of the total number on the right.

Each dot represents the accumulated number of students who have attaineda particular grade or less.

For example, the dot A corresponds to the second 72; reading on the vertical axis, it is evident that there are 12, or 40 percent, of the grades equal to or less than 72.

In analyzing the grades received by 10 sections of 30 pupils each on four examinations, a total of 1200 grades, the amount of data is too large to be exhibitedconveniently as in Fig.

1.

The statistician separates the data into suitably chosen groups, or intervals.

For example, ten intervals might be used to tabulate the 1200grades, as in column (a) of the accompanying frequency-distribution table; the actual number in an interval, called the frequency of the interval, is entered in column(c).

The numbers that define the interval range are called the interval boundaries.

It is convenient to choose the interval boundaries so that the interval ranges areequal to each other; the interval midpoints, half the sum of the interval boundaries, are simple numbers, because they are used in many calculations.

A grade such as87 will be tallied in the 80-90 interval; a boundary grade such as 90 may be tallied uniformly throughout the groups in either the lower or upper intervals.

The relativefrequency, column (d), is the ratio of the frequency of an interval to the total count; the relative frequency is multiplied by 100 to obtain the percent relative frequency.The cumulative frequency, column (e), represents the number of students receiving grades equal to or less than the range in each succeeding interval; thus, thenumber of students with grades of 30 or less is obtained by adding the frequencies in column (c) for the first three intervals, which total 53.

The cumulative relativefrequency, column (f), is the ratio of the cumulative frequency to the total number of grades.. »

↓↓↓ APERÇU DU DOCUMENT ↓↓↓