The research objects of probability theory and mathematical statistics are random phenomena, that is, phenomena that do not always have the same results under certain conditions, that is, phenomena that cannot determine the results are collectively called random phenomena. There are many random phenomena in real life, for example, the phenomenon that a student with a unified major in the same school is admitted to graduate school is random. You can't say which student will be admitted to a certain school, but you can estimate the postgraduate entrance examination rate of this school according to the data of this school in previous years, and to some extent, you can roughly estimate the possibility of a student admitted to a graduate school. Of course, whether a student can be admitted to a graduate school is not necessarily related to the postgraduate entrance examination rate of this school, because it is random and uncertain, but there is a certain correlation in it. Total probability theory studies the model (probability distribution) of random phenomena, and probability distribution is a tool that can be used to describe the characteristics of a random phenomenon. Where there is yin, there is yang, and when there are random events, it naturally corresponds to certain phenomena (such as the rising and setting of the sun every day).
1. 1.2 sample space:
The set of all possible basic results of random phenomena is called sample space, and the elements in the set are also called sample points. When the number of sample points is countable or finite, it is called discrete sample space, and when the number of sample points is infinite or uncountable, it is called continuous sample space. (A list is to list them one by one in a certain order. For example, the number of people who arrive at a shopping mall on a certain day is an integer 1, 2, 3. . . . For example, the life of a TV set is 100. 1 hour, 100.0 1 hour, 100.438+0 hour. You can never list the next element less than 100 in order.
1. 1.3 random events:
Random phenomena are called random events, which means that a random event is a subset of the sample space, while a set of single elements in the sample space is called a basic event. The sample space itself is also an event called an inevitable event, and the smallest subset of the sample space, namely an empty set, is called an impossible event.
1. 1.4 random variable:
Variables used to represent the results of random phenomena are called random variables, and the values of random variables represent the results of random events. In fact, the results of random events can often correspond to the value of a random variable.
1. 1.5 operations and relationships between random events;
Since we define random events as operations between set events, it can also be regarded as operations between sets. The operations of intersection, union, complement and difference between sets also exist between random events, and the operation rules are consistent. There are inclusion, equality, incompatibility, opposition and events between sets. The operational properties between random events satisfy the exchange law, correlation law, distribution rate and De Morgan law.
1. 1.6 Event domain:
Event domain is a set class composed of some subsets of sample space, and it satisfies three conditions. The number of elements in the event domain is the number of subsets of the sample space. For example, a sample space with n sample points has elements in its event domain. Defining the event domain is mainly to prepare for defining the event probability.
The most basic problem in probability theory is how to determine the probability of a random event. Although the result of a random event is uncertain, its result has certain regularity (that is, the probability of random events), and the tool used to describe this regularity is probability, but how do we define probability? How to measure the possibility of describing an event? This is a problem.
In the history of probability theory, there have been various definitions of probability for different random events, but those definitions are only applicable to a certain kind of random events. So how to give the most general definition of probability applicable to all random phenomena? 1900, the mathematician Hilbert proposed to establish an axiomatic definition of probability, that is, to establish a universal definition of probability that satisfies all random events, and to describe probability with the essence of probability. 1933, Andre Andrey Kolmogorov, a mathematician in the former Soviet Union, put forward the axiomatic definition of probability for the first time, which not only summarized the same characteristics of several definitions of probability in history, but also avoided their respective ambiguities. No matter what random phenomenon meets the three axioms in the definition, it can be said to be probability. After the definition was published, it was unanimously recognized by almost all mathematicians. (Off-topic, if a mathematician makes a major discovery, he needs to write a paper to be unanimously recognized by people in the academic circle, and his discovery can be written into the textbook as an axiom. It is called axiom because it is both a universally applicable principle and an accepted truth.
Three axiomatic definitions of 1.2. 1 probability;
Every random event must be accompanied by her sample space (just like some successful men have wives behind them). Every random event belongs to the event domain of the sample space. If the sample space is selected differently, the probability of the same random event will usually be different.
If probability satisfies the above three axioms, the space composed of sample space, event domain and probability is called probability space, and the probability satisfying the above three axioms can be called probability.
The axiomatic definition of probability does not give a method to calculate probability, so how to determine probability becomes another problem after knowing what probability is.
1.2.2 Frequency method for determining probability:
The application scenario of frequency method to determine probability is carried out in random experiments that can be repeated in a large number. The idea of using the stable value of frequency to obtain the estimated value of probability is as follows:
Why do you think of using frequency to estimate probability? Because people's long-term practice shows that with the increase of the number of experiments, the frequency will stabilize around a certain constant, which is called the stable value of frequency. Later, bernhard's law of large numbers proved that its stable value is the probability of random events, and it can be proved that frequency meets the three axiomatic definitions of probability, indicating that frequency is "pseudo-probability".
1.2.4 Classical method for determining probability;
Classical problems are the earliest problems in the history of probability theory learning, including the dice problem studied by Pascal, which are all classical problems. He is simple and intuitive, and we can analyze clearly and sensibly on the basis of empirical facts without doing a lot of experiments.
The idea of determining probability by classical method is as follows:
Obviously, classical probability satisfies the three axiomatic definitions of probability, and classical probability is the oldest commonly used method to determine probability. Finding the classical probability boils down to finding the total number of sample points and the number of event sample points in the sample space, so the tool of permutation and combination is often used in calculation.
1.2.5 Geometric method for determining probability;
Basic idea:
1.2.6 Subjective method for determining probability:
In the real world, some random phenomena can't be randomly tested, or the cost of conducting random experiments is too high to be worth the candle. How to determine the probability at this time?
Bayesian statistical school believes that the probability of an event is people's personal belief in the possibility of an event according to experience, so the probability given is called subjective probability. For example, I said that my probability of being admitted to graduate school is 100% (this is of course bragging, but it also includes self-confidence and my understanding of my own learning situation, as well as my understanding of the institutions I applied for). For example, an entrepreneur said that according to his years of experience and some market information at that time, the possibility of a new product selling well in the market is 80% (in this case, if an acquaintance tells you privately, you can still believe it, but be careful. If a stranger said it in front of many people, would you believe it? Only a fool would believe it, right? So popular, why not make it yourself and give it to Lao Tzu? )。 Subjective probability is an estimate of the possibility of something happening according to the actual situation, but the quality of this estimate remains to be verified.
You don't have to remember this if you understand. I am a very diligent person, too lazy to remember and write other formulas. . . . The following only analyzes conditional probability, total probability formula and Bayesian formula:
1.3. 1 conditional probability:
The so-called conditional probability is the probability that B occurs when event A occurs, that is, if P (b) >, A B is every two events in the sample space; 0 means:
It is the conditional probability of the occurrence of A on the premise of the occurrence of B, referred to as conditional probability.
This formula is not difficult to understand. In fact, the above formula means that "under the condition of B occurrence, the probability of A occurrence is equal to the number of sample points of event A and event B * * greater than the number of sample points of the last B", which can verify that this conditional probability meets the three axiomatic definitions of probability.
1.3.2 multiplication formula:
1.3.3 Total probability formula:
Let it be a division of sample space, that is, it is mutually incompatible, if there are:
This formula is also easy to understand, because the samples are incompatible with each other, and their sum events are sample spaces, so the number of sample points in event A is equal to the sum of sample points in event A and * * *.
1.3.4 Bayesian formula:
Bayesian formula is derived on the basis of total probability formula and multiplication formula.
If it is a partition of the sample space, it is mutually incompatible, and if:
The proof of the formula is based on conditional probability, and then the numerator denominator can be replaced by multiplication formula and total probability formula respectively. In the formula, the known probability is called prior probability, while in the formula, it is called posterior probability. The total probability formula and multiplication formula derive the result from the cause, and Bayesian formula derives the cause from the result.
1.3.5 Event independence:
Above, we introduced the concept of conditional probability. Under condition A, the probability that condition B occurs is, if the occurrence of condition B is not affected by A? Intuitively, this will mean
So introduce the following definition, if any two events A and B are involved, then event A and event B are independent of each other.
In addition to the definition that two random events satisfy each other independently, there will of course be many definitions of independent satisfaction of random events. For n independent random events, any random event in the event is required to be independent of each other.
1.3.6 Bernoulli probability type:
Definition: If there are only two possible results in experiment E, and then the experiment is repeated for n times, the N-fold Bernoulli test or Bernoulli probability form is formed. Obviously, the results of each Bernoulli test event are independent of each other, so Bernoulli test obviously obeys binomial distribution, and then introduces binomial distribution.
1.4. 1 discrete random variable:
As I have said before, the variables used to represent the results of random phenomena are called random variables, for example, the value of a random variable can be 1, 2, 3 ... Obviously, the results of random tests are in one-to-one correspondence with the values of random variables, so we study the statistical law of random test results as the statistical law of random variables, which is artificially established and reasonable, and call random variables when only a limited number or a column of values can be taken.
1.4.2 Distribution list of random variables;
List the values of random variables and their corresponding values, that is, probability, which is called distribution table. The distribution table makes the statistical law of random variables clear at a glance, and it is convenient to calculate the variance and mean of their characteristic numbers. Distribution lists satisfy the following two attributes:
A list that meets the above two attributes is called a distribution list.
1.4.3 distribution function:
Let x be a random variable, and for any real number x, it is written as the distribution function of random variable X.
The distribution function satisfies the following three characteristics:
The above properties are necessary and sufficient conditions for the function to be a distributed function.
1.4.4 Mathematical expectation and variance:
Let's look at an example. A watch factory randomly checks the daily travel time error of N= 100 watches in its products. The data are as follows:
At this time, the average daily travel time error of these 100 watches is: the place where the daily travel time error frequency is recorded.
The average value is the sum of frequency times frequency, so in theory, frequency should be replaced by probability because frequency is stable in time. At this time, we call the average value obtained after frequency is replaced by probability as mathematical expectation (in fact, the average value obtained by the law of large numbers is also stable in mathematical expectation), and mathematical expectation reflects the average degree of the result of random variable X to a certain extent, that is, the size of the whole, which is recorded as.
Definition: Let x be a random variable. If the mean value of x exists, it is called the variance of random variables.
Obviously variance is also a mean, so what is it? It represents the average deviation of random variables. We can deduce that the sum of mean deviations of random variables is equal to zero, so the mean of the sum of mean deviations is also equal to zero, but we hope to describe the differences between different distributions with deviation. If we use the average of the sum of the average deviations, then any distribution is zero, so we add a square to the deviation to avoid deviation and zero. So what is the meaning of variance, a number representing distribution characteristics? Many people seem to have finished learning probability statistics, but they don't even understand the meaning of variance. In fact, variance is used to describe the differences between data, and to describe the differences between data, whether it is a vector in space or a point on a plane, it is better to describe the differences between them by distance. In physics, if we want to compare the velocity and acceleration of two moving objects correctly and reasonably, we need to choose a suitable reference frame for comparison. Similarly, when comparing the differences between data, we often take the mean as their reference (in fact, we can also compare with other values, but that may cause the phenomenon of excessive variance). The greater the distance from the mean, the greater the difference between the two, and the distance is divided into positive and negative points. So in order to distinguish between positive and negative, we also need to add a square to the distance from the mean, which is the source of the concept of variance. We usually use variance to describe the difference between a set of data. The smaller the variance, the more concentrated the data, the larger the data and the more dispersed the data. At the same time, it is also used in finance to evaluate risks such as volatility of stock prices. Of course, we hope that the more stable the stock price fluctuation, the smaller the variance and the more stable the income.
Because the mean and variance describe some characteristics of random variables and their distribution, they are called characteristic numbers.
1.4.5 density function of continuous random variables;
The value of continuous random variables may fill a certain interval, so the probability distribution of continuous random variables can no longer be expressed by the lines of distribution list, but by other tools, that is, probability density function.
The origin of probability density function: For example, when a factory measures the length of a workpiece, we stack the measured parts according to the length. The horizontal axis is the unit length of the component, and the vertical axis is the frequency of the unit length of the component. When there are a large number of originals, a certain pattern will be formed. In order to stabilize this graph, we modify the vertical axis to the frequency per unit length. With the increase of the number of components, the frequency will gradually stabilize in probability. When the unit length is smaller, the original number is more and the graph is more stable. When the unit length tends to zero, the graph presents a smooth curve. At this time, the ordinate changes from "probability per unit length" to "probability density at a point". The function of the smooth curve formed at this time is called probability density function, which shows a statistical law that X is more likely to take value in some places and less likely to take value in some places. The probability density function is
Although the probability density function is not density, the approximate value of inter-cell probability can be obtained by multiplying it by a small infinitesimal, that is
The probability on the interval can be obtained by the accumulation of differential elements, which is nothing more than the integral on the interval =.
The distribution function of x can be obtained from this. For continuous random variables, the integral of the density function is a distribution function and the derivative of the distribution function is a density function.
Basic properties of density function:
1.4.6 expectation and variance of continuous random variables;
Let the density function of random variable x be.
Mathematical expectation:
Difference:
1.4.7 Chebyshev inequality (Chebyshev,1821-1894);
Let the mathematical expectation and variance of random variable x exist, and for any constant, there are:
.
The reason for this formula is that people think that the probability of event {0} should be related to variance, which is understandable. The greater the variance, the greater the deviation of the value of x, that is to say, the greater the deviation value is than a constant a, the greater the probability that the value is greater than a certain value. The above formula shows that the upper bound of the probability of large deviation is related to variance, and the greater the variance, the greater the upper bound.
1.4.8 Common discrete distribution:
1.4.9 Common continuous distribution: