

Probability Distributions play an important role in our daily lives. We commonly use them when trying to summarise and gain insights from different forms of data.

概率分布在我们的日常生活中起着重要作用。 在尝试总结不同形式的数据并从中获取见解时,我们通常使用它们。

Because of this, they're quite an important topic in fields such as Mathematics, Computer Science, Statistics, and Data Science.


There are two main types of data: Numerical (for example integers and floats), and Categorical (for example strings of text).

数据有两种主要类型: 数值 (例如整数和浮点数)和分类 (例如文本字符串)。

Numerical data can also be in either of two forms:


  • Discrete: this form of data can just take a limited number of values (like the number of clothes we have). We can infer probability mass functions from discrete data.

    离散的:这种形式的数据只能接受有限数量的值(例如我们拥有的衣服数量)。 我们可以从离散数据推断概率质量函数。

  • Continuous: on the other hand, continuous data is used to describe more abstract concepts such as weight/distance which can take any fractional or real value. From continuous data we can instead infer probability density functions.

    连续的:另一方面,连续的数据用于描述更抽象的概念,例如权重/距离,它可以取任何分数或实数值。 我们可以从连续数据中推断出概率密度函数。

Probability mass functions can give us the probability that a variable is equal to a certain value. On the other hand, the values of probability density functions do not represent probabilities on their own, but instead first need to be integrated (within the considered range).

概率质量函数可以为我们提供变量等于某个值的概率。 另一方面,概率密度函数的值本身并不表示概率,而是首先需要积分(在所考虑的范围内)。

什么是泊松分布? (What is a Poisson Distribution?)

Poisson Distributions are commonly used for two main purposes:


  • Predicting how many times an event will take place within a chosen time period. This technique can be used for different risk analysis applications such as house insurance price estimation.

    预测事件在选定时间段内将发生多少次。 该技术可用于不同的风险分析应用,例如房屋保险价格估计。
  • Estimating a probability that an event might occur given how often it happened in the past (for example how likely it is that there will be a power-cut in the next two months).


Poisson Distributions let us be confident of the average time between the occurrence of different events. They can't, however, tell us the precise moment an event might take place (since processes usually have stochastic behaviour).

泊松分布使我们对不同事件发生之间的平均时间充满信心。 但是,他们无法告诉我们事件可能发生的确切时间(因为流程通常具有随机行为)。

线性与非线性系统 (Linear vs non-linear systems)

Natural systems can, in fact, be divided into two main categories: linear and non-linear (stochastic).

实际上,自然系统可以分为两大类: 线性非线性(随机)

In linear systems, causes always precede their effect which creates a strong time precedence effect.


But this doesn't instead hold true when talking about non-linear systems, as small changes in the system's initial conditions can lead to unpredictable outcomes.


Considering how complex and chaotic our real world is, most processes are better described using non-linear systems, although linear approximations are sometimes possible.


Poisson Distributions can be modeled using the expression in the figure below, where λ is used to represent the expected number of events which can take place in the considered time-span.

可以使用下图中的表达式对泊松分布建模,其中λ用于 表示在考虑的时间跨度内可能发生的预期事件数。

The main characteristics which describe Poisson Processes are:


  1. Two events can't take place simultaneously.

  2. The average rate between event occurrence is overall constant.

  3. Events are independent of each other (if one happens, this does not have any influence on the probability that another event might take place).

  4. Events can take place any number of times (within the considered time-span).


泊松分布的一个例子 (An example of a Poisson Distribution)

In the figure below, you can see how varying the expected number of events (λ) which can take place in a period can change a Poisson Distribution. The image below has been simulated, making use of this Python code:

在下图中,您可以看到改变一个时期内可能发生的事件数(λ)如何改变泊松分布。 下面的图像已使用此Python代码进行了模拟:

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# n = number of events, lambd = expected number of events 
# which can take place in a period
for lambd in range(2, 12, 2):
    n = np.arange(0, 9)
    poisson = stats.poisson.pmf(n, lambd)
    plt.plot(n, poisson, '-o', label="λ = {:f}".format(lambd))
    plt.xlabel('Number of Events', fontsize=12)
    plt.ylabel('Probability', fontsize=12)
    plt.title("Poisson Distribution varying λ")

Taking a closer look to this simulation, we can discover the following patterns:


  • In each of the different cases, the number assigned to λ corresponds to the peak of the distribution, which then trails off moving further away from the peak.

  • The more events that are expected to take place during the simulation, the greater the expected area under the distribution curve will be.


This type of simulation could, for example, be used to try to reduce the queuing time when going shopping to a supermarket.


The owner could create a record of how many customers visit the store at different times and on different days of the week in order to then fit this data to a Poisson Distribution.


In this way, it would be much easier to determine how many cashiers should be working at different times of the day/week in order to enhance the customer experience.


结语 (Wrapping up)

In case you are interested in learning more about the applications of distributions in stochastic settings, more information is available here.


I hope you enjoyed this article, thank you for reading!


翻译自: https://www.freecodecamp.org/news/poisson-distribution-a-formula-to-calculate-probability-distribution/


