Marginalization in conditional probability involves summing over the possible values of some variables to obtain the probability distribution of a subset of the variables. This is useful when we are interested in the probability distribution of a subset of variables, without considering the others.
Given a joint probability distribution , marginalization can be used to find the marginal probability distribution of by summing over all possible values of :
In the context of conditional probability, marginalization can help us find the marginal conditional probability. For instance, given the joint distribution , the conditional probability can be obtained by marginalizing over :
Alternatively, if you have the conditional probability , you can marginalize over to find :
Consider three variables: , , and . Suppose we have the joint probability distribution , and we want to find the marginal conditional probability .
Joint Distribution:
Marginalize over to find :
However, using conditional probabilities directly, we can marginalize over :
The chain rule of probability, also known as the chain rule for joint distributions, is a method to express the joint probability of a set of random variables in terms of conditional probabilities.
For a set of random variables , the chain rule of probability states that the joint probability can be factored as:
In general form, this can be written as:
This formula allows you to break down the joint probability into a product of conditional probabilities, which can be easier to handle, especially when dealing with complex distributions or large numbers of variables.
For three random variables , the chain rule can be written as:
Each term in this product represents a conditional probability, showing the dependency of each variable on the preceding ones.
In Bayesian networks, the chain rule is often used to represent the joint probability distribution over the network's variables by considering the network's structure (i.e., the dependencies among the variables).
Factor conditioning refers to the process of conditioning a probability distribution on a subset of its variables, resulting in a new distribution over the remaining variables.
Let's consider a simple example with three random variables: , , and . Suppose we have the joint probability distribution .
Original Joint Distribution:
The joint distribution can be factored into conditional probabilities:
Conditioning on a Variable: Suppose we want to condition on . This means we are interested in the distribution of and given .
Conditional Distribution: The new distribution after conditioning on is:
This can be computed using the original factors:
Using the factorization of the joint distribution, we get:
Simplifying, we get: