s**s 发帖数: 1 | 1 There is no best choice. Which one is better depends on your problem. MLE is a
consitent estimate, provided that your model is correct. Consitency means that
MLE will converge to real parameter in probability as sample size goes to
infinity. What is more, MLE is asymptotically efficient, which means as sample
size goes to infinity, the variance of MLE will achieve Cramer-Rao lower
bound. In this sense, no other estimators can beat MLE. However, if sample
size is small, other estimators can be be | F******n 发帖数: 160 | 2
You are right, MLE is not Bayesian. MLE is biased and "the bias --> 0, as n --> infinity" (asymptotically unbiased). However, I don't think my question is focused on any concern with "whether it is Bayesian or not", so I would put this point aside later. To clarify this, I can give a bit more words on this. The general question, as posted below, is:
Probability(P|D,M) = ?
according to the Bayesian theorem, it can be expressed as:
Probability(P|D,M)
= Probability(P, M) * Probabil | F******n 发帖数: 160 | 3 Thanks -- a lot of good points. However, I do wonder this:
you said, "There is no best choice. Which one is better depends on your
problem." Actually in many engineering and scientific problems, people
performed estimation on parameter(s) "P" based on maximizing
"Probability(P|D, M)". For example, in wireless detection and estimation,
digital signals are determined based on maximizing "Probability(P|D, M)".
Did people do compare different estimators when they first
developed these theories/anal
【在 s**s 的大作中提到】 : There is no best choice. Which one is better depends on your problem. MLE is a : consitent estimate, provided that your model is correct. Consitency means that : MLE will converge to real parameter in probability as sample size goes to : infinity. What is more, MLE is asymptotically efficient, which means as sample : size goes to infinity, the variance of MLE will achieve Cramer-Rao lower : bound. In this sense, no other estimators can beat MLE. However, if sample : size is small, other estimators can be be
| g*********n 发帖数: 43 | 4 My opinion is that using Bayesian will give us the min error probability when
P=0/1.
Let's consider the simple example in communications.
You send a bit s, and after going through the noise channel, it is received as
r. Then you use a function f(r) to estimate s.
Now the task is to find a optimum function f.
If the criteria is min error probability, which corresponds to min Bit error
rate and is often the case in digital communications. Then
Pe = Int( prob(r)*Prob( f(r)<>s |r) dr)
Prob( f(r)<>s | n*******d 发帖数: 4 | 5 Intuitively, I'd say if you use the maximum probability, you get "the most
probable" parameter or model. If you use the expection value, you get a model
that infinity times estimations will converge to. As the purpose of the
estimation is just for one time use, MLE or MAP are the right choice.
无偏
【在 F******n 的大作中提到】 : Thanks -- a lot of good points. However, I do wonder this: : you said, "There is no best choice. Which one is better depends on your : problem." Actually in many engineering and scientific problems, people : performed estimation on parameter(s) "P" based on maximizing : "Probability(P|D, M)". For example, in wireless detection and estimation, : digital signals are determined based on maximizing "Probability(P|D, M)". : Did people do compare different estimators when they first : developed these theories/anal
| F******n 发帖数: 160 | 6 Your example is interesting and there are a couple of interesting points. The
bottemline of your arguement is pretty clear, however, there is one point
about that optimum function f() is a bit confusing, or maybe I didn't get it
right.
First, please allow me to try to formalize your example:
1. component analysis:
* observation data or signals: "r", belonging to space "R"
* parameter to be estimated: "s", belonging to space "S"
"S" is simp | F******n 发帖数: 160 | 7 Even though "the purpose of the estimation is just for one time use", in real
application, people are still concerned about the long-time statistical
behavior rather than the one-point behavior. For example, in digital comm., it
is true that 0 or 1 is estimated just for each digit, but it is more concerned
about BER (bit error rate) in long run rather than some sepcific point run. My
point is: MLE is also concerned about statistical aspect rather than a single
event.
However, your comments raise
【在 n*******d 的大作中提到】 : Intuitively, I'd say if you use the maximum probability, you get "the most : probable" parameter or model. If you use the expection value, you get a model : that infinity times estimations will converge to. As the purpose of the : estimation is just for one time use, MLE or MAP are the right choice. : : 无偏
|
|