Necessity of the `generalized formalism of quantum mechanics'
• Necessity of the `generalized formalism of quantum mechanics' is directly seen from the quantum mechanical description of the interaction of object and measuring instrument. Let ρ be the `initial density operator of the microscopic object', and ρa that of the measuring instrument. If H is the interaction Hamiltonian, and T the duration of the measurement interaction, then the final state is given by (compare)
ρf(T) = e−iHTρρa eiHT.
Let {Eam} be the `pointer observable of the measuring instrument' (in the simplistic discussion this is the projection-valued measure {|θm> <θm|}). Then, applying the Born rule to the measuring instrument, the probability distribution of the observed pointer positions is given as
pm = Troaρf(T)Eam,
where the trace is performed over both Hilbert spaces of object (o) and measuring instrument (a). By using a well-known property of the trace operation (viz. cyclic permutation of the operators) it is now possible to interpret this expression as a property of the initial state of the microscopic object:
pm = Troρ Mm,
with
Mm = Traρa eiHT Eame−iHT.
The operators Mm satisfy the conditions of a positive operator-valued measure.
• It should be noted that, even if the pointer observable {Eam} is a PVM, then there is no reason to assume that the set of operators {Mm} is a PVM too. From the point of view of `quantum measurement theory' PVMs do not have any particular position within the set of `all possible generalized observables'.
Also from a practical point of view PVMs do not present themselves in a particularly conspicuous way: hardly any quantum mechanical measurement that is performed in actual practice is represented by a PVM (see illustrations). On closer inspection most measurement procedures turn out to be represented by POVMs. The special position often attributed to PVMs must have a different origin. Presumably the main cause of this is the idea that Hermitian operators describe objective properties of the microscopic object, whereas generalized observables are (co-)determined by the measurement arrangement (they generally depend on its parameters). Unfortunately, this idea, although attractive, is tantamount to the possessed values principle, and is untenable.
• From the derivation of the POVM {Mm}, given above, it can be seen that it is a representation of the relative frequencies of `pointer positions of the measuring instrument'. This is consistent with an empiricist interpretation, in which an observable is not a `property of the microscopic object', but a `label of a measuring instrument'. For this reason it does not seem to be plausible to interpret POVMs in a realist sense. It is an open question whether such is possible for PVMs (in view of the impossibility of an objectivistic-realist interpretation it would at least have to be a contextualistic-realist interpretation).
• Some illustrations of the necessity of the generalized formalism
• Inefficient photon detection
A realistic photon counter has a quantum efficiency η ≤ 1. As a result such a photon counter does not measure the number observable N (having the numbers n= 0,1,... as eigenvalues, and the number states |n> as eigenvectors). The probability that such a counter detects m photons is given by
Mm = ∑ n=m,m+1,..(n!/(m!(n−m)!)ηm (1 − η)(n−m)Nn,
{Nn} being the spectral resolution of the number observable N. It is evident that Mm is not a projection operator. If η =1 then Mn = Nn. Hence, the standard formalism only applies to (unrealistic) ideal photon counters.
• The double-slit experiment
figure 4
The example of the inefficient photoncounter could be disregarded as being inessential, since it is not unreasonable that a theory be tested using only ideal measurement procedures. Such disregarding is impossible with respect to the double-slit experiment, since that experiment is at the heart of quantum mechanics, and yet needs a POVM for its description. This implies that this `paradigm of quantum mechanics', often used to clarify the meaning of the standard formalism, is not even within the domain of application of the standard theory. By hindsight it is understandable that this has caused quite a bit of confusion.
A simple way to derive a POVM (depending on the precise way the measurement is carried out there exist different ones) is the following:
Let ψ1 and ψ2 be two orthogonal wave packets originating from slits 1 and 2 in screen S, respectively. The general state is the superposition
ψ = αψ1 + βψ2, |α|2 + |β|2 = 1,
bringing about an interference pattern on screen B. As a pointer scale we can take the vertical position at which a particle hits screen B. Then, restricting ourselves to a two-dimensional description, the relative frequencies of the pointer are given by
p(z) = ∫dx |ψ(x,z)|2 = |α|2p11(z) + α*βp12(z) + αβ*p21(z) + |β|2p22(z),
with
pij(z) = ∫dx ψi(x,z)*ψj(x,z), i,j = 1,2.
The POVM is defined by putting
p(z) = <ψ|M(z)|ψ>,
yielding M(z) as a 2x2 matrix with components M(z)ij = pji(z). It is straightforward to prove that M(z)2M(z). Actually, we have
M(z) = ∫dx (|ψ1(x,z)|2 + |ψ2(x,z)|2) P(x,z),
in which P(x,z) is an idempotent 2x2 matrix for each value of x and z.
• Polarization measurements
One way to measure photon polarization is to send the photon through an analyzing filter (e.g. a nicol), oriented at angle θ in a plane perpendicular to the direction of propagation, and to detect whether or not the photon has passed the analyzer. If the photon detector has quantum efficiency η, then the probability of detecting the photon behind the analyzer is p+ = η <ψ|E+|ψ>, in which |ψ> is the polarization state vector of the incoming photon. Hence, this experiment is represented by the POVM {M+,M} = {ηE+, I − ηE+}, where {E+,E} (with E = IE+) is the PVM of the standard polarization observable. Once again it is seen that the standard formalism only applies to unrealistically ideal detectors. Note that the POVM {M+,M} of the nonideal detector is related to the PVM of the ideal one according to
Mm = ∑nλmnEn, m,n=+/−,
with
λ++ = η, λ+− = 0, λ−+ = 1 − η, λ−− = 1.
• Joint nonideal measurement of incompatible polarization observables
figure 5
In figure 5 a photon impinges on a partly transparent mirror (transmission coefficient γ) sending it either to a polarizer in direction θ (if it is transmitted by the mirror) or θ′ (if it is reflected by the mirror). Angles θ and θ′ are chosen such that the corresponding standard polarization observables are incompatible. Now there are three possibilities: the photon is detected either in detector D or in D′, or it is not detected at all. Here the quantum efficiency is taken to be 1. The quantum mechanical probabilities are found according to, respectively,
p+ = γ <ψ|E+|ψ>, p′+ = (1 − γ) <ψ|E+|ψ>, p = 1− p+ − p′+,
yielding {γE+,(1 − γ)E+, I − γE+ − (1 − γ)E+} as POVM.

It is important to note here the difference of this experiment with an experiment in which either the standard observable {E+,E} or {E+,E} is measured, which can be achieved by taking either γ = 1 or γ = 0. The present experiment is fundamentally different from these special cases which both correspond to a `standard observable'. It should not be overlooked that in these latter experiments the wave packet is completely sent either one way or the other, whereas in the present experiment the wave packet is split so as to send coherent wave information towards both detectors. This difference makes it possible for general values of γ to interpret the experiment as a joint nonideal measurement of the incompatible polarization observables {E+,E} and {E+,E} rather than as an experiment in which either {E+,E} or {E+,E} is measured.

In order to interpret the experiment as a joint measurement of two observables we need a bivariate POVM. This can be realized by considering the joint probability distribution of the two detectors D and D' rather than the probabilities of the single ones. Since it does not occur that both detectors are triggered by the same photon, we straightforwardly find:

p++ = 0, p+− = p+, p−+ = p′+, p−− = 1 − p+ − p′+,
from which the bivariate POVM follows as {Rmn, m,n=+/−}, with
R++ = O, R+− = γE+, R−+ = (1 − γ)E+, R−− = 1 − γE+ − (1 − γ)E+.
It is interesting to consider the marginals {∑nRmn} and {∑mRmn} of this POVM, which are POVMs, too:  {∑nRmn} = {γE+, I − γE+}, {∑mRmn} = {(1 − γ)E′+, I − (1 − γ)E′+}.
It turns out that the marginal POVMs can be interpreted as describing nonideal versions of the measurements of the standard observables {E+,E} and {E+,E}, respectively. Note that in the limits γ = 1 and γ = 0 we obtain for the marginal POVMs:  γ = 1: {∑nRmn} = {E+, E−}, {∑mRmn} = {O, I}, γ = 0: {∑nRmn} = {O, I}, {∑mRmn} = {E′+, E′−}.
Hence, as is to be expected because in these limits all photons go one way or the other, only one standard observable is measured (either {E+, E} or {E+, E}), no information being provided by the other observable ({O, I}). Evidently, the standard formalism is contained in the generalized one as a limiting case. The example demonstrates that, contrary to what is possible in the standard formalism, in the generalized formalism it is possible to obtain information about different incompatible standard observables in one single measurement arrangement. The aspects of complementarity, exhibited by this experiment, are dealt with in the general treatment of the Martens inequality.

• Nonideal measurements
• Generalizing the examples given above, a generalized observable {Mm} is said to represent a nonideal measurement of observable {Nn} if a nonideality matrix (λmn) exists such that
Mm = ∑nλmnNn, λmn≥ 0, ∑m λmn= 1.
figure 6
The quantities λmn are conditional probabilities, relating the `probability distribution {<Mm>} that is actually measured' to the `probability distribution {<Nn>} that would have been obtained if a measurement of {Nn} had been performed instead of {Mm}'. The nonideality matrix (λmn) is a so-called `stochastic matrix'75. It can be seen as a representation of a transmission channel, {Nn} representing the input channels, {Mm} the output channels, and the nonideality matrix (λmn) describing crossing of signals between subchannels (see figure 6).
If there are no crossings, then the stochastic matrix (λmn) equals the unit matrix (δmn). This corresponds to the situation that the POVM {Mm} is representing an ideal measurement of the POVM {Nn}.
• The observable {∑nλmnNn} can be interpreted as representing a nonideal way of measuring the observable {Nn}. Note that the quantities λmn are determined by the experimental measurement procedure; they are independent of the quantum mechanical state of the microscopic object. These quantities determine the accuracy observable {Nn} is measured with.
It can also be seen as an application of Bohr's notion of latitude, observable {Nn} in the `measurement procedure represented by POVM {∑nλmnNn}' being defined with a certain `latitude determined by the nonideality matrix'. The `measure of inaccuracy or nonideality POVM {Nn} is measured with' is determined by the `deviation of the nonideality matrix (λmn) from the unit matrix (δmn)'.
There exist several quantities that can play the role of `latitude/measure of inaccuracy or nonideality' (section 7.8 of Publ. 52). A particularly useful measure is the average row entropy, defined as
J(λ) = −(1/N)∑mm′ λmm′ ln(λmm′/∑m′′λmm′′),
ranging from 0 (if λmm′ = δmm′; i.e. the ideal case) to ln(N) (if λmm′ =1/N; i.e. the most nonideal case, in which the nonideal measurement is completely uninformative).
• Nonideality, in the sense defined here, is not an absolute notion but a relative one: it compares `measurement procedures'. If the measurement of {Nn} would yield undisturbed information about the microscopic object itself, then the nonideality matrix would describe the disturbance by the measurement. However, in an empiricist interpretation this would be questionable even if {Nn} corresponds to a `standard observable', since even the latter are not supposed to describe microscopic reality. Moreover, the relation between {Mm} and {Nn} is possible also if {Nn} is a POVM.
• Other definitions of `nonideal measurement' may be possible, and perhaps useful. The present definition has the advantage that it constitutes a well-defined numerical relation of the measured probability distribution with the one it purports to `measure in a nonideal way'. Moreover, relations of this kind turn out to be quite abundant.
If the nonideality matrix (λmn) is invertible, then the probability distribution {<Nn>} can be calculated from the actually measured one (at least in principle; in practice invertibility is frustrated by the fact that a probability distribution is seldom exactly known over the whole range of its spectrum).

• Joint measurement of generalized observables
• Commeasurability of (generalized) observables
Two observables represented by POVMs {Mm} and {Nn} are `jointly measurable (commeasurable)' if a measurement procedure exists, represented by a bivariate POVM {Rmn}, such that
Mm = ∑nRmn
and
Nn = ∑mRmn.
Note that in general compatibility of the operators of the POVMs is not required for commeasurability (compare the example of the joint nonideal measurement of incompatible polarization observables). However, it can be proven that if {Mm} and {Nn} are PVMs, then, in agreement with the standard formalism, a `bivariate POVM {Rmn} yielding these PVMs as marginals' only exists if the PVMs are compatible. Evidently, the impossibility of jointly measuring incompatible observables is restricted to the domain of application of the standard formalism.
• Joint nonideal measurement of (generalized) observables
A weaker kind of `joint measurability' is defined as follows:
Two observables, represented by POVMs {Mm} and {Nn}, are `jointly nonideally measurable' if a measurement procedure exists, represented by a bivariate POVM {Rmn}, such that its marginals {∑nRmn} and {∑mRmn} represent nonideal measurements of {Mm} and {Nn}, respectively. Thus,
nRmn = ∑m′ λmm′Mm′, λmm′ ≥ 0, ∑m λmm′= 1,
mRmn = ∑n′ μnn′Nn′, μnn′ ≥ 0, ∑n μnn′= 1.
The two generalized observables {∑m′ λmm′Mm′} and {∑n′ μnn′ Nn′} can be seen as being defined in the sense of Bohr's strong correspondence principle either as autonomous (generalized) observables, or as defining the observables {Mm} and {Nn} with a certain latitude determined by the nonideality matrices (λmm′) and (μnn′), respectively.
It is directly seen that the joint nonideal measurement of incompatible polarization observables satisfies this general definition for the special case of `standard observables', `nonideality matrices' being given by
 (λmm′) = (μnn′) =
• figure 7
The two nonideality matrices (λmm′) and (μnn′), given above, exhibit a certain `complementarity', in the sense that if one matrix equals the unit matrix (for either γ = 1 or 0), then the other one corresponds to a particularly bad transmission channel (cf. figure 6), yielding no information at all about the initial probability distribution.
Taking as nonideality measures of these matrices the average row entropies we find  J(λ) = ½[(2 − γ)ln(2 − γ) − (1 − γ)ln(1 − γ)], J(μ) = ½[(1 + γ)ln(1 + γ) − γ ln(γ)].
In figure 7 a parametric plot of these quantities is given (curved line, parameter γ). It is seen that if the measurement is ideal with respect to one PVM, then it is completely uninformative with respect to the other (incompatible) one (and vice versa).
In the following this is demonstrated to be an example of a generic type of measurements, shedding new light on the implementation of the notion of `complementarity' within the mathematical formalism of quantum mechanics.
• Complementarity in a joint nonideal measurement of incompatible standard observables93
• The Martens inequality
Let us now consider an arbitrary `joint nonideal measurement of two standard observables {Em} and {Fn}'. Hence we have a bivariate POVM {Rmn} satisfying
nRmn = ∑m′ λmm′ Em′, λmm′ ≥ 0, ∑m λmm′= 1, Em2 = Em,
mRmn = ∑n′ μnn′Fn′, μnn′ ≥ 0, ∑n μnn′= 1, Fn2 = Fn.
Using the `average row entropies' J(λ) and J(μ) Martens (Publs 25, 26) has derived an inequality for the general case76:
J(λ) + J(μ) ≥ −ln(maxm,n|<am|bn>|2),
|am> and |bn> the eigenvectors of the two incompatible standard observables {Em} and {Fn}, respectively. I shall refer to this inequality (and its generalizations) as the Martens inequality.
• Proof of the `Martens inequality'
Since the derivation of the Martens inequality is not easily accessible in the literature, it is presented here:
From the equalities ∑nRmn = ∑m′λmm′Em′ and Em |am′>= δmm′|am′> it follows that
λmm′ = <am′| ∑nRmn |am′>.
Making use of the von Neumann entropy the average row entropy J(λ) can be expressed according to
J(λ) = (1/N)∑m (Tr ∑n Rmn) H{Em}(∑n Rmn/ Tr ∑n Rmn).
In an analogous way we have
J(μ) = (1/N)∑n (Tr ∑m Rmn) H{Fn}(∑m Rmn/Tr ∑m Rmn).
It is now important to note that the `arguments of the von Neumann entropies', although not being related to `preparation' but to `measurement', both have the mathematical properties of density operators. This enables to apply the following well-know property of von Neumann entropy:
H{Em}(ρ) ≥ ∑k rk H{Em}(ρk), ρ = ∑k rk ρk, rk ≥ 0, ∑k rk = 1,
valid if the ensemble described by ρ consists of subensembles described by the density operators ρk.
Application of this inequality with
ρ = ∑k (Rmk/Tr ∑k′ Rmk′), rk = Tr Rmk/Tr ∑k′ Rmk′, ρk = Rmk/Tr Rmk
yields
H{Em}(∑kRmk/ Tr ∑k′ Rmk′) ≥  ∑k (Tr Rmk/ Tr ∑k′ Rmk′) H{Em}(Rmk/ Tr Rmk).
From this it directly follows that
J(λ) ≥ (1/N)∑mn (Tr Rmn) H{Em}(Rmn/ Tr Rmn).
In a completely analogous way we find
J(μ) ≥ (1/N)∑mn (Tr Rmn) H{Fm}(Rmn/ Tr Rmn).
It then directly follows that
J(λ) + J(μ) ≥  (1/N)∑mn (Tr Rmn) {H{Em}(Rmn/Tr Rmn) + H{Fm}(Rmn/Tr Rmn)}.
Finally we use an inequality derived by Maassen and Uffink80, viz.
H{Em}(ρ) + H{Fm}(ρ) ≥ −ln (maxmn Tr EmFn),
valid for an arbitrary density operator ρ, once again realizing that the operators Rmn/Tr Rmn are mathematically equivalent to density operators. For the nondegenerate, N-dimensional case we have restricted ourselves to94, the Martens inequality then follows from the condition that for a bivariate POVM we have ∑mnRmn = I, and Tr I = N.
• `Martens inequality' versus `entropic uncertainty relation'
The Martens inequality, with J(λ) the average row entropy of the nonideality matrix (λmm′), should be distinguished from the entropic uncertainty relation looking deceivingly similar. Yet, they are completely different, the entropic quantity H{Em}(ρ) having a meaning that is very different from J(λ), even though referring to the same standard observable {Em}.
It is important to note that the quantities involved in the `Martens inequality' do not depend on the `initial density operator ρ of the microscopic object', but only on `parameters of a measurement procedure'. The `Martens inequality' unambiguously is not a property of the state (density operator) but it is a `property of a generalized observable (representing a joint nonideal measurement procedure of incompatible standard observables)'. It is a property of the `measurement procedure' only. It has the meaning of an inaccuracy relation satisfied by the `inaccuracies of nonideal measurements of incompatible standard observables that are performed jointly'.
• `Martens inequality' versus `Heisenberg-Kennard-Robertson inequality'
Note also the similarity of the meanings of the entropic uncertainty relation and the Heisenberg-Kennard-Robertson inequality, the standard deviations of the latter inequality too depending only on the `initial state ρ'. Both inequalities can be tested in `separate ideal measurements' of the two PVMs, and, therefore, they are `not properties of a joint measurement' (more accurately: they are `properties of two different joint measurements', each of which usually being interpreted as a `measurement of a single standard observable', compare). This draws a sharp line distinguishing, on the one hand, the `Martens inequality' from, on the other hand, the `Heisenberg-Kennard-Robertson inequality' and the `entropic uncertainty relation', thus clarifying a basic confusion in the notion of complementarity (cf. Publ. 48).
• The `Martens inequality' and `complementarity'
This distinction enables to strengthen Ballentine's criticism of the `Heisenberg-Kennard-Robertson inequality' by distinguishing two different notions of `complementarity', viz. `complementarity of preparation' and 'complementarity of measurement'. As stressed by Ballentine, the `Heisenberg-Kennard-Robertson inequality' has unjustifiedly been interpreted as a property of a `joint nonideal measurement of incompatible standard observables': it, on the contrary, is a `property of a (preparation of a) state of the microscopic object'.
Actually the Martens inequality is quantifying `complementarity in a joint nonideal measurement of incompatible standard observables' in the way exemplified by figure 7. The essential feature, generally valid for any `joint nonideal measurement of incompatible standard observables', is that there is a forbidden region (shaded area), defined by the `Martens inequality', and signifying that the nonideality measures J(λ) and J(μ) cannot both approach zero for any value of the `parameters specifying the measurement arrangement'. The Martens inequality is a perfect representation of the idea of `complementarity' as embodied in the idea of `mutual exclusiveness of measurement arrangements of incompatible standard observables' implied by Heisenberg's disturbance theory of measurement, disturbance of the measurement results of one standard observable ({Em}) being induced by changing the measurement arrangement so as to also obtain information on an incompatible standard observable ({Fn}), and vice versa.
On the other hand, inequalities like the `Heisenberg-Kennard-Robertson inequality' and the `entropic uncertainty relation' are best considered as restrictions on our ability to prepare `quantum mechanical states having arbitrarily small dispersions'.
• Remarks on `Martens inequality and complementarity'
• Note that here `Heisenberg disturbance' is not taken in the `preparative (predictive) sense as conceived by Heisenberg himself' (i.e. referring to the future), but in the `determinative (retrodictive) sense of the empiricist interpretation' (i.e. referring to the initial state). As we can learn from the general theory of quantum measurement, a `nonideality measure' has both an ontological and an epistemological meaning (compare): it reflects how a `pointer position of a measuring instrument' is (ontologically) changed when changing the measurement arrangement (for instance, by introducing a semi-transparent mirror having transmissivity γ), at the same time (epistemologically) changing the `information obtained on the initial state of the microscopic object'.
• That the `Heisenberg-Kennard-Robertson inequality' has been widely (though unjustifiedly) accepted as a property of a `joint measurement of incompatible standard observables' can be understood as a consequence of a number of circumstances:
• i) in 1925 quantum mechanics did not come into being equipped with a full-blown generalized formalism, nor even with the `well-developed standard formalism nowadays presented in textbooks', but it has grown from discussions on the `restricted set of measurements carried out experimentally at that time' as well as on `thought experiments', in which different notions of correspondence have played important parts, and in which classical reasonings (in terms of both position and momentum) were applied (be it that allowance was made for deviation from classical mechanics into the direction of a certain latitude or uncertainty); as a result not all conclusions then obtained are still generally valid today;
• ii) the inequalities of the `uncertainty principle', derived in rather informal ways from the `thought experiments', appeared to be represented by the inequality derived by Heisenberg from the mathematical formalism of quantum mechanics, thus causing physicists to jump to the conclusion that the inequalities should be identical;
• iii) Heisenberg did not apply his inequality to the `initial state of the microscopic object' (as is usually done in textbooks), but to its `post-measurement state' (compare), thus implementing the possibility of `measurement disturbance' as well as of `an influence the choice of a particular measurement arrangement may have';
• iv) only as late as 1970 it was realized by Ballentine that this is inconsistent with the way the `Heisenberg-Kennard-Robertson inequality' is usually presented in textbooks, viz. as a `property of the pre-measurement state', not depending on any parameter of the measurement arrangement for a measurement to be performed later;
• v) it was not noticed that there actually are at stake two different kinds of complementarity, viz. `complementarity of state preparation' and 'complementarity of measurement', the `Heisenberg-Kennard-Robertson inequality' representing a `property of the post-measurement state of the microscopic object' rather than a `property of the measurement arrangement';
• vi) the difference between these kinds of complementarity was not obvious because it is not unreasonable that the post-measurement state be co-determined by the parameters of the measurement arrangement; however, before Ballentine's observation it was overlooked that the standard deviations involved in the `Heisenberg-Kennard-Robertson inequality' are not found by means of a `joint nonideal measurement of incompatible observables', but through separate ideal measurements of these observables;
• vii) only after the `Martens inequality' had become available, was it obvious that the `Heisenberg-Kennard-Robertson inequality' is only referring to the two extremes of the parameter range determining the generalized observables really describing a `joint nonideal measurement'; the latter measurements, although already discussed as `thought experiments', could not be formally treated as long as the `generalized mathematical formalism' had not been developed. Standing on the shoulders of many giants we now can do better.