Why Schrödinger's equation?
I recently overheard someone ask this about Schrödinger's equation. The answer they received was, for me, unsatisfying. "Because it agrees with experiment." Of course, that answers perfectly why the equation was adopted by future generations of physicists and indeed the calculation of the spectrum of atomic hydrogen from the energy eigenvalues of the Schrödinger operator is one of the most convincing and wholesome computations a young physicist can do. But the question that was left unanswered, the question I believe was being asked, was: "Why did Schrödinger write this equation down? Why not something else?" I don't believe for a second that Schrödinger sat down with an array of different equations and worked out what each of them predicted about hydrogen before he found the one that fit...
I turned to one of Schrödinger's (eminently readable) original papers on the subject:
- E.Schrödinger, An Undulatory Theory of the Mechanics of Atoms and
Molecules, Physical Review (1926) Vol. 28, No. 6 pp. 1049-1070
Hamilton later applied his formalism to describe classical mechanics. Schrödinger and de Broglie thought that one might reasonably expect there to be a wave theory underlying classical mechanics and reducing to it in the short wavelength limit. The difficulty was how to guess a wave equation that would give the right short wavelength limit.
Schrödinger took as his starting point the Hamilton-Jacobi equation, so let's review this. I'll assume you're happy with the usual Hamiltonian/Lagrangian formulation of classical mechanics.
Hamilton-Jacobi theory
For any pair of points \(A,\ B\in\mathbf{R}^n\) consider the space of paths \(\gamma\colon [0,T]\to\mathbf{R}^n\) between \(A\) and \(B\). Let \(L\) be a Lagrangian and \[I(\gamma)=\int_0^TL(t,\gamma(t),\dot{\gamma}(t))dt\] be the action of that path. A classical path of time \(T\) joining \(A\) and \(B\) is a solution to the corresponding Euler-Lagrange equation. Let's suppose we're in an ideal situation: for any \(T\) and any pair of points \(A\), \(B\) there is a unique classical path \(\gamma_{A,B,T}\) of time \(T\) joining \(A\) and \(B\).
Fix \(A\), but allow \(B\) and \(T\) to vary. Define the function \[W(B,T)=I(\gamma_{A,B,T}).\]
The Hamilton-Jacobi equation is a PDE satisfied by this function. Let's first compute the derivatives of \(W\) with respect to \(B\). Replace \(B\) by \(B+b\) and suppose that \(\gamma_{A,B+b,T}(t)=\gamma_{A,B,T}(t)+\eta(t)\). Then, for small \(b\), writing \(\gamma_{A,B,T}(t)=(x_1(t),\ldots,x_n(t))\), we have \[I(\gamma_{A,B+b,T})=I(\gamma_{A,B,T})+\int_0^T\left(\frac{\partial L}{\partial x_i}-\frac{d}{dt}\frac{\partial L}{\partial\dot{x}_i}\right)dt+\left[\frac{\partial L}{\partial \dot{x}_i}\eta_i(t)\right]_0^T+\cdots\] by the usual Euler-Lagrange argument for computing the first variation of \(I\). Since \(\gamma_{A,B,T}\) is the classical path, the first term vanishes. Since \(\eta_i(0)=0\) (the point \(A\) is fixed) the only remaining term is \(\frac{\partial L}{\partial \dot{x}_i}(T)\eta_i(T)\). Since \(b_i=\eta_i(T)\), this means that the first variation of \(W\) is \[\frac{\partial W}{\partial B_i}=\frac{\partial L}{\partial\dot{x}_i}(T)\] By Hamilton's equations, \(\frac{\partial L}{\partial\dot{x}_i}=p_i\) so this says that \(\partial W/\partial B_i\) is the ith component of momentum at the endpoint of the path.
Now, by the fundamental theorem of calculus: \[\frac{dW}{dt}=L(T,B_i,\dot{B}_i)\] but by the chain rule \[\frac{dW}{dT}=\frac{\partial W}{\partial T}+\sum_i\frac{\partial W}{\partial B_i}\dot{B}_i\] This gives \[\frac{\partial W}{\partial T}=L-\sum_ip_i\dot{B}_i\] where \(p_i\) is the momentum at the endpoint. Since the Hamiltonian \(H\) and Lagrangian \(L\) are related by a Legendre transform, we have \[L(T,B_i,\dot{B}_i)-\sum_ip_i\dot{B}_i=-H(T,B_i,p_i)=-H\left(T,B_i,\frac{\partial W}{\partial B_i}\right)\] so we see that \(W\) satisfies the Hamilton-Jacobi equation \[\frac{\partial W}{\partial T}=-H(T,B,\nabla W).\]
Autonomous case
If we assume that the Hamiltonian is of the form \[\frac{1}{2m}(p_1^2+p_2^2+p_3^2)+V(x,y,z)\] (in particular time-independent) then we know by energy conservation that \[\partial^2W/\partial t^2=\partial H/\partial t=0\] so \[W(t,x,y,z)=-Et+S(x,y,z)\] for some constant \(E\), and the Hamilton-Jacobi equation reduces to \[|\nabla S|=\sqrt{2m(E-V(x,y,z))}.\] We also have the momentum \(p=\nabla S\) from our earlier computation so the speed \(\sqrt{\dot{x}^2+\dot{y}^2+\dot{z}^2}\) is \(|p|/m=|\nabla S|/m=\sqrt{\frac{2(E-V(x,y,z))}{m}}\).
Schrödinger's idea
In going from geometric optics to wave optics you imagine little sine waves travelling along your rays and you imagine that the phase of the sine wave changes linearly with the optical path length. Via the optical/mechanical analogy (the direct correspondence between the Hamiltonian formalism of optics and of mechanics) one translates optical path length into the action \(W\), so Schrödinger's guess was to replace classical trajectories by sine waves whose phase is proportional to the function \(W\). In other words, he postulates a wavefunction \[\psi(x,y,z,t)=A(x,y,z)\sin(W(x,y,z,t)/K)=A(x,y,z)\sin(-Et/K+S(x,y,z)/K)\] for some constant \(K\). This constant has units of action so that the argument of sine is dimensionless.
The frequency of this sine wave is \(E/2\pi K\) so, comparing with the empirical relationship coming from the Einstein/Planck analyses of the photoelectric effect/black body radiation formula, Schrödinger guessed \[K=h/2\pi=\hbar.\]
I want to quickly recall the notion of phase velocity. This is different to the velocity of our classical particles – rather it is the speed of a crest of the underlying wave. The crest is a surface of constant phase \(W=\pi/2\), that is at each instant \(t\) the crest is a level surface of the function \(S\), that is \(S=\pi/2+Et\). Let \(u\) be the vector field \(E\nabla S/|\nabla S|^2\) and let \((x(t),y(t),z(t))\) be an integral curve of \(u\) starting at time 0 on the crest. Then \[\frac{d}{dt}W(x(t),y(t),z(t),t)=-E+\nabla S\cdot u=0\] so the integral curve keeps up with the crest. We think of \(u\) as the phase velocity vector, so the phase speed is \[E/|\nabla S|=\frac{E}{\sqrt{2m(E-V)}}\] Crucially, the phase speed depends on \(E\) which depends on the frequency, so the wave equation which underlies Schrödinger's matter waves must be dispersive (which means exactly that the phase speed depends on the frequency – in other words, different frequencies will disperse because they travel with different speeds). Schrödinger next made the simplest guess as to what the equation should be governing waves of a fixed frequency, namely he guessed the usual wave equation \[\Delta\psi=\frac{1}{u^2}\frac{\partial^2\psi}{\partial t^2}\] for waves whose time dependence is through a factor \(e^{2\pi iEt/h}\). The usual (light) wave equation replaces \(u\) by the constant \(c\). Here \(u\) is given by the dispersion relation \[u=\frac{E}{\sqrt{2m(E-V)}}.\] Since \(\psi\) has time dependence \(e^{2\pi iEt/h}\) this means that \[\partial^2\psi/\partial t^2=-\frac{E^2}{\hbar^2}\psi\] Now subsituting this and the dispersion relation into the wave equation gives \[\Delta\psi=\frac{2m(E-V)}{E^2}\frac{E^2}{\hbar^2}\psi\] or the more familiar \[-\frac{\hbar^2}{2m}\Delta\psi+V\psi=E\psi\] which is Schrödinger's equation.
For me, this route to Schrödinger's equation seems extremely natural when compared to Dirac's magic with Poisson brackets.