Monthly Archives: January 2017

A peek into the world of tensors


When I was in high school, two cool things that I learned from physics were “vectors” and “calculus”. I was (and am still) awestruck by following statement

Uniform circular motion is an accelerated motion.

At college, during my first (and second last) physics course, I was taught “vector calculus” and I didn’t enjoy it. Last year I learned “linear algebra”, that is the study of vector spaces, matrices, linear transformations… Also, a few months ago I wrote about my understanding of Algebra. In it I briefly mentioned that “…study of symmetry of equations, geometric objects, etc. became one of the central topics of interest…” and this lead to what we call “Abstract Algebra” of which linear algebra is a part. Following video by 3blue1brown explains how our understanding of vectors from physics can be used to develop the subject of linear algebra.

But, one should ask : “Why do we care to classify physical quantities as scalars and vectors?”. The answer to this question lies in the quest of physics to find the invariants, in terms of which we can state the Laws of nature. In general, the idea of finding invariants is a useful problem solving strategy in mathematics (the language of physics). For example, consider following problem from the book “Problem-Solving Strategies” by Arthur Engel:

Suppose the positive integer n is  odd. The numbers 1,2,…, 2n are written on the blackboard. Then one can pick any two numbers a and b, erase them and write instead |a-b|. Prove that an odd number will remain at the end.

To prove this statement, one will have to use the concept of “parity” of the sum 1+2+3+…+2n as the invariant. And as stated in the video above, vectors are invariant under transformations of coordinate systems (the components change, but length and direction of arrow remain unchanged). For example, consider the rotation of 2D axis by angle θ, keeping the origin fixed.


By Guy vandegrift (Own work) [CC BY-SA 3.0], via Wikimedia Commons

\displaystyle{x' =  x \cos \theta + y \sin \theta; \quad y'= -x \sin \theta + y \cos \theta}

Now, we can rewrite this by using x_1 and x_2 instead of x and y; and putting different subscripts on the single letter a instead of functions of \theta:

\displaystyle{x_1' =  a_{11} x_1 + a_{12}x_2; \quad x_2'=  a_{21}x_1 + a_{22}x_2}

Now we differentiate this system of equations to get:

\displaystyle{dx_1'= \frac{\partial x_1'}{\partial x_1} dx_1 + \frac{\partial x_1'}{\partial x_2} dx_2 ; \quad dx_2' = \frac{\partial x_2'}{\partial x_1} dx_1  + \frac{\partial x_2'}{\partial x_2} dx_2 }

where a_{ij} = \frac{\partial x_i'}{\partial x_j} . We can rewrite this system in condensed form as:

\displaystyle{dx_{\mu}' = \sum_{\sigma} \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma}}

for \mu =1,2 and \sigma =1,2. We can further abbreviate it by omitting the summation symbol \sum_{\sigma} with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

\displaystyle{\boxed{dx_{\mu}' = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma} }}

for \mu =1,2 and \sigma =1,2. This equation represents ANY transformation of coordinates whenever the values of (x_{\sigma}) and (x_{\mu}') are in one-to-one correspondence. Moreover, it can be extended to represent transformation of coordinates of any n-dimensional vector. For example, if  \mu =1,2,3 and \sigma =1,2,3 then it represents coordinate  transformations of a 3-dimensional vector.

But, there are physical quantities which can’t be classified as scalar or vector.  For example, “stress”: the internal force experienced by a material due to the “strain” caused by external force; is described as a “tensor of rank 2”. This is so, because the stress at any point on the surface depends upon the external force vector and area vector i.e. it describes things happening due to interaction between two vectors. The Cauchy stress tensor \boldsymbol{\sigma} consists of nine components \sigma_{ij} that completely define the state of stress at a point inside a material in the deformed state (where i corresponds to the force component direction and j corresponds to the area component direction). The tensor relates a unit-length direction vector  n to the stress vector  \mathbf{T}^{(\mathbf{n})} across an imaginary surface perpendicular to n:

\displaystyle{\mathbf{T}^{(\mathbf n)}= \mathbf n \cdot\boldsymbol{\sigma}\quad \text{or} \quad T_j^{(n)}= \sigma_{ij}n_i}        where,  \boldsymbol{\sigma} = \left[{\begin{matrix} \mathbf{T}^{(\mathbf{e}_1)} \\  \mathbf{T}^{(\mathbf{e}_2)} \\  \mathbf{T}^{(\mathbf{e}_3)} \\  \end{matrix}}\right] =  \left[{\begin{matrix}  \sigma _{11} & \sigma _{12} & \sigma _{13} \\  \sigma _{21} & \sigma _{22} & \sigma _{23} \\  \sigma _{31} & \sigma _{32} & \sigma _{33} \\  \end{matrix}}\right]
where \sigma_{11}, \sigma_{22} and \sigma_{33} are normal stresses, and \sigma_{12}, \sigma_{13}, \sigma_{21}, \sigma_{23}, \sigma_{31} and \sigma_{32} are shear stresses. We can represent the stress vector acting on a plane with normal unit vector n, as:


By Sanpaz (Own work) [CC BY-SA 3.0 or GFDL], via Wikimedia Commons

Here, the tetrahedron is formed by slicing a parallelepiped along an arbitrary plane n. So, the force acting on the plane n is the reaction exerted by the other half of the parallelepiped and has an opposite sign.

In this terminology, a scalar is a tensor of rank zero and a vector is a tensor of rank one. Moreover, in an n-dimensional space:

  • a vector has n components
  • a tensor of rank two has n^2 components
  • a tensor of rank three has n^3 components
  • and so on …

Just like vectors, tensors in general are invariant under transformations of coordinate systems. We wish to exploit this further. Let’s reconsider the boxed equation stated earlier.  Since we are working with Euclidean metric i.e the length s of vector is given by s^2=x_1^2+x_2^2, we have ds^2=dx_1^2+dx_2^2 i.e. dx_1 and dx_2 are the components of ds. So, replacing dx_1 and dx_2 by A^1 and A^2  we get (motivation is to capture the idea of area vector)

\displaystyle{\boxed{A^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} A^{\sigma}}}

where A^1, A^2, A^3, \ldots  are components of a vector in certain coordinate system (note that superscripts are just for indexing purposes and do NOT represent exponents). Any set of quantities which transforms according to this equation is defined to be a contravariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a contravariant tensor of rank two is defined by:

\displaystyle{A^{' \alpha \beta} = \frac{\partial x_{\alpha}'}{\partial x_{\gamma}} \frac{\partial x_{\beta}'}{\partial x_{\delta}} A^{\gamma \delta}}

where the sum is over the indices \gamma and \delta (since they occur twice in the term on right). We can illustrate this for 3 dimensional space, i.e. \alpha , \beta , \gamma , \delta = 1,2,3 but summation performed only on  \gamma  and  \delta; for instance, if  \alpha=1 and  \beta=2 then we have:

\displaystyle{A^{' 12} = \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{1}} A^{11} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{2}} A^{12} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{3}} A^{13}+ \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{1}} A^{21}  + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{2}} A^{22} + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{3}} A^{23} }

\displaystyle{+\frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{1}} A^{31} + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{2}} A^{32}  + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{3}} A^{33} }

So, we just analysed the invariance of one of the flavours of tensors. Mathematically thinking, one should expect existence of something “like algebraic inverse” of  contravariant tensor because tensor is a generalization of vector and in linear algebra we study inverse operations. Let’s consider a situation when we want to analyse density of an object at different points. For simplicity, lets’ consider a point A(x_1,x_2) on a plane surface with variable density.


A surface whose density is different in different parts

If we designate by \psi the density at A, then \frac{\partial \psi}{\partial x_1} and \frac{\partial \psi}{\partial x_2} represent, respectively the partial variation of \psi in the x_1 and x_2 directions. Although \psi is a scalar quantity, the “change in \psi” is a directed quantity with components \frac{\partial \psi}{\partial x_1} and \frac{\partial \psi}{\partial x_2}. Note that, “change in \psi” is a tensor of rank one because it depends upon the various directions. But it’s a tensor in a sense different from what we saw in case of “stress”. This “difference” will become clear once we analyse what happens to this quantity when the coordinate system is changed.

Now our motive is to express \frac{\partial \psi}{\partial x_1'} , \frac{\partial \psi}{\partial x_2'}  in terms of \frac{\partial \psi}{\partial x_1} , \frac{\partial \psi}{\partial x_2} . Note that, a change in x_1' will affect “both” x_1 and  x_2  (as seen in rotation of 2D axis in case of vector). Hence, the resulting changes in x_1  and x_2 will affect \psi

\displaystyle{\frac{\partial \psi}{\partial x_1'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_1'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_1'}; \quad  \frac{\partial \psi}{\partial x_2'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_2'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_2'}}

Here we have used the idea that if x,y, z are three variables such that y and z depend on x and the calculation of the change in z per unit change in x NOT easy, then we can calculate it using: \frac{dz}{dx}  = \frac{dz}{dy} \frac{dy}{dx}. We can rewrite this system in condensed form as:

\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} = \sum_{\sigma} \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }

for \mu =1,2 and \sigma =1,2. We can further abbreviate it by omitting the summation symbol \sum_{\sigma} with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} =  \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }

for \mu =1,2 and \sigma =1,2. Finally replacing \frac{\partial \psi}{\partial x_{\mu}'} by A_{\mu}' and \frac{\partial \psi}{\partial x_{\sigma}} by A_{\sigma} (to make it similar to notation introduced in case of stress tensor)

\displaystyle{\boxed{A_{ \mu}' = \frac{\partial x_{\sigma}}{\partial x_{\mu}'} A_{\sigma}}}

where A_1, A_2, \ldots  are components of a vector in certain coordinate system. Any set of quantities which transforms according to this equation is defined to be a covariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a covariant tensor of rank two is defined by:

\displaystyle{A_{ \alpha \beta}' = \frac{\partial x_{\gamma}}{\partial x_{\alpha}'} \frac{\partial x_{\delta}}{\partial x_{\beta}'} A_{\gamma \delta}}

where the sum is over the indices \gamma and \delta (since they occur twice in the term on right).

Comparing the (boxed) equations describing contravariant and covariant vectors, we observe that the coefficients on the right are reciprocal of each other (as promised…).  Moreover, all these boxed equations represent the law of transformation for tensors of rank one (a.k.a. vectors), which can be generalized to a tensor of any rank.

Our final task is to see how these two flavours of tensors interact with each other. Let’s study the algebraic operations of addition and multiplication for both flavours of tensors, just like the way we did for vectors (note that vector product = dot product, because cross product can’t be generalized to n-dimensional vectors).

First consider the case of contravariant tensors. Let A^{\alpha} be a vector having two components A^{1} and A^{2} in a plane and B^{\alpha} be another such vector. If we define  A^{\alpha}+B^\alpha = C^\alpha and A^{\alpha} B^\beta = C^{\alpha \beta} (this allows 4 components, namely C^{11}, C^{12}, C^{21}, C^{22})  with

\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\beta}} B^{\beta}}

for \lambda, \mu, \alpha, \beta =1,2, then on their addition and multiplication (called outer multiplication) we get:

\displaystyle{C^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} C^{\alpha}; \quad C^{' \lambda \mu} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\mu}'}{\partial x_{\beta}} C^{\alpha \beta}}

for \lambda, \mu, \alpha, \beta =1,2. One can prove this by patiently multiplying each term and then rearranging them. In general, if two contravariant tensors of rank m and n respectively, are multiplied together, the result is a contravariant tensor of rank m+n.

For the case of covariant tensors, the addition and (outer) multiplication is done in same manner as above. Let A_{\alpha} be a vector having two components A_{1} and A_{2} in a plane and B_{\alpha} be another such vector. If we define  A_{\alpha}+B_\alpha = C_\alpha and A_{\alpha} B_\beta = C_{\alpha \beta} (this allows 4 components, namely C_{11}, C_{12}, C_{21}, C_{22})  with

\displaystyle{A_{ \lambda}' = \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} A_{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}

for \lambda, \mu, \alpha, \beta =1,2, then on their addition and multiplication (called outer multiplication) we get:

\displaystyle{C_{ \lambda} '= \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} C_{\alpha}; \quad C_{ \lambda \mu}'= \frac{\partial x_{\alpha}} {\partial x_{\lambda}'}\frac{\partial x_{\beta}}{\partial x_{\mu}'} C_{\alpha \beta}}

for \lambda, \mu, \alpha, \beta =1,2. In general, if two covariant tensors of rank m and n respectively, are multiplied together, the result is a covariant tensor of rank m+n.

Now, as promised, it’s the time to see how both of these flavours of tensors interact with each other.  Let’s extend the notion of outer multiplication defined for each flavour of tensor, to outer product of a contravariant tensor with a covariant tensor. For example, consider vectors (a.k.a. tensors of rank 1) of each type:

\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}

then their outer product leads to

\displaystyle{C^{' \lambda}_{ \mu}= \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\beta}}{\partial x_{\mu}'} C^{\alpha}_{\beta}}

where A^\alpha B_\beta = C^\alpha _\beta. This is neither a contravarient nor a covariant tensor, hence is rather called a mixed tensor of rank 2. More generally, if a contravariant tensor of rank m and a covariant tensor of rank  n  are multiplied together so as to form their outer product, the result is a mixed tensor of rank m+n.

In general, if two mixed tensors of rank m (having m_1 indices/superscripts of contravariance and m_2 indices/subscripts of covariance, such that m_1+m_2=m) and n (having n_1 indices/superscripts of contravariance and n_2 indices/subscripts of covariance, such that n_1+n_2=n)  respectively, are multiplied together, the result is a mixed tensor of rank m+n (having m_1+n_1 indices/superscripts of contravariance and m_2+n_2 indices/subscripts of covariance, such that m_1+n_1+m_2+n_2=m+n) .

Unlike the previous two types of tensors, we can’t illustrate it using a simple physical example. To convince yourself, consider following two mixed tensors of rank 3 and rank 2, respectively:

\displaystyle{A^{'\alpha \beta}_{\gamma} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}} A^{\lambda \mu}_{\nu}; \quad B^{'\kappa}_{\delta} = \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\kappa}'}{\partial x_{\rho}}B^{\rho}_{\sigma}}

then following the notations introduced, their outer product is of rank 5 and is given by

\displaystyle{C^{'\alpha\beta\kappa}_{\gamma\delta} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'} \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}}\frac{\partial x_{\kappa}'}{\partial x_{\rho}} C^{\lambda\mu\rho}_{\nu\sigma}}

Behind this notation, the processes are really complicated. Now, suppose that we are working in 3D vector space. Then, the transformation law for tensor \mathbf{A} represents a set of 27 (=3^3) equations with each equation having 27 terms on the right. And the transformation law for tensor \mathbf{B} represents a set of 9 (=3^2) equations with each equation having 9 terms on the right. Therefore, the transformation law of their outer product tensor  \mathbf{C} represents a set of 243 (=3^5) equations with each equation having 243 terms on the right.

So, unlike previous two cases of contravarient and covariant tensors, the proof of outer product of mixed tens;ors is rather complicated and out of scope for discussion in this introductory post.


[L] Lillian R. Lieber, The Einstein Theory of Relativity. Internet Archive:

Imaginary Angles


You would have heard about imaginary numbers and most famous of them is i=\sqrt{-1}. I personally don’t like this name because all of mathematics is man/woman made, hence all mathematical objects are imaginary (there is no perfect circle in nature…) and lack physical meaning. Moreover, these numbers are very useful in physics (a.k.a. the study of nature using mathematics). For example, “time-dependent Schrödinger equation

\displaystyle{i \hbar \frac{\partial}{\partial t}\Psi(\mathbf{r},t) = \hat H \Psi(\mathbf{r},t)}

But, as described here:

Complex numbers are a tool for describing a theory, not a property of the theory itself. Which is to say that they can not be the fundamental difference between classical and quantum mechanics (QM). The real origin of the difference is the non-commutative nature of measurement in QM. Now this is a property that can be captured by all kinds of beasts — even real-valued matrices. [Physics.SE]

For more of such interpretation see: Volume 1, Chapter 22 of “The Feynman Lectures in Physics”. And also this discussion about Hawking’s wave function.

All these facts may not have fascinated you, but the following fact from Einstein’s Special Relativity should fascinate you:

In 1908 Hermann Minkowski explained how the Lorentz transformation could be seen as simply a hyperbolic rotation of the spacetime coordinates, i.e., a rotation through an imaginary angle. [Wiki: Rapidity]

Irrespective of the fact that you do/don’t understand Einstein’s relativity, the concept of imaginary angle appears bizarre. But, mathematically its just another consequence of non-euclidean geometry which can be interpreted as Hyperbolic law of cosines etc. For example:

\displaystyle{\cos (\alpha+i\beta) = \cos (\alpha) \cosh (\beta) - i \sin (\alpha) \sinh (\beta)}

\displaystyle{\sin (\alpha+i\beta) = \sin (\alpha) \cosh (\beta) + i \cos (\alpha) \sinh (\beta)}

Let’s try to understand what is meant by “imaginary angle” by following the article “A geometric view of complex trigonometric functions” by Richard Hammack. Consider the complex unit circle  U=\{z,w\in \mathbb{C} \ :  \  z^2+w^2=1\} of \mathbb{C}^2, in a manner exactly analogous to the definition of the standard unit circle in \mathbb{R}^2. Apparently U is some sort of surface in \mathbb{C}^2, but it can’t be drawn as simply as the usual unit circle, owing to the four-dimensional character of \mathbb{C}^2. But we can examine its lower dimensional cross sections. For example, if  z=x+iy and w=u+iv then by setting y = 0 we get the circle x^2+u^2=1 in x-u plane for v=0 and the hyperbola x^2-v^2 = 1 in x-vi plane for u=0.


The cross-section of complex unit circle (defined by z^2+w^2=1 for complex numbers z and w) with the x-u-vi coordinate space (where z=x+iy and w=u+iv) © 2007 Mathematical Association of America

These two curves (circle and hyperbola) touch at the points ±o, where o=(1,0) in \mathbb{C}^2, as illustrated above. The symbol o is used by Richard Hammack because this point will turn out to be the origin of complex radian measure.

Let’s define complex distance between points \mathbf{a} =(z_1,w_1) and \mathbf{b}=(z_2,w_2) in \mathbb{C}^2 as


where square root is the half-plane H of \mathbb{C} consisting of the non-negative imaginary axis and the numbers with a positive real part. Therefore, the complex distance between two points in \mathbb{C}^2 is a complex number (with non-negative real part).

Starting at the point o in the figure above, one can move either along the circle or along the right-hand branch of the hyperbola. On investigating these two choices, we conclude that they involve traversing either a real or an imaginary distance. Generalizing the idea of real radian measure, we define imaginary radian measure to be the oriented arclength from o to a point p on the hyperbola, as


(a) Real radian measure (b) Imaginary radian measure. © 2007 Mathematical Association of America

If p is above the x axis, its radian measure is \beta i with \beta >0, while if it is below the x axis, its radian measure is \beta i with \beta <0. As in the real case, we define \cos (\beta i) and \sin (\beta i) to be the z and w coordinates of p. According to above figure (b), this gives

\displaystyle{\cos (\beta i) = \cosh (\beta); \qquad \sin (\beta i) = i \sinh (\beta)}

\displaystyle{\cos (\pi + \beta i) = -\cosh (\beta); \qquad \sin (\pi + \beta i) = -i \sinh (\beta)}

Notice that both these relations hold for both positive and negative values of \beta, and are in agreement with the expansions of  \cos (\alpha+i\beta)  and \sin (\alpha+i\beta)  stated earlier.

But, to “see” what a complex angle looks like we will have to examine the complex versions of lines and rays. Despite the four dimensional flavour, \mathbb{C}^2 is a two-dimensional vector space over the field \mathbb{C}, just like \mathbb{R}^2 over \mathbb{R}.

Since a line (through the origin) in \mathbb{R}^2 is the span of a nonzero vector, we define a complex line in \mathbb{C}^2 analogously. For a nonzero vector u in \mathbb{C}^2, the complex line \Lambda through u is span(u), which is isomorphic to the complex plane.

In \mathbb{R}^2, the ray \overline{\mathbf{u}} passing through a nonzero vector u can be defined as the set of all nonnegative real multiples of u. Extending this to \mathbb{C}^2 seems problematic, for the word “nonnegative” has no meaning in \mathbb{C}. Using the half-plane H (where complex square root is defined) seems a reasonable alternative. If u is a nonzero vector in \mathbb{C}, then the complex ray through u is the set \overline{\mathbf{u}} = \{\lambda u \ : \  \lambda\in H\}.

Finally, we define a complex angle is the union of two complex rays \overline{\mathbf{u}_1} and \overline{\mathbf{u}_2} .

I will end my post by quoting an application of imaginary angles in optics from here:

… in optics, when a light ray hits a surface such as glass, Snell’s law tells you the angle of the refracted beam, Fresnel’s equations tell you the amplitudes of reflected and transmitted waves at an interface in terms of that angle. If the incidence angle is very oblique when travelling from glass into air, there will be no refracted beam: the phenomenon is called total internal reflection. However, if you try to solve for the angle using Snell’s law, you will get an imaginary angle. Plugging this into the Fresnel equations gives you the 100% reflectance observed in practice, along with an exponentially decaying “beam” that travels a slight distance into the air. This is called the evanescent wave and is important for various applications in optics. [Mathematics.SE]

Different representations of a number as sum of squares


A couple of weeks ago, I wrote a post on Ramanujam’s 129th birthday. In that post I couldn’t verify the fact that:

129 is the smallest number that can be written as the sum of 3 squares in 4 ways.

So I contacted Sannidhya, the only good programmer I know . He wrote a program in Python which finds all numbers less than 1000 that can be written as sum of three squares. Here is the program :

Now, we can conclude that 129 is the smallest such number and the next is 134.

Let’s try to understand, how this program works. Firstly, the sumOfSquares(n) procedure finds all a, b such that n = a^2 + b^2. Then the sumOfSquares3(n) procedure finds all a,b,c such that n = a^2 + b^2 + c^2. It works by repeatedly invoking sumOfSquares on (n-i^2) where i is incremented after each iteration from 1 to square root of n/3. Finally, we run a loop up to 1000 and find those number for which sumOfSquares3 returns 4 triplets (a,b,c). Similarly, one can also find numbers which can be expressed as sum of 3 squares in 5 different ways. The smallest one is 194, with the 5 triplets being (1, 7, 12), (3, 4, 13), (3, 8, 11), (5, 5, 12) and (7, 8, 9).
But how does the functions sumOfSquares(n) and sumOfSquares3(n) work? This is how Sannidhya explains:

 The SumOfSquares(n) finds all the unordered pairs (a, b), such that a^2 + b^2 = n. It works by subtracting a perfect square (i^2) from n, and checking if the remaining part is a perfect square as well. If so, then (i, sqrt(n-i^2)) will form a required unordered pair. Here, i loops from 1 to square root of n/2. Note: One can also loop from 1 to square root of n but after square root of n/2, further iterations will generate redundant pairs which are just permutations of the pairs already obtained. For example, consider 25, the expected output should be (3,4) only, but if the loop runs from 1 to square root of n, then the output will be (3, 4), (4, 3). As you can see we are getting redundant pairs. So, we run the loop from 1 to square root of n/2.

The SumOfSquares3 function calls SumOfSquares repeatedly with the argument n – i^2, where i is incremented from 1 to square root of n/3. Note that each element of SumOfSquares(n – i^2) is a pair. For each of these elements, the loop forms a triplet consisting of i and the pair. This triplet is then appended to the list, which is finally returned.

The repetitions of triplets can easily be controlled by using sorted function from Python in sumOfSquares3(n).

Indeed, these type of question are a bit hard computationally. For example, see:

Related discussions on MathOverflow:

– Is there a simple way to compute the number of ways to write a positive integer as the sum of three squares? : Note that this is not answer of my question since r_k(n) counts the number of representations of n by k squares, allowing zeros and distinguishing signs and order.

– Efficient computation of integer representation as sum of three squares

Related discussions on ComputerScience.SE

– Listing integers as the sum of three squares m=x^2+y^2+z^2 : Sannidhya did a clever improvement to this algorithm, but still as pointed here, Sannidhya’s algorithm is of O(n).

Related discussions on Mathematics.SE

– When is a rational number a sum of three squares?

– Why can’t this number be written as a sum of three squares of rationals?

– Sum of one, two, and three squares

Hello 2017


2017 is a prime number!! In fact this is 306th prime year A. D. The previous prime year was 2011 and next one will be 2027.

Moreover, 2017 leaves a remainder of 1 when divided by 4. So, this can be represented as sum of two squares. How do we know?

Fermat’s two-square theorem: An odd prime p can be written as sum of two squares if and only if it is can be written as 4n+1 for some integer n. Moreover, this representation is unique.

Unlike the three-square theorem discussed in previous post, this is not that difficult to prove (the only challenging part is to show that if the prime leaves a remainder 1 when divided by 4 then it “can” be written as sum of two squares). There are many ways to prove this, and there is a Wikipedia page dedicated to the popular proofs of it.  But my favorite proof is the one by Richard Dedekind, it requires knowledge of Gaussian integers and some College algebra (properties of unique factorization domain). You can find “existence” proof here and “uniqueness” proof here.

I will end this post with writing step-by-step procedure of writing 2017 as sum of two squares by following the algorithm explained here:

Step 1: Find z such that z^2+1 is divisible by 2017.

Choose any quadratic non-residue a modulo 2017 because then a^{1008} +1 is divisible by 2017. Since half of the residues modulo 2017 are quadratic non-residue, it’s easy to check our guess using a^{1008} +1 divisibility. Easy to observe that a=5 is smallest solution (in fact here is the list of quadratic non-residues modulo 2017 generated by WolframAlpha). Hence z\equiv 5^{504} \pmod{2017}, we get z=229.

Step 2: Compute the greatest common divisor of 2017 and 229+i using the Euclidean algorithm for the Gaussian integers. The answer will be x+yi where x^2+y^2=2017.

Note that norm of any Gaussian integer r+si is r^2+s^2. Hence, the norm of 229+i, N(229+i) = 52442. For euclidean algorithm I will use long division/calculator as:

For Gaussian integers we first multiply the denominator by its conjugate and then use calculator to compute real and imaginary parts of the quotient separately.

2017 = (229+i)(8) +(185-8i) ;   N(185-8i) = 34289

229 + i = (185-8i)(1) + (44+9i);   N(44+9i) = 2017

185-8i = (44+9i)(4-i) + 0

Hence, the gcd is 44+9i.

Finally, we get: 44^2 + 9^2 = 2017.

You may find this property of 2017 not so special, since there are infinitely many primes of form 4n+1. Please let me know more interesting properties of this number…