# Geometry & Arithmetic

Standard

A couple of weeks ago I discussed a geometric solution to an arithmetic problem. In this post, I will discuss an arithmetical solution to a geometry problem. Consider the following question:

Given a square whose sides are reflecting mirrors. A ray of light leaves a point inside the square and is reflected repeatedly in the mirrors. What is the nature of its paths?

It may happen that the ray passes through a corner of the square. In that case, we assume that it returns along its former path.

In figure, the parallels to the axis are the lines, $x = m + \frac{1}{2}$ and $y = n + \frac{1}{2}$, where $m$ and $n$ are integers. The thick square, of side 1, around the origin is the square of the problem and $E\equiv(a,b)$ is the starting point. We construct all images of $E$ in the mirrors, for direct or repeated reflection. One can observe that they are of four types, the coordinates of the images of the different types being

1. $(a+2n, b+2m)$
2. $(a+2n, -b+2m+1)$
3. $(-a+2n+1, b+2m)$
4. $(-a+2n+1, -b+2m+1)$

where $m$ and $n$ are arbitrary integers. Further, if the velocity at $E$ has direction cosines, $\lambda, \mu$, then the corresponding images of the velocity have direction cosines

1. $(\lambda, \mu)$
2. $(\lambda, -\mu)$
3. $(-\lambda, \mu)$
4. $(-\lambda, -\mu)$

where we suppose (on the grounds of symmetry) that $\mu$ is positive. If we think of the plane as divided into squares of unit side, then interior of a typical square being

$\displaystyle{n -\frac{1}{2} < x < n+\frac{1}{2}, \qquad m-\frac{1}{2}

then each squares contains just one image of every point in the original sqaure, given by $n=m=0$ (shown by the bold points in the figure). And if the image in any of the above squares of any point in the original sqaure is of type (1.), (2.), (3.) or (4.), then the image in any of the above squares of any other point in the original square is of the same type.

We can now imagine $E$ moving with the ray (shown by dotted lines in the figure). When the point $E$ meets the mirror, it coincides with an image and the image of $E$ which momentarily coincides with $E$ continues the motion of $E$, in its original direction, in one of the squares adjacent to the fundamental square (the thick square). We follow the motion of the image, in this square, until it in its turn it meets a side of the square. Clearly, the original path of $E$ will be continued indefinitely in the same line $L$ (dotted line in the figure), by a series of different images.

The segment of $L$ in any square (for a given $n$ and $m$) is the image of a straight portion of the path of $E$ in the original square. There is one-to-one correspondence between the segments of $L$, in different squares, and the portions of the path of $E$ between successive reflections, each segment of $L$ being an image of the corresponding portion of the path of $E$.

The path of $E$ in the original square will be periodic if $E$ returns to its original position moving in the same direction; and this will happen if and only if $L$ passes through an image of type (1.) of the original $E$. The coordinates of an arbitrary point of $L$ are $x=a+\lambda t, \quad y = b+\mu tf$.

Hence the path will be periodic if and only if $\lambda t = 2n, \quad \mu t = 2m$, for some $t$ and integral $n,m$, i.e. if  $\frac{\lambda}{\mu}$ is rational.

When $\frac{\lambda}{\mu}$ is irrational, then the path of $E$ approaches arbitrarily near to every point $(c,d)$ of the sqaure. This follows directly from Kronecker’s Theorem in one dimension (see § 23.3 of G H. Hardy and E. M. Wright’s An Introduction to the Theory of Numbers.):

[Kronecker’s Theorem in one dimension] If $\theta$ is irrational, $\alpha$ is arbitrary, and $N$ and $\epsilon$ are positive, then there are integers $p$ and $q$ such that $p>N$ and $|p\theta - q-\alpha|<\epsilon$.

Here, we have $\theta = \frac{\lambda}{\mu}$ and $\alpha = (b-d)\frac{\lambda}{2\mu} - \frac{1}{2}(a-c)$, with large enough integers $p=m$ and $q=n$. Hence we can conclude that

[König-Szücs Theorem]Given a square whose sides are reflecting mirrors. A ray of light leaves a point inside the square and is reflected repeatedly in the mirrors. Either the path is closed and periodic or it is dense in the square, passing arbitrarily near to every point of the square. A necessary and sufficient condition for the periodicity is that the angle between a side of the square and the initial direction of the ray should have a rational tangent.

Another way of stating the above Kronecker’s theorem is

[Kronecker’s Theorem in one dimension] If $\theta$ is irrational, then the set of points $n\theta - \lfloor n\theta\rfloor$ is dense in the interval $(0,1)$.

Then with some knowledge of Fourier series, we can try to answer a more general question

Given an irrational number $\theta$, what can be said about the distribution of the fractional parts of the sequence of numbers $n\theta$, for $n=1,2,3,\ldots$?

The answer to this question is called Weyl’s Equidistribution Theorem (see §4.2 of  Elias M. Stein & Rami Shakarchi’s Fourier Analysis: An Introduction)

[Weyl’s Equidistribution Theorem] If $\theta$ is irrational, then the sequence of fractional parts $\{n\theta - \lfloor n\theta\rfloor\}_{n=1}^{\infty}$ is equidistributed in $[0,1)$.

I really enjoyed reading about this unexpected link between geometry and arithmetic (and Fourier analysis). Most of the material has been taken/copied from Hardy’s book. The solution to the geometry problem reminds me of the solution to the Cross Diagonal Cover  Problem.

# A peek into the world of tensors

Standard

When I was in high school, two cool things that I learned from physics were “vectors” and “calculus”. I was (and am still) awestruck by following statement

Uniform circular motion is an accelerated motion.

At college, during my first (and second last) physics course, I was taught “vector calculus” and I didn’t enjoy it. Last year I learned “linear algebra”, that is the study of vector spaces, matrices, linear transformations… Also, a few months ago I wrote about my understanding of Algebra. In it I briefly mentioned that “…study of symmetry of equations, geometric objects, etc. became one of the central topics of interest…” and this lead to what we call “Abstract Algebra” of which linear algebra is a part. Following video by 3blue1brown explains how our understanding of vectors from physics can be used to develop the subject of linear algebra.

But, one should ask : “Why do we care to classify physical quantities as scalars and vectors?”. The answer to this question lies in the quest of physics to find the invariants, in terms of which we can state the Laws of nature. In general, the idea of finding invariants is a useful problem solving strategy in mathematics (the language of physics). For example, consider following problem from the book “Problem-Solving Strategies” by Arthur Engel:

Suppose the positive integer n is  odd. The numbers 1,2,…, 2n are written on the blackboard. Then one can pick any two numbers a and b, erase them and write instead |a-b|. Prove that an odd number will remain at the end.

To prove this statement, one will have to use the concept of “parity” of the sum 1+2+3+…+2n as the invariant. And as stated in the video above, vectors are invariant under transformations of coordinate systems (the components change, but length and direction of arrow remain unchanged). For example, consider the rotation of 2D axis by angle θ, keeping the origin fixed.

By Guy vandegrift (Own work) [CC BY-SA 3.0], via Wikimedia Commons

$\displaystyle{x' = x \cos \theta + y \sin \theta; \quad y'= -x \sin \theta + y \cos \theta}$

Now, we can rewrite this by using $x_1$ and $x_2$ instead of $x$ and $y$; and putting different subscripts on the single letter $a$ instead of functions of $\theta$:

$\displaystyle{x_1' = a_{11} x_1 + a_{12}x_2; \quad x_2'= a_{21}x_1 + a_{22}x_2}$

Now we differentiate this system of equations to get:

$\displaystyle{dx_1'= \frac{\partial x_1'}{\partial x_1} dx_1 + \frac{\partial x_1'}{\partial x_2} dx_2 ; \quad dx_2' = \frac{\partial x_2'}{\partial x_1} dx_1 + \frac{\partial x_2'}{\partial x_2} dx_2 }$

where $a_{ij} = \frac{\partial x_i'}{\partial x_j}$. We can rewrite this system in condensed form as:

$\displaystyle{dx_{\mu}' = \sum_{\sigma} \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma}}$

for $\mu =1,2$ and $\sigma =1,2$. We can further abbreviate it by omitting the summation symbol $\sum_{\sigma}$ with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

$\displaystyle{\boxed{dx_{\mu}' = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma} }}$

for $\mu =1,2$ and $\sigma =1,2$. This equation represents ANY transformation of coordinates whenever the values of $(x_{\sigma})$ and $(x_{\mu}')$ are in one-to-one correspondence. Moreover, it can be extended to represent transformation of coordinates of any n-dimensional vector. For example, if  $\mu =1,2,3$ and $\sigma =1,2,3$ then it represents coordinate  transformations of a 3-dimensional vector.

But, there are physical quantities which can’t be classified as scalar or vector.  For example, “stress”: the internal force experienced by a material due to the “strain” caused by external force; is described as a “tensor of rank 2”. This is so, because the stress at any point on the surface depends upon the external force vector and area vector i.e. it describes things happening due to interaction between two vectors. The Cauchy stress tensor $\boldsymbol{\sigma}$ consists of nine components $\sigma_{ij}$ that completely define the state of stress at a point inside a material in the deformed state (where i corresponds to the force component direction and j corresponds to the area component direction). The tensor relates a unit-length direction vector  n to the stress vector  $\mathbf{T}^{(\mathbf{n})}$ across an imaginary surface perpendicular to n:

$\displaystyle{\mathbf{T}^{(\mathbf n)}= \mathbf n \cdot\boldsymbol{\sigma}\quad \text{or} \quad T_j^{(n)}= \sigma_{ij}n_i}$        where,  $\boldsymbol{\sigma} = \left[{\begin{matrix} \mathbf{T}^{(\mathbf{e}_1)} \\ \mathbf{T}^{(\mathbf{e}_2)} \\ \mathbf{T}^{(\mathbf{e}_3)} \\ \end{matrix}}\right] = \left[{\begin{matrix} \sigma _{11} & \sigma _{12} & \sigma _{13} \\ \sigma _{21} & \sigma _{22} & \sigma _{23} \\ \sigma _{31} & \sigma _{32} & \sigma _{33} \\ \end{matrix}}\right]$
where $\sigma_{11}$, $\sigma_{22}$ and $\sigma_{33}$ are normal stresses, and $\sigma_{12}$, $\sigma_{13}$, $\sigma_{21}$, $\sigma_{23}$, $\sigma_{31}$ and $\sigma_{32}$ are shear stresses. We can represent the stress vector acting on a plane with normal unit vector n, as:

By Sanpaz (Own work) [CC BY-SA 3.0 or GFDL], via Wikimedia Commons

Here, the tetrahedron is formed by slicing a parallelepiped along an arbitrary plane n. So, the force acting on the plane n is the reaction exerted by the other half of the parallelepiped and has an opposite sign.

In this terminology, a scalar is a tensor of rank zero and a vector is a tensor of rank one. Moreover, in an n-dimensional space:

• a vector has n components
• a tensor of rank two has n^2 components
• a tensor of rank three has n^3 components
• and so on …

Just like vectors, tensors in general are invariant under transformations of coordinate systems. We wish to exploit this further. Let’s reconsider the boxed equation stated earlier.  Since we are working with Euclidean metric i.e the length s of vector is given by $s^2=x_1^2+x_2^2$, we have $ds^2=dx_1^2+dx_2^2$ i.e. $dx_1$ and $dx_2$ are the components of $ds$. So, replacing $dx_1$ and $dx_2$ by $A^1$ and $A^2$  we get (motivation is to capture the idea of area vector)

$\displaystyle{\boxed{A^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} A^{\sigma}}}$

where $A^1, A^2, A^3, \ldots$  are components of a vector in certain coordinate system (note that superscripts are just for indexing purposes and do NOT represent exponents). Any set of quantities which transforms according to this equation is defined to be a contravariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a contravariant tensor of rank two is defined by:

$\displaystyle{A^{' \alpha \beta} = \frac{\partial x_{\alpha}'}{\partial x_{\gamma}} \frac{\partial x_{\beta}'}{\partial x_{\delta}} A^{\gamma \delta}}$

where the sum is over the indices $\gamma$ and $\delta$ (since they occur twice in the term on right). We can illustrate this for 3 dimensional space, i.e. $\alpha , \beta , \gamma , \delta = 1,2,3$ but summation performed only on  $\gamma$  and  $\delta$; for instance, if  $\alpha=1$ and  $\beta=2$ then we have:

$\displaystyle{A^{' 12} = \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{1}} A^{11} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{2}} A^{12} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{3}} A^{13}+ \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{1}} A^{21} + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{2}} A^{22} + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{3}} A^{23} }$

$\displaystyle{+\frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{1}} A^{31} + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{2}} A^{32} + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{3}} A^{33} }$

So, we just analysed the invariance of one of the flavours of tensors. Mathematically thinking, one should expect existence of something “like algebraic inverse” of  contravariant tensor because tensor is a generalization of vector and in linear algebra we study inverse operations. Let’s consider a situation when we want to analyse density of an object at different points. For simplicity, lets’ consider a point $A(x_1,x_2)$ on a plane surface with variable density.

A surface whose density is different in different parts

If we designate by $\psi$ the density at A, then $\frac{\partial \psi}{\partial x_1}$ and $\frac{\partial \psi}{\partial x_2}$ represent, respectively the partial variation of $\psi$ in the $x_1$ and $x_2$ directions. Although $\psi$ is a scalar quantity, the “change in $\psi$” is a directed quantity with components $\frac{\partial \psi}{\partial x_1}$ and $\frac{\partial \psi}{\partial x_2}$. Note that, “change in $\psi$” is a tensor of rank one because it depends upon the various directions. But it’s a tensor in a sense different from what we saw in case of “stress”. This “difference” will become clear once we analyse what happens to this quantity when the coordinate system is changed.

Now our motive is to express $\frac{\partial \psi}{\partial x_1'}$ , $\frac{\partial \psi}{\partial x_2'}$  in terms of $\frac{\partial \psi}{\partial x_1}$ , $\frac{\partial \psi}{\partial x_2}$ . Note that, a change in $x_1'$ will affect “both” $x_1$ and  $x_2$  (as seen in rotation of 2D axis in case of vector). Hence, the resulting changes in $x_1$  and $x_2$ will affect $\psi$

$\displaystyle{\frac{\partial \psi}{\partial x_1'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_1'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_1'}; \quad \frac{\partial \psi}{\partial x_2'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_2'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_2'}}$

Here we have used the idea that if x,y, z are three variables such that y and z depend on x and the calculation of the change in z per unit change in x NOT easy, then we can calculate it using: $\frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx}$. We can rewrite this system in condensed form as:

$\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} = \sum_{\sigma} \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }$

for $\mu =1,2$ and $\sigma =1,2$. We can further abbreviate it by omitting the summation symbol $\sum_{\sigma}$ with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

$\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} = \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }$

for $\mu =1,2$ and $\sigma =1,2$. Finally replacing $\frac{\partial \psi}{\partial x_{\mu}'}$ by $A_{\mu}'$ and $\frac{\partial \psi}{\partial x_{\sigma}}$ by $A_{\sigma}$ (to make it similar to notation introduced in case of stress tensor)

$\displaystyle{\boxed{A_{ \mu}' = \frac{\partial x_{\sigma}}{\partial x_{\mu}'} A_{\sigma}}}$

where $A_1, A_2, \ldots$  are components of a vector in certain coordinate system. Any set of quantities which transforms according to this equation is defined to be a covariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a covariant tensor of rank two is defined by:

$\displaystyle{A_{ \alpha \beta}' = \frac{\partial x_{\gamma}}{\partial x_{\alpha}'} \frac{\partial x_{\delta}}{\partial x_{\beta}'} A_{\gamma \delta}}$

where the sum is over the indices $\gamma$ and $\delta$ (since they occur twice in the term on right).

Comparing the (boxed) equations describing contravariant and covariant vectors, we observe that the coefficients on the right are reciprocal of each other (as promised…).  Moreover, all these boxed equations represent the law of transformation for tensors of rank one (a.k.a. vectors), which can be generalized to a tensor of any rank.

Our final task is to see how these two flavours of tensors interact with each other. Let’s study the algebraic operations of addition and multiplication for both flavours of tensors, just like the way we did for vectors (note that vector product = dot product, because cross product can’t be generalized to n-dimensional vectors).

First consider the case of contravariant tensors. Let $A^{\alpha}$ be a vector having two components $A^{1}$ and $A^{2}$ in a plane and $B^{\alpha}$ be another such vector. If we define  $A^{\alpha}+B^\alpha = C^\alpha$ and $A^{\alpha} B^\beta = C^{\alpha \beta}$ (this allows 4 components, namely $C^{11}, C^{12}, C^{21}, C^{22}$)  with

$\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\beta}} B^{\beta}}$

for $\lambda, \mu, \alpha, \beta =1,2$, then on their addition and multiplication (called outer multiplication) we get:

$\displaystyle{C^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} C^{\alpha}; \quad C^{' \lambda \mu} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\mu}'}{\partial x_{\beta}} C^{\alpha \beta}}$

for $\lambda, \mu, \alpha, \beta =1,2$. One can prove this by patiently multiplying each term and then rearranging them. In general, if two contravariant tensors of rank m and n respectively, are multiplied together, the result is a contravariant tensor of rank m+n.

For the case of covariant tensors, the addition and (outer) multiplication is done in same manner as above. Let $A_{\alpha}$ be a vector having two components $A_{1}$ and $A_{2}$ in a plane and $B_{\alpha}$ be another such vector. If we define  $A_{\alpha}+B_\alpha = C_\alpha$ and $A_{\alpha} B_\beta = C_{\alpha \beta}$ (this allows 4 components, namely $C_{11}, C_{12}, C_{21}, C_{22}$)  with

$\displaystyle{A_{ \lambda}' = \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} A_{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}$

for $\lambda, \mu, \alpha, \beta =1,2$, then on their addition and multiplication (called outer multiplication) we get:

$\displaystyle{C_{ \lambda} '= \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} C_{\alpha}; \quad C_{ \lambda \mu}'= \frac{\partial x_{\alpha}} {\partial x_{\lambda}'}\frac{\partial x_{\beta}}{\partial x_{\mu}'} C_{\alpha \beta}}$

for $\lambda, \mu, \alpha, \beta =1,2$. In general, if two covariant tensors of rank m and n respectively, are multiplied together, the result is a covariant tensor of rank m+n.

Now, as promised, it’s the time to see how both of these flavours of tensors interact with each other.  Let’s extend the notion of outer multiplication defined for each flavour of tensor, to outer product of a contravariant tensor with a covariant tensor. For example, consider vectors (a.k.a. tensors of rank 1) of each type:

$\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}$

then their outer product leads to

$\displaystyle{C^{' \lambda}_{ \mu}= \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\beta}}{\partial x_{\mu}'} C^{\alpha}_{\beta}}$

where $A^\alpha B_\beta = C^\alpha _\beta$. This is neither a contravarient nor a covariant tensor, hence is rather called a mixed tensor of rank 2. More generally, if a contravariant tensor of rank m and a covariant tensor of rank  n  are multiplied together so as to form their outer product, the result is a mixed tensor of rank m+n.

In general, if two mixed tensors of rank m (having $m_1$ indices/superscripts of contravariance and $m_2$ indices/subscripts of covariance, such that $m_1+m_2=m$) and n (having $n_1$ indices/superscripts of contravariance and $n_2$ indices/subscripts of covariance, such that $n_1+n_2=n$)  respectively, are multiplied together, the result is a mixed tensor of rank m+n (having $m_1+n_1$ indices/superscripts of contravariance and $m_2+n_2$ indices/subscripts of covariance, such that $m_1+n_1+m_2+n_2=m+n$) .

Unlike the previous two types of tensors, we can’t illustrate it using a simple physical example. To convince yourself, consider following two mixed tensors of rank 3 and rank 2, respectively:

$\displaystyle{A^{'\alpha \beta}_{\gamma} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}} A^{\lambda \mu}_{\nu}; \quad B^{'\kappa}_{\delta} = \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\kappa}'}{\partial x_{\rho}}B^{\rho}_{\sigma}}$

then following the notations introduced, their outer product is of rank 5 and is given by

$\displaystyle{C^{'\alpha\beta\kappa}_{\gamma\delta} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'} \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}}\frac{\partial x_{\kappa}'}{\partial x_{\rho}} C^{\lambda\mu\rho}_{\nu\sigma}}$

Behind this notation, the processes are really complicated. Now, suppose that we are working in 3D vector space. Then, the transformation law for tensor $\mathbf{A}$ represents a set of 27 (=3^3) equations with each equation having 27 terms on the right. And the transformation law for tensor $\mathbf{B}$ represents a set of 9 (=3^2) equations with each equation having 9 terms on the right. Therefore, the transformation law of their outer product tensor  $\mathbf{C}$ represents a set of 243 (=3^5) equations with each equation having 243 terms on the right.

So, unlike previous two cases of contravarient and covariant tensors, the proof of outer product of mixed tens;ors is rather complicated and out of scope for discussion in this introductory post.

Reference:

[L] Lillian R. Lieber, The Einstein Theory of Relativity. Internet Archive: https://archive.org/details/einsteintheoryof032414mbp