Tag Archives: vector spaces

FLT for rational functions


Following is the problem 2.16 in The Math Problems Notebook:

Prove that if n>2, then we do not have any nontrivial solutions of the equation F^n + G^n = H^n where F,G,H are rational functions. Solutions of the form F = aJ, G=bJ, H=cJ where J is a rational function and a,b,c are complex numbers satisfying a^n + b^n = c^n, are called trivial.

This problem is analogous to the Fermat’s Last Theorem (FLT) which states that for n> 2, x^n + y^n = z^n has no nontrivial integer solutions.

The solution of this problem involves proof by contradiction:

Since any rational solution yields a complex polynomial solution, by clearing the denominators, it is sufficient to assume that (f,g,h) is a polynomial solution such that r=\max(\deg(f),\deg(g),\deg(h)) is minimal among all polynomial solutions, where r>0.

Assume also that f,g,h are relatively prime.  Hence we have f^n+g^n = h^n, i.e. f^n-h^n = g^n. Now using the simple factorization identity involving the roots of unity, we get:

\displaystyle{\prod_{\ell = 0}^{n-1}\left(f-\zeta^\ell h\right) = g^n}

where \zeta = e^{\frac{2\pi i}{n}} with i = \sqrt{-1}.

Since \gcd(f,g) = \gcd(f,h) = 1, we have \gcd(f-\zeta^\ell h, f-\zeta^k h)=1 for \ell\neq k. Since the ring of complex polynomials has unique facotrization property, we must have g = g_1\cdots g_{n-1}, where g_j are polynomials satisfying \boxed{g_\ell^n = f-\zeta^\ell h}.

Now consider the factors f-h, f-\zeta h, f-\zeta^2 h. Note that, since n>2, these elements belong to the 2-dimensional vector space generated by f,h over \mathbb{C}.  Hence these three elements are linearly dependent, i.e. there exists a vanishing linear combination with complex coefficients (not all zero) in these three elements. Thus there exist a_\ell\in\mathbb{C} so that a_0g_0^n + a_1g_1^n = a_2g_2^n. We then set h_\ell = \sqrt[n]{a_\ell}g_\ell, and observe that \boxed{h_0^n+h_1^n = h_2^n}.

Moreover, the polynomials \gcd(h_\ell,h_k)=1 for \ell \neq k and \max_{\ell}(h_\ell) = \max_{\ell} (g_\ell) < r since g_\ell^n \mid f^n-h^n. Thus contradicting the minimality of r, i.e. the minimal (degree) solution f,g,h didn’t exist. Hence no solution exists.

The above argument fails for proving the non-existence of integer solutions since two coprime integers don’t form a 2-dimensional vector space over \mathbb{C}.

In the praise of norm


If you have spent some time with undergraduate mathematics, you would have probably heard the word “norm”. This term is encountered in various branches of mathematics, like (as per Wikipedia):

But, it seems to occur only in abstract algebra. Although the definition of this term is always algebraic, it has a topological interpretation when we are working with vector spaces.  It secretly connects a vector space to a topological space where we can study differentiation (metric space), by satisfying the conditions of a metric.  This point of view along with an inner product structure, is explored when we study functional analysis.

Some facts to remember:

  1. Every vector space has a norm. [Proof]
  2. Every vector space has an inner product (assuming “Axiom of Choice”). [Proof]
  3. An inner product naturally induces an associated norm, thus an inner product space is also a normed vector space.  [Proof]
  4. All norms are equivalent in finite dimensional vector spaces. [Proof]
  5. Every normed vector space is a metric space (and NOT vice versa). [Proof]
  6. In general, a vector space is NOT same a metric space. [Proof]

Real vs Complex numbers


I want to talk about the algebraic and analytic differences between real and complex numbers. Firstly, let’s have a look at following beautiful explanation by Richard Feynman (from his QED lectures) about similarities between real and complex numbers:


From Chapter 2 of the book “QED – The Strange Theory of Light and Matter” © Richard P. Feynman, 1985.

Before reading this explanation, I used to believe that the need to establish “Fundamental theorem Algebra” (read this beautiful paper by Daniel J. Velleman to learn about proof of this theorem) was only way to motivate study of complex numbers.

The fundamental difference between real and complex numbers is

Real numbers form an ordered field, but complex numbers can’t form an ordered field. [Proof]

Where we define ordered field as follows:

Let \mathbf{F} be a field. Suppose that there is a set \mathcal{P} \subset \mathbf{F} which satisfies the following properties:

  • For each x \in \mathbf{F}, exactly one of the following statements holds: x \in \mathcal{P}, -x \in \mathcal{P}, x =0.
  • For x,y \in \mathcal{P}, xy \in \mathcal{P} and x+y \in \mathcal{P}.

If such a \mathcal{P} exists, then \mathbf{F} is an ordered field. Moreover, we define x \le y \Leftrightarrow y -x \in \mathcal{P} \vee x = y.

Note that, without retaining the vector space structure of complex numbers we CAN establish the order for complex numbers [Proof], but that is useless. I find this consequence pretty interesting, because though \mathbb{R} and \mathbb{C} are isomorphic as additive groups (and as vector spaces over \mathbb{Q}) but not isomorphic as rings (and hence not isomorphic as fields).

Now let’s have a look at the consequence of the difference between the two number systems due to the order structure.

Though both real and complex numbers form a complete field (a property of topological spaces), but only real numbers have least upper bound property.

Where we define least upper bound property as follows:

Let \mathcal{S} be a non-empty set of real numbers.

  • A real number x is called an upper bound for \mathcal{S} if x \geq s for all s\in \mathcal{S}.
  • A real number x is the least upper bound (or supremum) for \mathcal{S} if x is an upper bound for \mathcal{S} and x \leq y for every upper bound y of \mathcal{S} .

The least-upper-bound property states that any non-empty set of real numbers that has an upper bound must have a least upper bound in real numbers.
This least upper bound property is referred to as Dedekind completeness. Therefore, though both \mathbb{R} and \mathbb{C} are complete as a metric space [proof] but only \mathbb{R} is Dedekind complete.

In an arbitrary ordered field one has the notion of Dedekind completeness — every nonempty bounded above subset has a least upper bound — and also the notion of sequential completeness — every Cauchy sequence converges. The main theorem relating these two notions of completeness is as follows [source]:

For an ordered field \mathbf{F}, the following are equivalent:
(i) \mathbf{F} is Dedekind complete.
(ii) \mathbf{F} is sequentially complete and Archimedean.

Where we defined an Archimedean field as an ordered field such that for each element there exists a finite expression 1+1+\ldots+1 whose value is greater than that element, that is, there are no infinite elements.

As remarked earlier, \mathbb{C} is not an ordered field and hence can’t be Archimedean. Therefore, \mathbb{C}  can’t have least-upper-bound property, though it’s complete in topological sense. So, the consequence of all this is:

We can’t use complex numbers for counting.

But still, complex numbers are very important part of modern arithmetic (number-theory), because they enable us to view properties of numbers from a geometric point of view [source].

Dimension clarification


In several of my previous posts I have mentioned the word “dimension”. Recently I realized that dimension can be of two types, as pointed out by Bernhard Riemann in his famous lecture in 1854. Let me quote Donal O’Shea from pp. 99 of his book “The Poincaré Conjecture” :

Continuous spaces can have any dimension, and can even be infinite dimensional. One needs to distinguish between the notion of a space and a space with a geometry. The same space can have different geometries. A geometry is an additional structure on a space. Nowadays, we say that one must distinguish between topology and geometry.

[Here by the term “space(s)” the author means “topological space”]

In mathematics, the word “dimension” can have different meanings. But, broadly speaking, there are only three different ways of defining/thinking about “dimension”:

  • Dimension of Vector Space: It’s the number of elements in basis of the vector space. This is the sense in which the term dimension is used in geometry (while doing calculus) and algebra. For example:
    • A circle is a two dimensional object since we need a two dimensional vector space (aka coordinates) to write it. In general, this is how we define dimension for Euclidean space (which is an affine space, i.e. what is left of a vector space after you’ve forgotten which point is the origin).
    • Dimension of a differentiable manifold is the dimension of its tangent vector space at any point.
    • Dimension of a variety (an algebraic object) is the dimension of tangent vector space at any regular point. Krull dimension is remotely motivated by the idea of dimension of vector spaces.
  • Dimension of Topological Space: It’s the smallest integer that is somehow related to open sets in the given topological space. In contrast to a basis of a vector space, a basis of topological space need not be maximal; indeed, the only maximal base is the topology itself. Moreover, dimension is this case can be defined using  “Lebesgue covering dimension” or in some nice cases using “Inductive dimension“.  This is the sense in which the term dimension is used in topology. For example:
    • A circle is one dimensional object and a disc is two dimensional by topological definition of dimension.
    • Two spaces are said to have same dimension if and only if there exists a continuous bijective map between them. Due to this, a curve and a plane have different dimension even though curves can fill space.  Space-filling curves are special cases of fractal constructions. No differentiable space-filling curve can exist. Roughly speaking, differentiability puts a bound on how fast the curve can turn.
  • Fractal Dimension:  It’s a notion designed to study the complex sets/structures like fractals that allows notions of objects with dimensions other than integers. It’s definition lies in between of that of dimension of vector spaces and topological spaces. It can be defined in various similar ways. Most common way is to define it as “dimension of Hausdorff measure on a metric space” (measure theory enable us to integrate a function without worrying about  its smoothness and the defining property of fractals is that they are NOT smooth). This sense of dimension is used in very specific cases. For example:
    • A curve with fractal dimension very near to 1, say 1.10, behaves quite like an ordinary line, but a curve with fractal dimension 1.9 winds convolutedly through space very nearly like a surface.
      • The fractal dimension of the Koch curve is \frac{\ln 4}{\ln 3} \sim 1.26186, but its topological dimension is 1 (just like the space-filling curves). The Koch curve is continuous everywhere but differentiable nowhere.
      • The fractal dimension of space-filling curves is 2, but their topological dimension is 1. [source]
    • A surface with fractal dimension of 2.1 fills space very much like an ordinary surface, but one with a fractal dimension of 2.9 folds and flows to fill space rather nearly like a volume.

This simple observation has very interesting consequences. For example,  consider the following statement from. pp. 167  of the book “The Poincaré Conjecture” by Donal O’Shea:

… there are infinitely many incompatible ways of doing calculus in four-space. This contrasts with every other dimension…

This leads to a natural question:

Why is it difficult to develop calculus for any \mathbb{R}^n in general?

Actually, if we consider \mathbb{R}^n as a vector space then developing calculus is not a big deal (as done in multivariable calculus).  But, if we consider \mathbb{R}^n as a topological space then it becomes a challenging task due to the lack of required algebraic structure on the space. So, Donal O’Shea is actually pointing to the fact that doing calculus on differentiable manifolds in \mathbb{R}^4 is difficult. And this is because we are considering \mathbb{R}^4 as 4-dimensional topological space.

Now, I will end this post by pointing to the way in which definition of dimension should be seen in my older posts:

A peek into the world of tensors


When I was in high school, two cool things that I learned from physics were “vectors” and “calculus”. I was (and am still) awestruck by following statement

Uniform circular motion is an accelerated motion.

At college, during my first (and second last) physics course, I was taught “vector calculus” and I didn’t enjoy it. Last year I learned “linear algebra”, that is the study of vector spaces, matrices, linear transformations… Also, a few months ago I wrote about my understanding of Algebra. In it I briefly mentioned that “…study of symmetry of equations, geometric objects, etc. became one of the central topics of interest…” and this lead to what we call “Abstract Algebra” of which linear algebra is a part. Following video by 3blue1brown explains how our understanding of vectors from physics can be used to develop the subject of linear algebra.

But, one should ask : “Why do we care to classify physical quantities as scalars and vectors?”. The answer to this question lies in the quest of physics to find the invariants, in terms of which we can state the Laws of nature. In general, the idea of finding invariants is a useful problem solving strategy in mathematics (the language of physics). For example, consider following problem from the book “Problem-Solving Strategies” by Arthur Engel:

Suppose the positive integer n is  odd. The numbers 1,2,…, 2n are written on the blackboard. Then one can pick any two numbers a and b, erase them and write instead |a-b|. Prove that an odd number will remain at the end.

To prove this statement, one will have to use the concept of “parity” of the sum 1+2+3+…+2n as the invariant. And as stated in the video above, vectors are invariant under transformations of coordinate systems (the components change, but length and direction of arrow remain unchanged). For example, consider the rotation of 2D axis by angle θ, keeping the origin fixed.


By Guy vandegrift (Own work) [CC BY-SA 3.0], via Wikimedia Commons

\displaystyle{x' =  x \cos \theta + y \sin \theta; \quad y'= -x \sin \theta + y \cos \theta}

Now, we can rewrite this by using x_1 and x_2 instead of x and y; and putting different subscripts on the single letter a instead of functions of \theta:

\displaystyle{x_1' =  a_{11} x_1 + a_{12}x_2; \quad x_2'=  a_{21}x_1 + a_{22}x_2}

Now we differentiate this system of equations to get:

\displaystyle{dx_1'= \frac{\partial x_1'}{\partial x_1} dx_1 + \frac{\partial x_1'}{\partial x_2} dx_2 ; \quad dx_2' = \frac{\partial x_2'}{\partial x_1} dx_1  + \frac{\partial x_2'}{\partial x_2} dx_2 }

where a_{ij} = \frac{\partial x_i'}{\partial x_j} . We can rewrite this system in condensed form as:

\displaystyle{dx_{\mu}' = \sum_{\sigma} \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma}}

for \mu =1,2 and \sigma =1,2. We can further abbreviate it by omitting the summation symbol \sum_{\sigma} with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

\displaystyle{\boxed{dx_{\mu}' = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} dx_{\sigma} }}

for \mu =1,2 and \sigma =1,2. This equation represents ANY transformation of coordinates whenever the values of (x_{\sigma}) and (x_{\mu}') are in one-to-one correspondence. Moreover, it can be extended to represent transformation of coordinates of any n-dimensional vector. For example, if  \mu =1,2,3 and \sigma =1,2,3 then it represents coordinate  transformations of a 3-dimensional vector.

But, there are physical quantities which can’t be classified as scalar or vector.  For example, “stress”: the internal force experienced by a material due to the “strain” caused by external force; is described as a “tensor of rank 2”. This is so, because the stress at any point on the surface depends upon the external force vector and area vector i.e. it describes things happening due to interaction between two vectors. The Cauchy stress tensor \boldsymbol{\sigma} consists of nine components \sigma_{ij} that completely define the state of stress at a point inside a material in the deformed state (where i corresponds to the force component direction and j corresponds to the area component direction). The tensor relates a unit-length direction vector  n to the stress vector  \mathbf{T}^{(\mathbf{n})} across an imaginary surface perpendicular to n:

\displaystyle{\mathbf{T}^{(\mathbf n)}= \mathbf n \cdot\boldsymbol{\sigma}\quad \text{or} \quad T_j^{(n)}= \sigma_{ij}n_i}        where,  \boldsymbol{\sigma} = \left[{\begin{matrix} \mathbf{T}^{(\mathbf{e}_1)} \\  \mathbf{T}^{(\mathbf{e}_2)} \\  \mathbf{T}^{(\mathbf{e}_3)} \\  \end{matrix}}\right] =  \left[{\begin{matrix}  \sigma _{11} & \sigma _{12} & \sigma _{13} \\  \sigma _{21} & \sigma _{22} & \sigma _{23} \\  \sigma _{31} & \sigma _{32} & \sigma _{33} \\  \end{matrix}}\right]
where \sigma_{11}, \sigma_{22} and \sigma_{33} are normal stresses, and \sigma_{12}, \sigma_{13}, \sigma_{21}, \sigma_{23}, \sigma_{31} and \sigma_{32} are shear stresses. We can represent the stress vector acting on a plane with normal unit vector n, as:


By Sanpaz (Own work) [CC BY-SA 3.0 or GFDL], via Wikimedia Commons

Here, the tetrahedron is formed by slicing a parallelepiped along an arbitrary plane n. So, the force acting on the plane n is the reaction exerted by the other half of the parallelepiped and has an opposite sign.

In this terminology, a scalar is a tensor of rank zero and a vector is a tensor of rank one. Moreover, in an n-dimensional space:

  • a vector has n components
  • a tensor of rank two has n^2 components
  • a tensor of rank three has n^3 components
  • and so on …

Just like vectors, tensors in general are invariant under transformations of coordinate systems. We wish to exploit this further. Let’s reconsider the boxed equation stated earlier.  Since we are working with Euclidean metric i.e the length s of vector is given by s^2=x_1^2+x_2^2, we have ds^2=dx_1^2+dx_2^2 i.e. dx_1 and dx_2 are the components of ds. So, replacing dx_1 and dx_2 by A^1 and A^2  we get (motivation is to capture the idea of area vector)

\displaystyle{\boxed{A^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\sigma}} A^{\sigma}}}

where A^1, A^2, A^3, \ldots  are components of a vector in certain coordinate system (note that superscripts are just for indexing purposes and do NOT represent exponents). Any set of quantities which transforms according to this equation is defined to be a contravariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a contravariant tensor of rank two is defined by:

\displaystyle{A^{' \alpha \beta} = \frac{\partial x_{\alpha}'}{\partial x_{\gamma}} \frac{\partial x_{\beta}'}{\partial x_{\delta}} A^{\gamma \delta}}

where the sum is over the indices \gamma and \delta (since they occur twice in the term on right). We can illustrate this for 3 dimensional space, i.e. \alpha , \beta , \gamma , \delta = 1,2,3 but summation performed only on  \gamma  and  \delta; for instance, if  \alpha=1 and  \beta=2 then we have:

\displaystyle{A^{' 12} = \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{1}} A^{11} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{2}} A^{12} + \frac{\partial x_{1}'}{\partial x_{1}} \frac{\partial x_{2}'}{\partial x_{3}} A^{13}+ \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{1}} A^{21}  + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{2}} A^{22} + \frac{\partial x_{1}'}{\partial x_{2}} \frac{\partial x_{2}'}{\partial x_{3}} A^{23} }

\displaystyle{+\frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{1}} A^{31} + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{2}} A^{32}  + \frac{\partial x_{1}'}{\partial x_{3}} \frac{\partial x_{2}'}{\partial x_{3}} A^{33} }

So, we just analysed the invariance of one of the flavours of tensors. Mathematically thinking, one should expect existence of something “like algebraic inverse” of  contravariant tensor because tensor is a generalization of vector and in linear algebra we study inverse operations. Let’s consider a situation when we want to analyse density of an object at different points. For simplicity, lets’ consider a point A(x_1,x_2) on a plane surface with variable density.


A surface whose density is different in different parts

If we designate by \psi the density at A, then \frac{\partial \psi}{\partial x_1} and \frac{\partial \psi}{\partial x_2} represent, respectively the partial variation of \psi in the x_1 and x_2 directions. Although \psi is a scalar quantity, the “change in \psi” is a directed quantity with components \frac{\partial \psi}{\partial x_1} and \frac{\partial \psi}{\partial x_2}. Note that, “change in \psi” is a tensor of rank one because it depends upon the various directions. But it’s a tensor in a sense different from what we saw in case of “stress”. This “difference” will become clear once we analyse what happens to this quantity when the coordinate system is changed.

Now our motive is to express \frac{\partial \psi}{\partial x_1'} , \frac{\partial \psi}{\partial x_2'}  in terms of \frac{\partial \psi}{\partial x_1} , \frac{\partial \psi}{\partial x_2} . Note that, a change in x_1' will affect “both” x_1 and  x_2  (as seen in rotation of 2D axis in case of vector). Hence, the resulting changes in x_1  and x_2 will affect \psi

\displaystyle{\frac{\partial \psi}{\partial x_1'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_1'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_1'}; \quad  \frac{\partial \psi}{\partial x_2'} = \frac{\partial \psi}{\partial x_1} \frac{\partial x_1}{\partial x_2'} + \frac{\partial \psi}{\partial x_2} \frac{\partial x_2}{\partial x_2'}}

Here we have used the idea that if x,y, z are three variables such that y and z depend on x and the calculation of the change in z per unit change in x NOT easy, then we can calculate it using: \frac{dz}{dx}  = \frac{dz}{dy} \frac{dy}{dx}. We can rewrite this system in condensed form as:

\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} = \sum_{\sigma} \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }

for \mu =1,2 and \sigma =1,2. We can further abbreviate it by omitting the summation symbol \sum_{\sigma} with the understanding that whenever a subscript occurs twice in a single term, we do summation on that subscript.

\displaystyle{\frac{\partial \psi}{\partial x_{\mu}'} =  \frac{\partial \psi}{\partial x_{\sigma}} \frac{\partial x_{\sigma}}{\partial x_{\mu}'} }

for \mu =1,2 and \sigma =1,2. Finally replacing \frac{\partial \psi}{\partial x_{\mu}'} by A_{\mu}' and \frac{\partial \psi}{\partial x_{\sigma}} by A_{\sigma} (to make it similar to notation introduced in case of stress tensor)

\displaystyle{\boxed{A_{ \mu}' = \frac{\partial x_{\sigma}}{\partial x_{\mu}'} A_{\sigma}}}

where A_1, A_2, \ldots  are components of a vector in certain coordinate system. Any set of quantities which transforms according to this equation is defined to be a covariant vector . Moreover, we can generalize this equation to a tensor of any rank.  For example, a covariant tensor of rank two is defined by:

\displaystyle{A_{ \alpha \beta}' = \frac{\partial x_{\gamma}}{\partial x_{\alpha}'} \frac{\partial x_{\delta}}{\partial x_{\beta}'} A_{\gamma \delta}}

where the sum is over the indices \gamma and \delta (since they occur twice in the term on right).

Comparing the (boxed) equations describing contravariant and covariant vectors, we observe that the coefficients on the right are reciprocal of each other (as promised…).  Moreover, all these boxed equations represent the law of transformation for tensors of rank one (a.k.a. vectors), which can be generalized to a tensor of any rank.

Our final task is to see how these two flavours of tensors interact with each other. Let’s study the algebraic operations of addition and multiplication for both flavours of tensors, just like the way we did for vectors (note that vector product = dot product, because cross product can’t be generalized to n-dimensional vectors).

First consider the case of contravariant tensors. Let A^{\alpha} be a vector having two components A^{1} and A^{2} in a plane and B^{\alpha} be another such vector. If we define  A^{\alpha}+B^\alpha = C^\alpha and A^{\alpha} B^\beta = C^{\alpha \beta} (this allows 4 components, namely C^{11}, C^{12}, C^{21}, C^{22})  with

\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B^{' \mu} = \frac{\partial x_{\mu}'}{\partial x_{\beta}} B^{\beta}}

for \lambda, \mu, \alpha, \beta =1,2, then on their addition and multiplication (called outer multiplication) we get:

\displaystyle{C^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} C^{\alpha}; \quad C^{' \lambda \mu} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\mu}'}{\partial x_{\beta}} C^{\alpha \beta}}

for \lambda, \mu, \alpha, \beta =1,2. One can prove this by patiently multiplying each term and then rearranging them. In general, if two contravariant tensors of rank m and n respectively, are multiplied together, the result is a contravariant tensor of rank m+n.

For the case of covariant tensors, the addition and (outer) multiplication is done in same manner as above. Let A_{\alpha} be a vector having two components A_{1} and A_{2} in a plane and B_{\alpha} be another such vector. If we define  A_{\alpha}+B_\alpha = C_\alpha and A_{\alpha} B_\beta = C_{\alpha \beta} (this allows 4 components, namely C_{11}, C_{12}, C_{21}, C_{22})  with

\displaystyle{A_{ \lambda}' = \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} A_{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}

for \lambda, \mu, \alpha, \beta =1,2, then on their addition and multiplication (called outer multiplication) we get:

\displaystyle{C_{ \lambda} '= \frac{\partial x_{\alpha}}{\partial x_{\lambda}'} C_{\alpha}; \quad C_{ \lambda \mu}'= \frac{\partial x_{\alpha}} {\partial x_{\lambda}'}\frac{\partial x_{\beta}}{\partial x_{\mu}'} C_{\alpha \beta}}

for \lambda, \mu, \alpha, \beta =1,2. In general, if two covariant tensors of rank m and n respectively, are multiplied together, the result is a covariant tensor of rank m+n.

Now, as promised, it’s the time to see how both of these flavours of tensors interact with each other.  Let’s extend the notion of outer multiplication defined for each flavour of tensor, to outer product of a contravariant tensor with a covariant tensor. For example, consider vectors (a.k.a. tensors of rank 1) of each type:

\displaystyle{A^{' \lambda} = \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} A^{\alpha}; \quad B_{ \mu}' = \frac{\partial x_{\beta}}{\partial x_{\mu}'} B_{\beta}}

then their outer product leads to

\displaystyle{C^{' \lambda}_{ \mu}= \frac{\partial x_{\lambda}'}{\partial x_{\alpha}} \frac{\partial x_{\beta}}{\partial x_{\mu}'} C^{\alpha}_{\beta}}

where A^\alpha B_\beta = C^\alpha _\beta. This is neither a contravarient nor a covariant tensor, hence is rather called a mixed tensor of rank 2. More generally, if a contravariant tensor of rank m and a covariant tensor of rank  n  are multiplied together so as to form their outer product, the result is a mixed tensor of rank m+n.

In general, if two mixed tensors of rank m (having m_1 indices/superscripts of contravariance and m_2 indices/subscripts of covariance, such that m_1+m_2=m) and n (having n_1 indices/superscripts of contravariance and n_2 indices/subscripts of covariance, such that n_1+n_2=n)  respectively, are multiplied together, the result is a mixed tensor of rank m+n (having m_1+n_1 indices/superscripts of contravariance and m_2+n_2 indices/subscripts of covariance, such that m_1+n_1+m_2+n_2=m+n) .

Unlike the previous two types of tensors, we can’t illustrate it using a simple physical example. To convince yourself, consider following two mixed tensors of rank 3 and rank 2, respectively:

\displaystyle{A^{'\alpha \beta}_{\gamma} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}} A^{\lambda \mu}_{\nu}; \quad B^{'\kappa}_{\delta} = \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\kappa}'}{\partial x_{\rho}}B^{\rho}_{\sigma}}

then following the notations introduced, their outer product is of rank 5 and is given by

\displaystyle{C^{'\alpha\beta\kappa}_{\gamma\delta} = \frac{\partial x_{\nu} }{\partial x_{\gamma}'} \frac{\partial x_{\sigma}}{\partial x_{\delta}'}\frac{\partial x_{\alpha}'}{\partial x_{\lambda}}\frac{\partial x_{\beta}'}{\partial x_{\mu}}\frac{\partial x_{\kappa}'}{\partial x_{\rho}} C^{\lambda\mu\rho}_{\nu\sigma}}

Behind this notation, the processes are really complicated. Now, suppose that we are working in 3D vector space. Then, the transformation law for tensor \mathbf{A} represents a set of 27 (=3^3) equations with each equation having 27 terms on the right. And the transformation law for tensor \mathbf{B} represents a set of 9 (=3^2) equations with each equation having 9 terms on the right. Therefore, the transformation law of their outer product tensor  \mathbf{C} represents a set of 243 (=3^5) equations with each equation having 243 terms on the right.

So, unlike previous two cases of contravarient and covariant tensors, the proof of outer product of mixed tens;ors is rather complicated and out of scope for discussion in this introductory post.


[L] Lillian R. Lieber, The Einstein Theory of Relativity. Internet Archive: https://archive.org/details/einsteintheoryof032414mbp