Modular Arithmetic
Introduction
Modular arithmetic is not ordinarily taught in high school and I have been surprised at how many otherwise well educated people I have met who have never even heard of it. This is unfortunate. The basic concept is very simple. It helps clarify some things that you already know and it is just plain fun to work with, even if you never have any practical application for it.
I will introduce modular arithmetic by presenting two problems involving specific cases of modular arithmetic with which everyone is familiar:
236
* 498 + 164 = 117,69?
Find the missing last digit by doing a simple calculation in your head.
Without doing any calculations or algebra explain why it is not possible to find two consecutive whole numbers that add to 273,826.
In the first problem it is only necessary to work with the last digits of each of the numbers. Working with the last digits of 236 and 498 we get 6 * 8 = 48. Using the 8 from 48 and the 4 from 164 yields 8 + 4 = 12 and so the last digit of the answer must be 2. 236 * 498 + 164 = 117,692.
For the second problem observe that any pair of consecutive numbers consists of an even number and an odd number. Since even + odd = odd, the sum of two consecutive numbers is always odd and therefore can never equal 273,826, which is even.
What These Examples Have in Common
It may not be clear how these two examples relate if you are not already familiar with modular arithmetic and are not a genius of the caliber of the great early nineteenth century German mathematician Carl Friedrich Gauss, who developed the theory of modular arithmetic among many other accomplishments.
Note first of all that the last digit of a decimal number is the same as the remainder obtained when dividing the number by 10. For example, the remainder of 236 when divided by 10 is 6.
Similarly, the evenness or oddness of a number is found by looking at the remainder when dividing by 2. Even numbers are represented by 0, since they have no remainder when divided by 2 and odd numbers are represented by 1, since they have a remainder of 1 when divided by 2. Instead of memorizing the rules of evenness and oddness as we have all done, we could allow 0 to represent all even numbers and 1 to represent all odd numbers. To show that even + odd = odd, we would write 0 + 1 = 1. For odd + odd = even we get 1 + 1 = 2, which we could write as 1 + 1 = 0, since we are using 0 to represent all even numbers and 2 is even.
More properly, the last equation is written as:
1 + 1 º 0 (mod 2).
The symbol "º " is read as "is congruent to" and the "mod 2" tells us that we are working with the remainders of numbers when dividing by 2. X º Y (mod B) means that X and Y have the same remainder when divided by B
The rules for evenness and oddness in multiplication in the above notation are:
1 * 1 º 1 (mod 2) and 1 * 0 º 0 (mod 2).
In modular form the answer to the first problem would be expressed as:
236 * 498 + 164 º 6 * 8 + 4 (mod 10) º 2 (mod 10).
Definition
– a º
b
(mod m) if (a – b) is divisible by m. The value m is referred
to as the modulus.
This definition is equivalent to talking
about having equal remainders when divided by m . If a and b have the
same remainder when divided by m then a=im + r and b = jm + r for
some numbers i, j and r.
Then (a – b) = (im + r) –
(jm + r) = (i – j)m, and so (a – b) is divisible by
m.
In the congruence notation all numbers congruent to one another mod m are equivalent, but it is convenient to give special attention to the numbers that represent the remainder when divided by m. I will therefore sometimes refer to a number mod m. For example, 127 mod 10 = 7. In the most commonly used compuer languages there is a mod operator that works in this way.
The Basic Proofs
You will notice that modular arithmetic looks a lot like regular arithmetic. However, looks can be deceiving, so we should not make any assumptions of similar behavior without first carrying out a proof.
If x º y
(mod m) then (a + x) º (a
+ y) (mod m)
Proof: Given
that (x – y) = km for some
k, we want to show that (a + x) – (a + y) is divisible by m.
(a
+ x) – (a + y) = x - y = km and so (a +
x) º (a
+ y) (mod m).
If x º y
(mod m) then ax º
ay.
Proof:
We are given that x -y = km. Then ax – ay = a(x -y) = akm,
and
so is divisible by m.
We can work with negative numbers in modular arithmetic. For example -1 º 2 (mod 3), since -1 – 2 = -3, which is divisible by 3. Another way of showing this is to say that -1 º -1 + 0 (mod 3) º -1 + 3 (mod 3) º 2 (mod 3).
We can therefore solve any equation a + x º b (mod m) for x by adding -a to both sides.
Can we always solve for x in an equation of the form ax º b (mod m)? Of course if a º 0 (mod m) there will not be a solution, but what about the other cases? Consider the equation 2x º 3 (mod 4). If this were to have a solution then 3 - 2x = 4k, or 3 = 2x + 4k = 2(x + 2k). Since 2(x + 2k) is always even and 3 is odd, there is no solution. The problem in this case came from the fact that 2 had a divisor in common with 4. We will look at cases later where this limitation can be avoided.
Casting Out Nines
In the days before calculators this
method
was sometimes taught to students for checking their arithmetic. It is
based on the properties of decimal numbers mod 9. A student could
check that x × y
= z
by making sure that
x (mod 9) ×
y (mod 9) º
z (mod 9). Before
reading on, see if you determine an easy way of computing a number
mod 9. Hint: Think of a number like 257 as 2 ×102
+
5
×10
+ 7 and apply the rules of modular arithmetic.
The key to computing
numbers
mod 9 is to realize that 10 º 1
(mod
9).
Therefore 257 = 2
×102
+
5
×10 +
7 º
2 ×1 ×1 + 5 ×1
+ 7 (mod 9) º
14 (mod 9). Applying
the same method to 14, we get 14 (mod 9)º 1
+ 4 (mod 9) º 5
(mod 9).
To apply the casting out nines a student would add each of the digits of a number involved in a sum or product. As the student was adding, if the sum contained two digits, this value would be replaced by the sum of the two digits.
Exercise: Show that every number is congruent mod 3 to the sum of its digits.
Computer Arithmetic
One very practical use of modular arithmetic is the representation of numbers in computers.
Numbers on computers are represented as binary numbers. It should be clear that the addition of two positive numbers is easy to implement in the hardware. To compute each digit of a sum it is only necessary to look at the the two corresponding digits in the numbers being added along with any carry from the additon of the two previous digits.
How can subtraction be carried out and how should negative numbers be represented? Trying to incorporate the method used in subtracting on paper gets very messy. If, for example, you have 100100 – 11101, there are going to be a lot of cases of trying to subtract a 1 from a 0, which requires “borrowing” from possibly several adjacent columns. There are more problems. If you try to subtract a positive number from a smaller positive number it would be necessary to recognize this situation and subtract the smaller from the larger and then negate the answer.
All
of these difficulties are avoided by using modular arithmetic to
represent negative numbers. To see how this is done, imagine a car
odometer that goes up to one million. We can think of the odometer as
representing numbers mod one million, because if the car ever got to
one million miles the odometer would roll over and start again from
0. If you took a car with its odometer set to 0 and drove backwards
you would get 999,999, showing that
-1 º
999,999 (mod 1,000,000).
Computers typically use 32 or 64 binary bits to represent a number. To keep things simple we will just use four digits. The numbers from 0000 to 1111 correspond to the decimal numbers from 0 to 15. The four columns represent the digits for 1, 2, 4 and 8. If we added a fifth column it would represent 16. By restricting ourselves to four columns we are in effect doing arithmetic mod 16 in the same way that the car odometer can be thought of as doing arithmetic mod one million.
The binary numbers 0000 to 0111 will correspond to the numbers 0 to 7. By representing the numbers -1 to -8 mod 16, we get binary 1111 representing -1, binary 1110 standing for -2, up to binary 1000 standing for -8. We can therefore represent all the numbers from -8 to 7. Note that it is easy to tell if a number is negative because the most significant binary digit, sometimes called the sign bit, will only be equal to 1 for negative numbers. So binary 1100 = 12 º -4 (mod 16).
Using this notation, we can perform addition with negative numbers in the same way as for positive numbers. To add 4 and -2, we would add the binary numbers 0100 and 1110 to get 0100 + 1110 = 1 0010, but since we only have four bits this is 0010, or 2 decimal.
To subtract a number, we negate it and add. The negative of a number x is formed by finding 16 – x. This is easily done by realizing that 16 = binary 1 0000 = binary 1 + 1111, so that to find -2, we compute binary 1111 – 0010 + 1. Subtracting from binary 1111 is easy. The effect is to change the 1's to 0's and the 0's to 1's so that 0010 becomes 1101.
There will be a problem for addition if the numbers are too large. We only allow for numbers up to 7. Adding 6 and 5 gives 11, which would be too large. In binary terms we would get 0110 + 0101 = 1011. The computer can easily detect the error by noting that the sum of two numbers with a sign bit of 0 resulted in a number with a sign bit of 1. Similarly, there would be an error if two numbers with sign bit of 1 resulted in a sum with sign bit 0.
By the rules of modular arithmetic, the modular notation for negative binary numbers works for multiplication and division as well.
Prime Modulus
It was shown above that the equation ax º b (mod m) will not have a solution if a and m are divisible by a common number and b is not divisible by this number. This problem will not arise if m is a prime number p, because p is only divisible by 1 and itself.
ax
º
b
(mod p) has exactly one solution for prime p and non-zero a
Proof:
Suppose, for example, we want to
solve 2x º
1 (mod 7). Consider the
numbers 2n (mod 7) where n goes from 1
to 6. If we can show that the numbers 2n are all different from one
another then the six values of 2n must correspond to the numbers from
1 to 6, though not necessarily in that order, and therefore we must
have 2n º
1 (mod 7) for one of them.
Suppose 2n1 º 2n2 (mod 7). Then 2n1 – 2n2 º 0 (mod 7). 2(n1 – n2 ) º 0 (mod 7). That is, 2(n1 – n2 ) is divisible by 7. This can happen only if (n1 – n2 ) is divisible by 7, that is, if n1 º n2 (mod 7). Therefore the six values 2n for n from 1 to 6 must map to distinct values and so 2n º 1 (mod 7) for exactly one of them.
The same argument can be given for the solution of any equation ax º 1 (mod p) . It is tempting to say that x º 1/a (mod p) , but this notation is not used. Instead we will say that x º a-1. We can use a-1 to solve any equation ax º b (mod p), provided a is not 0, by multiplying both sides by a-1 .
For
a prime p and non-zero a, a(p
-1) º 1 (mod p)
Proof:
Again consider the six non-zero
congruence classes mod 7. Since
the six values 2n map to the six values of n, it follows that k = 1 ×
2 × 3
×
4 × 5
×
6 º
(2 ×
1)(2 ×
2)(2 ×
3)(2 × 4)(2
×
5)(2 × 6)
(mod 7).
k º 26k
(mod 7). Multiplying both sides by k-1
gives 26 º
1
(mod7). The exact same reasoning applies for any prime p
and any non-zero value of a, leading to a(p
-1) º
1
(mod p).
For
a prime p and non-zero a, the smallest value of x such that
ax
º
1 (mod p) must be
such that x divides (p – 1).
Proof: Consider
5x
º 1
(mod 13). From the previous proof we know that 512
º
1
(mod 13). We want to find the smallest value of x such
that 5x º
1. Suppose that 55
º 1
. 12 is not divisible by 5. It has a remainder of 2. We would then
have 55 55
52
º
512
º 1,
but since that 55 º 1
we would have 52 º 1,
contradicting the assumption that 5 was the smallest value of x such
that 5x º
1 (mod 13). In this case not
only is 5 not the smallest value
of x such that 5x º
1 (mod 13), but 5x
is not congruent to 1 (mod
13).
However 58 º
1 (mod 13) is a true
statement. When 12 is divided by 8 it
leaves a remainder of 4. Proceeding as previously, we conclude that
54 º
1 (mod 13) . 4 divides 12 and
it turns out that 4 is the
smallest value of x such that 5x
º
1 (mod 13).
Perfect Card Shuffle Problem
Enough of the theory for a while. Let's apply some of it.
To shuffle a deck of 52 cards perfectly is to split the deck into two 26 card piles and to alternately lay a card from the two piles on top of one another. How many times would this have to be done before the deck returned to its original sequence?
The first thing to notice is that there are two ways of doing a perfect shuffle, depending on whether the first card laid down comes from the top or bottom half of the deck. Let us first consider the case where the first card comes from the top half.
Number the cards from 1 to 52 starting from the bottom card of the deck. The first card from the bottom half will go to position 2, the second to position 4, the third to position 6 and so on up to card 26 going to position 52. For the bottom half of the deck we have the new position given in terms of the previous one given by function f(x) = 2x . For the top half of the deck, card 27 goes to position 1, 28 goes to position 3 and so on. The function g(x) in this case satisfies g(x+1) – g(x) = 2. It should be clear, or at least reasonable to suppose, that g(x) is a linear equation of the form 2x + a for some constant a. Substitution of values yields a = 53, so for cards from the bottom half of the deck satisfy the function g(x) = 2x – 53.
It would appear at first that we are working with two different functions for the top and bottom halves of the deck. However, for x between 26 and 52, 2x – 53 = 2x mod 53. For x from 1 to 26, 2x = 2x mod 53. We can therefore replace the two functions with a single function f(x) = 2x (mod 53). The position of card x after n shuffles is 2nx(mod 53). The deck will return its original state when 2n º 1. Since 53 is a prime, we know something about the possible values of n required to have 2n º 1 , namely that the smallest value must divide 53 -1 = 52. Listing all the divisors of 52, the possible values of n are 2, 4, 13, 26 and 52. Testing these values, it turns out that it requires 52 perfect shuffles to return the deck to its original state.
Starting from the bottom half
If we start the shuffle with the pile from the bottom half of the deck, card number 1 will always be in the same spot, so we can disregard it. Since we are ignoring card 1, let us renumber the cards from 1 to 51, starting with the second card. We now have the first card coming from the top half of the deck (on top of the ignored first card) and the analysis is similar to the previous one. The 25 cards from the bottom half again go from x to 2x. The 26 cards from the top half, 26 to 51, this time go from x to 2x – 51 and so we can use the function f(x) = 2x (mod 51). Unfortunately, 51 = 3 ×17 is not a prime and so we cannot apply the same analysis as before to limit the possible values of n that satisfy 2n º 1 (mod 51). We are going to need to develop more theory to handle this.
Reduced Residue Systems
Definition – all numbers congruent to one another mod m are said to belong to the same residue class.
One way of being able to get around the problem of having a and m with a common divisor in order to be able to solve the equation ax º b (mod m) was to choose m to be a prime. If m is not a prime another strategy is to work only with values of a that do not have a common divisor with m. For m = 15 we would work with 1, 2, 4, 7, 8, 11, 13 and 14. This is referred to as a reduced residue system. Note that if a is relatively prime to m then the entire residue class a+mx is relatively prime to m.
Definition
– two numbers are
said to be relatively prime if they do not have a common factor other
than 1.
We
can then define a reduced residue system for modulus m as being the
set of numbers relatively prime to m.
If a and b do not have a common factor with m then a × b is also relatively prime to m.
Working with a prime modulus is a special case of a reduced residue system. The proofs we did for prime modulus can all be generalized to reduced residue systems.
ax º
b (mod m) has exactly one
solution for a and b belonging to
the reduced residue system of m
Proof:
If we are
trying to solve 4x º1
(mod 15), then
similar to what we did previously, multiply all the members of the
reduced residue system by 4. The products will all be reduced
residues. If we can show that they are all different then exactly one
of them equals 1.
Suppose 4n1 º 4n2 (mod 15). (4n1 - 4n2) º 0 (mod 15). 4(n1 - n2) º 0 (mod 15). 4(n1 - n2) must be a multiple of 15 and the only way that can happen is if (n1 - n2) is a multiple of 15, that is n1 º n2 (mod 15).
Again this can be generalized for any a and b belonging to a reduced residue system m.
Definition
– The totient
function φ(m)
is the number
of residue classes relatively prime to m
For a prime number p,
φ(p) = p – 1. φ(15) = 8.
For
a relativley prime to m
aφ(m)
º
1 (mod
m)
Proof:
This is completely analogous to the proof
done for prime modulus. Again we know that an maps the reduced
residue system to itself. See if you can complete the proof using the
prime modulus proof as a guide.
For
a in the reduced residue system of m, the smallest value x such that
ax º 1
(mod m) is a divisor of
φ(m)
This proof is completely analogous to the one for
prime modulus and is left as an exercise.
Completion of Perfect Card Shuffle Problem
We are now in a position to tackle rest of the card shuffling problem. We are looking to solve ax º 1 (mod 51). We know that the smallest value of x that satisfies this equation must divide φ(51). Let's compute φ(51). The number of values with a factor of 3 is 51/3 = 17. That leaves 51 – 17 = 34 values. Of these, two values, 17 and 34, are divisible by 17. Removing them from our 34 values gives us φ(51) = 34 – 2 = 32. The value of x must divide 32. The possible values are 2, 4, 8, 16 and 32. 22 = 4 ,24 = 16 , 28 = 256 º 1 (mod 51). If we start shuffling with the bottom half of the deck it will take 8 perfect shuffles to return us to the starting position.
As
a generalization of the computation of φ(51)
we have:
φ (pq)=(p-1)(q-1),
where p
and q are primes
Proof: The
numbers up to pq divisible
by p are p, 2p, 3p,...,(q-1)p, qp. The numbers up to pq divisible
by q are q,2q,3q,...,(p-1)q, pq . We have p numbers divisible by p
and q numbers divisible by q, with pq counted twice. Therefore φ
(pq) =
pq – (p + q -1) = pq – p – q +
1 = (p-1)(q-1).
Finding an inverse
a-1 is the solution of ax º 1. We showed above that for a relatively prime to m, ax º 1 (mod φ(m)) always has a solution, but we did not provide a method for solving the equation other than trying out all possible values. Public key encryption, which I will explain in the next section, requires finding an inverse when φ(m) is very large. The idea behind public key encryption is that for m having hundreds of digits computing a-1 (mod m) can be done in a reasonable amount of time if φ(m) is known and not possible in a reasonable amount of time if φ(m) is not known.
I am going to outline the proof of the method used to find a-1 and provide an example of how it is used to compute an inverse.
The solution of ax º 1 (mod m) requires finding numbers x and y such that ax – 1 = ym or ax + ym =1, where a and m are relatively prime. If a and m are not relatively prime then if d is a commond divisor of a and m then ax + my will always be divisible by d. A method called Euclid's algorithm can be used to find x and y such that ax + ym = g, where g is the greatest common divisor, of a and m, gcd(a,m). In the case where a and m are relatively prime this will give us ax + ym =1, since in that case the gcd of a and m is 1.
Let us first
consider Euclid's algorithm as a way of finding the gcd of any two
numbers a and m. The method is based on a simple observation. If g
divides a and m then g divides a and (m – a). This is obvious
since if a=gs and m = gt then
(m – a) = gs – gt = g(s
– t). Similarly, any number that divides a and (m –
a)
will divide a + (m – a) = m, so the gcd of a and m
– a is
the same as gcd of a and m.
We can therefore transform the problem of finding gcd(a,m) to the simpler problem of finding gcd(a, m-a). We now look at a and m - a and subtract the smaller from the larger to give us a either a and (m – 2a )or [a – (m - a) ] and m-a. Notice that in either case the two numbers chosen can be expressed in terms of a and m. We continue in this way until we get two numbers that are equal to one another, which will then be the gcd.
The process can be speeded up if instead of subtracting the smaller number from the larger one we divide the smaller into the larger and work with the remainder. This is equivalent to peforming multiple subtractions from the larger number until a number is reached which is smaller than it.
Since at each stage we can express the two numbers being worked with in terms of a and m, we can use this method to solve ax + ym = gcd(a,m) and in the case when a and m are relatively prime this will give us ax + ym = 1, or x = a-1 (mod m).
An example should help to clarify things.
Let us find x
such that 23x º
1 (mod 76).
Start
with 23 and 76.
Replace 76 with
76 – 3 ×
23 =
7
This leaves:
7 = (76 - 3 ×
23 ) and 23
Replace 23 with
2 = 23 – 3 × 7
=
23 – 3 ×
(76 - 3 ×
23 ) = –
3 ×76
+ 10 ×
23
We have:
7 =(76 - 3 ×
23 ) and 2 = (– 3 ×76
+ 10 ×
23)
Finally, replace
7 with 1 = 7 - 3 ×
2
=
(76 - 3 ×
23)
– 3 ×
(–
3 ×76
+ 10 ×
23) =
10 ×
76 –
33 ×
23
This
leaves us with 2 and 1 and since 2 is divisible by 1, we are
finished.
So we have 1 =
10 ×
76 –
33 ×
23
–
33 ×
23 – 1 =
10 ×
76,
-33 ×
23 º
1 (mod 76)
23-1 º -33 (mod76) º -33 + 76 º 43 (mod 76).
Realize that it is simple work for a computer to keep track of the multipliers of a and m as the calculation procedes.
Public Key Encryption
What makes public key encryption public is that the means for encoding a message is made public, but it is not possible for anyone but the creater of the code to decrypt the message in a reasonable amount of time.
Here
are the steps in the encryption
process:
1. Find two very large (hundreds of digits) prime numbers
p and q
2. Use modulus m = pq.
3. Compute φ(m)
= (p-1)(q-1)
4. Choose some public encrypting key = e. e and m are
publicly accessible.
5. Divide a message into several messages M <
m and not divisible by p or q; the encrypted messages will be given
by M' = Me
(mod
m)
To decrypt the messages, use Euclid's algorithm to find d
such that de º
1(mod φ(m))
and
compute the original message as M
= M'd
(mod φ(m))
To
see why this works, we know that
de =
kφ(m)+1 for some k and so:
(M')d
= Mde
= Mkφ(m)+1.
Mkφ(m)+1
(mod
m) = Mkφ(m)
×
M (mod
m).
Since
Mφ(m)
º
1
(mod m), it follows that Mkφ(m)
º
1
(mod m) and therefore Mkφ(m)
×
M º
M (mod
m).
m and e are publicly known. To easily decrypt requires knowing φ(m) = (p-1)(q-1) and this requires being able to factor m as pq. This is a process that takes many years of computation on even the fastest computers and is the reason why publicly encrypted messages are safe.
There are some questions that you may have.
Firstly, how does one find large prime numbers? The basic method is to randomly choose numbers and to test them for primality. Recall that a(p-1) º 1(mod p) if p is prime. This provides a test for any candidate number p. There is a small chance that the equality will be satisfied by a non-prime number and this can be taken care of by choosing several numbers a so that the chances of the equation being satisfied by all of them is vanishingly small.
Another question that you may have is how do we compute powers of M mod m where the exponents may be fairly large. Suppose that you wish to compute M8 (mod m). 8 = 23. We would find M2 (mod m). Then we would square M2 (mod m) to get M4 (mod m), and finally we would square to M4 get M8 (mod m). Since numbers are represented on the computer in binary format it is not difficult to get the exponent e in this format. Each bit in the binary format of e corresponds to a power of 2, so if e = 23 + 26 + 210, we would find M to the eighth (23) power by going through the squaring process 3 times as above then mulitiply by M raised to the 26 power, found by going through the squaring process six times, and then mulitiply by M raised to the 210 power, which would be M gone through the squaring process 10 times. In this way a number can be raised to a power of a number with hundreds of digits by performing only several hundreds of multiplicaitons. Each time the multiplication is done mod m, so we never have to work with a number greater than m.
Finally, how do we know that M is not divisible by p or q. I do not know how this is handled in practice, but this is not a major problem. One method would be to break M into pieces that are smaller than p and q. Even if this is not done, the chances of a message M being divisible by p is 1/p and of being divisible by 1 is 1/q. Since p and q are very large numbers that means the chances of being divisible is for all practical purposes 0.