\myheading{Modulo inverse, part I}
\label{modulo_arith_1}

\leveldown{}

Example: this piece of code divides by 17:

\begin{lstlisting}[style=customc]
#include <stdio.h>
#include <stdint.h>

uint32_t div17 (uint32_t a)
{
    return a*0xf0f0f0f1;
};

int main()
{
    printf ("%d\n", div17(1700)); // result=100
    printf ("%d\n", div17(34));   // result=2
    printf ("%d\n", div17(2091)); // result=123
};
\end{lstlisting}

How it works?

\myhrule{}

Let's imagine, we work on 4-bit CPU, it has 4-bit registers, each can hold a value in 0..15 range.

Now we want to divide by 3 using multiplication.
Let's find modulo inverse of 3 using Wolfram Mathematica:

\begin{lstlisting}
In[]:= PowerMod[3, -1, 16]
Out[]= 11
\end{lstlisting}

This is in fact solution of a $3m=16k+1$ equation ($16 = 2^4$):

\begin{lstlisting}
In[]:= FindInstance[3 m == 16 k + 1, {m, k}, Integers]
Out[]= {{m -> 11, k -> 2}}
\end{lstlisting}

The "magic number" for division by 3 is 11. Multiply by 11 instead of dividing by 3 and you'll get a result (quotient).

This works, let's divide 6 by 3. We can now do this by multiplying 6 by 11, this is 66=0x42,
but on 4-bit register, only 0x2 will be left in register ($0x42 \equiv 2 \mod 2^4$).
Yes, 2 is correct answer, 6/3=2.

Let's divide 3, 6 and 9 by 3, by multiplying by 11 (m).

\iffalse
% FIXME: texify
\begin{lstlisting}[basicstyle=\ttfamily\tiny]
           |123456789abcdef0|123456789abcdef0|123456789abcdef0|123456789abcdef0|123456789abcdef0|123456789abcdef0|123456789abcdef0|
    m=11   |***********     |                |                |                |                |                |                |
3/3 3m=33  |****************|****************|*               |                |                |                |                |
6/3 6m=66  |****************|****************|****************|****************|**              |                |                |
9/3 9m=99  |****************|****************|****************|****************|****************|****************|***             |
\end{lstlisting}
\fi

\begin{lstlisting}[basicstyle=\ttfamily\tiny]
           |123456789abcdef0|12 ... f0|123456789abcdef0|12 ... f0|123456789abcdef0|12 ... f0|123456789abcdef0|
    m=11   |***********     |   ...   |                |   ...   |                |   ...   |                |
3/3 3m=33  |****************|** ... **|*               |   ...   |                |   ...   |                |
6/3 6m=66  |****************|** ... **|****************|** ... **|**              |   ...   |                |
9/3 9m=99  |****************|** ... **|****************|** ... **|****************|** ... **|***             |
\end{lstlisting}

A ``protruding'' asterisk(s) (``*'') in the last non-empty chunk is what will be left in 4-bit register.
This is 1 in case of 33, 2 if 66, 3 if 99.

In fact, this "protrusion" is defined by 1 in the equation we've solved.
Let's replace 1 with 2:

\begin{lstlisting}
In[]:= FindInstance[3 m == 16 k + 2, {m, k}, Integers]
Out[]= {{m -> 6, k -> 1}}
\end{lstlisting}

Now the new "magic number" is 6.
Let's divide 3 by 3. 3*6=18=0x12, 2 will be left in 4-bit register. This is incorrect, we have 2 instead of 1. 2 asterisks are "protruding".
Let's divide 6 by 3. 6*6=36=0x24, 4 will be left in the register. This is also incorrect, we now have 4 "protruding" asterisks instead of correct 2.

Replace 1 in the equation by 0, and nothing will "protrude".

\myheading{No remainder?}

Now the problem: this only works for dividends in 3x form, i.e., which can be divided by 3 with no remainder.
Try to divide 4 by 3, 4*11=44=0x2c, 12 will be left in register, this is incorrect.
The correct quotient is 1.

We can also notice that the 4-bit register is "overflown" during multiplication twice as much as in "incorrect" result in low 4 bits.

Here is what we can do: use only high 4 bits and drop low 4 bits.
4*11=0x2c and 2 is high 4 bits.
Divide 2 by 2, this is 1.

Let's "divide" 8 by 3. 8*11=88=0x58. 5/2=2, this is correct answer again.

Now this is the formula we can use on our 4-bit CPU to divide numbers by 3: "x*3 >> 4 / 2" or "x*3 >> 5".
This is the same as almost all modern compilers do instead of integer division, but they do this for 32-bit and 64-bit registers.

\levelup{}