\myheading{Autocomplete using Markov chains} \label{markov} TL;DR: collect statistics, for a given natural language, what words come most often after a word/pair of words/triplet of words. \myhrule{} What are most chess moves played after 1.e4 e5 2.Nf3 Nf6? A big database of chess games can be queried, showing statistics: \begin{figure}[H] \centering \frame{\includegraphics[scale=0.7]{prob/markov/chess.png}} \end{figure} ( from \url{https://chess-db.com/} ) Statistics shown is just number of games, where a corresponding 3rd move was played. The same database can be made for natural language words. \leveldown{} \myheading{Dissociated press} This is a well-known joke: \url{https://en.wikipedia.org/wiki/Dissociated_press}, \url{https://en.wikipedia.org/wiki/SCIgen}, \url{https://en.wikipedia.org/wiki/Mark_V._Shaney}. I wrote a Python script, took Sherlock Holmes stories from Gutenberg library... \lstinputlisting[style=custompy]{prob/markov/dissociated.py} And here are some 1st-order Markov chains. First part is a first word. Second is a list of words + probabilities of appearance of each one, in the Sherlock Holmes stories. However, probabilities are in form of words' numbers. In other word, how often each word appears after a word? \lstinputlisting{prob/markov/tbl1.txt} Now some snippets from 2nd-order Markov chains. \lstinputlisting{prob/markov/tbl2.txt} Now two words is a key in dictionary. And we see here an answer for the question "how often each words appears after a sequence of two words?" Now let's generate some rubbish: \lstinputlisting{prob/markov/diss.txt} By first look, these pieces of text are visually OK, but it is senseless. Some people (including me) find it amusing. Spammers also use this technique to make email message visually similar to a meaningful text, albeit it is not meaningful at all, rather absurdic and funny. \myheading{Autocomplete} It's surprising how easy this can be turned into something rather practically useful. \lstinputlisting[style=custompy]{prob/markov/autocomplete.py} First, let's also make 3rd-order Markov chains tables. There are some snippets from it: \lstinputlisting{prob/markov/tbl3.txt} You see, they looks as more precise, but tables are just smaller. You can't use them to generate rubbish. 1st-order tables big, but "less precise". And here I test some 3-words queries, like as if they inputted by user: \lstinputlisting{prob/markov/res.txt} Perhaps, results from all 3 tables can be combined, with the data from 3rd order table used in highest priority (or weight). And this is it --- this can be shown to user. Aside of Conan Doyle works, your software can collect user's input to adapt itself for user's lexicon, slang, memes, etc. Of course, user's "tables" should be used with highest priority. I have no idea, what Apple/Android devices using for hints generation, when user input text, but this is what I would use as a first idea. As a bonus, this can be used for language learners, to get the idea, how a word is used in a sentence. \myheading{Further work} Comma can be a separator as well, etc... \myheading{The files} ... including Conan Doyle's stories (2.5M). \MakeURL{\RepoPath/prob/markov}. Surely, any other texts can be used, in any language... Another related post is about typos: \url{https://yurichev.com/blog/fuzzy_string/}. \myheading{Read more} \href{https://www.kaggle.com/nulldata/meaningful-random-headlines-by-markov-chain}{Meaningful Random Headlines by Markov Chain} \href{https://news.ycombinator.com/item?id=19204186}{A discussion at hacker news} This wonderful chart based on lyrics of popular songs reflects the idea perfectly: \MakeURL{\RepoPath/prob/markov/khi7nxbepeb41.jpg}. Found on Reddit: \url{https://www.reddit.com/r/dataisbeautiful/comments/eq6s6j/oc_i_made_this_chart_of_some_songs/}, \url{https://archive.is/jgR6I}. \myheading{Another thought-provoking example} Found in \textit{Wolfram Mathematica Tutorial Collection: Graph Drawing}\footnote{\url{https://library.wolfram.com/infocenter/Books/8508/}, \url{https://library.wolfram.com/infocenter/Books/8508/GraphDrawing.pdf}.}: As they say, ``this generates a graph by linking words in a text with subsequent words'': \begin{lstlisting} TextPlot[w_String] := GraphPlot[(x = Map[ToLowerCase, StringCases[w, WordCharacter ..]]; g = Thread[Drop[x, -1] -> Drop[x, 1]]; g), DirectedEdges -> True, VertexLabeling -> True]; TextPlot["to be or not to be, that is the question"] \end{lstlisting} \begin{figure}[H] \centering \includesvg[width=1\textwidth]{prob/markov/textplot1} \end{figure} Nice, so I wanted to experiment further, taking some popular songs: \begin{lstlisting} TextPlot["Hey Jude, don't make it bad. Take a sad song and make it better. Remember to let her into your heart, Then you can start to make it better. Hey Jude, don't be afraid. You were made to go out and get her. The minute you let her under your skin, Then you begin to make it better."] \end{lstlisting} \begin{figure}[H] \centering \includesvg[width=1\textwidth]{prob/markov/textplot2} \end{figure} \begin{lstlisting} TextPlot["When I find myself in times of trouble, Mother Mary comes \ to me Speaking words of wisdom, let it be And in my hour of darkness she is standing right in front of me Speaking words of wisdom, let it be Let it be, let it be, let it be, let it be Whisper words of wisdom, let it be And when the broken hearted people living in the world agree There will be an answer, let it be"] \end{lstlisting} \begin{figure}[H] \centering \includesvg[width=1\textwidth]{prob/markov/textplot3} \end{figure} \begin{lstlisting} TextPlot["So close, no matter how far Couldn't be much more from the heart Forever trusting who we are And nothing else matters Never opened myself this way Life is ours, we live it our way All these words, I don't just say And nothing else matters Trust I seek and I find in you Every day for us something new Open mind for a different view And nothing else matters Never cared for what they do Never cared for what they know But I know"] \end{lstlisting} \begin{figure}[H] \centering \includesvg[width=1\textwidth]{prob/markov/textplot4} \end{figure} \levelup{}