There's this assignment I did in high school. We simulated/emulated a computer processor chip, a CPU, by writing C code to do exactly what every part of it did on the level of individual 1s and 0s. The processor wasn't real. It was the "MIC-1," a fictional chip that has never been produced. But it was real in the sense that the design worked the same way. It could have been produced. It was just too simple to be worth producing commercially. An exercise. But that exercise may have taught me more than any other assignment in high school.
It taught me how binary numbers do work on a chip, how they move things around. How they combine and juggle information. How the operations and the data worked on are both strings of 1s and 0s, and how that makes sense. Structure and movement are one. Data and calculation are one. Both actions and things acted upon and moved are just ons and offs. Switches. It was like directing the flow of water around a maze... with water... down forking river-railroads, like a splitting train. A branching ferry of water-electricity 1s and 0s. The shape of the maze at each moment was what it did in that moment. Its final answer now could be its shape later.
There were two other lessons. First (or second of three, really), I completed the assignment, but I never got full credit for it. Right as I was getting ready to turn it in, very happy and proud that it was working, I made a mistake on the command line and overwrote my source code. I put ">" instead of "<". If you've used Unix much you know what I mean. There was no recycle bin to save me there. I lost it and had to go back to a very old version, and scramble to make it semi-acceptable. I don't believe I've ever repeated that mistake. And I've lost almost no data in my life unintentionally. And I tie this to chance. The only big example that comes to mind was when my hard drive was destroyed by an EMP from a freak electrical surge from a distant lightning strike which caused an electrical fire in and under the building. My laptop was unplugged, but the pulse through the air killed my hard drive. And I only lost a small fraction of files that were in a folder that wasn't backed up. I have very often not lost my data, and I've always reminded myself that this is mainly because I'm lucky, not because I'm smart or doing everything right. There is a key truth. Knowing what luck is by now has been a big reason my data loss hasn't been much worse, but also luck is a big factor itself. This lesson here with my code helped, and I think anxiety helps. It's similar with driving. I'm still alive because of luck: or, rather, largely unexplainable probability so far. But I'm much worse at driving than at saving data, so that's a reason I want to segue to walking and public transit again, sometime soon. (I know, I seem to be rambling, but these things are truly connected and I could elaborate the connections more but will leave them to your imagination.)
The third lesson was the best. Right at the end, when I was finalizing the code that I was about to destroy, something happened. All of the code came together in one line. One line of code tied the entire virtual processor together, all the little pieces, the shifting channels of information, the storage and retrieval from registers and from the slower memory of RAM, the arithmetic operations, the comparisons between two things. Everything occurred, finally, in one line. That was the moment. It was so dumbfoundingly simple that I felt this had to be the main insight of the entire assignment. While I wasn't sure, I had that moment anyway: almost a eureka moment. And it worked. The chip functioned, I fixed a couple little issues, and a few minutes later I'd deleted it all by mistake.
Wasn't this realization, this "simplification" or "big picture" just coming from writing C code, code written to be easier for humans to understand than 1s and 0s and transistors and registers? Was it just how I had written the thing, and someone else could have written it differently? Was it a real insight, or just a cute line of code, a long string of the function of the function of the function of the function of the function of... It was pretty long, but the entire line was just function composition. Maybe I'd just written it that way, or I'd been encouraged to write it that way by the outline of the assignment. Maybe it didn't really mean anything.
And the funny thing is, it's taken 20 years for me to realize, for certain, that yes, this was absolutely a core insight about all computers. And not just all computers, but all physical processes in general. It can all be seen as function composition. I hesitate to go into detail on this, because I don't even know where to begin and I barely know what I'm saying myself, but if you do want to know, go and read about Alonzo Church's lambda calculus. It's... exactly what I saw a few moments (maybe an hour, actually; I think I'm embellishing and shortening that part, but hey I'm telling you) before I deleted that code. And it wasn't a coincidence or how I happened to have organized my code. No, it was the nature of all computers in the world, and all the computers and physical processes that are theoretically possible.
That's a pretty good assignment, huh?
And so I should have known. There have been many opportunities to make the connection, and I did. Somewhat. Partially. I thought I got it. For example, there's Lisp and the whole "functional programming" movement. But it was only the other day that the whole ton of bricks fell on me. Yes, no, yes, that was in no way a coincidence. The power of the parenthesis. It was even bigger than I imagined.
The class was called Computer Architecture, and it was taught by a middle-aged, overweight woman who was unassuming but sharp as a razor. She made a passing reference to all those function compositions afterwards, so I had one lead that it actually was important. But she left it to our imaginations and our curiosities to find out.
The best lesson is finding out, but I'll try to put it into normal words.
It goes back to algebra and functions. You send a number in, you get a number out. Right?
So f(x) = x*x becomes 36 when x = 6. If there's always just one number coming out (36), not indecision about two or three or four numbers that could be the answer, if it's always just one number coming out, then that's a function. That's basic determinism. And it's basic calculation.
And when you learn this, you can represent all the math you did before that moment in algebra, all the pluses and minuses and timeses and so on, with functions that add or subtract or multiply or whatever. It's hopefully an insight most people get from algebra, that a function is a "machine" that could do any of the math they know so far. Each function is one particular "machine" with one way of working. When you punch numbers into a calculator and hit "=" and get an answer, you get one answer. What you just did there was a function on those numbers.
Well, the insight is actually that this covers a lot more. It covers the rest of math, too. It covers everything that can be calculated, and that means all deterministic processes. It even, amazingly, covers non-deterministic processes. It covers quantum mechanics, including the purely random, non-deterministic component of it.
When you write your function, f(x) = 20x - 4.2, or whatever it is, you're writing a little program, a scrap of code. You send something in by replacing the x. Let's try 17.
Ok, so x = 17. And so f(x) is now f(17), which equals 20(17) - 4.2. I don't really care what that number is. It isn't important. But for completeness, it's 335.8. And I didn't need a calculator for that, or even paper (being good at arithmetic is a third of my job now, I practice), but the fact I can do it and a calculator can do it is not really a coincidence.
In Alonzo Church's idea (it's called "lambda calculus" or "λ-calculus," but as for the significance of the name, he once returned a postcard that asked this question with "eeny, meeny, miny, moe," so, really, don't worry about it), this idea of applying a tiny machine to a specific number coming in is reduced to its absolute, purest, most distilled logic. We can do this with our "f(x)" notation of parentheses or without, but the parentheses help us visualize the "boundary" of each tiny machine. You "bind" a number, 17 in my example, to another symbol, we like letters, x in my example. What we're doing is one teensy tinesy moment of memory. That right there is all of computer memory. It's all binding one thing (here, 17) to a symbol/space/slot that stores things (here, x). Data and a gap for data. That iota of memory is conceptually enough to cover - and represent - all computation.
x = 17. Memory. f(x). Process.
It seems bizarre, doesn't it?
But isn't this too simple - what if we want to work with 4 different numbers, or 15 variables containing numbers?
Good question.
When we want to work with a bunch of iotas of memory, different numbers assigned to different variables, how do we do that? We make a teensy tinesy machine for each, a function that will "bind" each one to a variable, and we put them in a chain, one sending its output as the next one's input. It's a little assembly line.
The function f(x, y, z), for example, which we can understand as a machine that works on points in 3D space to give you some corresponding piece of information about each one, can be rewritten as f(x, f(y, f(z))).
f(x, y, z) = f(x, f(y, f(z)))
We don't need a new concept at all, we just need to remember that these different functions could be different from each other. Each "f" here might mean something different, a different process from the others.
In algebra we'd say something like:
f(x, y, z) = g(x, h(y, i(z)))
Or even:
f(x, y, z) = g(y, h(z, i(x)))
Or
f(x, y, z) = a(y, c(z, q(x)))
The letters don't matter. Just remember the three variables are (or could be) different, and the three functions are (or could be) different from each other. A function or "machine" with many inputs can be expressed as many simpler functions or "machines" on single inputs, and you don't lose any functionality (no pun intended).
But don't even worry if you're confused yet. The core idea is that we don't need a new core idea. We can just keep packaging the same idea of a function, the same kind you learned about in algebra, with one number going in and one number going out.
In fact, we can even write the numbers that way. For example, 0 can be f(). 1 can be f(f())). Those empty parentheses, by the way, "contain" nothing. We call this nothingness "the empty string." It's the twin of 0 (a symbol, agreed?) when you're talking about strings of symbols - say, that unwritten term paper or novel, or the code you're about to write to calculate a cell in Google Sheets but haven't actually written. No symbol written. Blank page. Blank canvas. No code. Empty space. 0 is a number with no magnitude. The empty string is just nothing written, but it's a math concept. Small difference, but see it?
Ok.
2 can be f(f(f()). And so on.
There are even ways to write +, -, etc, as pure functions of functions.
If it helps you to see it more clearly, let me rewrite those this way:
0 = f(nothing)
1 = f(f(nothing))
2 = f(f(f(nothing)))
Etc.
You get 0 when you put nothing in the machine. When you feed that 0 into the next machine, you get 1. When you feed 1 into the next machine, you get 2: or two machines since the 0. Two assembly line steps. Think of the function as telling you how many steps led up to it. Or you can imagine it as looking inward, counting how many functions are inside it, held within its outer parentheses. (Incidentally, I mentioned above that we need to remember the f(x)'s may not be the same function, and I replaced some of the f's with other letters to illustrate, but in this case, it's all the same f(x). We don't need to worry: it's just one function working on itself.)
We can make all the numbers this way, and actually it goes deeper. There are many ways to make all the numbers with this concept, and this is only one. It simply illustrates the potential.
We already talked about how each function has its tiny iota of memory, right? Each f(x) has an x it's responsible for keeping in mind. And so when we do f(f(f(....))), we're actually building up both memory and process. And so maybe now it makes sense that f(nothing) is a symbol for nothing, ie, 0, and it's a little less than f(f(nothing)), which is 1.
It can actually be argued that every definition of numbers and operations depends on this concept, just in different ways. Data and process. Function.
Alonzo Church realized this independently of Alan Turing at almost exactly the same time, a little after 1930. What I've just outlined above, "Church's virtual atomic math machine" you could say, is equivalent to a universal Turing machine, even though the definitions of the two concepts look extremely different. They look different, but their functionality is identical. One (Church's) has functions of functions of functions, the other (Turing's) has you imagining an infinitely long spool of magnetic tape with a read/write head that can move back and forth according to some logic using a finite number of internal states: reading, writing, overwriting, or erasing 1s and 0s on the tape, and either doing this forever or at some point stopping and leaving its final result as a string of 1s and 0s on the tape. Every programming language (all the ones that are "Turing-complete," which is most of the good ones: it just means they can in theory calculate anything calculable, even if the computer might take a long time, or run out of memory space) lines up exactly with Turing's infinite tape and read/write head vision, which is also functionally identical to Church's idea above with functions of functions.
A key piece of this I haven't mentioned, but which the MIC-1 CPU in C assignment demonstrated, is that actually you only need a finite number of functions to get up and running. It's a small number, 5-10. That function composition I wrote at the end of the assignment was not infinitely long, but what it ran was a computer. Turing machines that are fully capable have a finite number of internal states, and correspondingly, real-world CPUs only need a finite number of defined machine operations. And correspondingly again, a finite set of rules of logic is your Swiss Army knife for all logical reasoning. (This is another mathematically proven result, one from Kurt Gödel that actually inspired both Turing and Church. It's closely related.) There are probably many ways to think about and visualize this same universal machine concept, ways we haven't imagined yet. Most of the coding languages look radically different from either way, and are much more workable. The Lisp family of languages draws on Church's functions of functions idea directly and elegantly, and that's why their code is famously stuffed with parentheses... and why some people actually like that. It goes right down to this core way to think about data and process.
Note: I don't think I've defined Church's lambda calculus precisely enough here that it would equate to a universal Turing machine yet, and I'm not an expert. Also, and this is an honest mistake, there's an inaccuracy where I talk about breaking functions with many inputs down into several functions (named by different letters) on single inputs. The equations I write with f(x, y, z) aren't quite correct. They hint at how it actually works; I'll try to fix it later. (Done: haven't changed the above, see below.) But I've described the idea as well as I know how.
Now if you're curious, the error comes down to the fact lambda functions have no names, but each can store a single piece of information in a named variable. This is why different notation was needed. It does mean, specifically, that a function can take in one function, do some transformations on it, and send out a different function as an answer, which then can be run elsewhere on some other input. Look up "currying" if you'd like a better explanation.
Church's notation gets at an idea that sort of combines these ideas:
f(x, y, z) = f( g( h(z) ) ), with y = h(z) and x = g(y)
f(x, y, z) = f(x, g(y, h(z)))
As far as I understand, it can't be represented fully in normal function notation (probably why we have lambda calculus and all these programming languages, huh?), but it's the same idea, just thought about and used in a more expansive way. 1) A complicated function can always be decomposed into simpler functions, and 2) functions can process other functions and even work on themselves and give themselves and each other as answers. While a function is waiting for all its inputs to have arrived (like guests for Thanksgiving), you can consider the welcoming of each arriving input (or guest) as a function that takes the number into account by memorizing it (seats him/her/hir/them at the table) and then returns to watching for remaining inputs (straggler guests). Decomposing a strict seating pattern into an open-ended, fluid arrangement allows us to build up infinitely complicated dinners from extremely simple Lego-like bits. Remember, a function doesn't only do things. A function can also modify a function that does things. And the "atoms" that make the molecules and mountains and continents and planets of functions are these tiny units of memory and process. Everything reduces to correlations working on correlations. It seems too basic, but a lot emerges. That's the gist of it.
And that's the best I can do! Admittedly I could probably do better if I understood better!
It seems almost as if we have made this more complicated, but the core insight is that we've actually made it simpler, and now we can use this as a basis for writing a computer language - or computer hardware - or any kind of hardware - that will do anything you want. Anything you've ever seen a computer do and in theory infinitely more. Anything you see happening around you in the physical world can be encoded and worked with this way. That's zany.