## Artificial intelligence (I?)

This book was okay, but nothing all that special. In my opinion there’s too much philosophy and similar stuff in there (‘what does intelligence *really mean* anyway?’), and the coverage isn’t nearly as focused on technological aspects as e.g. Winfield’s (…in my opinion better…) book from the same series on robotics (which I covered here) was; I am certain I’d have liked this book better if it’d provided a similar *type* of coverage as did Winfield, but it didn’t. However it’s far from terrible and I liked the authors skeptical approach to e.g. singularitarianism. Below I have added some quotes and links, as usual.

…

“Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do. Some of these (e.g. reasoning) are normally described as ‘intelligent’. Others (e.g. vision) aren’t. But all involve psychological skills — such as perception, association, prediction, planning, motor control — that enable humans and animals to attain their goals. Intelligence isn’t a single dimension, but a richly structured space of diverse information-processing capacities. Accordingly, AI uses many different techniques, addressing many different tasks. […] although AI needs *physical* machines (i.e. computers), it’s best thought of as using what computer scientists call *virtual* machines. A virtual machine isn’t a machine depicted in virtual reality, nor something like a simulated car engine used to train mechanics. Rather, it’s the *information-processing system* that the programmer has in mind when writing a program, and that people have in mind when using it. […] Virtual machines in general are comprised of patterns of activity (information processing) that exist at various levels. […] *the human mind* can be understood as the virtual machine – or rather, the set of mutually interacting virtual machines, running in parallel […] – that is implemented in the brain. Progress in AI requires progress in defining interesting/useful virtual machines. […] How the information is processed depends on the virtual machine involved. [There are many different approaches.] […] In brief, all the main types of AI were being thought about, and even implemented, by the late 1960s – and in some cases, much earlier than that. […] Neural networks are helpful for modelling aspects of the brain, and for doing pattern recognition and learning. Classical AI (especially when combined with statistics) can model learning too, and also planning and reasoning. Evolutionary programming throws light on biological evolution and brain development. Cellular automata and dynamical systems can be used to model development in living organisms. Some methodologies are closer to biology than to psychology, and some are closer to non-reflective behaviour than to deliberative thought. To understand the full range of mentality, all of them will be needed […]. Many AI researchers [however] don’t care about how minds work: they seek technological efficiency, not scientific understanding. […] In the 21st century, […] it has become clear that different questions require different types of answers”.

“State-of-the-art AI is a many-splendoured thing. It offers a profusion of virtual machines, doing many different kinds of information processing. There’s no key secret here, no core technique unifying the field: AI practitioners work in highly diverse areas, sharing little in terms of goals and methods. […] A host of AI applications exist, designed for countless specific tasks and used in almost every area of life, by laymen and professionals alike. Many outperform even the most expert humans. In that sense, progress has been spectacular. But the AI pioneers weren’t aiming only for specialist systems. They were also hoping for systems with general intelligence. Each human-like capacity they modelled — vision, reasoning, language, learning, and so on — would cover its entire range of challenges. Moreover, these capacities would be integrated when appropriate. Judged by those criteria, progress has been far less impressive. […] General intelligence is still a major challenge, still highly elusive. […] problems can’t always be solved merely by increasing computer power. New problem-solving *methods* are often needed. Moreover, even if a particular method *must* succeed in principle, it may need too much time and/or memory to succeed in practice. […] Efficiency is important, too: the fewer the number of computations, the better. In short, problems must be made tractable. There are several basic strategies for doing that. All were pioneered by classical symbolic AI, or GOFAI, and all are still essential today. One is to direct attention to only a part of the *search space* (the computer’s representation of the problem, within which the solution is assumed to be located). Another is to construct a smaller search space by making simplifying assumptions. A third is to order the search efficiently. Yet another is to construct a different search space, by representing the problem in a new way. These approaches involve *heuristics*, *planning*, *mathematical simplification*, and *knowledge representation*, respectively. […] Often, the hardest part of AI problem solving is presenting the problem to the system in the first place. […] the information (‘knowledge’) concerned must be presented to the system in a fashion that the machine can understand – in other words, that it can deal with. […] AI’s way of doing this are highly diverse.”

“The rule-baed form of knowledge representation enables programs to be built gradually, as the programmer – or perhaps an AGI system itself – learns more about the domain. A new rule can be added at any time. There’s no need to rewrite the program from scratch. However, there’s a catch. If the new rule isn’t logically consistent with the existing ones, the system won’t always do what it’s supposed to do. It may not even *approximate* what it’s supposed to do. When dealing with a small set of rules, such logical conflicts are easily avoided, but larger systems are less transparent. […] An alternative form of knowledge representation for concepts is semantic networks […] A semantic network links concepts by semantic relations […] semantic networks aren’t the same thing as neural networks. […] *distributed* neural networks represent knowledge in a very different way. There, individual concepts are represented not by a single node in a carefully defined associative net, but by the changing patterns of activity across an entire network. Such systems can tolerate conflicting evidence, so aren’t bedevilled by the problems of maintaining logical consistency […] Even a single mind involves distributed cognition, for it integrates many cognitive, motivational, and emotional subsystems […] Clearly, human-level AGI would involve distributed cognition.”

“In short, most human visual achievements surpass today’s AI. Often, AI researchers aren’t clear about what questions to ask. For instance, think about folding a slippery satin dress neatly. No robot can do this (although some can be instructed, step by step, how to fold an oblong terry towel). Or consider putting on a T-shirt: the head must go in first, and *not* via a sleeve — but *why*? Such topological problems hardly feature in AI. None of this implies that human-level computer vision is impossible. But achieving it is much more difficult than most people believe. So this is a special case of the fact noted in Chapter 1: that AI has taught us that human minds are hugely richer, and more subtle, than psychologists previously imagined. Indeed, that is *the* main lesson to be learned from AI. […] Difficult though it is to build a high-performing AI specialist, building an AI generalist is orders of magnitude harder. (Deep learning isn’t the answer: its aficionados admit that ‘new paradigms are needed’ to combine it with complex reasoning — scholarly code for ‘we haven’t got a clue’.) That’s why most AI researchers abandoned that early hope, turning instead to multifarious narrowly defined tasks—often with spectacular success.”

“Some machine learning uses neural networks. But much relies on symbolic AI, supplemented by powerful statistical algorithms. In fact, the statistics really do the work, the GOFAI merely guiding the worker to the workplace. Accordingly, some professionals regard machine learning as computer science and/or statistics —not AI. However, there’s no clear boundary here. Machine learning has three broad types: supervised, unsupervised, and reinforcement learning. […] In *supervised* learning, the programmer ‘trains’ the system by defining a set of desired outcomes for a range of inputs […], and providing continual feedback about whether it has achieved them. The learning system generates hypotheses about the relevant features. Whenever it classifies incorrectly, it amends its hypothesis accordingly. […] In *unsupervised* learning, the user provides no desired outcomes or error messages. Learning is driven by the principle that co-occurring features engender expectations that they will co-occur in future. Unsupervised learning can be used to discover knowledge. The programmers needn’t know what patterns/clusters exist in the data: the system finds them for itself […but even though Boden does not mention this fact, caution is most definitely warranted when applying such systems/methods to data (..it remains true that “Truth and true models are not statistically identifiable from data” – as usual, the go-to reference here is *Burnham & Anderson*)]. Finally, reinforcement learning is driven by analogues of reward and punishment: feedback messages telling the system that what it just did was good or bad. Often, reinforcement isn’t simply binary […] Given various theories of probability, there are many different algorithms suitable for distinct types of learning and different data sets.”

“Countless AI applications use natural language processing (NLP). Most focus on the computer’s ‘understanding’ of language that is presented to it, not on its own linguistic production. That’s because NLP generation is even more difficult than NLP acceptance [I had a suspicion this might be the case before reading the book, but I didn’t *know – US*]. […] It’s now clear that handling fancy syntax isn’t necessary for summarizing, questioning, or translating a natural-language text. Today’s NLP relies more on brawn (computational power) than on brain (grammatical analysis). Mathematics — specifically, statistics — has overtaken logic, and machine learning (including, but not restricted to, deep learning) has displaced syntactic analysis. […] In modern-day NLP, powerful computers do statistical searches of huge collections (‘corpora’) of texts […] to find word patterns both commonplace and unexpected. […] In general […], the focus is on words and phrases, not syntax. […] Machine-matching of languages from different language groups is usually difficult. […] Human judgements of relevance are often […] much too subtle for today’s NLP. Indeed, relevance is a linguistic/conceptual version of the unforgiving ‘frame problem‘ in robotics […]. Many people would argue that it will never be wholly mastered by a non-human system.”

“[M]any AI research groups are now addressing emotion. Most (not quite all) of this research is theoretically shallow. And most is potentially lucrative, being aimed at developing ‘computer companions’. These are AI systems — some screen-based, some ambulatory robots — designed to interact with people in ways that (besides being practically helpful) are affectively comfortable, even satisfying, for the user. Most are aimed at the elderly and/or disabled, including people with incipient dementia. Some are targeted on babies or infants. Others are interactive ‘adult toys’. […] AI systems can already recognize human emotions in various ways. Some are physiological: monitoring the person’s breathing rate and galvanic skin response. Some are verbal: noting the speaker’s speed and intonation, as well as their vocabulary. And some are visual: analysing their facial expressions. At present, all these methods are relatively crude. The user’s emotions are both easily missed and easily misinterpreted. […] [An] point [point], here [in the development and evaluation of AI], is that emotions aren’t merely *feelings*. They involve functional, as well as phenomenal, consciousness […]. Specifically, they are computational mechanisms that enable us to schedule competing motives – and without which we couldn’t function. […] If we are ever to achieve AGI, emotions such as anxiety will have to be included – and *used*.”

[The point made in the book is better made in Aureli *et al.*‘s book, especially the last chapters to which the coverage in the linked post refer. The point is that emotions enable us to make better decisions, or perhaps even to make a decision in the first place; the emotions we feel in specific contexts will tend not to be even remotely random, rather they will tend to a significant extent to be Nature’s (…and Mr. Darwin’s) attempt to tell us how to handle a specific conflict of interest in the ‘best’ manner. You don’t need to do the math, your forebears did it for you, which is why you’re now …angry, worried, anxious, etc. If you had to do the math every time before you made a decision, you’d be in trouble, and emotions provide a great shortcut in many contexts. The potential for such short-cuts seems really important if you want an agent to act intelligently, regardless of whether said agent is ‘artificial’ or not. The book very briefly mentions a few of Minsky’s thoughts on these topics, and people who are curious could probably do worse than read some of his stuff. This book seems like a place to start.]

…

Links:

GOFAI (“Good Old-Fashioned Artificial Intelligence”).

Ada Lovelace. Charles Babbage. Alan Turing. Turing machine. Turing test. Norbert Wiener. John von Neumann. W. Ross Ashby. William Grey Walter. Oliver Selfridge. Kenneth Craik. Gregory Bateson. Frank Rosenblatt. Marvin Minsky. Seymour Papert.

A logical calculus of the ideas immanent in nervous activity (McCulloch & Pitts, 1943).

Propositional logic. Logic gate.

Arthur Samuel’s checkers player. Logic Theorist. General Problem Solver. The Homeostat. Pandemonium architecture. Perceptron. Cyc.

Fault-tolerant computer system.

Cybernetics.

Programmed Data Processor (PDP).

Artificial life.

Forward chaining. Backward chaining.

Rule-based programming. MYCIN. Dendral.

Semantic network.

Non-monotonic logic. Fuzzy logic.

Facial recognition system. Computer vision.

Bayesian statistics.

Helmholtz machine.

DQN algorithm.

AlphaGo. AlphaZero.

Human Problem Solving (Newell & Simon, 1970).

ACT-R.

NELL (Never-Ending Language Learning).

SHRDLU.

ALPAC.

Google translate.

Data mining. Sentiment analysis. *Siri*. Watson (computer).

Paro (robot).

Uncanny valley.

CogAff architecture.

Connectionism.

Constraint satisfaction.

Content-addressable memory.

Graceful degradation.

Physical symbol system hypothesis.

## Combinatorics (II)

I really liked this book. Below I have added some links and quotes related to the second half of the book’s coverage.

…

“An *n × n magic square*, or a *magic square of order n*, is a square array of numbers — usually (but not necessarily) the numbers from 1 to n^{2 }— arranged in such a way that the sum of the numbers in each of the *n* rows, each of the *n* columns, or each of the two main diagonals is the same. A *semi-magic square* is a square array in which the sum of the numbers in each row or column, but not necessarily the diagonals, is the same. We note that if the entries are 1 to *n*^{2}, then the sum of the numbers in the whole array is

1 + 2 + 3 + … + *n*^{2} = *n*^{2 }(*n*^{2 }+ 1) / 2

on summing the arithmetic progression. Because the *n* rows and columns have the same ‘magic sum’, the numbers in each single row or column add up to (1/*n*)th of this, which is *n *(*n*^{2}+1) / 2 […] An *n* x *n* *latin square, *or a *latin square of order n,* is a square array with *n* symbols arranged so that each symbol appears just once in each row and column. […] Given a latin square, we can obtain others by rearranging the rows or the columns, or by permuting the symbols. For an *n × n* latin square with symbols 1, 2, … , *n*, we can thereby arrange that the numbers in the first row and the first column appear in order as 1, 2, … , *n*. Such a latin square is called *normalized* […] A familiar form of latin square is the *sudoku puzzle* […] How many *n x n* latin squares are there for a given order of *n*? The answer is known only for n ≤ 11. […] The number of normalized latin squares of order 11 has an impressive forty-eight digits.”

“A particular type of latin square is the *cyclic square*, where the symbols appear in the same cyclic order, moving one place to the left in each successive row, so that the entry at the beginning of each line appears at the end of the next one […] An extension of this idea is where the symbols move more places to the left in each successive row […] We can construct a latin square row by row from its first row, always taking care that no symbol appears twice in any column. […] An important concept […] is that of a set of *orthogonal latin squares* […] two *n × n* latin squares are *orthogonal* if, when superimposed, each of the *n*^{2} possible pairings of a symbol from each square appears exactly once. […] pairs of orthogonal latin squares are […] used in agricultural experiments. […] We can extend the idea of orthogonality beyond *pairs *[…] A set of mutually orthogonal latin squares (sometimes abbreviated to MOLS) is a set of latin squares, any two of which are orthogonal […] Note that there can be at most n-1 MOLS of order *n*. […] A full set of MOLS is called a *complete set *[…] We can ask the following question: For which values of *n* does there exist a complete set of *n × n* mutually orthogonal latin squares? As several authors have shown, a complete set exists whenever *n* is a prime number (other than 2) or a power of a prime […] In 1922, H. L. MacNeish generalized this result by observing that if *n* has prime factorization p, then the number of MOLS is at least min (p_{1}^{a} x p_{2}^{b}_{, … , }p_{k}^{z}) – 1″.

“Consider the following [problem] involving comparisons between a number of varieties of a commodity: A consumer organization wishes to compare seven brands of detergent and arranges a number of tests. But since it may be uneconomic or inconvenient for each tester to compare all seven brands it is decided that each tester should compare just three brands. How should the trials be organized if each brand is to be tested the same number of times and each pair of brands is to be compared directly? […] A block design consists of a set of *v* varieties arranged into *b* blocks. […] [if we] further assume that each block contains the same number *k* of varieties, and each variety appears in the same number *r* of blocks […] [the design is] called [an] equireplicate design […] for every block design we have *v x r = b x k*. […] It would clearly be preferable if all pairs of varieties in a design were compared the same number of times […]. Such a design is called balanced, or a balanced incomplete-block design (often abbreviated to BIBD). The number of times that any two varieties are compared is usually denoted by *λ* […] In a balanced block design the parameters *v, b, k, r,* and *λ* are not independent […] [Rather it is the case that:] *r* x (*k* -1) = *λ* x (*v* – 1). […] The conditions *v x r = b x k* and *r* x (*k* -1) = *λ* x (*v* – 1) are both necessary for a design to be balanced, but they’re not sufficient since there are designs satisfying both conditions which are not balanced. Another necessary condition for a design to be balanced is *v ≤ b*, a result known as Fisher’s inequality […] A balanced design for which *v = b*, and therefore *k = r*, is called a symmetric design“.

“A block design with *v* varieties is *resolvable* if its blocks can be rearranged into subdesigns, called replicates, each of which contains every variety just once. [….] we define a *finite projective plane* to be an arrangement of a finite number of points and a finite number of lines with the properties that: [i] Any two points lie on exactly one line. [ii] Any two lines pass through exactly one point.

Note that this differs from our usual Euclidean geometry, where any two lines pass through exactly one point *unless they’re parallel*. Omitting these italicized words produces a completely different type of geometry from the one we’re used to, since there’s now a ‘duality’ or symmetry between points and lines, according to which any statement about points lying on lines gives rise to a statement about lines passing through points, and vice versa. […] We say that the finite projective plane has *order n* if each line contains *n* + 1 points. […] removing a single line from a projective plane of order *n*, and the *n* + 1 points on this line, gives a square pattern with *n*^{2} points and *n*^{2 }+ *n* lines where each line contains *n* points and each point lies on *n* + 1 lines. Such a diagram is called an *affine plane of order n*. […] This process is reversible. If we start with an affine plane of order *n* and add another line joined up appropriately, we get a projective plane of order *n*. […] **Every finite projective plane gives rise to a symmetric balanced design**. […] In general, a finite projective plane of order *n*, with *n*^{2} + *n* + 1 points and lines and with *n* + 1 points on each line and *n* + 1 lines through each point, gives rise to a balanced symmetric design with parameters *v* = *b* = *n*^{2} + *n* + 1, *k* = *r* = *n* + 1, and *λ* = 1. […] **Every finite affine plane gives rise to a resolvable design**. […] In general, an affine plane of order *n*, obtained by removing a line and *n* + 1 points from a projective plane of order *n*, gives rise to a resolvable design with parameters *v* = *n*^{2} , *b* = *n*^{2 }+ *n* , *k* = *n* , and *r* = *n* + 1. […] **Every finite affine plane corresponds to a complete set of orthogonal latin squares**.”

…

Links:

Regular polygon.

Polyhedron.

Internal and external angles.

Triangular tiling. Square tiling. Hexagonal tiling.

Semiregular tessellations.

Penrose tiling.

Platonic solid.

Euler’s polyhedron formula.

Prism (geometry). Antiprism.

Fullerene.

Geodesic dome.

Graph theory.

Complete graph. Complete bipartite graph. Cycle graph.

Degree (graph theory).

Handshaking lemma.

Ramsey theory.

Tree (graph theory).

Eulerian and Hamiltonian Graphs. Hamiltonian path.

Icosian game.

Knight’s tour problem.

Planar graph. Euler’s formula for plane graphs.

Kuratowski’s theorem.

Dual graph.

Lo Shu Square.

Melencolia I.

Euler’s Thirty-six officers problem.

Steiner triple system.

Partition (number theory).

Pentagonal number. Pentagonal number theorem.

Ramanujan’s congruences.

## Combinatorics (I)

This book is not a particularly easy read, compared to what is the general format of the series in which it is published, but this is a good thing in my view as it also means the author managed to go into enough details in specific contexts to touch upon at least some properties/topics of interest. You don’t need any specific background knowledge to read and understand the book – at least not any sort of background knowledge one would not expect someone who might decide to read a book like this one to already have – but you do need when reading it to have the sort of mental surplus that enables you to think carefully about what’s going on and devote a few mental resources to understanding the details.

Some quotes and links from the first half of the book below.

…

“The subject of *combinatorial analysis* or *combinatorics* […] [w]e may loosely describe [as] the branch of mathematics concerned with *selecting*, *arranging*, *constructing*, *classifying*, and *counting *or *listing *things. […] the subject involves *finite sets* or *discrete elements* that proceed in separate steps […] rather than continuous systems […] Mathematicians sometimes use the term ‘combinatorics’ to refer to a larger subset of discrete mathematics that includes graph theory. In that case, what is commonly called combinatorics is then referred to as ‘enumeration’. […] Combinatorics now includes a wide range of topics, some of which we cover in this book, such as the geometry of tilings and polyhedra […], the theory of graphs […], magic squares and latin squares […], block designs and finite projective planes […], and partitions of numbers […]. [The] chapters [of the book] are largely independent of each other and can be read in any order. Much of combinatorics originated in recreational pastimes […] in recent years the subject has developed in depth and variety and has increasingly become a part of mainstream mathematics. […] Undoubtedly part of the reason for the subject’s recent importance has arisen from the growth of computer science and the increasing use of algorithmic methods for solving real-world practical problems. These have led to combinatorial applications in a wide range of subject areas, both within and outside mathematics, including network analysis, coding theory, probability, virology, experimental design, scheduling, and operations research.”

“[C]ombinatorics is primarily concerned with four types of problem:

*Existence problem*: Does □□□ exist?

*Construction problem*: If □□□ exists, how can we construct it?

*Enumeration problem*: How many □□□ are there?

*Optimization problem*: Which □□□ is best? […]

[T]hese types of problems are not unrelated; for example, the easiest way to prove that something exists may be to construct it explicitly.”

“In this book we consider two types of enumeration problem – *counting problems *in which we simply wish to know the number of objects involved, and *listing problems* in which we want to list them all explicitly. […] It’s useful to have some basic counting rules […] In what follows, all the sets are finite. […] In general we have the following rule; here, subsets are disjoint if they have no objects in common: *Addition rule:* To find the number of objects in a set, split the set into disjoint subsets, count the objects in each subset, and add the results. […] *Subtraction rule:* If a set of objects can be split into two subsets A and B, then the number of objects in *B* is obtained by subtracting the number of objects in A from the number in the whole set. […] The subtraction rule extends easily to sets that are split into more than two subsets with no elements in common. […] the *inclusion-exclusion principle* […] extends this simple idea to the situation where the subsets may have objects in common. […] In general we have the following result: *Multiplication rule:* If a counting problem can be split into stages with several options at each stage, then the total number of possibilities is the product of options at each stage. […] Another useful principle in combinatorics is the following: *Correspondence rule:* We can solve a counting problem if we can put the objects to be counted in one-to-one correspondence with the objects of a set that we have already counted. […] We conclude this section with one more rule: *Division rule:* If a set of *n* elements can be split into *m* disjoint subsets, each of size *k*, then *m = n / k*.”

“Every algorithm has a *running time *[…] this may be the time that a computer needs to carry out all the necessary calculations, or the actual number of such calculations. Each problem [also] has an *input size* […] the running time *T *usually depends on the input size *n*. Particularly important, because they’re the most efficient, are the *polynomial-time algorithms*, where the maximum running time is proportional to a power of the input size […] The collection of all polynomial-time algorithms is called P. […] In contrast, there are inefficient algorithms that don’t take polynomial time, such as the exponential-time algorithms […] At this point we introduce NP, the set of ‘non-deterministic polynomial-time problems’. These are algorithms for which a solution, when given, can be *checked* in polynomial time. Clearly P is contained in NP, since if a problem can be solved in polynomial time then a solution can certainly be checked in polynomial time – checking solutions is far easier than finding them in the first place. But are they the same? […] Few people people believe that the answer is ‘yes’, but no one has been able to prove that P ≠ NP. […] a problem is *NP-complete *if its solution in polynomial time means that *every* NP problem can be solved in polynomial time. […] If there were a polynomial algorithm for just one of them, then polynomial algorithms would exist for the whole lot and P would equal NP. On the other hand, if just one of them has no polynomial algorithm, then none of the others could have a polynomial algorithm either, and P would be different from NP.”

“In how many different ways can *n* objects be arranged? […] generally, we have the following result: *Arrangements*: The number of arrangements of *n *objects is n x (n -1) x (n – 2) x … x 3 x 2 x 1. This number is called *n factorial* and is denoted by *n!*. […] The word permutation is used in different ways. We’ll use it to mean an *ordered selection without repetition*, while others may use it to mean an arrangement […] generally, we have the following rule: *Ordered selections without repetition (permutations)*: If we select *k* items from a set of *n* objects, and if the selections are ordered and repetition is not allowed, then the number of possible selections is *n x (n – 1) x (n – 2) x … x (n – k +1)*. We denote this expression by *P(n,k)*. […] Since *P(n, n) = n x (n -1) x (n – 2) x … x 3 x 2 x 1 = n!*, an arrangement is a permutation for which *k = n*. […] generally, we have the following result: *P(n,k) = n! /(n-k)!*. […] unordered selections without repetition are called *combinations*, giving rise to the words *combinatorial* and *combinatorics*. […] generally, we have the following result: *Unordered selections without repetition (combinations)*: If we select *k *items from a set of *n* objects, and if the selections are unordered and repetition is not allowed, then the number of possible selections is *P(n,k)/k! = n x (n-1) x (n-2) x … x (n – k + 1)/k!*. We denote this expression by *C(n,k)* […] *Unordered selections with repetition*: If we select *k *items from a set of *n* objects, and if the selections are unordered and repetition is allowed, then the number of possible selections is C(n + *k *– 1, *k*). […] *Combination rule 1*: For any numbers *k* and *n* with *k ≤ n, C(n,k) = C(n,n-k)* […] *Combination rule 2*: For any numbers *n* and *k* with *k ≤ n*, *C(n, n-k) = n!/(n-k)!(n-(n-k))! = n!/(n-k)!k! = C(n,k). *[…] *Combination rule 3*: For any number *n, C(n,0) + C(n,1) + C(n,2) + … + C(n,n-1) + C(n,n) =* *2 ^{n}*”

…

Links:

Tilings/Tessellation.

Knight’s tour.

Seven Bridges of Königsberg problem.

Three utilities problem.

Four color theorem.

Tarry’s algorithm (p.7) (formulated slightly differently in the book, but it’s the same algorithm).

Polyomino.

Arthur Cayley.

Combinatorial principles.

Minimum connector problem.

Travelling salesman problem.

Algorithmic efficiency. Running time/time complexity.

Boolean satisfiability problem. Cook–Levin theorem.

Combination.

Mersenne primes.

Permutation. Factorial. Stirling’s formula.

Birthday problem.

Varāhamihira.

Manhattan distance.

Fibonacci number.

Pascal’s triangle. Binomial coefficient. Binomial theorem.

Pigeonhole principle.

Venn diagram.

Derangement (combinatorial mathematics).

Tower of Hanoi.

Stable marriage problem. Transversal (combinatorics). Hall’s marriage theorem.

Generating function (*the topic covered in the book more specifically is related to a symbolic generator of the subsets of a set, but a brief search yielded no good links to this particular topic – US*).

Group theory.

Ferdinand Frobenius. Burnside’s lemma.

## Lyapunov Arguments in Optimization

…

I’d say that if you’re interested in the intersection of mathematical optimization methods/-algorithms and dynamical systems analysis it’s probably a talk well worth watching. The lecture is reasonably high-level and covers a fairly satisfactory amount of ground in a relatively short amount of time, and it is not particularly hard to follow if you have at least some passing familiarity with the fields involved (dynamical systems analysis, statistics, mathematical optimization, computer science/machine learning).

…

Some links:

Dynamical system.

Euler–Lagrange equation.

Continuous optimization problem.

Gradient descent algorithm.

Lyapunov stability.

Condition number.

Fast (/accelerated-) gradient descent methods.

The Mirror Descent Algorithm.

Cubic regularization of Newton method and its global performance (Nesterov & Polyak).

A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights (Su, Boyd & Candès).

A Variational Perspective on Accelerated Methods in Optimization (Wibisono, Wilson & Jordan).

Breaking Locality Accelerates Block Gauss-Seidel (Tu, Venkataraman, Wilson, Gittens, Jordan & Recht).

A Lyapunov Analysis of Momentum Methods in Optimization (Wilson, Recht & Jordan).

Bregman divergence.

Estimate sequence methods.

Variance reduction techniques.

Stochastic gradient descent.

Langevin dynamics.

## An introduction to Invariant Theory

…

I was strongly considering not watching this one to the end (…such as it is – the video cuts off before the talk was completely finished, but I’m okay with missing out on the last 2 (?) minutes anyway) at several points during the lecture, mainly because this is definitely far from a ‘gentle’ introduction; this one is tough for an introductory lecture and I had to look up a lot of stuff to just sort-of-kind-of muddle along. One of the things that I recall kept me from giving the rest of the lecture a miss along the way was that some parts of the coverage made me review a few topics in group theory which I’d previously encountered, but did not remember at all well – basically I was reminded along the way that concepts X, Y, and Z existed, and that I’d forgot how they worked/what they were useful for.

I think most (…non-mathematicians? …people?) who watch this one will miss a lot of stuff and details, and although you by watching it might get some idea what this stuff’s about I’m quite sure I’d not recommend this lecture to non-mathematicians; I don’t think it’s really worth it to watch it.

I’ve posted some links below to things related to the lecture’s coverage.

…

Invariant (mathematics).

Loop invariant.

Knot invariant.

Jones polynomial.

Homogeneous polynomial.

Invariant polynomial.

Invariant of a binary form.

Paul Gordan.

Polynomial ring.

Indeterminate (variable).

Ideal (ring theory).

Hilbert’s basis theorem.

Hilbert’s Nullstellensatz.

Group representation.

Group action.

Subalgebra.

Permutation matrix.

Symmetric polynomial.

Hilbert’s finiteness theorem.

Irreducible representation.

Multiplicative group.

One-parameter group.

Hilbert–Mumford criterion.

Strictly Upper Triangular Matrix.

Nilpotent matrix.

Characteristic polynomial.

## Mathematics in Cryptography III

As she puts it herself, most of this lecture [~first 47 minutes or so] was basically “an explanation by a non-expert on how the internet uses public key” (-cryptography). The last 20 minutes cover, again in her own words, “more theoretical aspects”.

Some links:

ARPANET.

NSFNET.

Hypertext Transfer Protocol (HTTP). HTTPS.

Project Athena. Kerberos (protocol).

Pretty Good Privacy (PGP).

Secure Sockets Layer (SSL)/Transport Layer Security (TLS).

IPsec.

Wireshark.

Cipher suite.

Elliptic Curve Digital Signature Algorithm (ECDSA).

Request for Comments (RFC).

Elliptic-curve Diffie–Hellman (ECDH).

The SSL/TLS Handshake: an Overview.

Advanced Encryption Standard.

Galois/Counter Mode.

XOR gate.

Hexadecimal.

IP Header.

Time to live (TTL).

Transmission Control Protocol. TCP segment structure.

TLS record.

Security level.

Birthday problem. Birthday attack.

Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone). (§3.6 in particular is mentioned/referenced as this is stuff she talks about in the last ‘theoretical’ part of the lecture).

## Mathematics in Cryptography II

…

Some links to stuff covered in the lecture:

Public-key cryptography.

New Directions in Cryptography (Diffie & Hellman, 1976).

The history of Non-Secret Encryption (James Ellis).

Note on “Non-Secret Encryption” – Cliff Cocks (1973).

RSA (cryptosystem).

Discrete Logarithm Problem.

Diffie–Hellman key exchange.

AES (Advanced Encryption Standard).

Triple DES.

Trusted third party (TTP).

Key management.

Man-in-the-middle attack.

Digital signature.

Public key certificate.

Secret sharing.

Hash function. Cryptographic hash function.

Secure Hash Algorithm 2 (SHA-2).

Non-repudiation (digital security).

L-notation. L (complexity).

ElGamal signature scheme.

Digital Signature Algorithm (DSA).

Schnorr signature.

Identity-based cryptography.

Identity-Based Cryptosystems and Signature Schemes (Adi Shamir, 1984).

Algorithms for Quantum Computation: Discrete Logarithms and Factoring (Peter Shor, 1994).

Quantum resistant cryptography.

Elliptic curve. Elliptic-curve cryptography.

Projective space.

I have included very few links relating to the topics covered in the last part of the lecture. This was deliberate and not just a result of the type of coverage included in that part of the lecture. In my opinion non-mathematicians should probably skip the last 25 minutes or so as they’re – not only due to technical issues (the lecturer is writing stuff on the blackboard and for several minutes you’re unable to see what she’s writing, which is …unfortunate), but those certainly were not helping – not really worth the effort. The first hour of the lecture is great, the last 25 minutes are, well, less great, in my opinion. You should however not miss the first part of the coverage of ECC-related stuff (in particular the coverage ~55-58 minutes in), if you’re interested in making sense of how ECC works; I certainly found that part of the coverage very helpful.

## Mathematics in Cryptography

…

Some relevant links:

Caesar cipher.

Substitution cipher.

Frequency analysis.

Vigenère cipher.

ADFGVX cipher.

One-time pad.

Arthur Scherbius.

Enigma machine.

Permutation.

Cycle notation.

Permutation group.

Cyclic permutation.

Involution (mathematics).

An Application of the Theory of Permutations in Breaking the Enigma Cipher – Marian Rejewski.

## Medical Statistics (III)

In this post I’ll include some links and quotes related to topics covered in chapters 4, 6, and 7 of the book. Before diving in, I’ll however draw attention to some of Gerd Gigerenzer’s work as it is quite relevant to in particular the coverage included in chapter 4 (‘Presenting research findings’), even if the authors seem unaware of this. One of Gigerenzer’s key insights, which I consider important and which I have thus tried to keep in mind, unfortunately goes unmentioned in the book; namely the idea that *how* you communicate risk might be very important in terms of whether or not people *actually understand* what you are trying to tell them. A related observation is that people have studied these things and they’ve figured out that some types of risk communication are demonstrably better than others at enabling people to understand the issues at hand and the trade-offs involved in a given situation. I covered some of these ideas in a comment on SCC some time ago; if those comments spark your interest you should definitely go read the book).

…

IMRAD format.

CONSORT Statement (randomized trials).

Equator Network.

“Abstracts may appear easy to write since they are very short […] and often required to be written in a structured format. It is therefore perhaps surprising that they are sometimes poorly written, too bland, contain inaccuracies, and/or are simply misleading.^{1} The reason for poor quality abstracts are complex; abstracts are often written at the end of a long process of data collection, analysis, and writing up, when time is short and researchers are weary. […] statistical issues […] can lead to an abstract that is not a fair representation of the research conducted. […] **it is important that the abstract is consistent with the body of text** and that it gives a balanced summary of the work. […] **To maximize its usefulness, a summary or abstract should include estimates and confidence intervals for the main findings and not simply present P values**.”

“The methods section should describe how the study was conducted. […] it is important to include the following: *The setting or area […] The date(s) […] subjects included […] study design […] measurements used […] source of any non-original data […] sample size, including a justification […] statistical methods, including any computer software used […] The discussion section is where the findings of the study are discussed and interpreted […] this section tends to include less statistics than the results section […] Some medical journals have a specific structure for the discussion for researchers to follow, and so it is important to check the journal’s guidelines before submitting. […] [When] reporting statistical analyses from statistical programs: *Don’t put unedited computer output into a research document. *Extract the relevant data only and reformat as needed […] Beware of presenting percentages for very small samples as they may be misleading. Simply give the numbers alone. […] In general the following is recommended for P values: *Give the actual P value whenever possible. *Rounding: Two significant figures are usually enough […] [Confidence intervals] should be given whenever possible to indicate the precision of estimates. […] Avoid graphs with missing zeros or stretched scales […] a table or graph should stand alone so that a reader does not need to read the […] article to be able to understand it.”

…

Statistical data type.

Level of measurement.

Descriptive statistics.

Summary statistics.

Geometric mean.

Harmonic mean.

Mode.

Interquartile range.

Histogram.

Stem and leaf plot.

Box and whisker plot.

Dot plot.

…

“Quantitative data are data that can be **measured numerically** and may be continuous or discrete. ***Continuous data** lie on a continuum and so can take any value between two limits. […] ***Discrete data** do not lie on a continuum and can only take certain values, usually **counts** (integers) […] On an **interval** scale, differences between values at different points of the scale have the same meaning […] Data can be regarded as on a **ratio** scale if the ratio of the two measurements has a meaning. For example we can say that twice as many people in one group had a particular characteristic compared with another group and this has a sensible meaning. […] Quantitative data are always **ordinal** – the data values can be arranged in a numerical order from the smallest to the largest. […] *Interval scale data are always ordinal. Ratio scale data are always interval scale data and therefore must also be ordinal. *In practice, **continuous data may look discrete** because of the way they are measured and/or reported. […] All continuous measurements are limited by the accuracy of the instrument used to measure them, and many quantities such as age and height are reported in whole numbers for convenience”.

“Categorical data are data where individuals fall into a number of **separate categories or classes**. […] Different categories of categorical data may be assigned a number for coding purposes […] and if there are several categories, there may be an implied ordering, such as with stage of cancer where stage I is the least advanced and stage IV is the most advanced. This means that such data are **ordinal but not interval** because the ‘distance’ between adjacent categories has no real measurement attached to it. The ‘gap’ between stages I and II disease is not necessarily the same as the ‘gap’ between stages III and IV. […] Where categorical data are coded with numerical codes, it might appear that there is an ordering but this may not necessarily be so. It is important to **distinguish between ordered and non-ordered data** because it affects the analysis.”

“It is usually useful to present more than one summary measure for a set of data […] **If the data are going to be analyzed using methods based on means then it makes sense to present means rather than medians**. If the data are skewed they may need to be transformed before analysis and so it is best to present summaries based on the transformed data, such as geometric means. […] For very skewed data rather than reporting the median, it may be helpful to present a different percentile (i.e. not the 50th), which better reflects the shape of the distribution. […] Some researchers are reluctant to present the standard deviation when the data are skewed and so present the median and range and/or quartiles. If analyses are planned which are based on means then it makes sense to be consistent and give standard deviations. Further, the useful relationship that approximately 95% of the data lie between mean +/- 2 standard deviations, holds even for skewed data […] If data are transformed, the standard deviation cannot be back-transformed correctly and so for transformed data a standard deviation cannot be given. In this case the untransformed standard deviation can be given or another measure of spread. […] For discrete data with a narrow range, such as stage of cancer, it may be better to present the actual frequency distribution to give a fair summary of the data, rather than calculate a mean or dichotomize it. […] It is often useful to tabulate one categorical variable against another to show the proportions or percentages of the categories of one variable by the other”.

…

Random variable.

Independence (probability theory).

Probability.

Probability distribution.

Binomial distribution.

Poisson distribution.

Continuous probability distribution.

Normal distribution.

Uniform distribution.

…

“The central limit theorem is a very important mathematical theorem that **links the Normal distribution with other distributions** in a unique and surprising way and is therefore very useful in statistics. *The sum of a large number of independent random variables will follow an approximately Normal distribution irrespective of their underlying distributions. *This means that any random variable which can be regarded as a the sum of a large number of small, independent contributions is likely to follow the Normal distribution. [*I didn’t really like this description as it’s insufficiently detailed for my taste (and this was pretty much all they wrote about the CLT in that chapter); and one problem with the CLT is that people often think it applies when it might not actually do so, because the data restrictions implied by the theorem(s) are not really fully appreciated. On a related note people often seem to misunderstand what these theorems actually say and where they apply – see e.g. paragraph 10 in this post. See also the wiki link above for a more comprehensive treatment of these topics* – *US*] *The Normal distribution can be used as an approximation to the Binomial distribution **when n is large **[…] The Normal distribution can be used as an approximation to the Poisson distribution

**as the mean of the Poisson distribution increases**[…] The main advantage in using the Normal rather than the Binomial or the Poisson distribution is that it makes it easier to calculate probabilities and confidence intervals”

“The t distribution plays an important role in statistics as the sampling distribution of the sample mean divided by its standard error and is used in significance testing […] The shape is symmetrical about the mean value, and is similar to the Normal distribution but with a higher peak and longer tails to take account of the reduced precision in smaller samples. The exact shape is determined by the mean and variance plus the degrees of freedom. As the degrees of freedom increase, the shape comes closer to the Normal distribution […] The chi-squared distribution also plays an important role in statistics. If we take several variables, say *n*, which each follow a standard Normal distribution, and square each and add them, the sum of these will follow a chi-squared distribution with *n* degrees of freedom. This theoretical result is very useful and widely used in statistical testing […] The chi-squared distribution is always positive and its shape is uniquely determined by the degrees of freedom. The distribution becomes more symmetrical as the degrees of freedom increases. […] [The (noncentral) *F* distribution] is the distribution of the ratio of two chi-squared distributions and is used in hypothesis testing when we want to compare variances, such as in doing analysis of variance […] Sometimes data may follow a positively skewed distribution which becomes a Normal distribution when each data point is log-transformed [..] In this case the original data can be said to follow a lognormal distribution. The transformation of such data from log-normal to Normal is very useful in allowing skewed data to be analysed using methods based on the Normal distribution since these are usually more powerful than alternative methods”.

…

Half-Normal distribution.

Bivariate Normal distribution.

Negative binomial distribution.

Beta distribution.

Gamma distribution.

Conditional probability.

Bayes theorem.

## On the cryptographic hardness of finding a Nash equilibrium

…

I found it annoying that you generally can’t really hear the questions posed by the audience (which includes people like Avi Wigderson), especially considering that there are quite a few of these, especially in the middle section of the lecture. There are intermittent issues with the camera’s focus occasionally throughout the talk, but those are all transitory problems that should not keep you from watching the lecture. The sound issue at the beginning of the talk is resolved after 40 seconds.

One important take-away from this talk, if you choose not to watch it: “to date, there is no known efficient algorithm to find Nash equilibrium in games”. In general this paper – coauthored by the lecturer – seems from a brief skim to cover many of the topics also included in the lecture. I have added some other links to articles and topics covered/mentioned in the lecture below.

…

Nash’s Existence Theorem.

Reducibility Among Equilibrium Problems (Goldberg & Papadimitriou).

Three-Player Games Are Hard (Daskalakis & Papadimitriou).

3-Nash is PPAD-Complete (Chen & Deng).

PPAD (complexity).

NP-hardness.

On the (Im)possibility of Obfuscating Programs (Barak *et al*.).

On the Impossibility of Obfuscation with Auxiliary Input (Goldwasser & Kalai).

On Best-Possible Obfuscation (Goldwasser & Rothblum).

Functional Encryption without Obfuscation (Garg *et al.*).

On the Complexity of the Parity Argument and Other Inefficient Proofs of Existence (Papadimitriou).

Pseudorandom function family.

Revisiting the Cryptographic Hardness of Finding a Nash Equilibrium (Garg, Pandei & Srinivasan).

Constrained Pseudorandom Functions and Their Applications (Boneh & Waters).

Delegatable Pseudorandom Functions and Applications (Kiayias *et al.*).

Functional Signatures and Pseudorandom Functions (Boyle, Goldwasser & Ivan).

Universal Constructions and Robust Combiners for Indistinguishability Obfuscation and Witness Encryption (Ananth *et al.*).

## Networks

I actually think this was a really nice book, considering the format – I gave it four stars on goodreads. One of the things I noticed people didn’t like about it in the reviews is that it ‘jumps’ a bit in terms of topic coverage; it covers a wide variety of applications and analytical settings. I mostly don’t consider this a weakness of the book – even if occasionally it does get a bit excessive – and I can definitely understand the authors’ choice of approach; it’s sort of hard to illustrate the potential the analytical techniques described within this book have if you’re not allowed to talk about all the areas in which they have been – or could be gainfully – applied. A related point is that many people who read the book might be familiar with the application of these tools in specific contexts but have perhaps not thought about the fact that similar methods are applied in many other areas (and they might all of them be a bit annoyed the authors don’t talk more about computer science applications, or foodweb analyses, or infectious disease applications, or perhaps sociometry…). Most of the book is about graph-theory-related stuff, but a very decent amount of the coverage deals with *applications,* in a broad sense of the word at least, not *theory*. The discussion of theoretical constructs in the book always felt to me driven to a large degree by their *usefulness* in specific contexts.

I have covered related topics before here on the blog, also quite recently – e.g. there’s at least some overlap between this book and Holland’s book about complexity theory in the same series (I incidentally think these books probably go well together) – and as I found the book slightly difficult to blog as it was I decided against covering it in as much detail as I sometimes do when covering these texts – this means that I decided to leave out the links I usually include in posts like these.

Below some quotes from the book.

…

“The network approach focuses all the attention on the global structure of the interactions within a system. The detailed properties of each element on its own are simply ignored. Consequently, systems as different as a computer network, an ecosystem, or a social group are all described by the same tool: a *graph*, that is, a bare architecture of nodes bounded by connections. […] Representing widely different systems with the same tool can only be done by a high level of abstraction. What is lost in the specific description of the details is gained in the form of *universality* – that is, thinking about very different systems as if they were different realizations of the same theoretical structure. […] This line of reasoning provides many insights. […] The network approach also sheds light on another important feature: the fact that certain systems that grow without external control are still capable of spontaneously developing an internal order. […] Network models are able to describe in a clear and natural way how self-organization arises in many systems. […] In the study of complex, emergent, and self-organized systems (the modern science of complexity), networks are becoming increasingly important as a universal mathematical framework, especially when massive amounts of data are involved. […] networks are crucial instruments to sort out and organize these data, connecting individuals, products, news, etc. to each other. […] While the network approach eliminates many of the individual features of the phenomenon considered, it still maintains some of its specific features. Namely, it does not alter the size of the system — i.e. the number of its elements — or the pattern of interaction — i.e. the specific set of connections between elements. Such a simplified model is nevertheless enough to capture the properties of the system. […] The network approach [lies] somewhere between the description by individual elements and the description by big groups, bridging the two of them. In a certain sense, networks try to explain how a set of isolated elements are transformed, through a pattern of interactions, into groups and communities.”

“[T]he random graph model is very important because it quantifies the properties of a totally random network. Random graphs can be used as a benchmark, or *null case*, for any real network. This means that a random graph can be used in comparison to a real-world network, to understand how much chance has shaped the latter, and to what extent other criteria have played a role. The simplest recipe for building a random graph is the following. We take all the possible pair of vertices. For each pair, we toss a coin: if the result is heads, we draw a link; otherwise we pass to the next pair, until all the pairs are finished (this means drawing the link with a probability p = ½, but we may use whatever value of *p*). […] Nowadays [the random graph model] is a benchmark of comparison for all networks, since any deviations from this model suggests the presence of some kind of structure, order, regularity, and non-randomness in many real-world networks.”

“…in networks, *topology* is more important than *metrics*. […] In the network representation, the connections between the elements of a system are much more important than their specific positions in space and their relative distances. The focus on topology is one of its biggest strengths of the network approach, useful whenever topology is more relevant than metrics. […] In social networks, the relevance of topology means that *social structure* matters. […] Sociology has classified a broad range of possible links between individuals […]. The tendency to have several kinds of relationships in social networks is called multiplexity. But this phenomenon appears in many other networks: for example, two species can be connected by different strategies of predation, two computers by different cables or wireless connections, etc. We can modify a basic graph to take into account this multiplexity, e.g. by attaching specific tags to edges. […] Graph theory [also] allows us to encode in edges more complicated relationships, as when connections are not reciprocal. […] If a direction is attached to the edges, the resulting structure is a *directed graph* […] In these networks we have both *in-degree* and *out-degree*, measuring the number of inbound and outbound links of a node, respectively. […] in most cases, relations display a broad variation or intensity [i.e. they are not binary/dichotomous]. […] *Weighted networks* may arise, for example, as a result of different frequencies of interactions between individuals or entities.”

“An organism is […] the outcome of several layered networks and not only the deterministic result of the simple sequence of genes. Genomics has been joined by *epigenomics*, *transcriptomics*, *proteomics*, *metabolomics*, etc., the disciplines that study these layers, in what is commonly called the *omics revolution*. Networks are at the heart of this revolution. […] The brain is full of networks where various web-like structures provide the integration between specialized areas. In the cerebellum, neurons form modules that are repeated again and again: the interaction between modules is restricted to neighbours, similarly to what happens in a lattice. In other areas of the brain, we find random connections, with a more or less equal probability of connecting local, intermediate, or distant neurons. Finally, the neocortex — the region involved in many of the higher functions of mammals — combines local structures with more random, long-range connections. […] typically, food chains are not isolated, but interwoven in intricate patterns, where a species belongs to several chains at the same time. For example, a specialized species may predate on only one prey […]. If the prey becomes extinct, the population of the specialized species collapses, giving rise to a set of co-extinctions. An even more complicated case is where an omnivore species predates a certain herbivore, and both eat a certain plant. A decrease in the omnivore’s population does not imply that the plant thrives, because the herbivore would benefit from the decrease and consume even more plants. As more species are taken into account, the population dynamics can become more and more complicated. This is why a more appropriate description than ‘foodchains’ for ecosystems is the term foodwebs […]. These are networks in which nodes are species and links represent relations of predation. Links are usually directed (big fishes eat smaller ones, not the other way round). These networks provide the interchange of food, energy, and matter between species, and thus constitute the circulatory system of the biosphere.”

“In the cell, some groups of chemicals interact only with each other and with nothing else. In ecosystems, certain groups of species establish small foodwebs, without any connection to external species. In social systems, certain human groups may be totally separated from others. However, such disconnected groups, or components, are a strikingly small minority. In all networks, almost all the elements of the systems take part in one large connected structure, called a *giant connected component*. […] In general, the giant connected component includes not less than 90 to 95 per cent of the system in almost all networks. […] In a directed network, the existence of a path from one node to another does not guarantee that the journey can be made in the opposite direction. Wolves eat sheep, and sheep eat grass, but grass does not eat sheep, nor do sheep eat wolves. This restriction creates a complicated architecture within the giant connected component […] according to an estimate made in 1999, more than 90 per cent of the WWW is composed of pages connected to each other, if the direction of edges is ignored. However, if we take direction into account, the proportion of nodes mutually reachable is only 24 per cent, the *giant strongly connected component*. […] most networks are *sparse*, i.e. they tend to be quite frugal in connections. Take, for example, the airport network: the personal experience of every frequent traveller shows that direct flights are not that common, and intermediate stops are necessary to reach several destinations; thousands of airports are active, but each city is connected to less than 20 other cities, on average. The same happens in most networks. A measure of this is given by the mean number of connection of their nodes, that is, their *average degree*.”

“[A] puzzling contradiction — a sparse network can still be very well connected — […] attracted the attention of the Hungarian mathematicians […] Paul Erdős and Alfréd Rényi. They tackled it by producing different realizations of their random graph. In each of them, they changed the density of edges. They started with a very low density: less than one edge per node. It is natural to expect that, as the density increases, more and more nodes will be connected to each other. But what Erdős and Rényi found instead was a quite abrupt transition: several disconnected components coalesced suddenly into a large one, encompassing almost all the nodes. The sudden change happened at one specific critical density: when the average number of links per node (i.e. the average degree) was greater than one, then the giant connected component suddenly appeared. This result implies that networks display a very special kind of economy, intrinsic to their disordered structure: a small number of edges, even randomly distributed between nodes, is enough to generate a large structure that absorbs almost all the elements. […] Social systems seem to be very tightly connected: in a large enough group of strangers, it is not unlikely to find pairs of people with quite short chains of relations connecting them. […] The small-world property consists of the fact that the average distance between any two nodes (measured as the shortest path that connects them) is very small. Given a node in a network […], few nodes are very close to it […] and few are far from it […]: the majority are at the average — and very short — distance. This holds for all networks: starting from one specific node, almost all the nodes are at very few steps from it; the number of nodes within a certain distance increases exponentially fast with the distance. Another way of explaining the same phenomenon […] is the following: even if we add many nodes to a network, the average distance will not increase much; one has to increase the size of a network by several orders of magnitude to notice that the paths to new nodes are (just a little) longer. The small-world property is crucial to many network phenomena. […] The small-world property is something intrinsic to networks. Even the completely random Erdős-Renyi graphs show this feature. By contrast, regular grids do not display it. If the Internet was a chessboard-like lattice, the average distance between two routers would be of the order of 1,000 jumps, and the Net would be much slower [the authors note elsewhere that “The Internet is composed of hundreds of thousands of routers, but just about ten ‘jumps’ are enough to bring an information packet from one of them to any other.”] […] The key ingredient that transforms a structure of connections into a small world is the presence of a little disorder. No real network is an ordered array of elements. On the contrary, there are always connections ‘out of place’. It is precisely thanks to these connections that networks are small worlds. […] Shortcuts are responsible for the small-world property in many […] situations.”

“Body size, IQ, road speed, and other magnitudes have a *characteristic scale*: that is, an average value that in the large majority of cases is a rough predictor of the actual value that one will find. […] While height is a homogeneous magnitude, the number of social connection[s] is a heterogeneous one. […] A system with this feature is said to be *scale-free* or *scale-invariant*, in the sense that it does not have a characteristic scale. This can be rephrased by saying that the individual fluctuations with respect to the average are too large for us to make a correct prediction. […] In general, a network with heterogeneous connectivity has a set of clear hubs. When a graph is small, it is easy to find whether its connectivity is homogeneous or heterogeneous […]. In the first case, all the nodes have more or less the same connectivity, while in the latter it is easy to spot a few hubs. But when the network to be studied is very big […] things are not so easy. […] the distribution of the connectivity of the nodes of the […] network […] is the *degree distribution* of the graph. […] In homogeneous networks, the degree distribution is a bell curve […] while in heterogeneous networks, it is a power law […]. The power law implies that there are many more hubs (and much more connected) in heterogeneous networks than in homogeneous ones. Moreover, hubs are not isolated exceptions: there is a full hierarchy of nodes, each of them being a hub compared with the less connected ones.”

“Looking at the degree distribution is the best way to check if a network is heterogeneous or not: if the distribution is fat tailed, then the network will have hubs and heterogeneity. A mathematically perfect power law is never found, because this would imply the existence of hubs with an infinite number of connections. […] Nonetheless, a strongly skewed, fat-tailed distribution is a clear signal of heterogeneity, even if it is never a perfect power law. […] While the small-world property is something intrinsic to networked structures, hubs are not present in all kind of networks. For example, power grids usually have very few of them. […] hubs are not present in random networks. A consequence of this is that, while random networks are small worlds, heterogeneous ones are *ultra-small worlds*. That is, the distance between their vertices is relatively smaller than in their random counterparts. […] Heterogeneity is not equivalent to randomness. On the contrary, it can be the signature of a hidden order, not imposed by a top-down project, but generated by the elements of the system. The presence of this feature in widely different networks suggests that some common underlying mechanism may be at work in many of them. […] the Barabási–Albert model gives an important take-home message. A simple, local behaviour, iterated through many interactions, can give rise to complex structures. This arises without any overall blueprint”.

“*Homogamy*, the tendency of like to marry like, is very strong […] Homogamy is a specific instance of *homophily*: this consists of a general trend of like to link to like, and is a powerful force in shaping social networks […] *assortative mixing* [is] a special form of homophily, in which nodes tend to connect with others that are similar to them in the number of connections. By contrast [when] high- and low-degree nodes are more connected to each other [it] is called *disassortative mixing*. Both cases display a form of correlation in the degrees of neighbouring nodes. When the degrees of neighbours are positively correlated, then the mixing is assortative; when negatively, it is disassortative. […] In random graphs, the neighbours of a given node are chosen completely at random: as a result, there is no clear correlation between the degrees of neighbouring nodes […]. On the contrary, correlations are present in most real-world networks. Although there is no general rule, most natural and technological networks tend to be disassortative, while social networks tend to be assortative. […] Degree assortativity and disassortativity are just an example of the broad range of possible correlations that bias how nodes tie to each other.”

“[N]etworks (neither ordered lattices nor random graphs), can have both large clustering and small average distance at the same time. […] in almost all networks, the clustering of a node depends on the degree of that node. Often, the larger the degree, the smaller the clustering coefficient. Small-degree nodes tend to belong to well-interconnected local communities. Similarly, hubs connect with many nodes that are not directly interconnected. […] Central nodes usually act as bridges or bottlenecks […]. For this reason, centrality is an estimate of the load handled by a node of a network, assuming that most of the traffic passes through the shortest paths (this is not always the case, but it is a good approximation). For the same reason, damaging central nodes […] can impair radically the flow of a network. Depending on the process one wants to study, other definitions of centrality can be introduced. For example, *closeness centrality* computes the distance of a node to all others, and *reach centrality* factors in the portion of all nodes that can be reached in one step, two steps, three steps, and so on.”

“Domino effects are not uncommon in foodwebs. Networks in general provide the backdrop for large-scale, sudden, and surprising dynamics. […] most of the real-world networks show a doubled-edged kind of robustness. They are able to function normally even when a large fraction of the network is damaged, but suddenly certain small failures, or targeted attacks, bring them down completely. […] networks are very different from engineered systems. In an airplane, damaging one element is enough to stop the whole machine. In order to make it more resilient, we have to use strategies such as duplicating certain pieces of the plane: this makes it almost 100 per cent safe. In contrast, networks, which are mostly not blueprinted, display a natural resilience to a broad range of errors, but when certain elements fail, they collapse. […] A random graph of the size of most real-world networks is destroyed after the removal of half of the nodes. On the other hand, when the same procedure is performed on a heterogeneous network (either a map of a real network or a scale-free model of a similar size), the giant connected component resists even after removing more than 80 per cent of the nodes, and the distance within it is practically the same as at the beginning. The scene is different when researchers simulate a targeted attack […] In this situation the collapse happens much faster […]. However, now the most vulnerable is the second: while in the homogeneous network it is necessary to remove about one-fifth of its more connected nodes to destroy it, in the heterogeneous one this happens after removing the first few hubs. Highly connected nodes seem to play a crucial role, in both errors and attacks. […] hubs are mainly responsible for the overall cohesion of the graph, and removing a few of them is enough to destroy it.”

“Studies of errors and attacks have shown that hubs keep different parts of a network connected. This implies that they also act as bridges for spreading diseases. Their numerous ties put them in contact with both infected and healthy individuals: so hubs become easily infected, and they infect other nodes easily. […] The vulnerability of heterogeneous networks to epidemics is bad news, but understanding it can provide good ideas for containing diseases. […] if we can immunize just a fraction, it is not a good idea to choose people at random. Most of the times, choosing at random implies selecting individuals with a relatively low number of connections. Even if they block the disease from spreading in their surroundings, hubs will always be there to put it back into circulation. A much better strategy would be to target hubs. Immunizing hubs is like deleting them from the network, and the studies on targeted attacks show that eliminating a small fraction of hubs fragments the network: thus, the disease will be confined to a few isolated components. […] in the epidemic spread of sexually transmitted diseases the timing of the links is crucial. Establishing an unprotected link with a person before they establish an unprotected link with another person who is infected is not the same as doing so afterwards.”

## Sieve methods: what are they, and what are they good for?

…

Given the nature of the lecture it was difficult to come up with relevant links to include in this post, but these seemed relevant enough to include them here:

Sieve theory.

Inclusion–exclusion principle.

Fundamental lemma of sieve theory.

Parity problem (sieve theory).

Viggo Brun (the lecturer mentions along the way that many of the things he talks about in this lecture are things this guy figured out, but the wiki article is unfortunately very short).

As he notes early on, when working with sieves we’re: “*Interested in objects which are output of some inclusion-exclusion process & *Rather than counting precisely, we want to gain good bounds, but work flexibly.”

‘Counting’ should probably be interpreted loosely here, in the general scheme of things; sieves are mostly used in number theory, but as Maynard mentions presumably similar methods can be used in other mathematical contexts – thus the deliberate use of the word ‘objects’. It seems to be all about trying to ascertain some properties about some objects/sets/whatever, without necessarily imposing much structure (‘are we within the right order of magnitude?’ rather than ‘did we get them all?’). The basic idea behind restricting the amount of structure imposed is, as far as I gathered from the lecture, to make the problem you’re faced with more tractable.

## Some things you need to know about machine learning but didn’t know whom to ask (the grad school version)

…

Some links to stuff related to the lecture’s coverage:

An overview of gradient descent optimization algorithms.

Rectifier (neural networks) [Relu].

Backpropagation.

Escaping From Saddle Points – Online Stochastic Gradient for Tensor Decomposition (Ge et al.).

How to Escape Saddle Points Efficiently (closely related to the paper above, presumably one of the ‘recent improvements’ mentioned in the lecture).

Linear classifier.

Concentration inequality.

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks (Neyshabur et al.).

Off the convex path (the lecturer’s blog).

## Complexity

Complexity theory is a topic I’ve previously been exposed to through various channels; examples include Institute for Advanced Studies comp sci lectures, notes included in a few computer science-related books like Louridas and Dasgupta, and probably also e.g. some of the systems analysis/-science books I’ve read – Konieczny *et al.’s* text which I recently finished reading is another example of a book which peripherally covers content also covered in this book. Holland’s book pretty much doesn’t cover *computational* complexity theory at all, but some knowledge of computer science will probably still be useful as e.g. concepts from graph theory are touched upon/applied in the coverage; I am also aware that I derived some benefit while reading this book from having previously spent time on signalling models in microeconomics, as there were conceptual similarities between those models and their properties and some of the stuff Holland includes. I’m not really sure if you need to know ‘anything’ to read the book and get something out of it, but although Holland doesn’t use much mathematical formalism some of the ‘hidden’ formalism lurking in the background will probably not be easy to understand if you e.g. haven’t seen a mathematical equation since the 9th grade, and people who e.g. have seen hierarchical models before will definitely have a greater appreciation of some of the material covered than people who have not. Obviously I’ve read a lot of stuff over time that made the book easier for me to read and understand than it otherwise would have been, but how easy would the book have been for me to read if I hadn’t read those other things? It’s really difficult for me to say. I found the book hard to judge/rate/evaluate, so I decided against rating it on goodreads.

Below I have added some quotes from the book.

…

“[C]omplex systems exhibits a distinctive property called *emergence*, roughly described by the common phrase ‘the action of the whole is more than the sum of the actions of the parts’. In addition to complex systems, there is a subfield of computer science, called *computational complexity*, which concerns itself with the difficulty of solving different kinds of problems. […] The object of the computational complexity subfield is to assign levels of difficulty — levels of complexity — to different collections of problems. There are intriguing conjectures about these levels of complexity, but an understanding of the theoretical framework requires a substantial background in theoretical computer science — enough to fill an entire book in this series. For this reason, and because computational complexity does not touch upon emergence, I will confine this book to *systems* and the ways in which they exhibit *emergence*. […] *emergent behaviour* is an essential requirement for calling a system ‘complex’. […] Hierarchical organization is […] closely tied to emergence. Each level of a hierarchy typically is governed by its own set of laws. For example, the laws of the periodic table govern the combination of hydrogen and oxygen to form H_{2}O molecules, while the laws of fluid flow (such as the Navier-Stokes equations) govern the behaviour of water. The laws of a new level must not violate the laws of earlier levels — that is, the laws at lower levels constrain the laws at higher levels. […] Restated for complex systems: emergent properties at any level must be consistent with interactions specified at the lower level(s). […] Much of the motivation for treating a system as complex is to get at questions that would otherwise remain inaccessible. Often the first steps in acquiring a deeper understanding are through comparisons of similar systems. By treating hierarchical organization as *sine qua non* for complexity we focus on the interactions of emergent properties at various levels. The combination of ‘top–down’ effects (as when the daily market average affects actions of the buyers and sellers in an equities market) and ‘bottom–up’ effects (the interactions of the buyers and sellers determine the market average) is a pervasive feature of complex systems. The present exposition, then, centres on complex systems where emergence, and the reduction(s) involved, offer a key to new kinds of understanding.”

“As the field of complexity studies has developed, it has split into two subfields that examine two different kinds of emergence: the study of *complex physical systems* (CPS) and the study of *complex adaptive systems* (CAS): The study of complex physical systems focuses on geometric (often lattice-like) arrays of elements, in which interactions typically depend only on effects propagated from nearest neighbours. […] the study of CPS has a distinctive set of tools and questions centring on elements that have fixed properties – atoms, the squares of the cellular automaton, and the like. […] The tools used for studying CPS come, with rare exceptions, from a well-developed part of mathematics, the theory of partial differential equations […] CAS studies, in contrast to CPS studies, concern themselves with elements that are not fixed. The elements, usually called *agents*, learn or adapt in response to interactions with other agents. […] It is unusual for CAS agents to converge, even momentarily, to a single ‘optimal’ strategy, or to an equilibrium. As the agents adapt to each other, new agents with new strategies usually emerge. Then each new agent offers opportunities for still further interactions, increasing the overall complexity. […] The complex feedback loops that form make it difficult to analyse, or even describe, CAS. […] Analysis of complex systems almost always turns on finding recurrent patterns in the system’s ever-changing configurations. […] *perpetual novelty*, produced with a limited number of rules or laws, is a characteristic of most complex systems: DNA consists of strings of the same four nucleotides, yet no two humans are exactly alike; the theorems of Euclidian geometry are based on just five axioms, yet new theorems are still being derived after two millenia; and so it is for the other complex systems.”

“In a typical physical system the whole is (at least approximately) the sum of the parts, making the use of PDEs straightforward for a mathematician, but in a typical generated system the parts are put together in an interconnected, non-additive way. It is possible to write a concise set of partial differential equations to describe the basic elements of a computer, say an interconnected set of binary counters, but the existing theory of PDEs does little to increase our understanding of the circuits so-described. The formal grammar approach, in contrast, has already considerably increased our understanding of computer languages and programs. One of the major tasks of this book is to use a formal grammar to convert common features of complex systems into ‘stylized facts’ that can be examined carefully within the grammar.”

“Many CPS problems (e.g. the flow of electrons in superconductive materials) […] involve flows — flows that are nicely described by networks. Networks provide a detailed snapshot of CPS and complex adaptive systems (CAS) interactions at any given point in their development, but there are few studies of the evolution of networks […]. The distinction between the fast dynamic of flows (change of state) and the slow dynamic of adaptation (change of the network of interactions) often distinguishes CPS studies from CAS studies. […] all well-studied CAS exhibit *lever points*, points where a small directed action causes large predictable changes in aggregate behaviour, as when a vaccine produces long-term changes in an immune system. At present, lever points are almost always located by trial and error. However, by extracting mechanisms common to different lever points, a relevant CAS theory would provide a principled way of locating and testing lever points. […] activities that are easy to observe in one complex system often suggest ‘where to look’ in other complex systems where the activities are difficult to observe.”

“Observation shows that agents acting in a niche continually undergo ‘improvements’, without ever completely outcompeting other agents in the community. These improvements may come about in either of two ways: (i) an agent may become more of a generalist, processing resources from a wider variety of sources, or (ii) it may become more specialized, becoming more efficient than its competitors at exploiting a particular source of a vital resource. Both changes allow for still more interactions and still greater diversity. […] All CAS that have been examined closely exhibit trends toward increasing numbers of specialists.”

“Emergence is tightly tied to the formation of boundaries. These boundaries can arise from symmetry breaking, […] or they can arise by assembly of component building blocks […]. For CAS, the agent-defining boundaries determine the interactions between agents. […] Adaptation, and the emergence of new kinds of agents, then arises from changes in the relevant boundaries. Typically, a boundary only looks to a small segment of a signal, a tag, to determine whether or not the signal can pass through the boundary. […] an agent can be modelled by a set of conditional IF/THEN rules that represent both the effects of boundaries and internal signal-processing. Because tags are short, a given signal may carry multiple tags, and the rules that process signals can require the presence of more than one tag for the processing to proceed. Agents are *parallel processors* in the sense that all rules that are satisfied simultaneously in the agent are executed simultaneously. As a result, the interior of an agent will usually be filled with multiple signals […]. The central role of tags in routing signals through this complex interior puts emphasis on the mechanisms for tag modification as a means of adaptation. Recombination of extant conditions and signals […] turns tags into building blocks for specifying new routes. Parallel processing then makes it possible to test new routes so formed without seriously disrupting extant useful routes. Sophisticated agents have another means of adaptation: anticipation (‘lookahead’). If an agent has a set of rules that simulates part of its world, then it can run this internal model to examine the outcomes of different action sequences *before* those actions are executed.”

“The flow of signals within and between agents can be represented by a directed network, where nodes represent rules, and there is a connection from node x to node y if rule x sends a signal satisfying a condition of rule y. Then, the flow of signals over this network spells out the performance of the agent at a point in time. […] The networks associated with CAS are typically highly tangled, with many loops providing feedback and recirculation […]. An agent adapts by changing its signal-processing rules, with corresponding changes in the structure of the associated network. […] Most machine-learning models, including ‘artificial neural networks’ and ‘Bayesian networks’, lack feedback cycles — they are often called ‘feedforward networks’ (in contrast to networks with substantial feedback). In the terms used in Chapter 4, such networks have no ‘recirculation’ and hence have no autonomous subsystems. Networks with substantial numbers of cycles are difficult to analyse, but a large number of cycles is the essential requirement for the autonomous internal models that make lookahead and planning possible. […] The complexities introduced by loops have so far resisted most attempts at analysis. […] The difficulties of analysing the behaviour of networks with many interior loops has, both historically and currently, encouraged the study of networks without loops called *trees*. Trees occur naturally in the study of games. […] because trees are easier to analyse, most artificial neural networks constructed for pattern recognition are trees. […] *Evolutionary game theory* makes use of the tree structure of games to study the ways in which agents can modify their strategies as they interact with other agents playing the same game. […] However, evolutionary game theory does not concern itself with the evolution of the game’s laws.”

“It has been observed that innovation in CAS is mostly a matter of combining well-known components in new ways. […] Recombination abets the formation of new cascades. […] By extracting general mechanisms that modify CAS, such as recombination, we go from examination of particular instances to a unified study of characteristic CAS properties. The mechanisms of interest act mainly on extant substructures, using them as building blocks for more complex substructures […]. Because signals and boundaries are a pervasive feature of CAS, their modification has a central role in this adaptive process.”

## Random stuff

I have almost stopped posting posts like these, which has resulted in the accumulation of a very large number of links and studies which I figured I might like to blog at some point. This post is mainly an attempt to deal with the backlog – I won’t cover the material in too much detail.

…

i. Do Bullies Have More Sex? The answer seems to be a qualified yes. A few quotes:

“Sexual behavior during adolescence is fairly widespread in Western cultures (Zimmer-Gembeck and Helfland 2008) with nearly two thirds of youth having had sexual intercourse by the age of 19 (Finer and Philbin 2013). […] Bullying behavior may aid in intrasexual competition and intersexual selection as a strategy when competing for mates. In line with this contention, bullying has been linked to having a higher number of dating and sexual partners (Dane et al. 2017; Volk et al. 2015). This may be one reason why adolescence coincides with a peak in antisocial or aggressive behaviors, such as bullying (Volk et al. 2006). However, not all adolescents benefit from bullying. Instead, bullying may only benefit adolescents with certain personality traits who are willing and able to leverage bullying as a strategy for engaging in sexual behavior with opposite-sex peers. Therefore, we used two independent cross-sectional samples of older and younger adolescents to determine which personality traits, if any, are associated with leveraging bullying into opportunities for sexual behavior.”

“…bullying by males signal the ability to provide good genes, material resources, and protect offspring (Buss and Shackelford 1997; Volk et al. 2012) because bullying others is a way of displaying attractive qualities such as strength and dominance (Gallup et al. 2007; Reijntjes et al. 2013). As a result, this makes bullies attractive sexual partners to opposite-sex peers while simultaneously suppressing the sexual success of same-sex rivals (Gallup et al. 2011; Koh and Wong 2015; Zimmer-Gembeck et al. 2001). Females may denigrate other females, targeting their appearance and sexual promiscuity (Leenaars et al. 2008; Vaillancourt 2013), which are two qualities relating to male mate preferences. Consequently, derogating these qualities lowers a rivals’ appeal as a mate and also intimidates or coerces rivals into withdrawing from intrasexual competition (Campbell 2013; Dane et al. 2017; Fisher and Cox 2009; Vaillancourt 2013). Thus, males may use direct forms of bullying (e.g., physical, verbal) to facilitate intersexual selection (i.e., appear attractive to females), while females may use relational bullying to facilitate intrasexual competition, by making rivals appear less attractive to males.”

The study relies on the use of self-report data, which I find very problematic – so I won’t go into the results here. I’m not quite clear on how those studies mentioned in the discussion ‘have found self-report data [to be] valid under conditions of confidentiality’ – and I remain skeptical. You’ll usually want data from independent observers (e.g. teacher or peer observations) when analyzing these kinds of things. Note in the context of the self-report data problem that if there’s a strong stigma associated with being bullied (there often is, or bullying wouldn’t work as well), asking people if they have been bullied is not much better than asking people if they’re bullying others.

…

ii. Some topical advice that some people might soon regret not having followed, from the wonderful Things I Learn From My Patients thread:

“If you are a teenage boy experimenting with fireworks, do not empty the gunpowder from a dozen fireworks and try to mix it in your mother’s blender. But if you do decide to do that, don’t hold the lid down with your other hand and stand right over it. This will result in the traumatic amputation of several fingers, burned and skinned forearms, glass shrapnel in your face, and a couple of badly scratched corneas as a start. You will spend months in rehab and never be able to use your left hand again.”

…

iii. I haven’t talked about the AlphaZero-Stockfish match, but I was of course aware of it and did read a bit about that stuff. Here’s a reddit thread where one of the Stockfish programmers answers questions about the match. A few quotes:

## The history of astronomy

It’s been a while since I read this book, and I was for a while strongly considering not blogging it at all. In the end I figured I ought to cover it after all in at least a little bit of detail, though when I made the decision to cover the book here I also decided not to cover it in nearly as much detail as I usually cover the books in this series.

Below some random observations from the book which I found sufficiently interesting to add here.

…

“The *Almagest* is a magisterial work that provided geometrical models and related tables by which the movements of the Sun, Moon, and the five lesser planets could be calculated for the indefinite future. […] Its catalogue contains over 1,000 fixed stars arranged in 48 constellations, giving the longitude, latitude, and apparent brightness of each. […] the *Almagest* would dominate astronomy like a colossus for 14 centuries […] In the universities of the later Middle Ages, students would be taught Aristotle in philosophy and a simplified Ptolemy in astronomy. From Aristotle they would learn the basic truth that the heavens rotate uniformly about the central Earth. From the simplified Ptolemy they would learn of epicycles and eccentrics that violated this basic truth by generating orbits whose centre was not the Earth; and those expert enough to penetrate deeper into the Ptolemaic models would encounter equant theories that violated the (yet more basic) truth that heavenly motion is uniform. […] with the models of the *Almagest* – whose parameters would be refined over the centuries to come – the astronomer, and the astrologer, could compute the future positions of the planets with economy and reasonable accuracy. There were anomalies – the Moon, for example, would vary its apparent size dramatically in the Ptolemaic model but does not do so in reality, and Venus and Mercury were kept close to the Sun in the sky by a crude *ad hoc* device – but as a geometrical compendium of how to grind out planetary tables, the *Almagest* worked, and that was what mattered.”

“The revival of astronomy – and astrology – among the Latins was stimulated around the end of the first millennium when the astrolabe entered the West from Islamic Spain. Astrology in those days had a [‘]rational[‘] basis rooted in the Aristotelian analogy between the microcosm – the individual living body – and the macrocosm, the cosmos as a whole. Medical students were taught how to track the planets, so that they would know when the time was favourable for treating the corresponding organs in their patients.” [*Aaargh!* *– US*]

“The invention of printing in the 15th century had many consequences, none more significant than the stimulus it gave to the mathematical sciences. All scribes, being human, made occasional errors in preparing a copy of a manuscript. These errors would often be transmitted to copies of the copy. But if the works were literary and the later copyists attended to the meaning of the text, they might recognize and correct many of the errors introduced by their predecessors. Such control could rarely be exercised by copyists required to reproduce texts with significant numbers of mathematical symbols. As a result, a formidable challenge faced the medieval student of a mathematical or astronomical treatise, for it was available to him only in a manuscript copy that had inevitably become corrupt in transmission. After the introduction of printing, all this changed.”

“Copernicus, like his predecessors, had been content to work with observations handed down from the past, making new ones only when unavoidable and using instruments that left much to be desired. Tycho [Brahe], whose work marks the watershed between observational astronomy ancient and modern, saw accuracy of observation as the foundation of all good theorizing. He dreamed of having an observatory where he could pursue the research and development of precision instrumentation, and where a skilled team of assistants would test the instruments even as they were compiling a treasury of observations. Exploiting his contacts at the highest level, Tycho persuaded King Frederick II of Denmark to grant him the fiefdom of the island of Hven, and there, between 1576 and 1580, he constructed Uraniborg (‘Heavenly Castle’), the first scientific research institution of the modern era. […] Tycho was the first of the modern observers, and in his catalogue of 777 stars the positions of the brightest are accurate to a minute or so of arc; but he himself was probably most proud of his cosmology, which Galileo was not alone in seeing as a retrograde compromise. Tycho appreciated the advantages of heliocentic planetary models, but he was also conscious of the objections […]. In particular, his inability to detect annual parallax even with his superb instrumentation implied that the Copernican excuse, that the stars were too far away for annual parallax to be detected, was now implausible in the extreme. The stars, he calculated, would have to be at least 700 times further away than Saturn for him to have failed for this reason, and such a vast, purposeless empty space between the planets and the stars made no sense. He therefore looked for a cosmology that would have the geometrical advantages of the heliocentric models but would retain the Earth as the body physically at rest at the centre of the cosmos. The solution seems obvious in hindsight: make the Sun (and Moon) orbit the central Earth, and make the five planets into satellites of the Sun.”

“Until the invention of the telescope, each generation of astronomers had looked at much the same sky as their predecessors. If they knew more, it was chiefly because they had more books to read, more records to mine. […] Galileo could say of his predecessors, ‘If they had seen what we see, they would have judged as we judge’; and ever since his time, the astronomers of each generation have had an automatic advantage over their predecessors, because they possess apparatus that allows them access to objects unseen, unknown, and therefore unstudied in the past. […] astronomers [for a long time] found themselves in a situation where, as telescopes improved, the two coordinates of a star’s position on the heavenly sphere were being measured with ever increasing accuracy, whereas little was known of the star’s third coordinate, distance, except that its scale was enormous. Even the assumption that the nearest stars were the brightest was […*rightly, US*] being called into question, as the number of known proper motions increased and it emerged that not all the fastest-moving stars were bright.”

“We know little of how Newton’s thinking developed between 1679 and the visit from Halley in 1684, except for a confused exchange of letters between Newton and the Astronomer Royal, John Flamsteed […] the visit from the suitably deferential and tactful Halley encouraged Newton to promise him written proof that elliptical orbits would result from an inverse-square force of attraction residing in the Sun. The drafts grew and grew, and eventually resulted in *The Mathematical Principles of Natural Philosophy* (1687), better known in its abbreviated Latin title of the *Principia*. […] All three of Kepler’s laws (the second in ‘area’ form), which had been derived by their author from observations, with the help of a highly dubious dynamics, were now shown to be consequences of rectilinear motion under an inverse-square force. […] As the drafts of *Principia* multiplied, so too did the number of phenomena that at last found their explanation. The tides resulted from the difference between the effects on the land and on the seas of the attraction of Sun and Moon. The spinning Earth bulged at the equator and was flattened at the poles, and so was not strictly spherical; as a result, the attraction of Sun and Moon caused the Earth’s axis to wobble and so generated the precession of the equinoxes first noticed by Hipparchus. […] Newton was able to use the observed motions of the moons of Earth, Jupiter, and Saturn to calculate the masses of the parent planets, and he found that Jupiter and Saturn were huge compared to Earth – and, in all probability, to Mercury, Venus, and Mars.”

## Quotes

i. “The party that negotiates in haste is often at a disadvantage.” (Howard Raiffa)

ii. “Advice: don’t embarrass your bargaining partner by forcing him or her to make all the concessions.” (-ll-)

iii. “Disputants often fare poorly when they each act greedily and deceptively.” (-ll-)

iv. “Each man does seek his own interest, but, unfortunately, not according to the dictates of reason.” (Kenneth Waltz)

v. “Whatever is said after I’m gone is irrelevant.” (Jimmy Savile)

vi. “Trust is an important lubricant of a social system. It is extremely efficient; it saves a lot of trouble to have a fair degree of reliance on other people’s word. Unfortunately this is not a commodity which can be bought very easily. If you have to buy it, you already have some doubts about what you have bought.” (Kenneth Arrow)

vii. “… an author never does more damage to his readers than when he hides a difficulty.” (Évariste Galois)

viii. “A technical argument by a trusted author, which is hard to check and looks similar to arguments known to be correct, is hardly ever checked in detail” (Vladimir Voevodsky)

ix. “Suppose you want to teach the “cat” concept to a very young child. Do you explain that a cat is a relatively small, primarily carnivorous mammal with retractible claws, a distinctive sonic output, etc.? I’ll bet not. You probably show the kid a lot of different cats, saying “kitty” each time, until it gets the idea. To put it more generally, generalizations are best made by abstraction from experience. They should come one at a time; too many at once overload the circuits.” (Ralph P. Boas Jr.)

x. “Every author has several motivations for writing, and authors of technical books always have, as one motivation, the personal need to understand; that is, they write because they want to learn, or to understand a phenomenon, or to think through a set of ideas.” (Albert Wymore)

xi. “Great mathematics is achieved by solving difficult problems not by fabricating elaborate theories in search of a problem.” (Harold Davenport)

xii. “Is science really gaining in its assault on the totality of the unsolved? As science learns one answer, it is characteristically true that it also learns several new questions. It is as though science were working in a great forest of ignorance, making an ever larger circular clearing within which, not to insist on the pun, things are clear… But as that circle becomes larger and larger, the circumference of contact with ignorance also gets longer and longer. Science learns more and more. But there is an ultimate sense in which it does not gain; for the volume of the appreciated but not understood keeps getting larger. We keep, in science, getting a more and more sophisticated view of our essential ignorance.” (Warren Weaver)

xiii. “When things get too complicated, it sometimes makes sense to stop and wonder: Have I asked the right question?” (Enrico Bombieri)

xiv. “The mean and variance are unambiguously determined by the distribution, but a distribution is, of course, not determined by its mean and variance: A number of different distributions have the same mean and the same variance.” (Richard von Mises)

xv. “Algorithms existed for at least five thousand years, but people did not know that they were algorithmizing. Then came Turing (and Post and Church and Markov and others) and formalized the notion.” (Doron Zeilberger)

xvi. “When a problem seems intractable, it is often a good idea to try to study “toy” versions of it in the hope that as the toys become increasingly larger and more sophisticated, they would metamorphose, in the limit, to the *real thing*.” (-ll-)

xvii. “The kind of mathematics foisted on children in schools is not meaningful, fun, or even very useful. This does not mean that an individual child cannot turn it into a valuable and enjoyable personal game. For some the game is scoring grades; for others it is outwitting the teacher and the system. For many, school math is enjoyable in its repetitiveness, precisely because it is so mindless and dissociated that it provides a shelter from having to think about what is going on in the classroom. But all this proves is the ingenuity of children. It is not a justifications for school math to say that *despite* its intrinsic dullness, inventive children can find excitement and meaning in it.” (Seymour Papert)

xviii. “The optimist believes that this is the best of all possible worlds, and the pessimist fears that this might be the case.” (Ivar Ekeland)

xix. “An equilibrium is not always an optimum; it might not even be good. This may be the most important discovery of game theory.” (-ll-)

xx. “It’s not all that rare for people to suffer from a self-hating monologue. Any good theories about what’s going on there?”

“If there’s things you don’t like about your life, you can blame yourself, or you can blame others. If you blame others and you’re of low status, you’ll be told to cut that out and start blaming yourself. If you blame yourself and you can’t solve the problems, self-hate is the result.” (Nancy Lebovitz & ‘The Nybbler’)

## Interactive Coding with “Optimal” Round and Communication Blowup

…

The youtube description of this one was rather longer than usual, and I decided to quote it in full below:

“The problem of constructing error-resilient interactive protocols was introduced in the seminal works of Schulman (FOCS 1992, STOC 1993). These works show how to convert any two-party interactive protocol into one that is resilient to constant-fraction of error, while blowing up the communication by only a constant factor. Since these seminal works, there have been many follow-up works which improve the error rate, the communication rate, and the computational efficiency. All these works assume that in the underlying protocol, in each round, each party sends a *single* bit. This assumption is without loss of generality, since one can efficiently convert any protocol into one which sends one bit per round. However, this conversion may cause a substantial increase in *round* complexity, which is what we wish to minimize in this work. Moreover, all previous works assume that the communication complexity of the underlying protocol is *fixed* and a priori known, an assumption that we wish to remove. In this work, we consider protocols whose messages may be of *arbitrary* lengths, and where the length of each message and the length of the protocol may be *adaptive*, and may depend on the private inputs of the parties and on previous communication. We show how to efficiently convert any such protocol into another protocol with comparable efficiency guarantees, that is resilient to constant fraction of adversarial error, while blowing up both the *communication* complexity and the *round* complexity by at most a constant factor. Moreover, as opposed to most previous work, our error model not only allows the adversary to toggle with the corrupted bits, but also allows the adversary to *insert* and *delete* bits. In addition, our transformation preserves the computational efficiency of the protocol. Finally, we try to minimize the blowup parameters, and give evidence that our parameters are nearly optimal. This is joint work with Klim Efremenko and Elad Haramaty.”

…

A few links to stuff covered/mentioned in the lecture:

Coding for interactive communication correcting insertions and deletions.

Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes.

Common reference string model.

Small-bias probability spaces: Efficient constructions and applications.

Interactive Channel Capacity Revisited.

Collision (computer science).

Chernoff bound.

## Quantifying tumor evolution through spatial computational modeling

…

Two general remarks: 1. She talks *very* fast, in my opinion unpleasantly fast – the lecture would have been at least slightly easier to follow if she’d slowed down a little. 2. A few of the lectures uploaded in this lecture series (from the IAS Mathematical Methods in Cancer Evolution and Heterogeneity Workshop) seem to have some sound issues; in this lecture there are multiple 1-2 seconds long ‘chunks’ where the sound drops out and some words are lost. This is really annoying, and a similar problem (which was likely ‘the same problem’) previously lead me to quit another lecture in the series; however in this case I decided to give it a shot anyway, and I actually think it’s not a big deal; the sound-losses are very short in duration, and usually no more than one or two words are lost so you can usually figure out what was said. During this lecture there was incidentally also some issues with the monitor roughly 27 minutes in, but this isn’t a big deal as no information was lost and unlike the people who originally attended the lecture you can just skip ahead approximately one minute (that was how long it took to solve that problem).

A few relevant links to stuff she talks about in the lecture:

A Big Bang model of human colorectal tumor growth.

Approximate Bayesian computation.

Site frequency spectrum.

Identification of neutral tumor evolution across cancer types.

Using tumour phylogenetics to identify the roots of metastasis in humans.