Imitation Games – Avi Wigderson
…
If you wish to skip the introduction the talk starts at 5.20. The talk itself lasts roughly an hour, with the last ca. 20 minutes devoted to Q&A – that part is worth watching as well.
Some links related to the talk below:
Theory of computation.
Turing test.
COMPUTING MACHINERY AND INTELLIGENCE.
Probabilistic encryption & how to play mental poker keeping secret all partial information GoldwasserMicali82.
Probabilistic algorithm
How To Generate Cryptographically Strong Sequences Of PseudoRandom Bits (Blum&Micali, 1984)
Randomness extractor
Dense graph
Periodic sequence
Extremal graph theory
Szemerédi’s theorem
Green–Tao theorem
Szemerédi regularity lemma
New Proofs of the GreenTaoZiegler Dense Model Theorem: An Exposition
Calibrating Noise to Sensitivity in Private Data Analysis
Generalization in Adaptive Data Analysis and Holdout Reuse
Book: Math and Computation  Avi Wigderson
Oneway function
Latticebased cryptography
James Simons interview
James Simons. Differential geometry. Minimal varieties in riemannian manifolds. ShiingShen Chern. Characteristic Forms and Geometric Invariants. Renaissance Technologies.
“That’s really what’s great about basic science and in this case mathematics, I mean, I didn’t know any physics. It didn’t occur to me that this material, that Chern and I had developed would find use somewhere else altogether. This happens in basic science all the time that one guy’s discovery leads to someone else’s invention and leads to another guy’s machine or whatever it is. Basic science is the seed corn of our knowledge of the world. …I loved the subject, but I liked it for itself, I wasn’t thinking of applications. […] the government’s not doing such a good job at supporting basic science and so there’s a role for philanthropy, an increasingly important role for philanthropy.”
“My algorithm has always been: You put smart people together, you give them a lot of freedom, create an atmosphere where everyone talks to everyone else. They’re not hiding in the corner with their own little thing. They talk to everybody else. And you provide the best infrastructure. The best computers and so on that people can work with and make everyone partners.”
“We don’t have enough teachers of mathematics who know it, who know the subject … and that’s for a simple reason: 3040 years ago, if you knew some mathematics, enough to teach in let’s say high school, there weren’t a million other things you could do with that knowledge. Oh yeah, maybe you could become a professor, but let’s suppose you’re not quite at that level but you’re good at math and so on.. Being a math teacher was a nice job. But today if you know that much mathematics, you can get a job at Google, you can get a job at IBM, you can get a job in Goldman Sachs, I mean there’s plenty of opportunities that are going to pay more than being a high school teacher. There weren’t so many when I was going to high school … so the quality of high school teachers in math has declined, simply because if you know enough to teach in high school you know enough to work for Google…”
Quotes
 “It is not the answer that enlightens, but the question.” (Eugène Ionesco)
 “Where would be the merit if heroes were never afraid?” (Alphonse Daudet)
 “In wartime a man is called a hero. It doesn’t make him any braver, and he runs for his life. But at least it’s a hero who is running away.” (Jean Giraudoux)
 “Love is worth whatever it costs.” (Françoise Sagan)
 “It is healthier to see the good points of others than to analyze our own bad ones.” (ll)
 “When a man has dreamed of winning something by a colossal stroke of luck, he is prone to neglect petty but more practical ways of attaining it.” (ll)
 “I find war detestable but those who praise it without participating in it even more so.” (Romain Rolland)
 “There is something sadder to lose than life – the reason for living; Sadder than to lose one’s possessions is to lose one’s hope.” (Paul Claudel)
 “This is rather as if you imagine a puddle waking up one morning and thinking, ‘This is an interesting world I find myself in — an interesting hole I find myself in — fits me rather neatly, doesn’t it? In fact it fits me staggeringly well, must have been made to have me in it!’ This is such a powerful idea that as the sun rises in the sky and the air heats up and as, gradually, the puddle gets smaller and smaller, frantically hanging on to the notion that everything’s going to be alright, because this world was meant to have him in it, was built to have him in it; so the moment he disappears catches him rather by surprise. I think this may be something we need to be on the watch out for.” (Douglas Adams)
 “The Englishman of 1750 was closer in material things to Caesar’s legionnaires than to his own greatgrandchildren.” (Walter Scheidel, Escape from Rome (The Princeton Economic History of the Western World, Princeton University Press))
 “In the Western world, […] mature male stature rose by five inches between the late eighteenth and the late twentieth centuries.” (ll)
 People exaggerate both happiness and unhappiness; we are never so fortunate nor so unfortunate as people say we are. (/On amplifie également le malheur et le bonheur, nous ne sommes jamais ni si malheureux, ni si heureux qu’on le dit.) (Honoré de Balzac)
 “When women love, they forgive everything, even our crimes; when they do not love, they cannot forgive anything, not even our virtues.” (/Lorsque les femmes nous aiment, elles nous pardonnent tout, même nos crimes; lorsqu’elles ne nous aiment pas, elles ne nous pardonnent rien, pas même nos vertus!) (ll)
 “Those who spend too fast never grow rich.” (/Qui dépense trop n’est jamais riche) (ll)
 “Numerical results of mathematical problems can be tested by comparing them to observed numbers, or to a commonsense estimate of observable numbers. […] Yet every teacher knows that students achieve incredible things in this respect. Some students are not disturbed at all when they find 16,130 ft. for the length of the boat and 8 years, 2 months for the age of the captain who is, by the way, known to be a grandfather. Such neglect of the obvious does not show necessarily stupidity but rather indifference toward artificial problems. […] [A] teacher of mathematics has a great opportunity. If he fills his allotted time with drilling his students in routine operations he kills their interest, hampers their intellectual development, and misuses his opportunity. But if he challenges the curiosity of his students by setting them problems proportionate to their knowledge, and helps them to solve their problems with stimulating questions, he may give them a taste for, and some means of, independent thinking.” (George Pólya, How to Solve It. Princeton University Press)
 “If you can’t solve a problem, then there is an easier problem you can’t solve. Find it.” (ll)
 “No idea is really bad, unless we are uncritical. What is really bad is to have no idea at all. […] in theoretical matters, the best of ideas is hurt by uncritical acceptance and thrives on critical examination.” (ll)
 “Let us sum up. Recollecting formerly solved problems with the same or a similar unknown (formerly proved theorems with the same or a similar conclusion) we have a good chance to start in the right direction and we may conceive a plan of the solution. In simple cases, which are the most frequent in less advanced classes, the most elementary problems with the same unknown (theorems with the same conclusion) are usually sufficient. Trying to recollect problems with the same unknown is an obvious and commonsense device […]. It is rather surprising that such a simple and useful device is not more widely known […] neither students nor teachers of mathematics can afford to ignore the proper use of the suggestion: Look at the unknown! And try to think of a familiar problem having the same or a similar unknown.” (ll)
 “Speaking and thinking are closely connected, the use of words assists the mind. […] choosing a suitable notation may contribute essentially to understanding the problem. […] A good notation should be unambiguous, pregnant, easy to remember; it should avoid harmful second meanings, and take advantage of useful second meanings; the order and connection of signs should suggest the order and connection of things. […] we should choose our notation carefully, and have some good reason for our choice. […] Not only the most hopeless boys in the class but also quite intelligent students may have an aversion for algebra. There is always something arbitrary and artificial about notation; to learn a new notation is a burden for the memory. The intelligent student refuses to assume the burden if he does not see any compensation for it. The intelligent student is justified in his aversion for algebra if he is not given ample opportunity to convince himself by his own experience that the language of mathematical symbols assists the mind. To help him to such experience is an important task of the teacher, one of his most important tasks.” (ll)
 “Pedantry and mastery are opposite attitudes toward rules. […] To apply a rule to the letter, rigidly, unquestioningly, in cases where it fits and in cases where it does not fit, is pedantry. Some pedants are poor fools; they never did understand the rule which they apply so conscientiously and so indiscriminately. Some pedants are quite successful; they understood their rule, at least in the beginning (before they became pedants), and chose a good one that fits in many cases and fails only occasionally. To apply a rule with natural ease, with judgment, noticing the cases where it fits, and without ever letting the words of the rule obscure the purpose of the action or the opportunities of the situation, is mastery.” (ll)
 “L’amour est un tyran qui n’épargne personne.” (/Love is a tyrant, sparing none.) (Pierre Corneille)
 “To conquer without risk is to triumph without glory.” (ll)
 “Il faut bonne mémoire après qu’on a menti.” (/It takes a good memory to keep up a lie.) (ll)
 “The immune system functions so well that most of the time we do not notice it is actually working at all. However, it is continuously active, preventing severe infection from the microorganisms which colonize our skin and our gut, and suppressing the chronic virus infections most of us picked up as infants. […] There are [even] data to suggest that mate choice (including in humans) can be driven by olfactory signals derived from […] MHC molecules — such that those with divergent MHC types are chosen, hence maximizing the number of different MHC molecules available to the offspring.” (Paul Klenerman – The Immune System: A Very Short Introduction, Oxford University Press)
 “Music expresses that which cannot be said and on which it is impossible to be silent.” (Victor Hugo)
 “Being a husband is a wholetime job. That is why so many husbands fail. They can’t give their entire attention to it.” (Arnold Bennett)
 “Journalists say a thing that they know isn’t true, in the hope that if they keep on saying it long enough it will be true.” (ll) (They are wrong, and people should really stop taking those people seriously – see part ii. here)
 “Dying is a very dull, dreary affair. And my advice to you is to have nothing whatever to do with it.” (W. Somerset Maugham)
 “People ask you for criticism, but they only want praise.” (ll)
 “Unfortunately, theories that explain everything often explain very little.” (William Bynum. The History of Medicine: A Very Short Introduction (p. 76). Oxford University Press)
 “Whatever the system of medical care, in Western societies, thirdparty arrangements are the norm in hospital payments, so large are the bills. The costs of building, heating, lighting, maintaining, equipping, and staffing these complex institutions have been an increasing concern for the past century. The guaranteeing body has been variously the state, the municipality, a religious organization, an insurance company, a charitable group, individual governors, a rich benefactor, or a combination of these. […] the drive for efficiency, and the adoption of business models, characterizes almost all modern hospitals. […] While developed nations can take the surveillance and regulations of public health for granted, or be incensed when they fail, […] the problems encountered in poorer parts of the world would not have surprised Edwin Chadwick or other advocates in 19thcentury Europe. Issues of child and maternal mortality, epidemic diseases, poverty, and poor sanitation are still with us.” (Ibid., pp. 127128, 136)
 “I used to watch a lot of news and commentary until one day I tried to tally up what I had learned during a month of it and found the quantity of facts could fit on a postage stamp.” (Zach Weiner)
 “Eppur si muove…” (Galileo Galilei)
 “Birds born in a cage think flying is an illness.” (Alejandro Jodorowsky)
Designing Fast and Robust Learning Algorithms – Yu Cheng
…
Some links related to the lecture’s coverage:
Recommender system.
Collaborative filtering.
Matrix completion.
NonConvex Matrix Completion Against a SemiRandom Adversary (Cheng & Ge, 2018).
Singular value decomposition.
Spectral graph theory.
Spectral Sparsification of Graphs (Spielman & Teng).
Cut (graph theory).
Split (graph theory).
Robust statistics.
Being Robust (in High Dimensions) Can Be Practical (Diakonikolas et al).
HighDimensional Robust Mean Estimation in NearlyLinear Time (Cheng, Diakonikolas and Ge).
Data science (I?)
I’m not sure if I’ll actually blog this book in detail – I might, later on, but for now I’ll just cover it extremely lazily, by adding links to topics covered which I figured I wanted to include in this post.
The book is ‘okay’ – it’ll both allow (relatively) nontechnical (management) people to at least begin to understand what sort of tasks the more technical guys are spending time on (and how to prioritize regarding critical resources, and engage with the nerds!), and it might also give the data guys a few more tools that they’ll be able to use when confronted with a specific issue. I really liked the book’s emphasis on conceptualizing data as a strategic asset. On the other hand I imagine some parts of the book will often be close to painful to read for people who have spent at least a few semesters dealing with statsrelated topics in the past: This is the sort of book which is also at least in part written for people who might not be completely clear on what a statistical hypothesis test is, which discusses text mining without at any point in the coverage even mentioning the existence of regular expressions, and which discusses causal evaluation without mentioning topics like IV estimation.
Although there are some major gaps in the coverage the level of coverage is however not really all that bad; I hope to refer to at least some of the more technical material included in the book in my work in the future, but it’s not clear at this point how relevant this stuff’ll actually end up being longterm.
…
Links (…in random order, I did not have the book in front of me as I was writing this post so this is just a collection of links/topics I could recall being potentially worth including here):
Training, validation, and test sets
Crossvalidation (statistics)
Statistical classification
Tree model
Decision tree pruning
Random forest
Naive Bayes classifier
Bigram
ngram
Data mining
Zipf’s law (not covered, but relevant to some parts of the coverage)
Nearest neighbor search
Knearest_neighbors_algorithm
Cluster analysis
Jaccard index
Bias–variance tradeoff
Hierarchical clustering
Dendrogram
Boosting (machine learning)
Ensemble learning
Feature (machine learning)
Feature selection
Curse of dimensionality
Regularization (mathematics)
Overfitting
Association rule learning
Labeled data
Dimensionality reduction
Supervised_learning/Unsupervised learning
Model selection
Rubin causal model (not covered, but relevant to some parts of the coverage)
Regression discontinuity design (ll)
Lift (data mining)
Receiver operating characteristic
Stepwise regression
Grid search (hyperparameter optimization).
The Shapes of Spaces and the Nuclear Force
…
This one was in my opinion a great lecture which I enjoyed watching. It covers some quite highlevel mathematics and physics and some of the ways in which these two fields intersected in a specific historical research context; however it does so in a way that will enable many people outside of the fields involved to be able to follow the narrative reasonably easily.
Some links related to the lecture coverage:
Topological space.
Topological invariant.
Topological isomorphism.
Dimension of a mathematical space.
Metrically topologically complete space.
Genus (mathematics).
Quotient space (topology).
Will we ever classify simplyconnected smooth 4manifolds? (Stern, 2005).
Nuclear force.
Coulomb’s law.
Maxwell’s equations.
Commutative property.
Abelian group.
Nonabelian group.
Yang–Mills theory.
Soliton.
Instanton.
Michael Atiyah.
Donaldson theory.
Michael Freedman.
Topological (quantum) field theory.
Edward Witten.
Effective field theory.
Seiberg–Witten invariants.
“Theoretical mathematics”: toward a cultural synthesis of mathematics and theoretical physics (Jaffe & Quinn, 1993).
Responses to “Theoretical mathematics: toward a cultural synthesis of mathematics and theoretical physics (Atiyah et al, 1994).
A recent perspective on invariant theory
…
Some time ago I covered here on the blog a lecture with a somewhat technical introduction to invariant theory. Even though I didn’t recommend the lecture, I do recommend that you don’t watch the lecture above without first knowing the sort of stuff that might have been covered in that lecture (for all you know, that is), as well as some other lectures on related topics – to be more specific, to get anything out of this lecture you need some linear algebra, you need graph theory, you need some understanding of group theory, you need to know a little about computational complexity, it’ll probably help if you know a bit about invariant theory already, and surely you need some knowledge of a few other topics I forgot to mention. One of the statements I made about the introductory lecture to which I linked above also applies here: “I had to look up a lot of stuff to just sortofkindof muddle along”.
Below some links to stuff I looked up while watching the lecture:
Algebraically closed field.
Reductive group.
Rational representation.
Group homomorphism.
Morphism of algebraic varieties.
Fixedpoint subring.
Graph isomorphism.
Adjacency matrix.
Group action (mathematics).
General linear group.
Special linear group.
Alternating minimization, scaling algorithms, and the nullcone problem from invariant theory. (Bürgisser, Garg, Oliveira, Walter, and Wigderson (2017))
Noether normalization lemma.
Succinct data structure. (This link is actually not directly related to the lecture’s coverage; I came across it by accident while looking for topics he did talk about and I found it interesting, so I decided to include the link here anyway)
Characteristic polynomial.
Matrix similarity.
Monomial.
Associative algebra.
Polynomial degree bounds for matrix semiinvariants (Derksen & Makam, 2015).
Semiinvariant of a quiver.
The pleasure of finding things out (II)
Here’s my first post about the book. In this post I have included a few more quotes from the last half of the book.
…
“Are physical theories going to keep getting more abstract and mathematical? Could there be today a theorist like Faraday in the early nineteenth century, not mathematically sophisticated but with a very powerful intuition about physics?
Feynman: I’d say the odds are strongly against it. For one thing, you need the math just to understand what’s been done so far. Beyond that, the behavior of subnuclear systems is so strange compared to the ones the brain evolved to deal with that the analysis has to be very abstract: To understand ice, you have to understand things that are themselves very unlike ice. Faraday’s models were mechanical – springs and wires and tense bands in space – and his images were from basic geometry. I think we’ve understood all we can from that point of view; what we’ve found in this century is different enough, obscure enough, that further progress will require a lot of math.”
“There’s a tendency to pomposity in all this, to make it all deep and profound. My son is taking a course in philosophy, and last night we were looking at something by Spinoza – and there was the most childish reasoning! There were all these Attributes, and Substances, all this meaningless chewing around, and we started to laugh. Now, how could we do that? Here’s this great Dutch philosopher, and we’re laughing at him. It’s because there was no excuse for it! In that same period there was Newton, there was Harvey studying the circulation of the blood, there were people with methods of analysis by which progress was being made! You can take every one of Spinoza’s propositions, and take the contrary propositions, and look at the world – and you can’t tell which is right. Sure, people were awed because he had the courage to take on these great questions, but it doesn’t do any good to have the courage if you can’t get anywhere with the question. […] It isn’t the philosophy that gets me, it’s the pomposity. If they’d just laugh at themselves! If they’d just say, “I think it’s like this, but von Leipzig thought it was like that, and he had a good shot at it, too.” If they’d explain that this is their best guess … But so few of them do”.
“The lesson you learn as you grow older in physics is that what we can do is a very small fraction of what there is. Our theories are really very limited.”
“The first principle is that you must not fool yourself – and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.”
“When I was an undergraduate I worked with Professor Wheeler* as a research assistant, and we had worked out together a new theory about how light worked, how the interaction between atoms in different places worked; and it was at that time an apparently interesting theory. So Professor Wigner†, who was in charge of the seminars there [at Princeton], suggested that we give a seminar on it, and Professor Wheeler said that since I was a young man and hadn’t given seminars before, it would be a good opportunity to learn how to do it. So this was the first technical talk that I ever gave. I started to prepare the thing. Then Wigner came to me and said that he thought the work was important enough that he’d made special invitations to the seminar to Professor Pauli, who was a great professor of physics visiting from Zurich; to Professor von Neumann, the world’s greatest mathematician; to Henry Norris Russell, the famous astronomer; and to Albert Einstein, who was living near there. I must have turned absolutely white or something because he said to me, “Now don’t get nervous about it, don’t be worried about it. First of all, if Professor Russell falls asleep, don’t feel bad, because he always falls asleep at lectures. When Professor Pauli nods as you go along, don’t feel good, because he always nods, he has palsy,” and so on. That kind of calmed me down a bit”.
“Well, for the problem of understanding the hadrons and the muons and so on, I can see at the present time no practical applications at all, or virtually none. In the past many people have said that they could see no applications and then later they found applications. Many people would promise under those circumstances that something’s bound to be useful. However, to be honest – I mean he looks foolish; saying there will never be anything useful is obviously a foolish thing to do. So I’m going to be foolish and say these damn things will never have any application, as far as I can tell. I’m too dumb to see it. All right? So why do you do it? Applications aren’t the only thing in the world. It’s interesting in understanding what the world is made of. It’s the same interest, the curiosity of man that makes him build telescopes. What is the use of discovering the age of the universe? Or what are these quasars that are exploding at long distances? I mean what’s the use of all that astronomy? There isn’t any. Nonetheless, it’s interesting. So it’s the same kind of exploration of our world that I’m following and it’s curiosity that I’m satisfying. If human curiosity represents a need, the attempt to satisfy curiosity, then this is practical in the sense that it is that. That’s the way I would look at it at the present time. I would not put out any promise that it would be practical in some economic sense.”
“To science we also bring, besides the experiment, a tremendous amount of human intellectual attempt at generalization. So it’s not merely a collection of all those things which just happen to be true in experiments. It’s not just a collection of facts […] all the principles must be as wide as possible, must be as general as possible, and still be in complete accord with experiment, that’s the challenge. […] Evey one of the concepts of science is on a scale graduated somewhere between, but at neither end of, absolute falsity or absolute truth. It is necessary, I believe, to accept this idea, not only for science, but also for other things; it is of great value to acknowledge ignorance. It is a fact that when we make decisions in our life, we don’t necessarily know that we are making them correctly; we only think that we are doing the best we can – and that is what we should do.”
“In this age of specialization, men who thoroughly know one field are often incompetent to discuss another.”
“I believe that moral questions are outside of the scientific realm. […] The typical human problem, and one whose answer religion aims to supply, is always of the following form: Should I do this? Should we do this? […] To answer this question we can resolve it into two parts: First – If I do this, what will happen? – and second – Do I want that to happen? What would come of it of value – of good? Now a question of the form: If I do this, what will happen? is strictly scientific. […] The technique of it, fundamentally, is: Try it and see. Then you put together a large amount of information from such experiences. All scientists will agree that a question – any question, philosophical or other – which cannot be put into the form that can be tested by experiment (or, in simple terms, that cannot be put into the form: If I do this, what will happen?) is not a scientific question; it is outside the realm of science.”
Random stuff
i. Your Care Home in 120 Seconds. Some quotes:
“In order to get an overall estimate of mental power, psychologists have chosen a series of tasks to represent some of the basic elements of problem solving. The selection is based on looking at the sorts of problems people have to solve in everyday life, with particular attention to learning at school and then taking up occupations with varying intellectual demands. Those tasks vary somewhat, though they have a core in common.
Most tests include Vocabulary, examples: either asking for the definition of words of increasing rarity; or the names of pictured objects or activities; or the synonyms or antonyms of words.
Most tests include Reasoning, examples: either determining which pattern best completes the missing cell in a matrix (like Raven’s Matrices); or putting in the word which completes a sequence; or finding the odd word out in a series.
Most tests include visualization of shapes, examples: determining the correspondence between a 3D figure and alternative 2D figures; determining the pattern of holes that would result from a sequence of folds and a punch through folded paper; determining which combinations of shapes are needed to fill a larger shape.
Most tests include episodic memory, examples: number of idea units recalled across two or three stories; number of words recalled from across 1 to 4 trials of a repeated word list; number of words recalled when presented with a stimulus term in a pairedassociate learning task.
Most tests include a rather simple set of basic tasks called Processing Skills. They are rather humdrum activities, like checking for errors, applying simple codes, and checking for similarities or differences in word strings or line patterns. They may seem low grade, but they are necessary when we try to organise ourselves to carry out planned activities. They tend to decline with age, leading to patchy, unreliable performance, and a tendency to muddled and even harmful errors. […]
A brain scan, for all its apparent precision, is not a direct measure of actual performance. Currently, scans are not as accurate in predicting behaviour as is a simple test of behaviour. This is a simple but crucial point: so long as you are willing to conduct actual tests, you can get a good understanding of a person’s capacities even on a very brief examination of their performance. […] There are several tests which have the benefit of being quick to administer and powerful in their predictions.[..] All these tests are good at picking up illness related cognitive changes, as in diabetes. (Intelligence testing is rarely criticized when used in medical settings). Delayed memory and working memory are both affected during diabetic crises. Digit Symbol is reduced during hypoglycaemia, as are Digits Backwards. Digit Symbol is very good at showing general cognitive changes from age 70 to 76. Again, although this is a limited time period in the elderly, the decline in speed is a notable feature. […]
The most robust and consistent predictor of cognitive change within old age, even after control for all the other variables, was the presence of the APOE e4 allele. APOE e4 carriers showed over half a standard deviation more general cognitive decline compared to noncarriers, with particularly pronounced decline in their Speed and numerically smaller, but still significant, declines in their verbal memory.
It is rare to have a big effect from one gene. Few people carry it, and it is not good to have.“
…
ii. What are common mistakes junior data scientists make?
Apparently the OP had second thoughts about this query so s/he deleted the question and marked the thread nsfw (??? …nothing remotely nsfw in that thread…). Fortunately the replies are all still there, there are quite a few good responses in the thread. I added some examples below:
“I think underestimating the domain/business side of things and focusing too much on tools and methodology. As a fairly new data scientist myself, I found myself humbled during this one project where I had I spent a lot of time tweaking parameters and making sure the numbers worked just right. After going into a meeting about it became clear pretty quickly that my little microoptimizations were hardly important, and instead there were X Y Z big picture considerations I was missing in my analysis.”
[…]

Forgetting to check how actionable the model (or features) are. It doesn’t matter if you have amazing model for cancer prediction, if it’s based on features from tests performed as part of the postmortem. Similarly, predicting account fraud after the money has been transferred is not going to be very useful.

Emphasis on lack of understanding of the business/domain.

Lack of communication and presentation of the impact. If improving your model (which is a quarter of the overall pipeline) by 10% in reducing customer churn is worth just ~100K a year, then it may not be worth putting into production in a large company.

Underestimating how hard it is to productionize models. This includes acting on the models outputs, it’s not just “run model, get score out per sample”.

Forgetting about model and feature decay over time, concept drift.

Underestimating the amount of time for data cleaning.

Thinking that data cleaning errors will be complicated.

Thinking that data cleaning will be simple to automate.

Thinking that automation is always better than heuristics from domain experts.
 Focusing on modelling at the expense of [everything] else”
“unhealthy attachments to tools. It really doesn’t matter if you use R, Python, SAS or Excel, did you solve the problem?”
“Starting with actual modelling way too soon: you’ll end up with a model that’s really good at answering the wrong question.
First, make sure that you’re trying to answer the right question, with the right considerations. This is typically not what the client initially told you. It’s (mainly) a data scientist’s job to help the client with formulating the right question.”
…
iii. Some random wikipedia links: Ottoman–Habsburg wars. Planetshine. Anticipation (genetics). Cloze test. Loop quantum gravity. Implicature. Starfish Prime. Stall (fluid dynamics). White Australia policy. Apostatic selection. Deimatic behaviour. Antipredator adaptation. Lefschetz fixedpoint theorem. Hairy ball theorem. Macedonia naming dispute. Holevo’s theorem. Holmström’s theorem. Sparse matrix. Binary search algorithm. Battle of the Bismarck Sea.
…
iv. 5HTTLPR: A Pointed Review. This one is hard to quote, you should read all of it. I did however decide to add a few quotes from the post, as well as a few quotes from the comments:
“…what bothers me isn’t just that people said 5HTTLPR mattered and it didn’t. It’s that we built whole imaginary edifices, whole castles in the air on top of this idea of 5HTTLPR mattering. We “figured out” how 5HTTLPR exerted its effects, what parts of the brain it was active in, what sorts of things it interacted with, how its effects were enhanced or suppressed by the effects of other imaginary depression genes. This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blowbyblow account of a wrestling match between unicorns and Bigfoot.
This is why I start worrying when people talk about how maybe the replication crisis is overblown because sometimes experiments will go differently in different contexts. The problem isn’t just that sometimes an effect exists in a cold room but not in a hot room. The problem is more like “you can get an entire field with hundreds of studies analyzing the behavior of something that doesn’t exist”. There is no amount of contextsensitivity that can help this. […] The problem is that the studies came out positive when they shouldn’t have. This was a perfectly fine thing to study before we understood genetics well, but the whole point of studying is that, once you have done 450 studies on something, you should end up with more knowledge than you started with. In this case we ended up with less. […] I think we should take a second to remember that yes, this is really bad. That this is a rare case where methodological improvements allowed a conclusive test of a popular hypothesis, and it failed badly. How many other cases like this are there, where there’s no geneticist with a 600,000 person sample size to check if it’s true or not? How many of our scientific edifices are built on air? How many useless products are out there under the guise of good science? We still don’t know.”
A few more quotes from the comment section of the post:
“most things that are obviously advantageous or deleterious in a major way aren’t gonna hover at 10%/50%/70% allele frequency.
Population variance where they claim some gene found in > [non trivial]% of the population does something big… I’ll mostly tend to roll to disbelieve.
But if someone claims a family/village with a load of weirdly depressed people (or almost any other disorder affecting anything related to the human condition in any horrifying way you can imagine) are depressed because of a genetic quirk… believable but still make sure they’ve confirmed it segregates with the condition or they’ve got decent backing.
And a large fraction of people have some kind of rare disorder […]. Long tail. Lots of disorders so quite a lot of people with something odd.
It’s not that single variants can’t have a big effect. It’s that really big effects either win and spread to everyone or lose and end up carried by a tiny minority of families where it hasn’t had time to die out yet.
Very few variants with big effect sizes are going to be half way through that process at any given time.
Exceptions are
1: mutations that confer resistance to some disease as a tradeoff for something else […] 2: Genes that confer a big advantage against something that’s only a very recent issue.”
“I think the summary could be something like:
A single gene determining 50% of the variance in any complex trait is inherently atypical, because variance depends on the population plus environment and the selection for such a gene would be strong, rapidly reducing that variance.
However, if the environment has recently changed or is highly variable, or there is a tradeoff against adverse effects it is more likely.
Furthermore – if the test population is specifically engineered to target an observed trait following an apparently Mendelian inheritance pattern – such as a family group or a small genetically isolated population plus controls – 50% of the variance could easily be due to a single gene.”
…
“The most overused and underanalyzed statement in the academic vocabulary is surely “more research is needed”. These four words, occasionally justified when they appear as the last sentence in a Masters dissertation, are as often to be found as the coda for a megatrial that consumed the lion’s share of a national research budget, or that of a Cochrane review which began with dozens or even hundreds of primary studies and progressively excluded most of them on the grounds that they were “methodologically flawed”. Yet however large the trial or however comprehensive the review, the answer always seems to lie just around the next empirical corner.
With due respect to all those who have used “more research is needed” to sum up months or years of their own work on a topic, this ultimate academic cliché is usually an indicator that serious scholarly thinking on the topic has ceased. It is almost never the only logical conclusion that can be drawn from a set of negative, ambiguous, incomplete or contradictory data.” […]
“Here is a quote from a typical genomewide association study:
“Genomewide association (GWA) studies on coronary artery disease (CAD) have been very successful, identifying a total of 32 susceptibility loci so far. Although these loci have provided valuable insights into the etiology of CAD, their cumulative effect explains surprisingly little of the total CAD heritability.” [1]
The authors conclude that not only is more research needed into the genomic loci putatively linked to coronary artery disease, but that – precisely because the model they developed was so weak – further sets of variables (“genetic, epigenetic, transcriptomic, proteomic, metabolic and intermediate outcome variables”) should be added to it. By adding in more and more sets of variables, the authors suggest, we will progressively and substantially reduce the uncertainty about the multiple and complex geneenvironment interactions that lead to coronary artery disease. […] We predict tomorrow’s weather, more or less accurately, by measuring dynamic trends in today’s air temperature, wind speed, humidity, barometric pressure and a host of other meteorological variables. But when we try to predict what the weather will be next month, the accuracy of our prediction falls to little better than random. Perhaps we should spend huge sums of money on a more sophisticated weatherprediction model, incorporating the tides on the seas of Mars and the flutter of butterflies’ wings? Of course we shouldn’t. Not only would such a hyperinclusive model fail to improve the accuracy of our predictive modeling, there are good statistical and operational reasons why it could well make it less accurate.”
…
vi. Why software projects take longer than you think – a statistical model.
“Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.
Let’s say you estimate a project to take 1 week. Let’s say there are three equally likely outcomes: either it takes 1/2 week, or 1 week, or 2 weeks. The median outcome is actually the same as the estimate: 1 week, but the mean (aka average, aka expected value) is 7/6 = 1.17 weeks. The estimate is actually calibrated (unbiased) for the median (which is 1), but not for the the mean.
A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a lognormal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the lognormal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger […] Intuitively the reason the mean is so large is that tasks that complete faster than estimated have no way to compensate for the tasks that take much longer than estimated. We’re bounded by 0, but unbounded in the other direction.”
I like this way to conceptually frame the problem, and I definitely do not think it only applies to software development.
“I filed this in my brain under “curious toy models” for a long time, occasionally thinking that it’s a neat illustration of a real world phenomenon I’ve observed. But surfing around on the interwebs one day, I encountered an interesting dataset of project estimation and actual times. Fantastic! […] The median blowup factor turns out to be exactly 1x for this dataset, whereas the mean blowup factor is 1.81x. Again, this confirms the hunch that developers estimate the median well, but the mean ends up being much higher. […]
If my model is right (a big if) then here’s what we can learn:
 People estimate the median completion time well, but not the mean.
 The mean turns out to be substantially worse than the median, due to the distribution being skewed (lognormally).
 When you add up the estimates for n tasks, things get even worse.
 Tasks with the most uncertainty (rather the biggest size) can often dominate the mean time it takes to complete all tasks.”
…
vii. Attraction inequality and the dating economy.
“…the relentless focus on inequality among politicians is usually quite narrow: they tend to consider inequality only in monetary terms, and to treat “inequality” as basically synonymous with “income inequality.” There are so many other types of inequality that get air time less often or not at all: inequality of talent, height, number of friends, longevity, inner peace, health, charm, gumption, intelligence, and fortitude. And finally, there is a type of inequality that everyone thinks about occasionally and that young single people obsess over almost constantly: inequality of sexual attractiveness. […] One of the useful tools that economists use to study inequality is the Gini coefficient. This is simply a number between zero and one that is meant to represent the degree of income inequality in any given nation or group. An egalitarian group in which each individual has the same income would have a Gini coefficient of zero, while an unequal group in which one individual had all the income and the rest had none would have a Gini coefficient close to one. […] Some enterprising data nerds have taken on the challenge of estimating Gini coefficients for the dating “economy.” […] The Gini coefficient for [heterosexual] men collectively is determined by [ll] women’s collective preferences, and vice versa. If women all find every man equally attractive, the male dating economy will have a Gini coefficient of zero. If men all find the same one woman attractive and consider all other women unattractive, the female dating economy will have a Gini coefficient close to one.”
“A data scientist representing the popular dating app “Hinge” reported on the Gini coefficients he had found in his company’s abundant data, treating “likes” as the equivalent of income. He reported that heterosexual females faced a Gini coefficient of 0.324, while heterosexual males faced a much higher Gini coefficient of 0.542. So neither sex has complete equality: in both cases, there are some “wealthy” people with access to more romantic experiences and some “poor” who have access to few or none. But while the situation for women is something like an economy with some poor, some middle class, and some millionaires, the situation for men is closer to a world with a small number of superbillionaires surrounded by huge masses who possess almost nothing. According to the Hinge analyst:
On a list of 149 countries’ Gini indices provided by the CIA World Factbook, this would place the female dating economy as 75th most unequal (average—think Western Europe) and the male dating economy as the 8th most unequal (kleptocracy, apartheid, perpetual civil war—think South Africa).”
Btw., I’m reasonably certain “Western Europe” as most people think of it is not average in terms of Gini, and that halfway down the list should rather be represented by some other region or country type, like, say Mongolia or Bulgaria. A brief look at Gini lists seemed to support this impression.
“Quartz reported on this finding, and also cited another article about an experiment with Tinder that claimed that that “the bottom 80% of men (in terms of attractiveness) are competing for the bottom 22% of women and the top 78% of women are competing for the top 20% of men.” These studies examined “likes” and “swipes” on Hinge and Tinder, respectively, which are required if there is to be any contact (via messages) between prospective matches. […] Yet another study, run by OkCupid on their huge datasets, found that women rate 80 percent of men as “worselooking than medium,” and that this 80 percent “belowaverage” block received replies to messages only about 30 percent of the time or less. By contrast, men rate women as worselooking than medium only about 50 percent of the time, and this 50 percent belowaverage block received message replies closer to 40 percent of the time or higher.
If these findings are to be believed, the great majority of women are only willing to communicate romantically with a small minority of men while most men are willing to communicate romantically with most women. […] It seems hard to avoid a basic conclusion: that the majority of women find the majority of men unattractive and not worth engaging with romantically, while the reverse is not true. Stated in another way, it seems that men collectively create a “dating economy” for women with relatively low inequality, while women collectively create a “dating economy” for men with very high inequality.”
…
I think the author goes a bit off the rails later in the post, but the data is interesting. It’s however important keeping in mind in contexts like these that sexual selection pressures apply at multiple levels, not just one, and that partner preferences can be nontrivial to model satisfactorily; for example as many women have learned the hard way, males may have very different standards for whom to a) ‘engage with romantically’ and b) ‘consider a longterm partner’.
…
viii. Flipping the Metabolic Switch: Understanding and Applying Health Benefits of Fasting.
“Intermittent fasting (IF) is a term used to describe a variety of eating patterns in which no or few calories are consumed for time periods that can range from 12 hours to several days, on a recurring basis. Here we focus on the physiological responses of major organ systems, including the musculoskeletal system, to the onset of the metabolic switch – the point of negative energy balance at which liver glycogen stores are depleted and fatty acids are mobilized (typically beyond 12 hours after cessation of food intake). Emerging findings suggest the metabolic switch from glucose to fatty acidderived ketones represents an evolutionarily conserved trigger point that shifts metabolism from lipid/cholesterol synthesis and fat storage to mobilization of fat through fatty acid oxidation and fattyacid derived ketones, which serve to preserve muscle mass and function. Thus, IF regimens that induce the metabolic switch have the potential to improve body composition in overweight individuals. […] many experts have suggested IF regimens may have potential in the treatment of obesity and related metabolic conditions, including metabolic syndrome and type 2 diabetes.(20)”
“In most studies, IF regimens have been shown to reduce overall fat mass and visceral fat both of which have been linked to increased diabetes risk.(139) IF regimens ranging in duration from 8 to 24 weeks have consistently been found to decrease insulin resistance.(12, 115, 118, 119, 122, 123, 131, 132, 134, 140) In line with this, many, but not all,(7) largescale observational studies have also shown a reduced risk of diabetes in participants following an IF eating pattern.”
“…we suggest that future randomized controlled IF trials should use biomarkers of the metabolic switch (e.g., plasma ketone levels) as a measure of compliance and the magnitude of negative energy balance during the fasting period. It is critical for this switch to occur in order to shift metabolism from lipidogenesis (fat storage) to fat mobilization for energy through fatty acid βoxidation. […] As the health benefits and therapeutic efficacies of IF in different disease conditions emerge from RCTs, it is important to understand the current barriers to widespread use of IF by the medical and nutrition community and to develop strategies for broad implementation. One argument against IF is that, despite the plethora of animal data, some human studies have failed to show such significant benefits of IF over CR [Calorie Restriction].(54) Adherence to fasting interventions has been variable, some shortterm studies have reported over 90% adherence,(54) whereas in a one year ADMF study the dropout rate was 38% vs 29% in the standard caloric restriction group.(135)”
…
ix. Selfrepairing cells: How single cells heal membrane ruptures and restore lost structures.
Artificial intelligence (I?)
This book was okay, but nothing all that special. In my opinion there’s too much philosophy and similar stuff in there (‘what does intelligence really mean anyway?’), and the coverage isn’t nearly as focused on technological aspects as e.g. Winfield’s (…in my opinion better…) book from the same series on robotics (which I covered here) was; I am certain I’d have liked this book better if it’d provided a similar type of coverage as did Winfield, but it didn’t. However it’s far from terrible and I liked the authors skeptical approach to e.g. singularitarianism. Below I have added some quotes and links, as usual.
…
“Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do. Some of these (e.g. reasoning) are normally described as ‘intelligent’. Others (e.g. vision) aren’t. But all involve psychological skills — such as perception, association, prediction, planning, motor control — that enable humans and animals to attain their goals. Intelligence isn’t a single dimension, but a richly structured space of diverse informationprocessing capacities. Accordingly, AI uses many different techniques, addressing many different tasks. […] although AI needs physical machines (i.e. computers), it’s best thought of as using what computer scientists call virtual machines. A virtual machine isn’t a machine depicted in virtual reality, nor something like a simulated car engine used to train mechanics. Rather, it’s the informationprocessing system that the programmer has in mind when writing a program, and that people have in mind when using it. […] Virtual machines in general are comprised of patterns of activity (information processing) that exist at various levels. […] the human mind can be understood as the virtual machine – or rather, the set of mutually interacting virtual machines, running in parallel […] – that is implemented in the brain. Progress in AI requires progress in defining interesting/useful virtual machines. […] How the information is processed depends on the virtual machine involved. [There are many different approaches.] […] In brief, all the main types of AI were being thought about, and even implemented, by the late 1960s – and in some cases, much earlier than that. […] Neural networks are helpful for modelling aspects of the brain, and for doing pattern recognition and learning. Classical AI (especially when combined with statistics) can model learning too, and also planning and reasoning. Evolutionary programming throws light on biological evolution and brain development. Cellular automata and dynamical systems can be used to model development in living organisms. Some methodologies are closer to biology than to psychology, and some are closer to nonreflective behaviour than to deliberative thought. To understand the full range of mentality, all of them will be needed […]. Many AI researchers [however] don’t care about how minds work: they seek technological efficiency, not scientific understanding. […] In the 21st century, […] it has become clear that different questions require different types of answers”.
“Stateoftheart AI is a manysplendoured thing. It offers a profusion of virtual machines, doing many different kinds of information processing. There’s no key secret here, no core technique unifying the field: AI practitioners work in highly diverse areas, sharing little in terms of goals and methods. […] A host of AI applications exist, designed for countless specific tasks and used in almost every area of life, by laymen and professionals alike. Many outperform even the most expert humans. In that sense, progress has been spectacular. But the AI pioneers weren’t aiming only for specialist systems. They were also hoping for systems with general intelligence. Each humanlike capacity they modelled — vision, reasoning, language, learning, and so on — would cover its entire range of challenges. Moreover, these capacities would be integrated when appropriate. Judged by those criteria, progress has been far less impressive. […] General intelligence is still a major challenge, still highly elusive. […] problems can’t always be solved merely by increasing computer power. New problemsolving methods are often needed. Moreover, even if a particular method must succeed in principle, it may need too much time and/or memory to succeed in practice. […] Efficiency is important, too: the fewer the number of computations, the better. In short, problems must be made tractable. There are several basic strategies for doing that. All were pioneered by classical symbolic AI, or GOFAI, and all are still essential today. One is to direct attention to only a part of the search space (the computer’s representation of the problem, within which the solution is assumed to be located). Another is to construct a smaller search space by making simplifying assumptions. A third is to order the search efficiently. Yet another is to construct a different search space, by representing the problem in a new way. These approaches involve heuristics, planning, mathematical simplification, and knowledge representation, respectively. […] Often, the hardest part of AI problem solving is presenting the problem to the system in the first place. […] the information (‘knowledge’) concerned must be presented to the system in a fashion that the machine can understand – in other words, that it can deal with. […] AI’s way of doing this are highly diverse.”
“The rulebaed form of knowledge representation enables programs to be built gradually, as the programmer – or perhaps an AGI system itself – learns more about the domain. A new rule can be added at any time. There’s no need to rewrite the program from scratch. However, there’s a catch. If the new rule isn’t logically consistent with the existing ones, the system won’t always do what it’s supposed to do. It may not even approximate what it’s supposed to do. When dealing with a small set of rules, such logical conflicts are easily avoided, but larger systems are less transparent. […] An alternative form of knowledge representation for concepts is semantic networks […] A semantic network links concepts by semantic relations […] semantic networks aren’t the same thing as neural networks. […] distributed neural networks represent knowledge in a very different way. There, individual concepts are represented not by a single node in a carefully defined associative net, but by the changing patterns of activity across an entire network. Such systems can tolerate conflicting evidence, so aren’t bedevilled by the problems of maintaining logical consistency […] Even a single mind involves distributed cognition, for it integrates many cognitive, motivational, and emotional subsystems […] Clearly, humanlevel AGI would involve distributed cognition.”
“In short, most human visual achievements surpass today’s AI. Often, AI researchers aren’t clear about what questions to ask. For instance, think about folding a slippery satin dress neatly. No robot can do this (although some can be instructed, step by step, how to fold an oblong terry towel). Or consider putting on a Tshirt: the head must go in first, and not via a sleeve — but why? Such topological problems hardly feature in AI. None of this implies that humanlevel computer vision is impossible. But achieving it is much more difficult than most people believe. So this is a special case of the fact noted in Chapter 1: that AI has taught us that human minds are hugely richer, and more subtle, than psychologists previously imagined. Indeed, that is the main lesson to be learned from AI. […] Difficult though it is to build a highperforming AI specialist, building an AI generalist is orders of magnitude harder. (Deep learning isn’t the answer: its aficionados admit that ‘new paradigms are needed’ to combine it with complex reasoning — scholarly code for ‘we haven’t got a clue’.) That’s why most AI researchers abandoned that early hope, turning instead to multifarious narrowly defined tasks—often with spectacular success.”
“Some machine learning uses neural networks. But much relies on symbolic AI, supplemented by powerful statistical algorithms. In fact, the statistics really do the work, the GOFAI merely guiding the worker to the workplace. Accordingly, some professionals regard machine learning as computer science and/or statistics —not AI. However, there’s no clear boundary here. Machine learning has three broad types: supervised, unsupervised, and reinforcement learning. […] In supervised learning, the programmer ‘trains’ the system by defining a set of desired outcomes for a range of inputs […], and providing continual feedback about whether it has achieved them. The learning system generates hypotheses about the relevant features. Whenever it classifies incorrectly, it amends its hypothesis accordingly. […] In unsupervised learning, the user provides no desired outcomes or error messages. Learning is driven by the principle that cooccurring features engender expectations that they will cooccur in future. Unsupervised learning can be used to discover knowledge. The programmers needn’t know what patterns/clusters exist in the data: the system finds them for itself […but even though Boden does not mention this fact, caution is most definitely warranted when applying such systems/methods to data (..it remains true that “Truth and true models are not statistically identifiable from data” – as usual, the goto reference here is Burnham & Anderson)]. Finally, reinforcement learning is driven by analogues of reward and punishment: feedback messages telling the system that what it just did was good or bad. Often, reinforcement isn’t simply binary […] Given various theories of probability, there are many different algorithms suitable for distinct types of learning and different data sets.”
“Countless AI applications use natural language processing (NLP). Most focus on the computer’s ‘understanding’ of language that is presented to it, not on its own linguistic production. That’s because NLP generation is even more difficult than NLP acceptance [I had a suspicion this might be the case before reading the book, but I didn’t know – US]. […] It’s now clear that handling fancy syntax isn’t necessary for summarizing, questioning, or translating a naturallanguage text. Today’s NLP relies more on brawn (computational power) than on brain (grammatical analysis). Mathematics — specifically, statistics — has overtaken logic, and machine learning (including, but not restricted to, deep learning) has displaced syntactic analysis. […] In modernday NLP, powerful computers do statistical searches of huge collections (‘corpora’) of texts […] to find word patterns both commonplace and unexpected. […] In general […], the focus is on words and phrases, not syntax. […] Machinematching of languages from different language groups is usually difficult. […] Human judgements of relevance are often […] much too subtle for today’s NLP. Indeed, relevance is a linguistic/conceptual version of the unforgiving ‘frame problem‘ in robotics […]. Many people would argue that it will never be wholly mastered by a nonhuman system.”
“[M]any AI research groups are now addressing emotion. Most (not quite all) of this research is theoretically shallow. And most is potentially lucrative, being aimed at developing ‘computer companions’. These are AI systems — some screenbased, some ambulatory robots — designed to interact with people in ways that (besides being practically helpful) are affectively comfortable, even satisfying, for the user. Most are aimed at the elderly and/or disabled, including people with incipient dementia. Some are targeted on babies or infants. Others are interactive ‘adult toys’. […] AI systems can already recognize human emotions in various ways. Some are physiological: monitoring the person’s breathing rate and galvanic skin response. Some are verbal: noting the speaker’s speed and intonation, as well as their vocabulary. And some are visual: analysing their facial expressions. At present, all these methods are relatively crude. The user’s emotions are both easily missed and easily misinterpreted. […] [An] point [point], here [in the development and evaluation of AI], is that emotions aren’t merely feelings. They involve functional, as well as phenomenal, consciousness […]. Specifically, they are computational mechanisms that enable us to schedule competing motives – and without which we couldn’t function. […] If we are ever to achieve AGI, emotions such as anxiety will have to be included – and used.”
[The point made in the book is better made in Aureli et al.‘s book, especially the last chapters to which the coverage in the linked post refer. The point is that emotions enable us to make better decisions, or perhaps even to make a decision in the first place; the emotions we feel in specific contexts will tend not to be even remotely random, rather they will tend to a significant extent to be Nature’s (…and Mr. Darwin’s) attempt to tell us how to handle a specific conflict of interest in the ‘best’ manner. You don’t need to do the math, your forebears did it for you, which is why you’re now …angry, worried, anxious, etc. If you had to do the math every time before you made a decision, you’d be in trouble, and emotions provide a great shortcut in many contexts. The potential for such shortcuts seems really important if you want an agent to act intelligently, regardless of whether said agent is ‘artificial’ or not. The book very briefly mentions a few of Minsky’s thoughts on these topics, and people who are curious could probably do worse than read some of his stuff. This book seems like a place to start.]
…
Links:
GOFAI (“Good OldFashioned Artificial Intelligence”).
Ada Lovelace. Charles Babbage. Alan Turing. Turing machine. Turing test. Norbert Wiener. John von Neumann. W. Ross Ashby. William Grey Walter. Oliver Selfridge. Kenneth Craik. Gregory Bateson. Frank Rosenblatt. Marvin Minsky. Seymour Papert.
A logical calculus of the ideas immanent in nervous activity (McCulloch & Pitts, 1943).
Propositional logic. Logic gate.
Arthur Samuel’s checkers player. Logic Theorist. General Problem Solver. The Homeostat. Pandemonium architecture. Perceptron. Cyc.
Faulttolerant computer system.
Cybernetics.
Programmed Data Processor (PDP).
Artificial life.
Forward chaining. Backward chaining.
Rulebased programming. MYCIN. Dendral.
Semantic network.
Nonmonotonic logic. Fuzzy logic.
Facial recognition system. Computer vision.
Bayesian statistics.
Helmholtz machine.
DQN algorithm.
AlphaGo. AlphaZero.
Human Problem Solving (Newell & Simon, 1970).
ACTR.
NELL (NeverEnding Language Learning).
SHRDLU.
ALPAC.
Google translate.
Data mining. Sentiment analysis. Siri. Watson (computer).
Paro (robot).
Uncanny valley.
CogAff architecture.
Connectionism.
Constraint satisfaction.
Contentaddressable memory.
Graceful degradation.
Physical symbol system hypothesis.
Combinatorics (II)
I really liked this book. Below I have added some links and quotes related to the second half of the book’s coverage.
…
“An n × n magic square, or a magic square of order n, is a square array of numbers — usually (but not necessarily) the numbers from 1 to n^{2 }— arranged in such a way that the sum of the numbers in each of the n rows, each of the n columns, or each of the two main diagonals is the same. A semimagic square is a square array in which the sum of the numbers in each row or column, but not necessarily the diagonals, is the same. We note that if the entries are 1 to n^{2}, then the sum of the numbers in the whole array is
1 + 2 + 3 + … + n^{2} = n^{2 }(n^{2 }+ 1) / 2
on summing the arithmetic progression. Because the n rows and columns have the same ‘magic sum’, the numbers in each single row or column add up to (1/n)th of this, which is n (n^{2}+1) / 2 […] An n x n latin square, or a latin square of order n, is a square array with n symbols arranged so that each symbol appears just once in each row and column. […] Given a latin square, we can obtain others by rearranging the rows or the columns, or by permuting the symbols. For an n × n latin square with symbols 1, 2, … , n, we can thereby arrange that the numbers in the first row and the first column appear in order as 1, 2, … , n. Such a latin square is called normalized […] A familiar form of latin square is the sudoku puzzle […] How many n x n latin squares are there for a given order of n? The answer is known only for n ≤ 11. […] The number of normalized latin squares of order 11 has an impressive fortyeight digits.”
“A particular type of latin square is the cyclic square, where the symbols appear in the same cyclic order, moving one place to the left in each successive row, so that the entry at the beginning of each line appears at the end of the next one […] An extension of this idea is where the symbols move more places to the left in each successive row […] We can construct a latin square row by row from its first row, always taking care that no symbol appears twice in any column. […] An important concept […] is that of a set of orthogonal latin squares […] two n × n latin squares are orthogonal if, when superimposed, each of the n^{2} possible pairings of a symbol from each square appears exactly once. […] pairs of orthogonal latin squares are […] used in agricultural experiments. […] We can extend the idea of orthogonality beyond pairs […] A set of mutually orthogonal latin squares (sometimes abbreviated to MOLS) is a set of latin squares, any two of which are orthogonal […] Note that there can be at most n1 MOLS of order n. […] A full set of MOLS is called a complete set […] We can ask the following question: For which values of n does there exist a complete set of n × n mutually orthogonal latin squares? As several authors have shown, a complete set exists whenever n is a prime number (other than 2) or a power of a prime […] In 1922, H. L. MacNeish generalized this result by observing that if n has prime factorization p, then the number of MOLS is at least min (p_{1}^{a} x p_{2}^{b}_{, … , }p_{k}^{z}) – 1″.
“Consider the following [problem] involving comparisons between a number of varieties of a commodity: A consumer organization wishes to compare seven brands of detergent and arranges a number of tests. But since it may be uneconomic or inconvenient for each tester to compare all seven brands it is decided that each tester should compare just three brands. How should the trials be organized if each brand is to be tested the same number of times and each pair of brands is to be compared directly? […] A block design consists of a set of v varieties arranged into b blocks. […] [if we] further assume that each block contains the same number k of varieties, and each variety appears in the same number r of blocks […] [the design is] called [an] equireplicate design […] for every block design we have v x r = b x k. […] It would clearly be preferable if all pairs of varieties in a design were compared the same number of times […]. Such a design is called balanced, or a balanced incompleteblock design (often abbreviated to BIBD). The number of times that any two varieties are compared is usually denoted by λ […] In a balanced block design the parameters v, b, k, r, and λ are not independent […] [Rather it is the case that:] r x (k 1) = λ x (v – 1). […] The conditions v x r = b x k and r x (k 1) = λ x (v – 1) are both necessary for a design to be balanced, but they’re not sufficient since there are designs satisfying both conditions which are not balanced. Another necessary condition for a design to be balanced is v ≤ b, a result known as Fisher’s inequality […] A balanced design for which v = b, and therefore k = r, is called a symmetric design“.
“A block design with v varieties is resolvable if its blocks can be rearranged into subdesigns, called replicates, each of which contains every variety just once. [….] we define a finite projective plane to be an arrangement of a finite number of points and a finite number of lines with the properties that: [i] Any two points lie on exactly one line. [ii] Any two lines pass through exactly one point.
Note that this differs from our usual Euclidean geometry, where any two lines pass through exactly one point unless they’re parallel. Omitting these italicized words produces a completely different type of geometry from the one we’re used to, since there’s now a ‘duality’ or symmetry between points and lines, according to which any statement about points lying on lines gives rise to a statement about lines passing through points, and vice versa. […] We say that the finite projective plane has order n if each line contains n + 1 points. […] removing a single line from a projective plane of order n, and the n + 1 points on this line, gives a square pattern with n^{2} points and n^{2 }+ n lines where each line contains n points and each point lies on n + 1 lines. Such a diagram is called an affine plane of order n. […] This process is reversible. If we start with an affine plane of order n and add another line joined up appropriately, we get a projective plane of order n. […] Every finite projective plane gives rise to a symmetric balanced design. […] In general, a finite projective plane of order n, with n^{2} + n + 1 points and lines and with n + 1 points on each line and n + 1 lines through each point, gives rise to a balanced symmetric design with parameters v = b = n^{2} + n + 1, k = r = n + 1, and λ = 1. […] Every finite affine plane gives rise to a resolvable design. […] In general, an affine plane of order n, obtained by removing a line and n + 1 points from a projective plane of order n, gives rise to a resolvable design with parameters v = n^{2} , b = n^{2 }+ n , k = n , and r = n + 1. […] Every finite affine plane corresponds to a complete set of orthogonal latin squares.”
…
Links:
Regular polygon.
Polyhedron.
Internal and external angles.
Triangular tiling. Square tiling. Hexagonal tiling.
Semiregular tessellations.
Penrose tiling.
Platonic solid.
Euler’s polyhedron formula.
Prism (geometry). Antiprism.
Fullerene.
Geodesic dome.
Graph theory.
Complete graph. Complete bipartite graph. Cycle graph.
Degree (graph theory).
Handshaking lemma.
Ramsey theory.
Tree (graph theory).
Eulerian and Hamiltonian Graphs. Hamiltonian path.
Icosian game.
Knight’s tour problem.
Planar graph. Euler’s formula for plane graphs.
Kuratowski’s theorem.
Dual graph.
Lo Shu Square.
Melencolia I.
Euler’s Thirtysix officers problem.
Steiner triple system.
Partition (number theory).
Pentagonal number. Pentagonal number theorem.
Ramanujan’s congruences.
Combinatorics (I)
This book is not a particularly easy read, compared to what is the general format of the series in which it is published, but this is a good thing in my view as it also means the author managed to go into enough details in specific contexts to touch upon at least some properties/topics of interest. You don’t need any specific background knowledge to read and understand the book – at least not any sort of background knowledge one would not expect someone who might decide to read a book like this one to already have – but you do need when reading it to have the sort of mental surplus that enables you to think carefully about what’s going on and devote a few mental resources to understanding the details.
Some quotes and links from the first half of the book below.
…
“The subject of combinatorial analysis or combinatorics […] [w]e may loosely describe [as] the branch of mathematics concerned with selecting, arranging, constructing, classifying, and counting or listing things. […] the subject involves finite sets or discrete elements that proceed in separate steps […] rather than continuous systems […] Mathematicians sometimes use the term ‘combinatorics’ to refer to a larger subset of discrete mathematics that includes graph theory. In that case, what is commonly called combinatorics is then referred to as ‘enumeration’. […] Combinatorics now includes a wide range of topics, some of which we cover in this book, such as the geometry of tilings and polyhedra […], the theory of graphs […], magic squares and latin squares […], block designs and finite projective planes […], and partitions of numbers […]. [The] chapters [of the book] are largely independent of each other and can be read in any order. Much of combinatorics originated in recreational pastimes […] in recent years the subject has developed in depth and variety and has increasingly become a part of mainstream mathematics. […] Undoubtedly part of the reason for the subject’s recent importance has arisen from the growth of computer science and the increasing use of algorithmic methods for solving realworld practical problems. These have led to combinatorial applications in a wide range of subject areas, both within and outside mathematics, including network analysis, coding theory, probability, virology, experimental design, scheduling, and operations research.”
“[C]ombinatorics is primarily concerned with four types of problem:
Existence problem: Does □□□ exist?
Construction problem: If □□□ exists, how can we construct it?
Enumeration problem: How many □□□ are there?
Optimization problem: Which □□□ is best? […]
[T]hese types of problems are not unrelated; for example, the easiest way to prove that something exists may be to construct it explicitly.”
“In this book we consider two types of enumeration problem – counting problems in which we simply wish to know the number of objects involved, and listing problems in which we want to list them all explicitly. […] It’s useful to have some basic counting rules […] In what follows, all the sets are finite. […] In general we have the following rule; here, subsets are disjoint if they have no objects in common: Addition rule: To find the number of objects in a set, split the set into disjoint subsets, count the objects in each subset, and add the results. […] Subtraction rule: If a set of objects can be split into two subsets A and B, then the number of objects in B is obtained by subtracting the number of objects in A from the number in the whole set. […] The subtraction rule extends easily to sets that are split into more than two subsets with no elements in common. […] the inclusionexclusion principle […] extends this simple idea to the situation where the subsets may have objects in common. […] In general we have the following result: Multiplication rule: If a counting problem can be split into stages with several options at each stage, then the total number of possibilities is the product of options at each stage. […] Another useful principle in combinatorics is the following: Correspondence rule: We can solve a counting problem if we can put the objects to be counted in onetoone correspondence with the objects of a set that we have already counted. […] We conclude this section with one more rule: Division rule: If a set of n elements can be split into m disjoint subsets, each of size k, then m = n / k.”
“Every algorithm has a running time […] this may be the time that a computer needs to carry out all the necessary calculations, or the actual number of such calculations. Each problem [also] has an input size […] the running time T usually depends on the input size n. Particularly important, because they’re the most efficient, are the polynomialtime algorithms, where the maximum running time is proportional to a power of the input size […] The collection of all polynomialtime algorithms is called P. […] In contrast, there are inefficient algorithms that don’t take polynomial time, such as the exponentialtime algorithms […] At this point we introduce NP, the set of ‘nondeterministic polynomialtime problems’. These are algorithms for which a solution, when given, can be checked in polynomial time. Clearly P is contained in NP, since if a problem can be solved in polynomial time then a solution can certainly be checked in polynomial time – checking solutions is far easier than finding them in the first place. But are they the same? […] Few people people believe that the answer is ‘yes’, but no one has been able to prove that P ≠ NP. […] a problem is NPcomplete if its solution in polynomial time means that every NP problem can be solved in polynomial time. […] If there were a polynomial algorithm for just one of them, then polynomial algorithms would exist for the whole lot and P would equal NP. On the other hand, if just one of them has no polynomial algorithm, then none of the others could have a polynomial algorithm either, and P would be different from NP.”
“In how many different ways can n objects be arranged? […] generally, we have the following result: Arrangements: The number of arrangements of n objects is n x (n 1) x (n – 2) x … x 3 x 2 x 1. This number is called n factorial and is denoted by n!. […] The word permutation is used in different ways. We’ll use it to mean an ordered selection without repetition, while others may use it to mean an arrangement […] generally, we have the following rule: Ordered selections without repetition (permutations): If we select k items from a set of n objects, and if the selections are ordered and repetition is not allowed, then the number of possible selections is n x (n – 1) x (n – 2) x … x (n – k +1). We denote this expression by P(n,k). […] Since P(n, n) = n x (n 1) x (n – 2) x … x 3 x 2 x 1 = n!, an arrangement is a permutation for which k = n. […] generally, we have the following result: P(n,k) = n! /(nk)!. […] unordered selections without repetition are called combinations, giving rise to the words combinatorial and combinatorics. […] generally, we have the following result: Unordered selections without repetition (combinations): If we select k items from a set of n objects, and if the selections are unordered and repetition is not allowed, then the number of possible selections is P(n,k)/k! = n x (n1) x (n2) x … x (n – k + 1)/k!. We denote this expression by C(n,k) […] Unordered selections with repetition: If we select k items from a set of n objects, and if the selections are unordered and repetition is allowed, then the number of possible selections is C(n + k – 1, k). […] Combination rule 1: For any numbers k and n with k ≤ n, C(n,k) = C(n,nk) […] Combination rule 2: For any numbers n and k with k ≤ n, C(n, nk) = n!/(nk)!(n(nk))! = n!/(nk)!k! = C(n,k). […] Combination rule 3: For any number n, C(n,0) + C(n,1) + C(n,2) + … + C(n,n1) + C(n,n) = 2^{n}”
…
Links:
Tilings/Tessellation.
Knight’s tour.
Seven Bridges of Königsberg problem.
Three utilities problem.
Four color theorem.
Tarry’s algorithm (p.7) (formulated slightly differently in the book, but it’s the same algorithm).
Polyomino.
Arthur Cayley.
Combinatorial principles.
Minimum connector problem.
Travelling salesman problem.
Algorithmic efficiency. Running time/time complexity.
Boolean satisfiability problem. Cook–Levin theorem.
Combination.
Mersenne primes.
Permutation. Factorial. Stirling’s formula.
Birthday problem.
Varāhamihira.
Manhattan distance.
Fibonacci number.
Pascal’s triangle. Binomial coefficient. Binomial theorem.
Pigeonhole principle.
Venn diagram.
Derangement (combinatorial mathematics).
Tower of Hanoi.
Stable marriage problem. Transversal (combinatorics). Hall’s marriage theorem.
Generating function (the topic covered in the book more specifically is related to a symbolic generator of the subsets of a set, but a brief search yielded no good links to this particular topic – US).
Group theory.
Ferdinand Frobenius. Burnside’s lemma.
Lyapunov Arguments in Optimization
…
I’d say that if you’re interested in the intersection of mathematical optimization methods/algorithms and dynamical systems analysis it’s probably a talk well worth watching. The lecture is reasonably highlevel and covers a fairly satisfactory amount of ground in a relatively short amount of time, and it is not particularly hard to follow if you have at least some passing familiarity with the fields involved (dynamical systems analysis, statistics, mathematical optimization, computer science/machine learning).
…
Some links:
Dynamical system.
Euler–Lagrange equation.
Continuous optimization problem.
Gradient descent algorithm.
Lyapunov stability.
Condition number.
Fast (/accelerated) gradient descent methods.
The Mirror Descent Algorithm.
Cubic regularization of Newton method and its global performance (Nesterov & Polyak).
A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights (Su, Boyd & Candès).
A Variational Perspective on Accelerated Methods in Optimization (Wibisono, Wilson & Jordan).
Breaking Locality Accelerates Block GaussSeidel (Tu, Venkataraman, Wilson, Gittens, Jordan & Recht).
A Lyapunov Analysis of Momentum Methods in Optimization (Wilson, Recht & Jordan).
Bregman divergence.
Estimate sequence methods.
Variance reduction techniques.
Stochastic gradient descent.
Langevin dynamics.
An introduction to Invariant Theory
…
I was strongly considering not watching this one to the end (…such as it is – the video cuts off before the talk was completely finished, but I’m okay with missing out on the last 2 (?) minutes anyway) at several points during the lecture, mainly because this is definitely far from a ‘gentle’ introduction; this one is tough for an introductory lecture and I had to look up a lot of stuff to just sortofkindof muddle along. One of the things that I recall kept me from giving the rest of the lecture a miss along the way was that some parts of the coverage made me review a few topics in group theory which I’d previously encountered, but did not remember at all well – basically I was reminded along the way that concepts X, Y, and Z existed, and that I’d forgot how they worked/what they were useful for.
I think most (…nonmathematicians? …people?) who watch this one will miss a lot of stuff and details, and although you by watching it might get some idea what this stuff’s about I’m quite sure I’d not recommend this lecture to nonmathematicians; I don’t think it’s really worth it to watch it.
I’ve posted some links below to things related to the lecture’s coverage.
…
Invariant (mathematics).
Loop invariant.
Knot invariant.
Jones polynomial.
Homogeneous polynomial.
Invariant polynomial.
Invariant of a binary form.
Paul Gordan.
Polynomial ring.
Indeterminate (variable).
Ideal (ring theory).
Hilbert’s basis theorem.
Hilbert’s Nullstellensatz.
Group representation.
Group action.
Subalgebra.
Permutation matrix.
Symmetric polynomial.
Hilbert’s finiteness theorem.
Irreducible representation.
Multiplicative group.
Oneparameter group.
Hilbert–Mumford criterion.
Strictly Upper Triangular Matrix.
Nilpotent matrix.
Characteristic polynomial.
Mathematics in Cryptography III
As she puts it herself, most of this lecture [~first 47 minutes or so] was basically “an explanation by a nonexpert on how the internet uses public key” (cryptography). The last 20 minutes cover, again in her own words, “more theoretical aspects”.
Some links:
ARPANET.
NSFNET.
Hypertext Transfer Protocol (HTTP). HTTPS.
Project Athena. Kerberos (protocol).
Pretty Good Privacy (PGP).
Secure Sockets Layer (SSL)/Transport Layer Security (TLS).
IPsec.
Wireshark.
Cipher suite.
Elliptic Curve Digital Signature Algorithm (ECDSA).
Request for Comments (RFC).
Ellipticcurve Diffie–Hellman (ECDH).
The SSL/TLS Handshake: an Overview.
Advanced Encryption Standard.
Galois/Counter Mode.
XOR gate.
Hexadecimal.
IP Header.
Time to live (TTL).
Transmission Control Protocol. TCP segment structure.
TLS record.
Security level.
Birthday problem. Birthday attack.
Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone). (§3.6 in particular is mentioned/referenced as this is stuff she talks about in the last ‘theoretical’ part of the lecture).
Mathematics in Cryptography II
…
Some links to stuff covered in the lecture:
Publickey cryptography.
New Directions in Cryptography (Diffie & Hellman, 1976).
The history of NonSecret Encryption (James Ellis).
Note on “NonSecret Encryption” – Cliff Cocks (1973).
RSA (cryptosystem).
Discrete Logarithm Problem.
Diffie–Hellman key exchange.
AES (Advanced Encryption Standard).
Triple DES.
Trusted third party (TTP).
Key management.
Maninthemiddle attack.
Digital signature.
Public key certificate.
Secret sharing.
Hash function. Cryptographic hash function.
Secure Hash Algorithm 2 (SHA2).
Nonrepudiation (digital security).
Lnotation. L (complexity).
ElGamal signature scheme.
Digital Signature Algorithm (DSA).
Schnorr signature.
Identitybased cryptography.
IdentityBased Cryptosystems and Signature Schemes (Adi Shamir, 1984).
Algorithms for Quantum Computation: Discrete Logarithms and Factoring (Peter Shor, 1994).
Quantum resistant cryptography.
Elliptic curve. Ellipticcurve cryptography.
Projective space.
I have included very few links relating to the topics covered in the last part of the lecture. This was deliberate and not just a result of the type of coverage included in that part of the lecture. In my opinion nonmathematicians should probably skip the last 25 minutes or so as they’re – not only due to technical issues (the lecturer is writing stuff on the blackboard and for several minutes you’re unable to see what she’s writing, which is …unfortunate), but those certainly were not helping – not really worth the effort. The first hour of the lecture is great, the last 25 minutes are, well, less great, in my opinion. You should however not miss the first part of the coverage of ECCrelated stuff (in particular the coverage ~5558 minutes in), if you’re interested in making sense of how ECC works; I certainly found that part of the coverage very helpful.
Mathematics in Cryptography
…
Some relevant links:
Caesar cipher.
Substitution cipher.
Frequency analysis.
Vigenère cipher.
ADFGVX cipher.
Onetime pad.
Arthur Scherbius.
Enigma machine.
Permutation.
Cycle notation.
Permutation group.
Cyclic permutation.
Involution (mathematics).
An Application of the Theory of Permutations in Breaking the Enigma Cipher – Marian Rejewski.
Medical Statistics (III)
In this post I’ll include some links and quotes related to topics covered in chapters 4, 6, and 7 of the book. Before diving in, I’ll however draw attention to some of Gerd Gigerenzer’s work as it is quite relevant to in particular the coverage included in chapter 4 (‘Presenting research findings’), even if the authors seem unaware of this. One of Gigerenzer’s key insights, which I consider important and which I have thus tried to keep in mind, unfortunately goes unmentioned in the book; namely the idea that how you communicate risk might be very important in terms of whether or not people actually understand what you are trying to tell them. A related observation is that people have studied these things and they’ve figured out that some types of risk communication are demonstrably better than others at enabling people to understand the issues at hand and the tradeoffs involved in a given situation. I covered some of these ideas in a comment on SCC some time ago; if those comments spark your interest you should definitely go read the book).
…
IMRAD format.
CONSORT Statement (randomized trials).
Equator Network.
“Abstracts may appear easy to write since they are very short […] and often required to be written in a structured format. It is therefore perhaps surprising that they are sometimes poorly written, too bland, contain inaccuracies, and/or are simply misleading.^{1} The reason for poor quality abstracts are complex; abstracts are often written at the end of a long process of data collection, analysis, and writing up, when time is short and researchers are weary. […] statistical issues […] can lead to an abstract that is not a fair representation of the research conducted. […] it is important that the abstract is consistent with the body of text and that it gives a balanced summary of the work. […] To maximize its usefulness, a summary or abstract should include estimates and confidence intervals for the main findings and not simply present P values.”
“The methods section should describe how the study was conducted. […] it is important to include the following: *The setting or area […] The date(s) […] subjects included […] study design […] measurements used […] source of any nonoriginal data […] sample size, including a justification […] statistical methods, including any computer software used […] The discussion section is where the findings of the study are discussed and interpreted […] this section tends to include less statistics than the results section […] Some medical journals have a specific structure for the discussion for researchers to follow, and so it is important to check the journal’s guidelines before submitting. […] [When] reporting statistical analyses from statistical programs: *Don’t put unedited computer output into a research document. *Extract the relevant data only and reformat as needed […] Beware of presenting percentages for very small samples as they may be misleading. Simply give the numbers alone. […] In general the following is recommended for P values: *Give the actual P value whenever possible. *Rounding: Two significant figures are usually enough […] [Confidence intervals] should be given whenever possible to indicate the precision of estimates. […] Avoid graphs with missing zeros or stretched scales […] a table or graph should stand alone so that a reader does not need to read the […] article to be able to understand it.”
…
Statistical data type.
Level of measurement.
Descriptive statistics.
Summary statistics.
Geometric mean.
Harmonic mean.
Mode.
Interquartile range.
Histogram.
Stem and leaf plot.
Box and whisker plot.
Dot plot.
…
“Quantitative data are data that can be measured numerically and may be continuous or discrete. *Continuous data lie on a continuum and so can take any value between two limits. […] *Discrete data do not lie on a continuum and can only take certain values, usually counts (integers) […] On an interval scale, differences between values at different points of the scale have the same meaning […] Data can be regarded as on a ratio scale if the ratio of the two measurements has a meaning. For example we can say that twice as many people in one group had a particular characteristic compared with another group and this has a sensible meaning. […] Quantitative data are always ordinal – the data values can be arranged in a numerical order from the smallest to the largest. […] *Interval scale data are always ordinal. Ratio scale data are always interval scale data and therefore must also be ordinal. *In practice, continuous data may look discrete because of the way they are measured and/or reported. […] All continuous measurements are limited by the accuracy of the instrument used to measure them, and many quantities such as age and height are reported in whole numbers for convenience”.
“Categorical data are data where individuals fall into a number of separate categories or classes. […] Different categories of categorical data may be assigned a number for coding purposes […] and if there are several categories, there may be an implied ordering, such as with stage of cancer where stage I is the least advanced and stage IV is the most advanced. This means that such data are ordinal but not interval because the ‘distance’ between adjacent categories has no real measurement attached to it. The ‘gap’ between stages I and II disease is not necessarily the same as the ‘gap’ between stages III and IV. […] Where categorical data are coded with numerical codes, it might appear that there is an ordering but this may not necessarily be so. It is important to distinguish between ordered and nonordered data because it affects the analysis.”
“It is usually useful to present more than one summary measure for a set of data […] If the data are going to be analyzed using methods based on means then it makes sense to present means rather than medians. If the data are skewed they may need to be transformed before analysis and so it is best to present summaries based on the transformed data, such as geometric means. […] For very skewed data rather than reporting the median, it may be helpful to present a different percentile (i.e. not the 50th), which better reflects the shape of the distribution. […] Some researchers are reluctant to present the standard deviation when the data are skewed and so present the median and range and/or quartiles. If analyses are planned which are based on means then it makes sense to be consistent and give standard deviations. Further, the useful relationship that approximately 95% of the data lie between mean +/ 2 standard deviations, holds even for skewed data […] If data are transformed, the standard deviation cannot be backtransformed correctly and so for transformed data a standard deviation cannot be given. In this case the untransformed standard deviation can be given or another measure of spread. […] For discrete data with a narrow range, such as stage of cancer, it may be better to present the actual frequency distribution to give a fair summary of the data, rather than calculate a mean or dichotomize it. […] It is often useful to tabulate one categorical variable against another to show the proportions or percentages of the categories of one variable by the other”.
…
Random variable.
Independence (probability theory).
Probability.
Probability distribution.
Binomial distribution.
Poisson distribution.
Continuous probability distribution.
Normal distribution.
Uniform distribution.
…
“The central limit theorem is a very important mathematical theorem that links the Normal distribution with other distributions in a unique and surprising way and is therefore very useful in statistics. *The sum of a large number of independent random variables will follow an approximately Normal distribution irrespective of their underlying distributions. *This means that any random variable which can be regarded as a the sum of a large number of small, independent contributions is likely to follow the Normal distribution. [I didn’t really like this description as it’s insufficiently detailed for my taste (and this was pretty much all they wrote about the CLT in that chapter); and one problem with the CLT is that people often think it applies when it might not actually do so, because the data restrictions implied by the theorem(s) are not really fully appreciated. On a related note people often seem to misunderstand what these theorems actually say and where they apply – see e.g. paragraph 10 in this post. See also the wiki link above for a more comprehensive treatment of these topics – US] *The Normal distribution can be used as an approximation to the Binomial distribution when n is large […] The Normal distribution can be used as an approximation to the Poisson distribution as the mean of the Poisson distribution increases […] The main advantage in using the Normal rather than the Binomial or the Poisson distribution is that it makes it easier to calculate probabilities and confidence intervals”
“The t distribution plays an important role in statistics as the sampling distribution of the sample mean divided by its standard error and is used in significance testing […] The shape is symmetrical about the mean value, and is similar to the Normal distribution but with a higher peak and longer tails to take account of the reduced precision in smaller samples. The exact shape is determined by the mean and variance plus the degrees of freedom. As the degrees of freedom increase, the shape comes closer to the Normal distribution […] The chisquared distribution also plays an important role in statistics. If we take several variables, say n, which each follow a standard Normal distribution, and square each and add them, the sum of these will follow a chisquared distribution with n degrees of freedom. This theoretical result is very useful and widely used in statistical testing […] The chisquared distribution is always positive and its shape is uniquely determined by the degrees of freedom. The distribution becomes more symmetrical as the degrees of freedom increases. […] [The (noncentral) F distribution] is the distribution of the ratio of two chisquared distributions and is used in hypothesis testing when we want to compare variances, such as in doing analysis of variance […] Sometimes data may follow a positively skewed distribution which becomes a Normal distribution when each data point is logtransformed [..] In this case the original data can be said to follow a lognormal distribution. The transformation of such data from lognormal to Normal is very useful in allowing skewed data to be analysed using methods based on the Normal distribution since these are usually more powerful than alternative methods”.
…
HalfNormal distribution.
Bivariate Normal distribution.
Negative binomial distribution.
Beta distribution.
Gamma distribution.
Conditional probability.
Bayes theorem.
On the cryptographic hardness of finding a Nash equilibrium
…
I found it annoying that you generally can’t really hear the questions posed by the audience (which includes people like Avi Wigderson), especially considering that there are quite a few of these, especially in the middle section of the lecture. There are intermittent issues with the camera’s focus occasionally throughout the talk, but those are all transitory problems that should not keep you from watching the lecture. The sound issue at the beginning of the talk is resolved after 40 seconds.
One important takeaway from this talk, if you choose not to watch it: “to date, there is no known efficient algorithm to find Nash equilibrium in games”. In general this paper – coauthored by the lecturer – seems from a brief skim to cover many of the topics also included in the lecture. I have added some other links to articles and topics covered/mentioned in the lecture below.
…
Nash’s Existence Theorem.
Reducibility Among Equilibrium Problems (Goldberg & Papadimitriou).
ThreePlayer Games Are Hard (Daskalakis & Papadimitriou).
3Nash is PPADComplete (Chen & Deng).
PPAD (complexity).
NPhardness.
On the (Im)possibility of Obfuscating Programs (Barak et al.).
On the Impossibility of Obfuscation with Auxiliary Input (Goldwasser & Kalai).
On BestPossible Obfuscation (Goldwasser & Rothblum).
Functional Encryption without Obfuscation (Garg et al.).
On the Complexity of the Parity Argument and Other Inefficient Proofs of Existence (Papadimitriou).
Pseudorandom function family.
Revisiting the Cryptographic Hardness of Finding a Nash Equilibrium (Garg, Pandei & Srinivasan).
Constrained Pseudorandom Functions and Their Applications (Boneh & Waters).
Delegatable Pseudorandom Functions and Applications (Kiayias et al.).
Functional Signatures and Pseudorandom Functions (Boyle, Goldwasser & Ivan).
Universal Constructions and Robust Combiners for Indistinguishability Obfuscation and Witness Encryption (Ananth et al.).
Networks
I actually think this was a really nice book, considering the format – I gave it four stars on goodreads. One of the things I noticed people didn’t like about it in the reviews is that it ‘jumps’ a bit in terms of topic coverage; it covers a wide variety of applications and analytical settings. I mostly don’t consider this a weakness of the book – even if occasionally it does get a bit excessive – and I can definitely understand the authors’ choice of approach; it’s sort of hard to illustrate the potential the analytical techniques described within this book have if you’re not allowed to talk about all the areas in which they have been – or could be gainfully – applied. A related point is that many people who read the book might be familiar with the application of these tools in specific contexts but have perhaps not thought about the fact that similar methods are applied in many other areas (and they might all of them be a bit annoyed the authors don’t talk more about computer science applications, or foodweb analyses, or infectious disease applications, or perhaps sociometry…). Most of the book is about graphtheoryrelated stuff, but a very decent amount of the coverage deals with applications, in a broad sense of the word at least, not theory. The discussion of theoretical constructs in the book always felt to me driven to a large degree by their usefulness in specific contexts.
I have covered related topics before here on the blog, also quite recently – e.g. there’s at least some overlap between this book and Holland’s book about complexity theory in the same series (I incidentally think these books probably go well together) – and as I found the book slightly difficult to blog as it was I decided against covering it in as much detail as I sometimes do when covering these texts – this means that I decided to leave out the links I usually include in posts like these.
Below some quotes from the book.
…
“The network approach focuses all the attention on the global structure of the interactions within a system. The detailed properties of each element on its own are simply ignored. Consequently, systems as different as a computer network, an ecosystem, or a social group are all described by the same tool: a graph, that is, a bare architecture of nodes bounded by connections. […] Representing widely different systems with the same tool can only be done by a high level of abstraction. What is lost in the specific description of the details is gained in the form of universality – that is, thinking about very different systems as if they were different realizations of the same theoretical structure. […] This line of reasoning provides many insights. […] The network approach also sheds light on another important feature: the fact that certain systems that grow without external control are still capable of spontaneously developing an internal order. […] Network models are able to describe in a clear and natural way how selforganization arises in many systems. […] In the study of complex, emergent, and selforganized systems (the modern science of complexity), networks are becoming increasingly important as a universal mathematical framework, especially when massive amounts of data are involved. […] networks are crucial instruments to sort out and organize these data, connecting individuals, products, news, etc. to each other. […] While the network approach eliminates many of the individual features of the phenomenon considered, it still maintains some of its specific features. Namely, it does not alter the size of the system — i.e. the number of its elements — or the pattern of interaction — i.e. the specific set of connections between elements. Such a simplified model is nevertheless enough to capture the properties of the system. […] The network approach [lies] somewhere between the description by individual elements and the description by big groups, bridging the two of them. In a certain sense, networks try to explain how a set of isolated elements are transformed, through a pattern of interactions, into groups and communities.”
“[T]he random graph model is very important because it quantifies the properties of a totally random network. Random graphs can be used as a benchmark, or null case, for any real network. This means that a random graph can be used in comparison to a realworld network, to understand how much chance has shaped the latter, and to what extent other criteria have played a role. The simplest recipe for building a random graph is the following. We take all the possible pair of vertices. For each pair, we toss a coin: if the result is heads, we draw a link; otherwise we pass to the next pair, until all the pairs are finished (this means drawing the link with a probability p = ½, but we may use whatever value of p). […] Nowadays [the random graph model] is a benchmark of comparison for all networks, since any deviations from this model suggests the presence of some kind of structure, order, regularity, and nonrandomness in many realworld networks.”
“…in networks, topology is more important than metrics. […] In the network representation, the connections between the elements of a system are much more important than their specific positions in space and their relative distances. The focus on topology is one of its biggest strengths of the network approach, useful whenever topology is more relevant than metrics. […] In social networks, the relevance of topology means that social structure matters. […] Sociology has classified a broad range of possible links between individuals […]. The tendency to have several kinds of relationships in social networks is called multiplexity. But this phenomenon appears in many other networks: for example, two species can be connected by different strategies of predation, two computers by different cables or wireless connections, etc. We can modify a basic graph to take into account this multiplexity, e.g. by attaching specific tags to edges. […] Graph theory [also] allows us to encode in edges more complicated relationships, as when connections are not reciprocal. […] If a direction is attached to the edges, the resulting structure is a directed graph […] In these networks we have both indegree and outdegree, measuring the number of inbound and outbound links of a node, respectively. […] in most cases, relations display a broad variation or intensity [i.e. they are not binary/dichotomous]. […] Weighted networks may arise, for example, as a result of different frequencies of interactions between individuals or entities.”
“An organism is […] the outcome of several layered networks and not only the deterministic result of the simple sequence of genes. Genomics has been joined by epigenomics, transcriptomics, proteomics, metabolomics, etc., the disciplines that study these layers, in what is commonly called the omics revolution. Networks are at the heart of this revolution. […] The brain is full of networks where various weblike structures provide the integration between specialized areas. In the cerebellum, neurons form modules that are repeated again and again: the interaction between modules is restricted to neighbours, similarly to what happens in a lattice. In other areas of the brain, we find random connections, with a more or less equal probability of connecting local, intermediate, or distant neurons. Finally, the neocortex — the region involved in many of the higher functions of mammals — combines local structures with more random, longrange connections. […] typically, food chains are not isolated, but interwoven in intricate patterns, where a species belongs to several chains at the same time. For example, a specialized species may predate on only one prey […]. If the prey becomes extinct, the population of the specialized species collapses, giving rise to a set of coextinctions. An even more complicated case is where an omnivore species predates a certain herbivore, and both eat a certain plant. A decrease in the omnivore’s population does not imply that the plant thrives, because the herbivore would benefit from the decrease and consume even more plants. As more species are taken into account, the population dynamics can become more and more complicated. This is why a more appropriate description than ‘foodchains’ for ecosystems is the term foodwebs […]. These are networks in which nodes are species and links represent relations of predation. Links are usually directed (big fishes eat smaller ones, not the other way round). These networks provide the interchange of food, energy, and matter between species, and thus constitute the circulatory system of the biosphere.”
“In the cell, some groups of chemicals interact only with each other and with nothing else. In ecosystems, certain groups of species establish small foodwebs, without any connection to external species. In social systems, certain human groups may be totally separated from others. However, such disconnected groups, or components, are a strikingly small minority. In all networks, almost all the elements of the systems take part in one large connected structure, called a giant connected component. […] In general, the giant connected component includes not less than 90 to 95 per cent of the system in almost all networks. […] In a directed network, the existence of a path from one node to another does not guarantee that the journey can be made in the opposite direction. Wolves eat sheep, and sheep eat grass, but grass does not eat sheep, nor do sheep eat wolves. This restriction creates a complicated architecture within the giant connected component […] according to an estimate made in 1999, more than 90 per cent of the WWW is composed of pages connected to each other, if the direction of edges is ignored. However, if we take direction into account, the proportion of nodes mutually reachable is only 24 per cent, the giant strongly connected component. […] most networks are sparse, i.e. they tend to be quite frugal in connections. Take, for example, the airport network: the personal experience of every frequent traveller shows that direct flights are not that common, and intermediate stops are necessary to reach several destinations; thousands of airports are active, but each city is connected to less than 20 other cities, on average. The same happens in most networks. A measure of this is given by the mean number of connection of their nodes, that is, their average degree.”
“[A] puzzling contradiction — a sparse network can still be very well connected — […] attracted the attention of the Hungarian mathematicians […] Paul Erdős and Alfréd Rényi. They tackled it by producing different realizations of their random graph. In each of them, they changed the density of edges. They started with a very low density: less than one edge per node. It is natural to expect that, as the density increases, more and more nodes will be connected to each other. But what Erdős and Rényi found instead was a quite abrupt transition: several disconnected components coalesced suddenly into a large one, encompassing almost all the nodes. The sudden change happened at one specific critical density: when the average number of links per node (i.e. the average degree) was greater than one, then the giant connected component suddenly appeared. This result implies that networks display a very special kind of economy, intrinsic to their disordered structure: a small number of edges, even randomly distributed between nodes, is enough to generate a large structure that absorbs almost all the elements. […] Social systems seem to be very tightly connected: in a large enough group of strangers, it is not unlikely to find pairs of people with quite short chains of relations connecting them. […] The smallworld property consists of the fact that the average distance between any two nodes (measured as the shortest path that connects them) is very small. Given a node in a network […], few nodes are very close to it […] and few are far from it […]: the majority are at the average — and very short — distance. This holds for all networks: starting from one specific node, almost all the nodes are at very few steps from it; the number of nodes within a certain distance increases exponentially fast with the distance. Another way of explaining the same phenomenon […] is the following: even if we add many nodes to a network, the average distance will not increase much; one has to increase the size of a network by several orders of magnitude to notice that the paths to new nodes are (just a little) longer. The smallworld property is crucial to many network phenomena. […] The smallworld property is something intrinsic to networks. Even the completely random ErdősRenyi graphs show this feature. By contrast, regular grids do not display it. If the Internet was a chessboardlike lattice, the average distance between two routers would be of the order of 1,000 jumps, and the Net would be much slower [the authors note elsewhere that “The Internet is composed of hundreds of thousands of routers, but just about ten ‘jumps’ are enough to bring an information packet from one of them to any other.”] […] The key ingredient that transforms a structure of connections into a small world is the presence of a little disorder. No real network is an ordered array of elements. On the contrary, there are always connections ‘out of place’. It is precisely thanks to these connections that networks are small worlds. […] Shortcuts are responsible for the smallworld property in many […] situations.”
“Body size, IQ, road speed, and other magnitudes have a characteristic scale: that is, an average value that in the large majority of cases is a rough predictor of the actual value that one will find. […] While height is a homogeneous magnitude, the number of social connection[s] is a heterogeneous one. […] A system with this feature is said to be scalefree or scaleinvariant, in the sense that it does not have a characteristic scale. This can be rephrased by saying that the individual fluctuations with respect to the average are too large for us to make a correct prediction. […] In general, a network with heterogeneous connectivity has a set of clear hubs. When a graph is small, it is easy to find whether its connectivity is homogeneous or heterogeneous […]. In the first case, all the nodes have more or less the same connectivity, while in the latter it is easy to spot a few hubs. But when the network to be studied is very big […] things are not so easy. […] the distribution of the connectivity of the nodes of the […] network […] is the degree distribution of the graph. […] In homogeneous networks, the degree distribution is a bell curve […] while in heterogeneous networks, it is a power law […]. The power law implies that there are many more hubs (and much more connected) in heterogeneous networks than in homogeneous ones. Moreover, hubs are not isolated exceptions: there is a full hierarchy of nodes, each of them being a hub compared with the less connected ones.”
“Looking at the degree distribution is the best way to check if a network is heterogeneous or not: if the distribution is fat tailed, then the network will have hubs and heterogeneity. A mathematically perfect power law is never found, because this would imply the existence of hubs with an infinite number of connections. […] Nonetheless, a strongly skewed, fattailed distribution is a clear signal of heterogeneity, even if it is never a perfect power law. […] While the smallworld property is something intrinsic to networked structures, hubs are not present in all kind of networks. For example, power grids usually have very few of them. […] hubs are not present in random networks. A consequence of this is that, while random networks are small worlds, heterogeneous ones are ultrasmall worlds. That is, the distance between their vertices is relatively smaller than in their random counterparts. […] Heterogeneity is not equivalent to randomness. On the contrary, it can be the signature of a hidden order, not imposed by a topdown project, but generated by the elements of the system. The presence of this feature in widely different networks suggests that some common underlying mechanism may be at work in many of them. […] the Barabási–Albert model gives an important takehome message. A simple, local behaviour, iterated through many interactions, can give rise to complex structures. This arises without any overall blueprint”.
“Homogamy, the tendency of like to marry like, is very strong […] Homogamy is a specific instance of homophily: this consists of a general trend of like to link to like, and is a powerful force in shaping social networks […] assortative mixing [is] a special form of homophily, in which nodes tend to connect with others that are similar to them in the number of connections. By contrast [when] high and lowdegree nodes are more connected to each other [it] is called disassortative mixing. Both cases display a form of correlation in the degrees of neighbouring nodes. When the degrees of neighbours are positively correlated, then the mixing is assortative; when negatively, it is disassortative. […] In random graphs, the neighbours of a given node are chosen completely at random: as a result, there is no clear correlation between the degrees of neighbouring nodes […]. On the contrary, correlations are present in most realworld networks. Although there is no general rule, most natural and technological networks tend to be disassortative, while social networks tend to be assortative. […] Degree assortativity and disassortativity are just an example of the broad range of possible correlations that bias how nodes tie to each other.”
“[N]etworks (neither ordered lattices nor random graphs), can have both large clustering and small average distance at the same time. […] in almost all networks, the clustering of a node depends on the degree of that node. Often, the larger the degree, the smaller the clustering coefficient. Smalldegree nodes tend to belong to wellinterconnected local communities. Similarly, hubs connect with many nodes that are not directly interconnected. […] Central nodes usually act as bridges or bottlenecks […]. For this reason, centrality is an estimate of the load handled by a node of a network, assuming that most of the traffic passes through the shortest paths (this is not always the case, but it is a good approximation). For the same reason, damaging central nodes […] can impair radically the flow of a network. Depending on the process one wants to study, other definitions of centrality can be introduced. For example, closeness centrality computes the distance of a node to all others, and reach centrality factors in the portion of all nodes that can be reached in one step, two steps, three steps, and so on.”
“Domino effects are not uncommon in foodwebs. Networks in general provide the backdrop for largescale, sudden, and surprising dynamics. […] most of the realworld networks show a doublededged kind of robustness. They are able to function normally even when a large fraction of the network is damaged, but suddenly certain small failures, or targeted attacks, bring them down completely. […] networks are very different from engineered systems. In an airplane, damaging one element is enough to stop the whole machine. In order to make it more resilient, we have to use strategies such as duplicating certain pieces of the plane: this makes it almost 100 per cent safe. In contrast, networks, which are mostly not blueprinted, display a natural resilience to a broad range of errors, but when certain elements fail, they collapse. […] A random graph of the size of most realworld networks is destroyed after the removal of half of the nodes. On the other hand, when the same procedure is performed on a heterogeneous network (either a map of a real network or a scalefree model of a similar size), the giant connected component resists even after removing more than 80 per cent of the nodes, and the distance within it is practically the same as at the beginning. The scene is different when researchers simulate a targeted attack […] In this situation the collapse happens much faster […]. However, now the most vulnerable is the second: while in the homogeneous network it is necessary to remove about onefifth of its more connected nodes to destroy it, in the heterogeneous one this happens after removing the first few hubs. Highly connected nodes seem to play a crucial role, in both errors and attacks. […] hubs are mainly responsible for the overall cohesion of the graph, and removing a few of them is enough to destroy it.”
“Studies of errors and attacks have shown that hubs keep different parts of a network connected. This implies that they also act as bridges for spreading diseases. Their numerous ties put them in contact with both infected and healthy individuals: so hubs become easily infected, and they infect other nodes easily. […] The vulnerability of heterogeneous networks to epidemics is bad news, but understanding it can provide good ideas for containing diseases. […] if we can immunize just a fraction, it is not a good idea to choose people at random. Most of the times, choosing at random implies selecting individuals with a relatively low number of connections. Even if they block the disease from spreading in their surroundings, hubs will always be there to put it back into circulation. A much better strategy would be to target hubs. Immunizing hubs is like deleting them from the network, and the studies on targeted attacks show that eliminating a small fraction of hubs fragments the network: thus, the disease will be confined to a few isolated components. […] in the epidemic spread of sexually transmitted diseases the timing of the links is crucial. Establishing an unprotected link with a person before they establish an unprotected link with another person who is infected is not the same as doing so afterwards.”