Artificial intelligence (I?)

This book was okay, but nothing all that special. In my opinion there’s too much philosophy and similar stuff in there (‘what does intelligence really mean anyway?’), and the coverage isn’t nearly as focused on technological aspects as e.g. Winfield’s (…in my opinion better…) book from the same series on robotics (which I covered here) was; I am certain I’d have liked this book better if it’d provided a similar type of coverage as did Winfield, but it didn’t. However it’s far from terrible and I liked the authors skeptical approach to e.g. singularitarianism. Below I have added some quotes and links, as usual.

“Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do. Some of these (e.g. reasoning) are normally described as ‘intelligent’. Others (e.g. vision) aren’t. But all involve psychological skills — such as perception, association, prediction, planning, motor control — that enable humans and animals to attain their goals. Intelligence isn’t a single dimension, but a richly structured space of diverse information-processing capacities. Accordingly, AI uses many different techniques, addressing many different tasks. […] although AI needs physical machines (i.e. computers), it’s best thought of as using what computer scientists call virtual machines. A virtual machine isn’t a machine depicted in virtual reality, nor something like a simulated car engine used to train mechanics. Rather, it’s the information-processing system that the programmer has in mind when writing a program, and that people have in mind when using it. […] Virtual machines in general are comprised of patterns of activity (information processing) that exist at various levels. […] the human mind can be understood as the virtual machine – or rather, the set of mutually interacting virtual machines, running in parallel […] – that is implemented in the brain. Progress in AI requires progress in defining interesting/useful virtual machines. […] How the information is processed depends on the virtual machine involved. [There are many different approaches.] […] In brief, all the main types of AI were being thought about, and even implemented, by the late 1960s – and in some cases, much earlier than that. […] Neural networks are helpful for modelling aspects of the brain, and for doing pattern recognition and learning. Classical AI (especially when combined with statistics) can model learning too, and also planning and reasoning. Evolutionary programming throws light on biological evolution and brain development. Cellular automata and dynamical systems can be used to model development in living organisms. Some methodologies are closer to biology than to psychology, and some are closer to non-reflective behaviour than to deliberative thought. To understand the full range of mentality, all of them will be needed […]. Many AI researchers [however] don’t care about how minds work: they seek technological efficiency, not scientific understanding. […] In the 21st century, […] it has become clear that different questions require different types of answers”.

“State-of-the-art AI is a many-splendoured thing. It offers a profusion of virtual machines, doing many different kinds of information processing. There’s no key secret here, no core technique unifying the field: AI practitioners work in highly diverse areas, sharing little in terms of goals and methods. […] A host of AI applications exist, designed for countless specific tasks and used in almost every area of life, by laymen and professionals alike. Many outperform even the most expert humans. In that sense, progress has been spectacular. But the AI pioneers weren’t aiming only for specialist systems. They were also hoping for systems with general intelligence. Each human-like capacity they modelled — vision, reasoning, language, learning, and so on — would cover its entire range of challenges. Moreover, these capacities would be integrated when appropriate. Judged by those criteria, progress has been far less impressive. […] General intelligence is still a major challenge, still highly elusive. […] problems can’t always be solved merely by increasing computer power. New problem-solving methods are often needed. Moreover, even if a particular method must succeed in principle, it may need too much time and/or memory to succeed in practice. […] Efficiency is important, too: the fewer the number of computations, the better. In short, problems must be made tractable. There are several basic strategies for doing that. All were pioneered by classical symbolic AI, or GOFAI, and all are still essential today. One is to direct attention to only a part of the search space (the computer’s representation of the problem, within which the solution is assumed to be located). Another is to construct a smaller search space by making simplifying assumptions. A third is to order the search efficiently. Yet another is to construct a different search space, by representing the problem in a new way. These approaches involve heuristics, planning, mathematical simplification, and knowledge representation, respectively. […] Often, the hardest part of AI problem solving is presenting the problem to the system in the first place. […] the information (‘knowledge’) concerned must be presented to the system in a fashion that the machine can understand – in other words, that it can deal with. […] AI’s way of doing this are highly diverse.”

“The rule-baed form of knowledge representation enables programs to be built gradually, as the programmer – or perhaps an AGI system itself – learns more about the domain. A new rule can be added at any time. There’s no need to rewrite the program from scratch. However, there’s a catch. If the new rule isn’t logically consistent with the existing ones, the system won’t always do what it’s supposed to do. It may not even approximate what it’s supposed to do. When dealing with a small set of rules, such logical conflicts are easily avoided, but larger systems are less transparent. […] An alternative form of knowledge representation for concepts is semantic networks […] A semantic network links concepts by semantic relations […] semantic networks aren’t the same thing as neural networks. […] distributed neural networks represent knowledge in a very different way. There, individual concepts are represented not by a single node in a carefully defined associative net, but by the changing patterns of activity across an entire network. Such systems can tolerate conflicting evidence, so aren’t bedevilled by the problems of maintaining logical consistency […] Even a single mind involves distributed cognition, for it integrates many cognitive, motivational, and emotional subsystems […] Clearly, human-level AGI would involve distributed cognition.”

“In short, most human visual achievements surpass today’s AI. Often, AI researchers aren’t clear about what questions to ask. For instance, think about folding a slippery satin dress neatly. No robot can do this (although some can be instructed, step by step, how to fold an oblong terry towel). Or consider putting on a T-shirt: the head must go in first, and not via a sleeve — but why? Such topological problems hardly feature in AI. None of this implies that human-level computer vision is impossible. But achieving it is much more difficult than most people believe. So this is a special case of the fact noted in Chapter 1: that AI has taught us that human minds are hugely richer, and more subtle, than psychologists previously imagined. Indeed, that is the main lesson to be learned from AI. […] Difficult though it is to build a high-performing AI specialist, building an AI generalist is orders of magnitude harder. (Deep learning isn’t the answer: its aficionados admit that ‘new paradigms are needed’ to combine it with complex reasoning — scholarly code for ‘we haven’t got a clue’.) That’s why most AI researchers abandoned that early hope, turning instead to multifarious narrowly defined tasks—often with spectacular success.”

“Some machine learning uses neural networks. But much relies on symbolic AI, supplemented by powerful statistical algorithms. In fact, the statistics really do the work, the GOFAI merely guiding the worker to the workplace. Accordingly, some professionals regard machine learning as computer science and/or statistics —not AI. However, there’s no clear boundary here. Machine learning has three broad types: supervised, unsupervised, and reinforcement learning. […] In supervised learning, the programmer ‘trains’ the system by defining a set of desired outcomes for a range of inputs […], and providing continual feedback about whether it has achieved them. The learning system generates hypotheses about the relevant features. Whenever it classifies incorrectly, it amends its hypothesis accordingly. […] In unsupervised learning, the user provides no desired outcomes or error messages. Learning is driven by the principle that co-occurring features engender expectations that they will co-occur in future. Unsupervised learning can be used to discover knowledge. The programmers needn’t know what patterns/clusters exist in the data: the system finds them for itself […but even though Boden does not mention this fact, caution is most definitely warranted when applying such systems/methods to data ( remains true that “Truth and true models are not statistically identifiable from data” – as usual, the go-to reference here is Burnham & Anderson)]. Finally, reinforcement learning is driven by analogues of reward and punishment: feedback messages telling the system that what it just did was good or bad. Often, reinforcement isn’t simply binary […] Given various theories of probability, there are many different algorithms suitable for distinct types of learning and different data sets.”

“Countless AI applications use natural language processing (NLP). Most focus on the computer’s ‘understanding’ of language that is presented to it, not on its own linguistic production. That’s because NLP generation is even more difficult than NLP acceptance [I had a suspicion this might be the case before reading the book, but I didn’t know – US]. […] It’s now clear that handling fancy syntax isn’t necessary for summarizing, questioning, or translating a natural-language text. Today’s NLP relies more on brawn (computational power) than on brain (grammatical analysis). Mathematics — specifically, statistics — has overtaken logic, and machine learning (including, but not restricted to, deep learning) has displaced syntactic analysis. […] In modern-day NLP, powerful computers do statistical searches of huge collections (‘corpora’) of texts […] to find word patterns both commonplace and unexpected. […] In general […], the focus is on words and phrases, not syntax. […] Machine-matching of languages from different language groups is usually difficult. […] Human judgements of relevance are often […] much too subtle for today’s NLP. Indeed, relevance is a linguistic/conceptual version of the unforgiving ‘frame problem‘ in robotics […]. Many people would argue that it will never be wholly mastered by a non-human system.”

“[M]any AI research groups are now addressing emotion. Most (not quite all) of this research is theoretically shallow. And most is potentially lucrative, being aimed at developing ‘computer companions’. These are AI systems — some screen-based, some ambulatory robots — designed to interact with people in ways that (besides being practically helpful) are affectively comfortable, even satisfying, for the user. Most are aimed at the elderly and/or disabled, including people with incipient dementia. Some are targeted on babies or infants. Others are interactive ‘adult toys’. […] AI systems can already recognize human emotions in various ways. Some are physiological: monitoring the person’s breathing rate and galvanic skin response. Some are verbal: noting the speaker’s speed and intonation, as well as their vocabulary. And some are visual: analysing their facial expressions. At present, all these methods are relatively crude. The user’s emotions are both easily missed and easily misinterpreted. […] [An] point [point], here [in the development and evaluation of AI], is that emotions aren’t merely feelings. They involve functional, as well as phenomenal, consciousness […]. Specifically, they are computational mechanisms that enable us to schedule competing motives – and without which we couldn’t function. […] If we are ever to achieve AGI, emotions such as anxiety will have to be included – and used.”

[The point made in the book is better made in Aureli et al.‘s book, especially the last chapters to which the coverage in the linked post refer. The point is that emotions enable us to make better decisions, or perhaps even to make a decision in the first place; the emotions we feel in specific contexts will tend not to be even remotely random, rather they will tend to a significant extent to be Nature’s (…and Mr. Darwin’s) attempt to tell us how to handle a specific conflict of interest in the ‘best’ manner. You don’t need to do the math, your forebears did it for you, which is why you’re now …angry, worried, anxious, etc. If you had to do the math every time before you made a decision, you’d be in trouble, and emotions provide a great shortcut in many contexts. The potential for such short-cuts seems really important if you want an agent to act intelligently, regardless of whether said agent is ‘artificial’ or not. The book very briefly mentions a few of Minsky’s thoughts on these topics, and people who are curious could probably do worse than read some of his stuff. This book seems like a place to start.]


GOFAI (“Good Old-Fashioned Artificial Intelligence”).
Ada Lovelace. Charles Babbage. Alan Turing. Turing machine. Turing test. Norbert WienerJohn von Neumann. W. Ross Ashby. William Grey Walter. Oliver SelfridgeKenneth Craik. Gregory Bateson. Frank Rosenblatt. Marvin Minsky. Seymour Papert.
A logical calculus of the ideas immanent in nervous activity (McCulloch & Pitts, 1943).
Propositional logic. Logic gate.
Arthur Samuel’s checkers player. Logic Theorist. General Problem Solver. The Homeostat. Pandemonium architecture. Perceptron. Cyc.
Fault-tolerant computer system.
Programmed Data Processor (PDP).
Artificial life.
Forward chaining. Backward chaining.
Rule-based programming. MYCIN. Dendral.
Semantic network.
Non-monotonic logic. Fuzzy logic.
Facial recognition system. Computer vision.
Bayesian statistics.
Helmholtz machine.
DQN algorithm.
AlphaGo. AlphaZero.
Human Problem Solving (Newell & Simon, 1970).
NELL (Never-Ending Language Learning).
Google translate.
Data mining. Sentiment analysis. Siri. Watson (computer).
Paro (robot).
Uncanny valley.
CogAff architecture.
Constraint satisfaction.
Content-addressable memory.
Graceful degradation.
Physical symbol system hypothesis.


January 10, 2019 Posted by | Biology, Books, Computer science, Engineering, Language, Mathematics, Papers, Psychology, Statistics | Leave a comment

Some observations on a cryptographic problem

It’s been a long time since I last posted one of these sort of ‘rootless’ posts which are not based on a specific book or a specific lecture or something along those lines, but a question on r/science made me think about these topics and start writing a bit about it, and I decided I might as well add my thoughts and ideas here.

The reddit question which motivated me to write this post was this one: “Is it difficult to determine the password for an encryption if you are given both the encrypted and unencrypted message?

By “difficult” I mean requiring an inordinate amount of computation. If given both an encrypted and unencrypted file/message, is it reasonable to be able to recover the password that was used to encrypt the file/message?”

Judging from the way the question is worded, the inquirer obviously knows very little about these topics, but that was part of what motivated me when I started out writing; s/he quite obviously has a faulty model of how this kind of stuff actually works, and just by virtue of the way he or she asks his/her question s/he illustrates some ways in which s/he gets things wrong.

When I decided to transfer my efforts towards discussing these topics to the blog I also implicitly decided against using language that would be expected to be easily comprehensible for the original inquirer, as s/he was no longer in the target group and there’s a cost to using that kind of language when discussing technical matters. I’ve sort of tried to make this post both useful and readable to people not all that familiar with the related fields, but I tend to find it difficult to evaluate the extent to which I’ve succeeded when I try to do things like that.

I decided against adding stuff already commented on when I started out writing this, so I’ll not e.g. repeat noiwontfixyourpc’s reply below. However I have added some other observations that seem to me to be relevant and worth mentioning to people who might consider asking a similar question to the one the original inquirer asked in that thread:

i. Finding a way to make plaintext turn into cipher text (…or cipher text into plaintext; and no, these two things are not actually always equivalent, see below…) is a very different (and in many contexts a much easier problem) than finding out the actual encryption scheme that is at work producing the text strings you observe. There can be many, many different ways to go from a specific sample of plaintext to a specific sample of ciphertext, and most of the solutions won’t work if you’re faced with a new piece of ciphertext; especially not if the original samples are small, so only a small amount of (potential) information would be expected to be included in the text strings.

If you only get a small amount of plaintext and corresponding cipher text you may decide that algorithm A is the one that was applied to the message, even if the algorithm actually applied was a more complex algorithm, B. To illustrate in a very simple way how this might happen, A might be a particular case of B, because B is a superset of A and a large number of other potential encryption algorithms applied in the encryption scheme B (…or the encryption scheme C, because B also happens to be a subset of C, or… etc.). In such a context A might be an encryption scheme/approach that perhaps only applies in very specific contexts; for example (part of) the coding algorithm might have been to decide that ‘on next Tuesday, we’ll use this specific algorithm to translate plaintext into cipher text, and we’ll never use that specific translation-/mapping algorithm (which may be but one component of the encryption algorithm) again’. If such a situation applies then you’re faced with the problem that even if your rule ‘worked’ in that particular instance, in terms of translating your plaintext into cipher text and vice versa, it only ‘worked’ because you blindly fitted the two data-sets in a way that looked right, even if you actually had no idea how the coding scheme really worked (you only guessed A, not B, and in this particular instance A’s never actually going to happen again).

On a more general level some of the above comments incidentally in my view quite obviously links to results from classical statistics; there are many ways to link random variables through data fitting methods, but reliably identifying proper causal linkages through the application of such approaches is, well, difficult (and, according to some, often ill-advised)…

ii. In my view, it does not seem possible in general to prove that any specific proposed encryption/decryption algorithm is ‘the correct one’. This is because the proposed algorithm will never be a unique solution to the problem you’re evaluating. How are you going to convince me that The True Algorithm is not a more general/complex one (or perhaps a completely different one – see iii. below) than the one you propose, and that your solution is not missing relevant variables? The only way to truly test if the proposed algorithm is a valid algorithm is to test it on new data and compare its performance on this new data set with the performances of competing variables/solution proposals which also managed to correctly link cipher text and plaintext. If the algorithm doesn’t work on the new data, you got it wrong. If it does work on new data, well, you might still just have been lucky. You might get more confident with more correctly-assessed (…guessed?) data, but you never get certain. In other similar contexts a not uncommon approach for trying to get around these sorts of problems is to limit the analysis to a subset of the data available in order to obtain the algorithm, and then using the rest of the data for validation purposes (here’s a relevant link), but here even with highly efficient estimation approaches you almost certainly will run out of information (/degrees of freedom) long before you get anywhere if the encryption algorithm is at all non-trivial. In these settings information is likely to be a limiting resource.

iii. There are many different types of encryption schemes, and people who ask questions like the one above tend, I believe, to have a quite limited view of which methods and approaches are truly available to one who desires secrecy when exchanging information with others. Imagine a situation where the plaintext is ‘See you next Wednesday’ and the encrypted text is an English translation of Tolstoy’s book War and Peace (or, to make it even more fun, all pages published on the English version of Wikipedia, say on November the 5th, 2017 at midnight GMT). That’s an available encryption approach that might be applied. It might be a part (‘A’) of a more general (‘B’) encryption approach of linking specific messages from a preconceived list of messages, which had been considered worth sending in the future when the algorithm was chosen, to specific book titles decided on in advance. So if you want to say ‘good Sunday!’, Eve gets to read the Bible and see where that gets her. You could also decide that in half of all cases the book cipher text links to specific messages from a list but in the other half of the cases what you actually mean to communicate is on page 21 of the book; this might throw a hacker who saw a combined cipher text and plaintext combination resulting from that part of the algorithm off in terms of the other half, and vice versa – and it illustrates well one of the key problems you’re faced with as an attacker when working on cryptographic schemes about which you have limited knowledge; the opponent can always add new layers on top of the ones that already exist/apply to make the problem harder to solve. And so you could also link the specific list message with some really complicated cipher-encrypted version of the Bible. There’s a lot more to encryption schemes than just exchanging a few letters here and there. On related topics, see this link. On a different if related topic, people who desire secrecy when exchanging information may also attempt to try to hide the fact that any secrets are exchanged in the first place. See also this.

iv. The specific usage of the word ‘password’ in the original query calls for comment for multiple reasons, some of which have been touched upon above, perhaps mainly because it implicitly betrays a lack of knowledge about how modern cryptographic systems actually work. The thing is, even if you might consider an encryption scheme to just be an advanced sort of ‘password’, finding the password (singular) is not always the task you’re faced with today. In symmetric-key algorithm settings you might sort-of-kind-of argue that it sort-of is – in such settings you might say that you have one single (collection of) key(s) which you use to encrypt messages and also use to decrypt the messages. So you can both encrypt and decrypt the message using the same key(s), and so you only have one ‘password’. That’s however not how asymmetric-key encryption works. As wiki puts it: “In an asymmetric key encryption scheme, anyone can encrypt messages using the public key, but only the holder of the paired private key can decrypt.”

This of course relates to what you actually want to do/achieve when you get your samples of cipher text and plaintext. In some cryptographic contexts by design the route you need to to go to get from cipher text to plaintext is conceptually different from the route you need to go to get from plaintext to cipher text. And some of the ‘passwords’ that relate to how the schemes work are public knowledge by design.

v. I have already touched a bit upon the problem of the existence of an information constraint, but I realized I probably need to spell this out in a bit more detail. The original inquirer to me seems implicitly to be under the misapprehension that computational complexity is the only limiting constraint here (“By “difficult” I mean requiring an inordinate amount of computation.”). Given the setting he or she proposes, I don’t think that’s true, and why that is is sort of interesting.

If you think about what kind of problem you’re facing, what you have here in this setting is really a very limited amount of data which relates in an unknown manner to an unknown data-generating process (‘algorithm’). There are, as has been touched upon, in general many ways to obtain linkage between two data sets (the cipher text and the plaintext) using an algorithm – too many ways for comfort, actually. The search space is large, there are too many algorithms to consider; or equivalently, the amount of information supplied by the data will often be too small for us to properly evaluate the algorithms under consideration. An important observation is that more complex algorithms will both take longer to calculate (‘identify’ …at least as candidates) and be expected to require more data to evaluate, at least to the extent that algorithmic complexity constrains the data (/relates to changes in data structure/composition that needs to be modeled in order to evaluate/identify the goal algorithm). If the algorithm says a different encryption rule is at work on Wednesdays, you’re going to have trouble figuring that out if you only got hold of a cipher text/plaintext combination derived from an exchange which took place on a Saturday. There are methods from statistics that might conceivably help you deal with problems like these, but they have their own issues and trade-offs. You might limit yourself to considering only settings where you have access to all known plaintext and cipher text combinations, so you got both Wednesday and Saturday, but even here you can’t be safe – next (metaphorical, I probably at this point need to add) Friday might be different from last (metaphorical) Friday, and this could even be baked into the algorithm in very non-obvious ways.

The above remarks might give you the idea that I’m just coming up with these kinds of suggestions to try to foil your approaches to figuring out the algorithm ‘by cheating’ (…it shouldn’t matter whether or not it was ‘sent on a Saturday’), but the main point is that a complex encryption algorithm is complex, and even if you see it applied multiple times you might not get enough information about how it works from the data suggested to be able to evaluate if you guessed right. In fact, given a combination of a sparse data set (one message, or just a few messages, in plaintext and cipher text) and a complex algorithm involving a very non-obvious mapping function, the odds are strongly against you.

vi. I had the thought that one reason why the inquirer might be confused about some of these things is that s/he might well be aware of the existence of modern cryptographic techniques which do rely to a significant extent on computational complexity aspects. I.e., here you do have settings where you’re asked to provide ‘the right answer’ (‘the password’), but it’s hard to calculate the right answer in a reasonable amount of time unless you have the relevant (private) information at hand – see e.g. these links for more. One way to think about how such a problem relates to the other problem at hand (you have been presented with samples of cipher text and plaintext and you want to guess all the details about how the encryption and decryption schemes which were applied work) is that this kind of algorithm/approach may be applied in combination with other algorithmic approaches to encrypt/decrypt the text you’re analyzing. A really tough prime factorization problem might for all we know be an embedded component of the cryptographic process that is applied to our text. We could call it A.

In such a situation we would definitely be in trouble because stuff like prime factorization is really hard and computationally complex, and to make matters worse just looking at the plaintext and the cipher text would not make it obvious to us that a prime factorization scheme had even been applied to the data. But a really important point is that even if such a tough problem was not present and even if only relatively less computationally demanding problems were involved, we almost certainly still just wouldn’t have enough information to break any semi-decent encryption algorithm based on a small sample of plaintext and cipher text. It might help a little bit, but in the setting contemplated by the inquirer a ‘faster computer’ (/…’more efficient decision algorithm’, etc.) can only help so much.

vii. Shannon and Kerckhoffs may have a point in a general setting, but in specific settings like this particular one I think it is well worth taking into account the implications of not having a (publicly) known algorithm to attack. As wiki notes (see the previous link), ‘Many ciphers are actually based on publicly known algorithms or are open source and so it is only the difficulty of obtaining the key that determines security of the system’. The above remarks were of course all based on an assumption that Eve does not here have the sort of knowledge about the encryption scheme applied that she in many cases today actually might have. There are obvious and well-known weaknesses associated with having security-associated components of a specific cryptographic scheme be independent of the key, but I do not see how it does not in this particular setting cause search space blow-up making the decision problem (did we actually guess right?) intractable in many cases. A key feature of the problem considered by the inquirer is that you here – unlike in many ‘guess the password-settings’ where for example a correct password will allow you access to an application or a document or whatever – do not get any feedback neither in the case where you guess right nor in the case where you guess wrong; it’s a decision problem, not a calculation problem. (However it is perhaps worth noting on the other hand that in a ‘standard guess-the-password-problem’ you may also sometimes implicitly face a similar decision problem due to e.g. the potential for a combination of cryptographic security and steganographic complementary strategies like e.g. these having been applied).

August 14, 2018 Posted by | Computer science, Cryptography, Data, rambling nonsense, Statistics | Leave a comment

Combinatorics (I)

This book is not a particularly easy read, compared to what is the general format of the series in which it is published, but this is a good thing in my view as it also means the author managed to go into enough details in specific contexts to touch upon at least some properties/topics of interest. You don’t need any specific background knowledge to read and understand the book – at least not any sort of background knowledge one would not expect someone who might decide to read a book like this one to already have – but you do need when reading it to have the sort of mental surplus that enables you to think carefully about what’s going on and devote a few mental resources to understanding the details.

Some quotes and links from the first half of the book below.

“The subject of combinatorial analysis or combinatorics […] [w]e may loosely describe [as] the branch of mathematics concerned with selecting, arranging, constructing, classifying, and counting or listing things. […] the subject involves finite sets or discrete elements that proceed in separate steps […] rather than continuous systems […] Mathematicians sometimes use the term ‘combinatorics’ to refer to a larger subset of discrete mathematics that includes graph theory. In that case, what is commonly called combinatorics is then referred to as ‘enumeration’. […] Combinatorics now includes a wide range of topics, some of which we cover in this book, such as the geometry of tilings and polyhedra […], the theory of graphs […], magic squares and latin squares […], block designs and finite projective planes […], and partitions of numbers […]. [The] chapters [of the book] are largely independent of each other and can be read in any order. Much of combinatorics originated in recreational pastimes […] in recent years the subject has developed in depth and variety and has increasingly become a part of mainstream mathematics. […] Undoubtedly part of the reason for the subject’s recent importance has arisen from the growth of computer science and the increasing use of algorithmic methods for solving real-world practical problems. These have led to combinatorial applications in a wide range of subject areas, both within and outside mathematics, including network analysis, coding theory, probability, virology, experimental design, scheduling, and operations research.”

“[C]ombinatorics is primarily concerned with four types of problem:
Existence problem: Does □□□ exist?
Construction problem: If □□□ exists, how can we construct it?
Enumeration problem: How many □□□ are there?
Optimization problem: Which □□□ is best? […]
[T]hese types of problems are not unrelated; for example, the easiest way to prove that something exists may be to construct it explicitly.”

“In this book we consider two types of enumeration problem – counting problems in which we simply wish to know the number of objects involved, and listing problems in which we want to list them all explicitly. […] It’s useful to have some basic counting rules […] In what follows, all the sets are finite. […] In general we have the following rule; here, subsets are disjoint if they have no objects in common: Addition rule: To find the number of objects in a set, split the set into disjoint subsets, count the objects in each subset, and add the results. […] Subtraction rule: If a set of objects can be split into two subsets A and B, then the number of objects in B is obtained by subtracting the number of objects in A from the number in the whole set. […] The subtraction rule extends easily to sets that are split into more than two subsets with no elements in common. […] the inclusion-exclusion principle […] extends this simple idea to the situation where the subsets may have objects in common. […] In general we have the following result: Multiplication rule: If a counting problem can be split into stages with several options at each stage, then the total number of possibilities is the product of options at each stage. […] Another useful principle in combinatorics is the following: Correspondence rule: We can solve a counting problem if we can put the objects to be counted in one-to-one correspondence with the objects of a set that we have already counted. […] We conclude this section with one more rule: Division rule: If a set of n elements can be split into m disjoint subsets, each of size k, then m = n / k.”

“Every algorithm has a running time […] this may be the time that a computer needs to carry out all the necessary calculations, or the actual number of such calculations. Each problem [also] has an input size […] the running time T usually depends on the input size n. Particularly important, because they’re the most efficient, are the polynomial-time algorithms, where the maximum running time is proportional to a power of the input size […] The collection of all polynomial-time algorithms is called P. […] In contrast, there are inefficient algorithms that don’t take polynomial time, such as the exponential-time algorithms […] At this point we introduce NP, the set of ‘non-deterministic polynomial-time problems’. These are algorithms for which a solution, when given, can be checked in polynomial time. Clearly P is contained in NP, since if a problem can be solved in polynomial time then a solution can certainly be checked in polynomial time – checking solutions is far easier than finding them in the first place. But are they the same? […] Few people people believe that the answer is ‘yes’, but no one has been able to prove that P ≠ NP. […] a problem is NP-complete if its solution in polynomial time means that every NP problem can be solved in polynomial time. […] If there were a polynomial algorithm for just one of them, then polynomial algorithms would exist for the whole lot and P would equal NP. On the other hand, if just one of them has no polynomial algorithm, then none of the others could have a polynomial algorithm either, and P would be different from NP.”

“In how many different ways can n objects be arranged? […] generally, we have the following result: Arrangements: The number of arrangements of n objects is n x (n -1) x (n – 2) x … x 3 x 2 x 1. This number is called n factorial and is denoted by n!. […] The word permutation is used in different ways. We’ll use it to mean an ordered selection without repetition, while others may use it to mean an arrangement […] generally, we have the following rule: Ordered selections without repetition (permutations): If we select k items from a set of n objects, and if the selections are ordered and repetition is not allowed, then the number of possible selections is n x (n – 1) x (n – 2) x … x (n – k +1). We denote this expression by P(n,k). […] Since P(n, n) = n x (n -1) x (n – 2) x … x 3 x 2 x 1 = n!, an arrangement is a permutation for which k = n. […] generally, we have the following result: P(n,k) = n! /(n-k)!. […] unordered selections without repetition are called combinations, giving rise to the words combinatorial and combinatorics. […] generally, we have the following result: Unordered selections without repetition (combinations): If we select k items from a set of n objects, and if the selections are unordered and repetition is not allowed, then the number of possible selections is P(n,k)/k! = n x (n-1) x (n-2) x … x (n – k + 1)/k!. We denote this expression by C(n,k) […] Unordered selections with repetition: If we select k items from a set of n objects, and if the selections are unordered and repetition is allowed, then the number of possible selections is C(n + k – 1, k). […] Combination rule 1: For any numbers k and n with n, C(n,k) = C(n,n-k) […] Combination rule 2: For any numbers n and k with n, C(n, n-k) = n!/(n-k)!(n-(n-k))! = n!/(n-k)!k! = C(n,k). […] Combination rule 3: For any number n, C(n,0) + C(n,1) + C(n,2) + … + C(n,n-1) + C(n,n) = 2n


Knight’s tour.
Seven Bridges of Königsberg problem.
Three utilities problem.
Four color theorem.
Tarry’s algorithm (p.7) (formulated slightly differently in the book, but it’s the same algorithm).
Arthur Cayley.
Combinatorial principles.
Minimum connector problem.
Travelling salesman problem.
Algorithmic efficiency. Running time/time complexity.
Boolean satisfiability problem. Cook–Levin theorem.
Mersenne primes.
Permutation. Factorial. Stirling’s formula.
Birthday problem.
Manhattan distance.
Fibonacci number.
Pascal’s triangle. Binomial coefficient. Binomial theorem.
Pigeonhole principle.
Venn diagram.
Derangement (combinatorial mathematics).
Tower of Hanoi.
Stable marriage problem. Transversal (combinatorics). Hall’s marriage theorem.
Generating function (the topic covered in the book more specifically is related to a symbolic generator of the subsets of a set, but a brief search yielded no good links to this particular topic – US).
Group theory.
Ferdinand Frobenius. Burnside’s lemma.

August 4, 2018 Posted by | Books, Computer science, Mathematics | 1 Comment

Lyapunov Arguments in Optimization

I’d say that if you’re interested in the intersection of mathematical optimization methods/-algorithms and dynamical systems analysis it’s probably a talk well worth watching. The lecture is reasonably high-level and covers a fairly satisfactory amount of ground in a relatively short amount of time, and it is not particularly hard to follow if you have at least some passing familiarity with the fields involved (dynamical systems analysis, statistics, mathematical optimization, computer science/machine learning).

Some links:

Dynamical system.
Euler–Lagrange equation.
Continuous optimization problem.
Gradient descent algorithm.
Lyapunov stability.
Condition number.
Fast (/accelerated-) gradient descent methods.
The Mirror Descent Algorithm.
Cubic regularization of Newton method and its global performance (Nesterov & Polyak).
A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights (Su, Boyd & Candès).
A Variational Perspective on Accelerated Methods in Optimization (Wibisono, Wilson & Jordan).
Breaking Locality Accelerates Block Gauss-Seidel (Tu, Venkataraman, Wilson, Gittens, Jordan & Recht).
A Lyapunov Analysis of Momentum Methods in Optimization (Wilson, Recht & Jordan).
Bregman divergence.
Estimate sequence methods.
Variance reduction techniques.
Stochastic gradient descent.
Langevin dynamics.


July 22, 2018 Posted by | Computer science, Lectures, Mathematics, Physics, Statistics | Leave a comment

Big Data (II)

Below I have added a few observation from the last half of the book, as well as some coverage-related links to topics of interest.

“With big data, using correlation creates […] problems. If we consider a massive dataset, algorithms can be written that, when applied, return a large number of spurious correlations that are totally independent of the views, opinions, or hypotheses of any human being. Problems arise with false correlations — for example, divorce rate and margarine consumption […]. [W]hen the number of variables becomes large, the number of spurious correlations also increases. This is one of the main problems associated with trying to extract useful information from big data, because in doing so, as with mining big data, we are usually looking for patterns and correlations. […] one of the reasons Google Flu Trends failed in its predictions was because of these problems. […] The Google Flu Trends project hinged on the known result that there is a high correlation between the number of flu-related online searches and visits to the doctor’s surgery. If a lot of people in a particular area are searching for flu-related information online, it might then be possible to predict the spread of flu cases to adjoining areas. Since the interest is in finding trends, the data can be anonymized and hence no consent from individuals is required. Using their five-year accumulation of data, which they limited to the same time-frame as the CDC data, and so collected only during the flu season, Google counted the weekly occurrence of each of the fifty million most common search queries covering all subjects. These search query counts were then compared with the CDC flu data, and those with the highest correlation were used in the flu trends model. […] The historical data provided a baseline from which to assess current flu activity on the chosen search terms and by comparing the new real-time data against this, a classification on a scale from 1 to 5, where 5 signified the most severe, was established. Used in the 2011–12 and 2012–13 US flu seasons, Google’s big data algorithm famously failed to deliver. After the flu season ended, its predictions were checked against the CDC’s actual data. […] the Google Flu Trends algorithm over-predicted the number of flu cases by at least 50 per cent during the years it was used.” [For more details on why blind/mindless hypothesis testing/p-value hunting on big data sets is usually a terrible idea, see e.g. Burnham & Anderson, US]

“The data Google used [in the Google Flu Trends algorithm], collected selectively from search engine queries, produced results [with] obvious bias […] for example by eliminating everyone who does not use a computer and everyone using other search engines. Another issue that may have led to poor results was that customers searching Google on ‘flu symptoms’ would probably have explored a number of flu-related websites, resulting in their being counted several times and thus inflating the numbers. In addition, search behaviour changes over time, especially during an epidemic, and this should be taken into account by updating the model regularly. Once errors in prediction start to occur, they tend to cascade, which is what happened with the Google Flu Trends predictions: one week’s errors were passed along to the next week. […] [Similarly,] the Ebola prediction figures published by WHO [during the West African Ebola virus epidemic] were over 50 per cent higher than the cases actually recorded. The problems with both the Google Flu Trends and Ebola analyses were similar in that the prediction algorithms used were based only on initial data and did not take into account changing conditions. Essentially, each of these models assumed that the number of cases would continue to grow at the same rate in the future as they had before the medical intervention began. Clearly, medical and public health measures could be expected to have positive effects and these had not been integrated into the model.”

“Every time a patient visits a doctor’s office or hospital, electronic data is routinely collected. Electronic health records constitute legal documentation of a patient’s healthcare contacts: details such as patient history, medications prescribed, and test results are recorded. Electronic health records may also include sensor data such as Magnetic Resonance Imaging (MRI) scans. The data may be anonymized and pooled for research purposes. It is estimated that in 2015, an average hospital in the USA will store over 600 Tb of data, most of which is unstructured. […] Typically, the human genome contains about 20,000 genes and mapping such a genome requires about 100 Gb of data. […] The interdisciplinary field of bioinformatics has flourished as a consequence of the need to manage and analyze the big data generated by genomics. […] Cloud-based systems give authorized users access to data anywhere in the world. To take just one example, the NHS plans to make patient records available via smartphone by 2018. These developments will inevitably generate more attacks on the data they employ, and considerable effort will need to be expended in the development of effective security methods to ensure the safety of that data. […] There is no absolute certainty on the Web. Since e-documents can be modified and updated without the author’s knowledge, they can easily be manipulated. This situation could be extremely damaging in many different situations, such as the possibility of someone tampering with electronic medical records. […] [S]ome of the problems facing big data systems [include] ensuring they actually work as intended, [that they] can be fixed when they break down, and [that they] are tamper-proof and accessible only to those with the correct authorization.”

“With transactions being made through sales and auction bids, eBay generates approximately 50 Tb of data a day, collected from every search, sale, and bid made on their website by a claimed 160 million active users in 190 countries. […] Amazon collects vast amounts of data including addresses, payment information, and details of everything an individual has ever looked at or bought from them. Amazon uses its data in order to encourage the customer to spend more money with them by trying to do as much of the customer’s market research as possible. In the case of books, for example, Amazon needs to provide not only a huge selection but to focus recommendations on the individual customer. […] Many customers use smartphones with GPS capability, allowing Amazon to collect data showing time and location. This substantial amount of data is used to construct customer profiles allowing similar individuals and their recommendations to be matched. Since 2013, Amazon has been selling customer metadata to advertisers in order to promote their Web services operation […] Netflix collects and uses huge amounts of data to improve customer service, such as offering recommendations to individual customers while endeavouring to provide reliable streaming of its movies. Recommendation is at the heart of the Netflix business model and most of its business is driven by the data-based recommendations it is able to offer customers. Netflix now tracks what you watch, what you browse, what you search for, and the day and time you do all these things. It also records whether you are using an iPad, TV, or something else. […] As well as collecting search data and star ratings, Netflix can now keep records on how often users pause or fast forward, and whether or not they finish watching each programme they start. They also monitor how, when, and where they watched the programme, and a host of other variables too numerous to mention.”

“Data science is becoming a popular study option in universities but graduates so far have been unable to meet the demands of commerce and industry, where positions in data science offer high salaries to experienced applicants. Big data for commercial enterprises is concerned with profit, and disillusionment will set in quickly if an over-burdened data analyst with insufficient experience fails to deliver the expected positive results. All too often, firms are asking for a one-size-fits-all model of data scientist who is expected to be competent in everything from statistical analysis to data storage and data security.”

“In December 2016, Yahoo! announced that a data breach involving over one billion user accounts had occurred in August 2013. Dubbed the biggest ever cyber theft of personal data, or at least the biggest ever divulged by any company, thieves apparently used forged cookies, which allowed them access to accounts without the need for passwords. This followed the disclosure of an attack on Yahoo! in 2014, when 500 million accounts were compromised. […] The list of big data security breaches increases almost daily. Data theft, data ransom, and data sabotage are major concerns in a data-centric world. There have been many scares regarding the security and ownership of personal digital data. Before the digital age we used to keep photos in albums and negatives were our backup. After that, we stored our photos electronically on a hard-drive in our computer. This could possibly fail and we were wise to have back-ups but at least the files were not publicly accessible. Many of us now store data in the Cloud. […] If you store all your photos in the Cloud, it’s highly unlikely with today’s sophisticated systems that you would lose them. On the other hand, if you want to delete something, maybe a photo or video, it becomes difficult to ensure all copies have been deleted. Essentially you have to rely on your provider to do this. Another important issue is controlling who has access to the photos and other data you have uploaded to the Cloud. […] although the Internet and Cloud-based computing are generally thought of as wireless, they are anything but; data is transmitted through fibre-optic cables laid under the oceans. Nearly all digital communication between continents is transmitted in this way. My email will be sent via transatlantic fibre-optic cables, even if I am using a Cloud computing service. The Cloud, an attractive buzz word, conjures up images of satellites sending data across the world, but in reality Cloud services are firmly rooted in a distributed network of data centres providing Internet access, largely through cables. Fibre-optic cables provide the fastest means of data transmission and so are generally preferable to satellites.”


Health care informatics.
Electronic health records.
European influenza surveillance network.
Public Health Emergency of International Concern.
Virtual Physiological Human project.
Watson (computer).
Natural language processing.
Anthem medical data breach.
Electronic delay storage automatic calculator (EDSAC). LEO (computer). ICL (International Computers Limited).
E-commerce. Online shopping.
Pay-per-click advertising model. Google AdWords. Click fraud. Targeted advertising.
Recommender system. Collaborative filtering.
Anticipatory shipping.
BlackPOS Malware.
Data Encryption Standard algorithm. EFF DES cracker.
Advanced Encryption Standard.
Tempora. PRISM (surveillance program). Edward Snowden. WikiLeaks. Tor (anonymity network). Silk Road (marketplace). Deep web. Internet of Things.
Songdo International Business District. Smart City.
United Nations Global Pulse.

July 19, 2018 Posted by | Books, Computer science, Cryptography, Data, Engineering, Epidemiology, Statistics | Leave a comment

Big Data (I?)

Below a few observations from the first half of the book, as well as some links related to the topic coverage.

“The data we derive from the Web can be classified as structured, unstructured, or semi-structured. […] Carefully structured and tabulated data is relatively easy to manage and is amenable to statistical analysis, indeed until recently statistical analysis methods could be applied only to structured data. In contrast, unstructured data is not so easily categorized, and includes photos, videos, tweets, and word-processing documents. Once the use of the World Wide Web became widespread, it transpired that many such potential sources of information remained inaccessible because they lacked the structure needed for existing analytical techniques to be applied. However, by identifying key features, data that appears at first sight to be unstructured may not be completely without structure. Emails, for example, contain structured metadata in the heading as well as the actual unstructured message […] and so may be classified as semi-structured data. Metadata tags, which are essentially descriptive references, can be used to add some structure to unstructured data. […] Dealing with unstructured data is challenging: since it cannot be stored in traditional databases or spreadsheets, special tools have had to be developed to extract useful information. […] Approximately 80 per cent of the world’s data is unstructured in the form of text, photos, and images, and so is not amenable to the traditional methods of structured data analysis. ‘Big data’ is now used to refer not just to the total amount of data generated and stored electronically, but also to specific datasets that are large in both size and complexity, with which new algorithmic techniques are required in order to extract useful information from them.”

“In the digital age we are no longer entirely dependent on samples, since we can often collect all the data we need on entire populations. But the size of these increasingly large sets of data cannot alone provide a definition for the term ‘big data’ — we must include complexity in any definition. Instead of carefully constructed samples of ‘small data’ we are now dealing with huge amounts of data that has not been collected with any specific questions in mind and is often unstructured. In order to characterize the key features that make data big and move towards a definition of the term, Doug Laney, writing in 2001, proposed using the three ‘v’s: volume, variety, and velocity. […] ‘Volume’ refers to the amount of electronic data that is now collected and stored, which is growing at an ever-increasing rate. Big data is big, but how big? […] Generally, we can say the volume criterion is met if the dataset is such that we cannot collect, store, and analyse it using traditional computing and statistical methods. […] Although a great variety of data [exists], ultimately it can all be classified as structured, unstructured, or semi-structured. […] Velocity is necessarily connected with volume: the faster the data is generated, the more there is. […] Velocity also refers to the speed at which data is electronically processed. For example, sensor data, such as that generated by an autonomous car, is necessarily generated in real time. If the car is to work reliably, the data […] must be analysed very quickly […] Variability may be considered as an additional dimension of the velocity concept, referring to the changing rates in flow of data […] computer systems are more prone to failure [during peak flow periods]. […] As well as the original three ‘v’s suggested by Laney, we may add ‘veracity’ as a fourth. Veracity refers to the quality of the data being collected. […] Taken together, the four main characteristics of big data – volume, variety, velocity, and veracity – present a considerable challenge in data management.” [As regular readers of this blog might be aware, not everybody would agree with the author here about the inclusion of veracity as a defining feature of big data – “Many have suggested that there are more V’s that are important to the big data problem [than volume, variety & velocity] such as veracity and value (IEEE BigData 2013). Veracity refers to the trustworthiness of the data, and value refers to the value that the data adds to creating knowledge about a topic or situation. While we agree that these are important data characteristics, we do not see these as key features that distinguish big data from regular data. It is important to evaluate the veracity and value of all data, both big and small. (Knoth & Schmid)]

“Anyone who uses a personal computer, laptop, or smartphone accesses data stored in a database. Structured data, such as bank statements and electronic address books, are stored in a relational database. In order to manage all this structured data, a relational database management system (RDBMS) is used to create, maintain, access, and manipulate the data. […] Once […] the database [has been] constructed we can populate it with data and interrogate it using structured query language (SQL). […] An important aspect of relational database design involves a process called normalization which includes reducing data duplication to a minimum and hence reduces storage requirements. This allows speedier queries, but even so as the volume of data increases the performance of these traditional databases decreases. The problem is one of scalability. Since relational databases are essentially designed to run on just one server, as more and more data is added they become slow and unreliable. The only way to achieve scalability is to add more computing power, which has its limits. This is known as vertical scalability. So although structured data is usually stored and managed in an RDBMS, when the data is big, say in terabytes or petabytes and beyond, the RDBMS no longer works efficiently, even for structured data. An important feature of relational databases and a good reason for continuing to use them is that they conform to the following group of properties: atomicity, consistency, isolation, and durability, usually known as ACID. Atomicity ensures that incomplete transactions cannot update the database; consistency excludes invalid data; isolation ensures one transaction does not interfere with another transaction; and durability means that the database must update before the next transaction is carried out. All these are desirable properties but storing and accessing big data, which is mostly unstructured, requires a different approach. […] given the current data explosion there has been intensive research into new storage and management techniques. In order to store these massive datasets, data is distributed across servers. As the number of servers involved increases, the chance of failure at some point also increases, so it is important to have multiple, reliably identical copies of the same data, each stored on a different server. Indeed, with the massive amounts of data now being processed, systems failure is taken as inevitable and so ways of coping with this are built into the methods of storage.”

“A distributed file system (DFS) provides effective and reliable storage for big data across many computers. […] Hadoop DFS [is] one of the most popular DFS […] When we use Hadoop DFS, the data is distributed across many nodes, often tens of thousands of them, physically situated in data centres around the world. […] The NameNode deals with all requests coming in from a client computer; it distributes storage space, and keeps track of storage availability and data location. It also manages all the basic file operations (e.g. opening and closing files) and controls data access by client computers. The DataNodes are responsible for actually storing the data and in order to do so, create, delete, and replicate blocks as necessary. Data replication is an essential feature of the Hadoop DFS. […] It is important that several copies of each block are stored so that if a DataNode fails, other nodes are able to take over and continue with processing tasks without loss of data. […] Data is written to a DataNode only once but will be read by an application many times. […] One of the functions of the NameNode is to determine the best DataNode to use given the current usage, ensuring fast data access and processing. The client computer then accesses the data block from the chosen node. DataNodes are added as and when required by the increased storage requirements, a feature known as horizontal scalability. One of the main advantages of Hadoop DFS over a relational database is that you can collect vast amounts of data, keep adding to it, and, at that time, not yet have any clear idea of what you want to use it for. […] structured data with identifiable rows and columns can be easily stored in a RDBMS while unstructured data can be stored cheaply and readily using a DFS.”

NoSQL is the generic name used to refer to non-relational databases and stands for Not only SQL. […] The non-relational model has some features that are necessary in the management of big data, namely scalability, availability, and performance. With a relational database you cannot keep scaling vertically without loss of function, whereas with NoSQL you scale horizontally and this enables performance to be maintained. […] Within the context of a distributed database system, consistency refers to the requirement that all copies of data should be the same across nodes. […] Availability requires that if a node fails, other nodes still function […] Data, and hence DataNodes, are distributed across physically separate servers and communication between these machines will sometimes fail. When this occurs it is called a network partition. Partition tolerance requires that the system continues to operate even if this happens. In essence, what the CAP [Consistency, Availability, Partition Tolerance] Theorem states is that for any distributed computer system, where the data is shared, only two of these three criteria can be met. There are therefore three possibilities; the system must be: consistent and available, consistent and partition tolerant, or partition tolerant and available. Notice that since in a RDMS the network is not partitioned, only consistency and availability would be of concern and the RDMS model meets both of these criteria. In NoSQL, since we necessarily have partitioning, we have to choose between consistency and availability. By sacrificing availability, we are able to wait until consistency is achieved. If we choose instead to sacrifice consistency it follows that sometimes the data will differ from server to server. The somewhat contrived acronym BASE (Basically Available, Soft, and Eventually consistent) is used as a convenient way of describing this situation. BASE appears to have been chosen in contrast to the ACID properties of relational databases. ‘Soft’ in this context refers to the flexibility in the consistency requirement. The aim is not to abandon any one of these criteria but to find a way of optimizing all three, essentially a compromise. […] The name NoSQL derives from the fact that SQL cannot be used to query these databases. […] There are four main types of non-relational or NoSQL database: key-value, column-based, document, and graph – all useful for storing large amounts of structured and semi-structured data. […] Currently, an approach called NewSQL is finding a niche. […] the aim of this latent technology is to solve the scalability problems associated with the relational model, making it more useable for big data.”

“A popular way of dealing with big data is to divide it up into small chunks and then process each of these individually, which is basically what MapReduce does by spreading the required calculations or queries over many, many computers. […] Bloom filters are particularly suited to applications where storage is an issue and where the data can be thought of as a list. The basic idea behind Bloom filters is that we want to build a system, based on a list of data elements, to answer the question ‘Is X in the list?’ With big datasets, searching through the entire set may be too slow to be useful, so we use a Bloom filter which, being a probabilistic method, is not 100 per cent accurate—the algorithm may decide that an element belongs to the list when actually it does not; but it is a fast, reliable, and storage efficient method of extracting useful knowledge from data. Bloom filters have many applications. For example, they can be used to check whether a particular Web address leads to a malicious website. In this case, the Bloom filter would act as a blacklist of known malicious URLs against which it is possible to check, quickly and accurately, whether it is likely that the one you have just clicked on is safe or not. Web addresses newly found to be malicious can be added to the blacklist. […] A related example is that of malicious email messages, which may be spam or may contain phishing attempts. A Bloom filter provides us with a quick way of checking each email address and hence we would be able to issue a timely warning if appropriate. […] they can [also] provide a very useful way of detecting fraudulent credit card transactions.”


Punched card.
Clickstream log.
HTTP cookie.
Australian Square Kilometre Array Pathfinder.
The Millionaire Calculator.
Data mining.
Supervised machine learning.
Unsupervised machine learning.
Statistical classification.
Cluster analysis.
Moore’s Law.
Cloud storage. Cloud computing.
Data compression. Lossless data compression. Lossy data compression.
ASCII. Huffman algorithm. Variable-length encoding.
Data compression ratio.
Discrete cosine transform.
Bit array. Hash function.
PageRank algorithm.
Common crawl.

July 14, 2018 Posted by | Books, Computer science, Data, Statistics | Leave a comment


“This book is not about the psychology or cultural anthropology of robotics, interesting as those are. I am an engineer and roboticist, so I confine myself firmly to the technology and application of real physical robots. […] robotics is the study of the design, application, and use of robots, and that is precisely what this Very Short Introduction is about: what robots do and what roboticists do.”

The above quote is from the book‘s preface; the book is quite decent and occasionally really quite fascinating. Below I have added some sample quotes and links to topics/stuff covered in the book.

“Some of all of […] five functions – sensing, signalling, moving, intelligence, and energy, integrated into a body – are present in all robots. The actual sensors, motors, and behaviours designed into a particular robot body shape depend on the job that robot is designed to do. […] A robot is: 1. an artificial device that can sense its environment and purposefully act on or in that environment; 2. an embodied artificial intelligence; or 3. a machine that can autonomously carry out useful work. […] Many real-world robots […] are not autonomous but remotely operated by humans. […] These are also known as tele-operated robots. […] From a robot design point of view, the huge advantage of tele-operated robots is that the human in the loop provides the robot’s ‘intelligence’. One of the most difficult problems in robotics — the design of the robot’s artificial intelligence — is therefore solved, so it’s not surprising that so many real-world robots are tele-operated. The fact that tele-operated robots alleviate the problem of AI design should not fool us into making the mistake of thinking that tele-operated robots are not sophisticated — they are. […] counter-intuitively, autonomous robots are often simpler than tele-operated robots […] When roboticists talk about autonomous robots they normally mean robots that decide what to do next entirely without human intervention or control. We need to be careful here because they are not talking about true autonomy, in the sense that you or I would regard ourselves as self-determining individuals, but what I would call ‘control autonomy’. By control autonomy I mean that the robot can undertake its task, or mission, without human intervention, but that mission is still programmed or commanded by a human. In fact, there are very few robots in use in the real world that are autonomous even in this limited sense. […] It is helpful to think about a spectrum of robot autonomy, from remotely operated at one end (no autonomy) to fully autonomous at the other. We can then place robots on this spectrum according to their degree of autonomy. […] On a scale of autonomy, a robot that can react on its own in response to its sensors is highly autonomous. A robot that cannot react, perhaps because it doesn’t have any sensors, is not.”

“It is […] important to note that autonomy and intelligence are not the same thing. A robot can be autonomous but not very smart, like a robot vacuum cleaner. […] A robot vacuum cleaner has a small number of preprogrammed (i.e. instinctive) behaviours and is not capable of any kind of learning […] These are characteristics we would associate with very simple animals. […] When roboticists describe a robot as intelligent, what they mean is ‘a robot that behaves, in some limited sense, as if it were intelligent’. The words as if are important here. […] There are basically two ways in which we can make a robot behave as if it is more intelligent: 1. preprogram a larger number of (instinctive) behaviours; and/or 2. design the robot so that it can learn and therefore develop and grow its own intelligence. The first of these approaches is fine, providing that we know everything there is to know about what the robot must do and all of the situations it will have to respond to while it is working. Typically we can only do this if we design both the robot and its operational environment. […] For unstructured environments, the first approach to robot intelligence above is infeasible simply because it’s impossible to anticipate every possible situation a robot might encounter, especially if it has to interact with humans. The only solution is to design a robot so that it can learn, either from its own experience or from humans or other robots, and therefore adapt and develop its own intelligence: in effect, grow its behavioural repertoire to be able to respond appropriately to more and more situations. This brings us to the subject of learning robots […] robot learning or, more generally, ‘machine learning’ — a branch of AI — has proven to be very much harder than was expected in the early days of Artificial Intelligence.”

“Robot arms on an assembly line are typically programmed to go through a fixed sequence of moves over and over again, for instance spot-welding car body panels, or spray-painting the complete car. These robots are therefore not intelligent. In fact, they often have no exteroceptive sensors at all. […] when we see an assembly line with multiple robot arms positioned on either side along a line, we need to understand that the robots are part of an integrated automated manufacturing system, in which each robot and the line itself have to be carefully programmed in order to coordinate and choreograph the whole operation. […] An important characteristic of assembly-line robots is that they require the working environment to be designed for and around them, i.e. a structured environment. They also need that working environment to be absolutely predictable and repeatable. […] Robot arms either need to be painstakingly programmed, so that the precise movement required of each joint is worked out and coded into a set of instructions for the robot arm or, more often (and rather more easily), ‘taught’ by a human using a control pad to move its end-effector (hand) to the required positions in the robot’s workspace. The robot then memorizes the set of joint movements so that they can be replayed (over and over again). The human operator teaching the robot controls the trajectory, i.e. the path the robot arm’s end-effector follows as it moves through its 3D workspace, and a set of mathematical equations called the ‘inverse kinematics’ converts the trajectory into a set of individual joint movements. Using this approach, it is relatively easy to teach a robot arm to pick up an object and move it smoothly to somewhere else in its workspace while keeping the object level […]. However […] most real-world robot arms are unable to sense the weight of the object and automatically adjust accordingly. They are simply designed with stiff enough joints and strong enough motors that, whatever the weight of the object (providing it’s within the robot’s design limits), it can be lifted, moved, and placed with equal precision. […] The robot arm and gripper are a foundational technology in robotics. Not only are they extremely important as […] industrial assembly-line robot[s], but they have become a ‘component’ in many areas of robotics.”

Planetary rovers are tele-operated mobile robots that present the designer and operator with a number of very difficult challenges. One challenge is power: a planetary rover needs to be energetically self-sufficient for the lifetime of its mission, and must either be launched with a power source or — as in the case of the Mars rovers — fitted with solar panels capable of recharging the rover’s on-board batteries. Another challenge is dependability. Any mechanical fault is likely to mean the end of the rover’s mission, so it needs to be designed and built to exceptional standards of reliability and fail-safety, so that if parts of the rover should fail, the robot can still operate, albeit with reduced functionality. Extremes of temperature are also a problem […] But the greatest challenge is communication. With a round-trip signal delay time of twenty minutes to Mars and back, tele-operating the rover in real time is impossible. If the rover is moving and its human operator in the command centre on Earth reacts to an obstacle, it’s likely to be already too late; the robot will have hit the obstacle by the time the command signal to turn reaches the rover. An obvious answer to this problem would seem to be to give the rover a degree of autonomy so that it could, for instance, plan a path to a rock or feature of interest — while avoiding obstacles — then, when it arrives at the point of interest, call home and wait. Although path-planning algorithms capable of this level of autonomy have been well developed, the risk of a failure of the algorithm (and hence perhaps the whole mission) is deemed so high that in practice the rovers are manually tele-operated, at very low speed, with each manual manoeuvre carefully planned. When one also takes into account the fact that the Mars rovers are contactable only for a three-hour window per Martian day, a traverse of 100 metres will typically take up one day of operation at an average speed of 30 metres per hour.”

“The realization that the behaviour of an autonomous robot is an emergent property of its interactions with the world has important and far-reaching consequences for the way we design autonomous robots. […] when we design robots, and especially when we come to decide what behaviours to programme the robot’s AI with, we cannot think about the robot on its own. We must take into account every detail of the robot’s working environment. […] Like all machines, robots need power. For fixed robots, like the robot arms used for manufacture, power isn’t a problem because the robot is connected to the electrical mains supply. But for mobile robots power is a huge problem because mobile robots need to carry their energy supply around with them, with problems of both the size and weight of the batteries and, more seriously, how to recharge those batteries when they run out. For autonomous robots, the problem is acute because a robot cannot be said to be truly autonomous unless it has energy autonomy as well as computational autonomy; there seems little point in building a smart robot that ‘dies’ when its battery runs out. […] Localization is a[nother] major problem in mobile robotics; in other words, how does a robot know where it is, in 2D or 3D space. […] [One] type of robot learning is called reinforcement learning. […] it is a kind of conditioned learning. If a robot is able to try out several different behaviours, test the success or failure of each behaviour, then ‘reinforce’ the successful behaviours, it is said to have reinforcement learning. Although this sounds straightforward in principle, it is not. It assumes, first, that a robot has at least one successful behaviour in its list of behaviours to try out, and second, that it can test the benefit of each behaviour — in other words, that the behaviour has an immediate measurable reward. If a robot has to try every possible behaviour or if the rewards are delayed, then this kind of so-called ‘unsupervised’ individual robot learning is very slow.”

“A robot is described as humanoid if it has a shape or structure that to some degree mimics the human form. […] A small subset of humanoid robots […] attempt a greater degree of fidelity to the human form and appearance, and these are referred to as android. […] It is a recurring theme of this book that robot intelligence technology lags behind robot mechatronics – and nowhere is the mismatch between the two so starkly evident as it is in android robots. The problem is that if a robot looks convincingly human, then we (not unreasonably) expect it to behave like a human. For this reason whole-body android robots are, at the time of writing, disappointing. […] It is important not to overstate the case for humanoid robots. Without doubt, many potential applications of robots in human work- or living spaces would be better served by non-humanoid robots. The humanoid robot to use human tools argument doesn’t make sense if the job can be done autonomously. It would be absurd, for instance, to design a humanoid robot in order to operate a vacuum cleaner designed for humans. Similarly, if we want a driverless car, it doesn’t make sense to build a humanoid robot that sits in the driver’s seat. It seems that the case for humanoid robots is strongest when the robots are required to work alongside, learn from, and interact closely with humans. […] One of the most compelling reasons why robots should be humanoid is for those applications in which the robot has to interact with humans, work in human workspaces, and use tools or devices designed for humans.”

“…to put it bluntly, sex with a robot might not be safe. As soon as a robot has motors and moving parts, then assuring the safety of human-robot interaction becomes a difficult problem and if that interaction is intimate, the consequences of a mechanical or control systems failure could be serious.”

“All of the potential applications of humanoid robots […] have one thing in common: close interaction between human and robot. The nature of that interaction will be characterized by close proximity and communication via natural human interfaces – speech, gesture, and body language. Human and robot may or may not need to come into physical contact, but even when direct contact is not required they will still need to be within each other’s body space. It follows that robot safety, dependability, and trustworthiness are major issues for the robot designer. […] making a robot safe isn’t the same as making it trustworthy. One person trusts another if, generally speaking, that person is reliable and does what they say they will. So if I were to provide a robot that helps to look after your grandmother and I claim that it is perfectly safe — that it’s been designed to cover every risk or hazard — would you trust it? The answer is probably not. Trust in robots, just as in humans, has to be earned. […for more on these topics, see this post – US] […] trustworthiness cannot just be designed into the robot — it has to be earned by use and by experience. Consider a robot intended to fetch drinks for an elderly person. Imagine that the person calls for a glass of water. The robot then needs to fetch the drink, which may well require the robot to find a glass and fill it with water. Those tasks require sensing, dexterity, and physical manipulation, but they are problems that can be solved with current technology. The problem of trust arises when the robot brings the glass of water to the human. How does the robot give the glass to the human? If the robot has an arm so that it can hold out the glass in the same way a human would, how would the robot know when to let go? The robot clearly needs sensors in order to see and feel when the human has taken hold of the glass. The physical process of a robot handing something to a person is fraught with difficulty. Imagine, for instance, that the robot holds out its arm with the glass but the human can’t reach the glass. How does the robot decide where and how far it would be safe to bring its arm toward the person? What if the human takes hold of the glass but then the glass slips; does the robot let it fall or should it — as a human would — renew its grip on the glass? At what point would the robot decide the transaction has failed: it can’t give the glass of water to the person, or they won’t take it; perhaps they are asleep, or simply forgotten they wanted a glass of water, or confused. How does the robot sense that it should give up and perhaps call for assistance? These are difficult problems in robot cognition. Until they are solved, it’s doubtful we could trust a robot sufficiently well to do even a seemingly simple thing like handing over a glass of water.”

“The fundamental problem with Asimov’s laws of robotics, or any similar construction, is that they require the robot to make judgments. […] they assume that the robot is capable of some level of moral agency. […] No robot that we can currently build, or will build in the foreseeable future, is ‘intelligent’ enough to be able to even recognize, let alone make, these kinds of choices. […] Most roboticists agree that for the foreseeable future robots cannot be ethical, moral agents. […] precisely because, as we have seen, present-day ‘intelligent’ robots are not very intelligent, there is a danger of a gap between what robot users believe those robots to be capable of and what they are actually capable of. Given humans’ propensity to anthropomorphize and form emotional attachments to machines, there is clearly a danger that such vulnerabilities could be either unwittingly or deliberately exploited. Although robots cannot be ethical, roboticists should be.”

“In robotics research, the simulator has become an essential tool of the roboticist’s trade. The reason for this is that designing, building, and testing successive versions of real robots is both expensive and time-consuming, and if part of that work can be undertaken in the virtual rather than the real world, development times can be shortened, and the chances of a robot that works first time substantially improved. A robot simulator has three essential features. First, it must provide a virtual world. Second, it must offer a facility for creating a virtual model of the real robot. And third, it must allow the robot’s controller to be installed and ‘run’ on the virtual robot in the virtual world; the controller then determines how the robot behaves when running in the simulator. The simulator should also provide a visualization of the virtual world and simulated robots in it so that the designer can see what’s going on. […] These are difficult challenges for developers of robot simulators.”

“The next big step in miniaturization […] requires the solution of hugely difficult problems and, in all likelihood, the use of exotic approaches to design and fabrication. […] It is impossible to shrink mechanical and electrical components, or MEMS devices, in order to reduce total robot size to a few micrometres. In any event, the physics of locomotion through a fluid changes at the microscale and simply shrinking mechanical components from macro to micro — even if it were possible — would fail to address this problem. A radical approach is to leave behind conventional materials and components and move to a bioengineered approach in which natural bacteria are modified by adding artificial components. The result is a hybrid of artificial and natural (biological) components. The bacterium has many desirable properties for a microbot. By selecting a bacterium with a flagellum, we have locomotion perfectly suited to the medium. […] Another hugely desirable characteristic is that the bacteria are able to naturally scavenge for energy, thus avoiding the otherwise serious problem of powering the microbots. […] Whatever technology is used to create the microbots, huge problems would have to be overcome before a swarm of medical microbots could become a practical reality. The first is technical: how do surgeons or medical technicians reliably control and monitor the swarm while it’s working inside the body? Or, assuming we can give the microbots sufficient intelligence and autonomy (also a very difficult challenge), do we forgo precise control and human intervention altogether by giving the robots the swarm intelligence to be able to do the job, i.e. find the problem, fix it, then exit? […] these questions bring us to what would undoubtedly represent the greatest challenge: validating the swarm of medical microbots as effective, dependable, and above all safe, then gaining approval and public acceptance for its use. […] Do we treat the validation of the medical microbot swarm as an engineering problem, and attempt to apply the same kinds of methods we would use to validate safety-critical systems such as air traffic control systems? Or do we instead regard the medical microbot swarm as a drug and validate it with conventional and (by and large) trusted processes, including clinical trials, leading to approval and licensing for use? My suspicion is that we will need a new combination of both approaches.”


E-puck mobile robot.
Jacques de Vaucanson’s Digesting Duck.
Alan Turing. W. Ross Ashby. Norbert Wiener. Warren McCulloch. William Grey Walter.
Turtle (robot).
Industrial robot. Mechanical arm. Robotic arm. Robot end effector.
Automated guided vehicle.
Remotely operated vehicle. Unmanned aerial vehicle. Remotely operated underwater vehicle. Wheelbarrow (robot).
Robot-assisted surgery.
Lego Mindstorms NXT. NXT Intelligent Brick.
Biomimetic robots.
Artificial life.
Braitenberg vehicle.
Shakey the robot. Sense-Plan-Act. Rodney Brooks. A robust layered control system for a mobile robot.
Toto the robot.
Slugbot. Ecobot. Microbial fuel cell.
Simultaneous localization and mapping (SLAM).
Programming by demonstration.
Evolutionary algorithm.
NASA Robonaut. BERT 2. Kismet (robot). Jules (robot). Frubber. Uncanny valley.
AIBO. Paro.
Cronos Robot. ECCEROBOT.
Swarm robotics. S-bot mobile robot. Swarmanoid project.
Artificial neural network.
Microelectromechanical systems. I-SWARM project.
ALICE (Artificial Linguistic Internet Computer Entity). BINA 48 (Breakthrough Intelligence via Neural Architecture 48).

June 15, 2018 Posted by | Books, Computer science, Engineering, Medicine | Leave a comment

Mathematics in Cryptography III

As she puts it herself, most of this lecture [~first 47 minutes or so] was basically “an explanation by a non-expert on how the internet uses public key” (-cryptography). The last 20 minutes cover, again in her own words, “more theoretical aspects”.

Some links:

Hypertext Transfer Protocol (HTTP). HTTPS.
Project Athena. Kerberos (protocol).
Pretty Good Privacy (PGP).
Secure Sockets Layer (SSL)/Transport Layer Security (TLS).
Cipher suite.
Elliptic Curve Digital Signature Algorithm (ECDSA).
Request for Comments (RFC).
Elliptic-curve Diffie–Hellman (ECDH).
The SSL/TLS Handshake: an Overview.
Advanced Encryption Standard.
Galois/Counter Mode.
XOR gate.
IP Header.
Time to live (TTL).
Transmission Control Protocol. TCP segment structure.
TLS record.
Security level.
Birthday problem. Birthday attack.
Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone). (§3.6 in particular is mentioned/referenced as this is stuff she talks about in the last ‘theoretical’ part of the lecture).


June 8, 2018 Posted by | Computer science, Cryptography, Lectures, Mathematics | Leave a comment

Mathematics in Cryptography II

Some links to stuff covered in the lecture:

Public-key cryptography.
New Directions in Cryptography (Diffie & Hellman, 1976).
The history of Non-Secret Encryption (James Ellis).
Note on “Non-Secret Encryption” – Cliff Cocks (1973).
RSA (cryptosystem).
Discrete Logarithm Problem.
Diffie–Hellman key exchange.
AES (Advanced Encryption Standard).
Triple DES.
Trusted third party (TTP).
Key management.
Man-in-the-middle attack.
Digital signature.
Public key certificate.
Secret sharing.
Hash function. Cryptographic hash function.
Secure Hash Algorithm 2 (SHA-2).
Non-repudiation (digital security).
L-notation. L (complexity).
ElGamal signature scheme.
Digital Signature Algorithm (DSA).
Schnorr signature.
Identity-based cryptography.
Identity-Based Cryptosystems and Signature Schemes (Adi Shamir, 1984).
Algorithms for Quantum Computation: Discrete Logarithms and Factoring (Peter Shor, 1994).
Quantum resistant cryptography.
Elliptic curve. Elliptic-curve cryptography.
Projective space.

I have included very few links relating to the topics covered in the last part of the lecture. This was deliberate and not just a result of the type of coverage included in that part of the lecture. In my opinion non-mathematicians should probably skip the last 25 minutes or so as they’re – not only due to technical issues (the lecturer is writing stuff on the blackboard and for several minutes you’re unable to see what she’s writing, which is …unfortunate), but those certainly were not helping – not really worth the effort. The first hour of the lecture is great, the last 25 minutes are, well, less great, in my opinion. You should however not miss the first part of the coverage of ECC-related stuff (in particular the coverage ~55-58 minutes in), if you’re interested in making sense of how ECC works; I certainly found that part of the coverage very helpful.

June 2, 2018 Posted by | Computer science, Cryptography, Lectures, Mathematics, Papers | Leave a comment

Computers, People and the Real World

“An exploration of some spectacular failures of modern day computer-aided systems, which fail to take into account the real-world […] Almost nobody wants an IT system. What they want is a better way of doing something, whether that is buying and selling shares on the Stock Exchange, flying an airliner or running a hospital. So the system they want will usually involve changes to the way people work, and interactions with physical objects and the environment. Drawing on examples including the new programme for IT in the NHS, this lecture explores what can go wrong when business change is mistakenly viewed as an IT project.” (Quote from the video description on youtube).

Some links related to the lecture coverage:
Computer-aided dispatch.
London Ambulance Service – computerization.
Report of the Inquiry Into The London Ambulance Service (February 1993).
Sociotechnical system.
Tay (bot).

A few observations/quotes from the lecture (-notes):

The bidder who least understands the complexity of a requirement is likely to put in the lowest bid.
“It is a mistake to use a computer system to impose new work processes on under-trained or reluctant staff. – Front line staff are often best placed to judge what is practical.” [A quote from later in the lecture [~36 mins] is even more explicit: “The experts in any work process are usually the people who have been carrying it out.”]
“It is important to understand that in any system implementation the people factor is as important, and arguably more important, than the technical infrastructure.” (This last one is a full quote from the report linked above; the lecture includes a shortened version – US) [Quotes and observations above from ~16 minute mark unless otherwise noted]

“There is no such thing as an IT project”
“(almost) every significant “IT Project” is actually a business change project that is enabled and supported by one or more IT systems. Business processes are expensive to change. The business changes take at least as long and cost as much as the new IT system, and need at least as much planning and management” [~29 mins]

“Software packages are packaged business processes
*Changing a package to fit the way you want to work can cost more than writing new software” [~31-32 mins]

“Most computer systems interact with people: the sociotechnical view is that the people and the IT are two components of a larger system. Designing that larger system is the real task.” [~36 mins]

May 31, 2018 Posted by | Computer science, Economics, Engineering, Lectures | Leave a comment

On the cryptographic hardness of finding a Nash equilibrium

I found it annoying that you generally can’t really hear the questions posed by the audience (which includes people like Avi Wigderson), especially considering that there are quite a few of these, especially in the middle section of the lecture. There are intermittent issues with the camera’s focus occasionally throughout the talk, but those are all transitory problems that should not keep you from watching the lecture. The sound issue at the beginning of the talk is resolved after 40 seconds.

One important take-away from this talk, if you choose not to watch it: “to date, there is no known efficient algorithm to find Nash equilibrium in games”. In general this paper – coauthored by the lecturer – seems from a brief skim to cover many of the topics also included in the lecture. I have added some other links to articles and topics covered/mentioned in the lecture below.

Nash’s Existence Theorem.
Reducibility Among Equilibrium Problems (Goldberg & Papadimitriou).
Three-Player Games Are Hard (Daskalakis & Papadimitriou).
3-Nash is PPAD-Complete (Chen & Deng).
PPAD (complexity).
On the (Im)possibility of Obfuscating Programs (Barak et al.).
On the Impossibility of Obfuscation with Auxiliary Input (Goldwasser & Kalai).
On Best-Possible Obfuscation (Goldwasser & Rothblum).
Functional Encryption without Obfuscation (Garg et al.).
On the Complexity of the Parity Argument and Other Inefficient Proofs of Existence (Papadimitriou).
Pseudorandom function family.
Revisiting the Cryptographic Hardness of Finding a Nash Equilibrium (Garg, Pandei & Srinivasan).
Constrained Pseudorandom Functions and Their Applications (Boneh & Waters).
Delegatable Pseudorandom Functions and Applications (Kiayias et al.).
Functional Signatures and Pseudorandom Functions (Boyle, Goldwasser & Ivan).
Universal Constructions and Robust Combiners for Indistinguishability Obfuscation and Witness Encryption (Ananth et al.).

April 18, 2018 Posted by | Computer science, Cryptography, Game theory, Lectures, Mathematics, Papers | Leave a comment

The Internet of Things


Some links to stuff he talks about in the lecture:

The Internet of Things: making the most of the Second Digital Revolution – A report by the UK Government Chief Scientific Adviser.
South–North Water Transfer Project.
FDA approves first smart pill that tracks drug regimen compliance from the inside.
The Internet of Things (IoT)* units installed base by category from 2014 to 2020.
Share of the IoT market by sub-sector worldwide in 2017.
San Diego to Cover Half the City with Intelligent Streetlights.
IPv4 and IPv6 (specifically, he talks a little about the IPv4 address space problem).
General Data Protection Regulation (GDPR).
Shodan (website).
Mirai botnet.
Gait analysis.
Website reveals 73,000 unprotected security cameras with default passwords. (This was just an example link – it’s unclear if the site he used to illustrate his point in that part of the lecture was actually Insecam, but he does talk about the widespread use of default passwords and related security implications during the lecture).
Strava’s fitness heatmaps are a ‘potential catastrophe’.
‘Secure by Design’ (a very recently published proposed UK IoT code of practice).

March 26, 2018 Posted by | Computer science, Engineering, Lectures | Leave a comment

The Computer

Below some quotes and links related to the book‘s coverage:

“At the heart of every computer is one or more hardware units known as processors. A processor controls what the computer does. For example, it will process what you type in on your computer’s keyboard, display results on its screen, fetch web pages from the Internet, and carry out calculations such as adding two numbers together. It does this by ‘executing’ a computer program that details what the computer should do […] Data and programs are stored in two storage areas. The first is known as main memory and has the property that whatever is stored there can be retrieved very quickly. Main memory is used for transient data – for example, the result of a calculation which is an intermediate result in a much bigger calculation – and is also used to store computer programs while they are being executed. Data in main memory is transient – it will disappear when the computer is switched off. Hard disk memory, also known as file storage or backing storage, contains data that are required over a period of time. Typical entities that are stored in this memory include files of numerical data, word-processed documents, and spreadsheet tables. Computer programs are also stored here while they are not being executed. […] There are a number of differences between main memory and hard disk memory. The first is the retrieval time. With main memory, an item of data can be retrieved by the processor in fractions of microseconds. With file-based memory, the retrieval time is much greater: of the order of milliseconds. The reason for this is that main memory is silicon-based […] hard disk memory is usually mechanical and is stored on the metallic surface of a disk, with a mechanical arm retrieving the data. […] main memory is more expensive than file-based memory”.

The Internet is a network of computers – strictly, it is a network that joins up a number of networks. It carries out a number of functions. First, it transfers data from one computer to another computer […] The second function of the Internet is to enforce reliability. That is, to ensure that when errors occur then some form of recovery process happens; for example, if an intermediate computer fails then the software of the Internet will discover this and resend any malfunctioning data via other computers. A major component of the Internet is the World Wide Web […] The web […] uses the data-transmission facilities of the Internet in a specific way: to store and distribute web pages. The web consists of a number of computers known as web servers and a very large number of computers known as clients (your home PC is a client). Web servers are usually computers that are more powerful than the PCs that are normally found in homes or those used as office computers. They will be maintained by some enterprise and will contain individual web pages relevant to that enterprise; for example, an online book store such as Amazon will maintain web pages for each item it sells. The program that allows users to access the web is known as a browser. […] A part of the Internet known as the Domain Name System (usually referred to as DNS) will figure out where the page is held and route the request to the web server holding the page. The web server will then send the page back to your browser which will then display it on your computer. Whenever you want another page you would normally click on a link displayed on that page and the process is repeated. Conceptually, what happens is simple. However, it hides a huge amount of detail involving the web discovering where pages are stored, the pages being located, their being sent, the browser reading the pages and interpreting how they should be displayed, and eventually the browser displaying the pages. […] without one particular hardware advance the Internet would be a shadow of itself: this is broadband. This technology has provided communication speeds that we could not have dreamed of 15 years ago. […] Typical broadband speeds range from one megabit per second to 24 megabits per second, the lower rate being about 20 times faster than dial-up rates.”

“A major idea I hope to convey […] is that regarding the computer as just the box that sits on your desk, or as a chunk of silicon that is embedded within some device such as a microwave, is only a partial view. The Internet – or rather broadband access to the Internet – has created a gigantic computer that has unlimited access to both computer power and storage to the point where even applications that we all thought would never migrate from the personal computer are doing just that. […] the Internet functions as a series of computers – or more accurately computer processors – carrying out some task […]. Conceptually, there is little difference between these computers and [a] supercomputer, the only difference is in the details: for a supercomputer the communication between processors is via some internal electronic circuit, while for a collection of computers working together on the Internet the communication is via external circuits used for that network.”

“A computer will consist of a number of electronic circuits. The most important is the processor: this carries out the instructions that are contained in a computer program. […] There are a number of individual circuit elements that make up the computer. Thousands of these elements are combined together to construct the computer processor and other circuits. One basic element is known as an And gate […]. This is an electrical circuit that has two binary inputs A and B and a single binary output X. The output will be one if both the inputs are one and zero otherwise. […] the And gate is only one example – when some action is required, for example adding two numbers together, [the different circuits] interact with each other to carry out that action. In the case of addition, the two binary numbers are processed bit by bit to carry out the addition. […] Whatever actions are taken by a program […] the cycle is the same; an instruction is read into the processor, the processor decodes the instruction, acts on it, and then brings in the next instruction. So, at the heart of a computer is a series of circuits and storage elements that fetch and execute instructions and store data and programs.”

“In essence, a hard disk unit consists of one or more circular metallic disks which can be magnetized. Each disk has a very large number of magnetizable areas which can either represent zero or one depending on the magnetization. The disks are rotated at speed. The unit also contains an arm or a number of arms that can move laterally and which can sense the magnetic patterns on the disk. […] When a processor requires some data that is stored on a hard disk […] then it issues an instruction to find the file. The operating system – the software that controls the computer – will know where the file starts and ends and will send a message to the hard disk to read the data. The arm will move laterally until it is over the start position of the file and when the revolving disk passes under the arm the magnetic pattern that represents the data held in the file is read by it. Accessing data on a hard disk is a mechanical process and usually takes a small number of milliseconds to carry out. Compared with the electronic speeds of the computer itself – normally measured in fractions of a microsecond – this is incredibly slow. Because disk access is slow, systems designers try to minimize the amount of access required to files. One technique that has been particularly effective is known as caching. It is, for example, used in web servers. Such servers store pages that are sent to browsers for display. […] Caching involves placing the frequently accessed pages in some fast storage medium such as flash memory and keeping the remainder on a hard disk.”

“The first computers had a single hardware processor that executed individual instructions. It was not too long before researchers started thinking about computers that had more than one processor. The simple theory here was that if a computer had n processors then it would be n times faster. […] it is worth debunking this notion. If you look at many classes of problems […], you see that a strictly linear increase in performance is not achieved. If a problem that is solved by a single computer is solved in 20 minutes, then you will find a dual processor computer solving it in perhaps 11 minutes. A 3-processor computer may solve it in 9 minutes, and a 4-processor computer in 8 minutes. There is a law of diminishing returns; often, there comes a point when adding a processor slows down the computation. What happens is that each processor needs to communicate with the others, for example passing on the result of a computation; this communicational overhead becomes bigger and bigger as you add processors to the point when it dominates the amount of useful work that is done. The sort of problems where they are effective is where a problem can be split up into sub-problems that can be solved almost independently by each processor with little communication.”

Symmetric encryption methods are very efficient and can be used to scramble large files or long messages being sent from one computer to another. Unfortunately, symmetric techniques suffer from a major problem: if there are a number of individuals involved in a data transfer or in reading a file, each has to know the same key. This makes it a security nightmare. […] public key cryptography removed a major problem associated with symmetric cryptography: that of a large number of keys in existence some of which may be stored in an insecure way. However, a major problem with asymmetric cryptography is the fact that it is very inefficient (about 10,000 times slower than symmetric cryptography): while it can be used for short messages such as email texts, it is far too inefficient for sending gigabytes of data. However, […] when it is combined with symmetric cryptography, asymmetric cryptography provides very strong security. […] One very popular security scheme is known as the Secure Sockets Layer – normally shortened to SSL. It is based on the concept of a one-time pad. […] SSL uses public key cryptography to communicate the randomly generated key between the sender and receiver of a message. This key is only used once for the data interchange that occurs and, hence, is an electronic analogue of a one-time pad. When each of the parties to the interchange has received the key, they encrypt and decrypt the data employing symmetric cryptography, with the generated key carrying out these processes. […] There is an impression amongst the public that the main threats to security and to privacy arise from technological attack. However, the threat from more mundane sources is equally high. Data thefts, damage to software and hardware, and unauthorized access to computer systems can occur in a variety of non-technical ways: by someone finding computer printouts in a waste bin; by a window cleaner using a mobile phone camera to take a picture of a display containing sensitive information; by an office cleaner stealing documents from a desk; by a visitor to a company noting down a password written on a white board; by a disgruntled employee putting a hammer through the main server and the backup server of a company; or by someone dropping an unencrypted memory stick in the street.”

“The basic architecture of the computer has remained unchanged for six decades since IBM developed the first mainframe computers. It consists of a processor that reads software instructions one by one and executes them. Each instruction will result in data being processed, for example by being added together; and data being stored in the main memory of the computer or being stored on some file-storage medium; or being sent to the Internet or to another computer. This is what is known as the von Neumann architecture; it was named after John von Neumann […]. His key idea, which still holds sway today, is that in a computer the data and the program are both stored in the computer’s memory in the same address space. There have been few challenges to the von Neumann architecture.”

[A] ‘neural network‘ […] consists of an input layer that can sense various signals from some environment […]. In the middle (hidden layer), there are a large number of processing elements (neurones) which are arranged into sub-layers. Finally, there is an output layer which provides a result […]. It is in the middle layer that the work is done in a neural computer. What happens is that the network is trained by giving it examples of the trend or item that is to be recognized. What the training does is to strengthen or weaken the connections between the processing elements in the middle layer until, when combined, they produce a strong signal when a new case is presented to them that matches the previously trained examples and a weak signal when an item that does not match the examples is encountered. Neural networks have been implemented in hardware, but most of the implementations have been via software where the middle layer has been implemented in chunks of code that carry out the learning process. […] although the initial impetus was to use ideas in neurobiology to develop neural architectures based on a consideration of processes in the brain, there is little resemblance between the internal data and software now used in commercial implementations and the human brain.”


Byte. Bit.
Moore’s law.
Computer program.
Programming language. High-level programming language. Low-level programming language.
Zombie (computer science).
Cloud computing.
Instructions per second.
Fetch-execute cycle.
Grace Hopper. Software Bug.
Transistor. Integrated circuit. Very-large-scale integration. Wafer (electronics). Photomask.
Read-only memory (ROM). Read-write memory (RWM). Bus (computing). Address bus. Programmable read-only memory (PROM). Erasable programmable read-only memory (EPROM). Electrically erasable programmable read-only memory (EEPROM). Flash memory. Dynamic random-access memory (DRAM). Static random-access memory (static RAM/SRAM).
Hard disc.
Wireless communication.
Radio-frequency identification (RFID).
NP-hardness. Set partition problem. Bin packing problem.
Cray X-MP. Beowulf cluster.
Vector processor.
Denial-of-service attack. Melissa (computer virus). Malware. Firewall (computing). Logic bomb. Fork bomb/rabbit virus. Cryptography. Caesar cipher. Social engineering (information security).
Application programming interface.
Data mining. Machine translation. Machine learning.
Functional programming.
Quantum computing.

March 19, 2018 Posted by | Books, Computer science, Cryptography, Engineering | Leave a comment

Safety-Critical Systems

Some related links to topics covered in the lecture:

Safety-critical system.
Safety engineering.
Fault tree analysis.
Failure mode and effects analysis.
Value of a statistical life.
ALARP principle.
Hazards and Risk (HSA).
Software system safety.
Aleatoric and epistemic uncertainty.
N-version programming.
An experimental evaluation of the assumption of independence in multiversion programming (Knight & Leveson).
Safety integrity level.
Software for Dependable Systems – Sufficient Evidence? (consensus study report).

March 15, 2018 Posted by | Computer science, Economics, Engineering, Lectures, Statistics | Leave a comment

Some things you need to know about machine learning but didn’t know whom to ask (the grad school version)

Some links to stuff related to the lecture’s coverage:
An overview of gradient descent optimization algorithms.
Rectifier (neural networks) [Relu].
Escaping From Saddle Points – Online Stochastic Gradient for Tensor Decomposition (Ge et al.).
How to Escape Saddle Points Efficiently (closely related to the paper above, presumably one of the ‘recent improvements’ mentioned in the lecture).
Linear classifier.
Concentration inequality.
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks (Neyshabur et al.).
Off the convex path (the lecturer’s blog).

February 19, 2018 Posted by | Computer science, Lectures, Mathematics | Leave a comment


Complexity theory is a topic I’ve previously been exposed to through various channels; examples include Institute for Advanced Studies comp sci lectures, notes included in a few computer science-related books like Louridas and Dasgupta, and probably also e.g. some of the systems analysis/-science books I’ve read – Konieczny et al.’s text which I recently finished reading is another example of a book which peripherally covers content also covered in this book. Holland’s book pretty much doesn’t cover computational complexity theory at all, but some knowledge of computer science will probably still be useful as e.g. concepts from graph theory are touched upon/applied in the coverage; I am also aware that I derived some benefit while reading this book from having previously spent time on signalling models in microeconomics, as there were conceptual similarities between those models and their properties and some of the stuff Holland includes. I’m not really sure if you need to know ‘anything’ to read the book and get something out of it, but although Holland doesn’t use much mathematical formalism some of the ‘hidden’ formalism lurking in the background will probably not be easy to understand if you e.g. haven’t seen a mathematical equation since the 9th grade, and people who e.g. have seen hierarchical models before will definitely have a greater appreciation of some of the material covered than people who have not. Obviously I’ve read a lot of stuff over time that made the book easier for me to read and understand than it otherwise would have been, but how easy would the book have been for me to read if I hadn’t read those other things? It’s really difficult for me to say. I found the book hard to judge/rate/evaluate, so I decided against rating it on goodreads.

Below I have added some quotes from the book.

“[C]omplex systems exhibits a distinctive property called emergence, roughly described by the common phrase ‘the action of the whole is more than the sum of the actions of the parts’. In addition to complex systems, there is a subfield of computer science, called computational complexity, which concerns itself with the difficulty of solving different kinds of problems. […] The object of the computational complexity subfield is to assign levels of difficulty — levels of complexity — to different collections of problems. There are intriguing conjectures about these levels of complexity, but an understanding of the theoretical framework requires a substantial background in theoretical computer science — enough to fill an entire book in this series. For this reason, and because computational complexity does not touch upon emergence, I will confine this book to systems and the ways in which they exhibit emergence. […] emergent behaviour is an essential requirement for calling a system ‘complex’. […] Hierarchical organization is […] closely tied to emergence. Each level of a hierarchy typically is governed by its own set of laws. For example, the laws of the periodic table govern the combination of hydrogen and oxygen to form H2O molecules, while the laws of fluid flow (such as the Navier-Stokes equations) govern the behaviour of water. The laws of a new level must not violate the laws of earlier levels — that is, the laws at lower levels constrain the laws at higher levels. […] Restated for complex systems: emergent properties at any level must be consistent with interactions specified at the lower level(s). […] Much of the motivation for treating a system as complex is to get at questions that would otherwise remain inaccessible. Often the first steps in acquiring a deeper understanding are through comparisons of similar systems. By treating hierarchical organization as sine qua non for complexity we focus on the interactions of emergent properties at various levels. The combination of ‘top–down’ effects (as when the daily market average affects actions of the buyers and sellers in an equities market) and ‘bottom–up’ effects (the interactions of the buyers and sellers determine the market average) is a pervasive feature of complex systems. The present exposition, then, centres on complex systems where emergence, and the reduction(s) involved, offer a key to new kinds of understanding.”

“As the field of complexity studies has developed, it has split into two subfields that examine two different kinds of emergence: the study of complex physical systems (CPS) and the study of complex adaptive systems (CAS): The study of complex physical systems focuses on geometric (often lattice-like) arrays of elements, in which interactions typically depend only on effects propagated from nearest neighbours. […] the study of CPS has a distinctive set of tools and questions centring on elements that have fixed properties – atoms, the squares of the cellular automaton, and the like. […] The tools used for studying CPS come, with rare exceptions, from a well-developed part of mathematics, the theory of partial differential equations […] CAS studies, in contrast to CPS studies, concern themselves with elements that are not fixed. The elements, usually called agents, learn or adapt in response to interactions with other agents. […] It is unusual for CAS agents to converge, even momentarily, to a single ‘optimal’ strategy, or to an equilibrium. As the agents adapt to each other, new agents with new strategies usually emerge. Then each new agent offers opportunities for still further interactions, increasing the overall complexity. […] The complex feedback loops that form make it difficult to analyse, or even describe, CAS. […] Analysis of complex systems almost always turns on finding recurrent patterns in the system’s ever-changing configurations. […] perpetual novelty, produced with a limited number of rules or laws, is a characteristic of most complex systems: DNA consists of strings of the same four nucleotides, yet no two humans are exactly alike; the theorems of Euclidian geometry are based on just five axioms, yet new theorems are still being derived after two millenia; and so it is for the other complex systems.”

“In a typical physical system the whole is (at least approximately) the sum of the parts, making the use of PDEs straightforward for a mathematician, but in a typical generated system the parts are put together in an interconnected, non-additive way. It is possible to write a concise set of partial differential equations to describe the basic elements of a computer, say an interconnected set of binary counters, but the existing theory of PDEs does little to increase our understanding of the circuits so-described. The formal grammar approach, in contrast, has already considerably increased our understanding of computer languages and programs. One of the major tasks of this book is to use a formal grammar to convert common features of complex systems into ‘stylized facts’ that can be examined carefully within the grammar.”

“Many CPS problems (e.g. the flow of electrons in superconductive materials) […] involve flows — flows that are nicely described by networks. Networks provide a detailed snapshot of CPS and complex adaptive systems (CAS) interactions at any given point in their development, but there are few studies of the evolution of networks […]. The distinction between the fast dynamic of flows (change of state) and the slow dynamic of adaptation (change of the network of interactions) often distinguishes CPS studies from CAS studies. […] all well-studied CAS exhibit lever points, points where a small directed action causes large predictable changes in aggregate behaviour, as when a vaccine produces long-term changes in an immune system. At present, lever points are almost always located by trial and error. However, by extracting mechanisms common to different lever points, a relevant CAS theory would provide a principled way of locating and testing lever points. […] activities that are easy to observe in one complex system often suggest ‘where to look’ in other complex systems where the activities are difficult to observe.”

“Observation shows that agents acting in a niche continually undergo ‘improvements’, without ever completely outcompeting other agents in the community. These improvements may come about in either of two ways: (i) an agent may become more of a generalist, processing resources from a wider variety of sources, or (ii) it may become more specialized, becoming more efficient than its competitors at exploiting a particular source of a vital resource. Both changes allow for still more interactions and still greater diversity. […] All CAS that have been examined closely exhibit trends toward increasing numbers of specialists.”

“Emergence is tightly tied to the formation of boundaries. These boundaries can arise from symmetry breaking, […] or they can arise by assembly of component building blocks […]. For CAS, the agent-defining boundaries determine the interactions between agents. […] Adaptation, and the emergence of new kinds of agents, then arises from changes in the relevant boundaries. Typically, a boundary only looks to a small segment of a signal, a tag, to determine whether or not the signal can pass through the boundary. […] an agent can be modelled by a set of conditional IF/THEN rules that represent both the effects of boundaries and internal signal-processing. Because tags are short, a given signal may carry multiple tags, and the rules that process signals can require the presence of more than one tag for the processing to proceed. Agents are parallel processors in the sense that all rules that are satisfied simultaneously in the agent are executed simultaneously. As a result, the interior of an agent will usually be filled with multiple signals […]. The central role of tags in routing signals through this complex interior puts emphasis on the mechanisms for tag modification as a means of adaptation. Recombination of extant conditions and signals […] turns tags into building blocks for specifying new routes. Parallel processing then makes it possible to test new routes so formed without seriously disrupting extant useful routes. Sophisticated agents have another means of adaptation: anticipation (‘lookahead’). If an agent has a set of rules that simulates part of its world, then it can run this internal model to examine the outcomes of different action sequences before those actions are executed.”

“The flow of signals within and between agents can be represented by a directed network, where nodes represent rules, and there is a connection from node x to node y if rule x sends a signal satisfying a condition of rule y. Then, the flow of signals over this network spells out the performance of the agent at a point in time. […] The networks associated with CAS are typically highly tangled, with many loops providing feedback and recirculation […]. An agent adapts by changing its signal-processing rules, with corresponding changes in the structure of the associated network. […] Most machine-learning models, including ‘artificial neural networks’ and ‘Bayesian networks’, lack feedback cycles — they are often called ‘feedforward networks’ (in contrast to networks with substantial feedback). In the terms used in Chapter 4, such networks have no ‘recirculation’ and hence have no autonomous subsystems. Networks with substantial numbers of cycles are difficult to analyse, but a large number of cycles is the essential requirement for the autonomous internal models that make lookahead and planning possible. […] The complexities introduced by loops have so far resisted most attempts at analysis. […] The difficulties of analysing the behaviour of networks with many interior loops has, both historically and currently, encouraged the study of networks without loops called trees. Trees occur naturally in the study of games. […] because trees are easier to analyse, most artificial neural networks constructed for pattern recognition are trees. […] Evolutionary game theory makes use of the tree structure of games to study the ways in which agents can modify their strategies as they interact with other agents playing the same game. […] However, evolutionary game theory does not concern itself with the evolution of the game’s laws.”

“It has been observed that innovation in CAS is mostly a matter of combining well-known components in new ways. […] Recombination abets the formation of new cascades. […] By extracting general mechanisms that modify CAS, such as recombination, we go from examination of particular instances to a unified study of characteristic CAS properties. The mechanisms of interest act mainly on extant substructures, using them as building blocks for more complex substructures […]. Because signals and boundaries are a pervasive feature of CAS, their modification has a central role in this adaptive process.”

February 12, 2018 Posted by | Books, Computer science, Mathematics | Leave a comment

Random stuff

I have almost stopped posting posts like these, which has resulted in the accumulation of a very large number of links and studies which I figured I might like to blog at some point. This post is mainly an attempt to deal with the backlog – I won’t cover the material in too much detail.

i. Do Bullies Have More Sex? The answer seems to be a qualified yes. A few quotes:

“Sexual behavior during adolescence is fairly widespread in Western cultures (Zimmer-Gembeck and Helfland 2008) with nearly two thirds of youth having had sexual intercourse by the age of 19 (Finer and Philbin 2013). […] Bullying behavior may aid in intrasexual competition and intersexual selection as a strategy when competing for mates. In line with this contention, bullying has been linked to having a higher number of dating and sexual partners (Dane et al. 2017; Volk et al. 2015). This may be one reason why adolescence coincides with a peak in antisocial or aggressive behaviors, such as bullying (Volk et al. 2006). However, not all adolescents benefit from bullying. Instead, bullying may only benefit adolescents with certain personality traits who are willing and able to leverage bullying as a strategy for engaging in sexual behavior with opposite-sex peers. Therefore, we used two independent cross-sectional samples of older and younger adolescents to determine which personality traits, if any, are associated with leveraging bullying into opportunities for sexual behavior.”

“…bullying by males signal the ability to provide good genes, material resources, and protect offspring (Buss and Shackelford 1997; Volk et al. 2012) because bullying others is a way of displaying attractive qualities such as strength and dominance (Gallup et al. 2007; Reijntjes et al. 2013). As a result, this makes bullies attractive sexual partners to opposite-sex peers while simultaneously suppressing the sexual success of same-sex rivals (Gallup et al. 2011; Koh and Wong 2015; Zimmer-Gembeck et al. 2001). Females may denigrate other females, targeting their appearance and sexual promiscuity (Leenaars et al. 2008; Vaillancourt 2013), which are two qualities relating to male mate preferences. Consequently, derogating these qualities lowers a rivals’ appeal as a mate and also intimidates or coerces rivals into withdrawing from intrasexual competition (Campbell 2013; Dane et al. 2017; Fisher and Cox 2009; Vaillancourt 2013). Thus, males may use direct forms of bullying (e.g., physical, verbal) to facilitate intersexual selection (i.e., appear attractive to females), while females may use relational bullying to facilitate intrasexual competition, by making rivals appear less attractive to males.”

The study relies on the use of self-report data, which I find very problematic – so I won’t go into the results here. I’m not quite clear on how those studies mentioned in the discussion ‘have found self-report data [to be] valid under conditions of confidentiality’ – and I remain skeptical. You’ll usually want data from independent observers (e.g. teacher or peer observations) when analyzing these kinds of things. Note in the context of the self-report data problem that if there’s a strong stigma associated with being bullied (there often is, or bullying wouldn’t work as well), asking people if they have been bullied is not much better than asking people if they’re bullying others.

ii. Some topical advice that some people might soon regret not having followed, from the wonderful Things I Learn From My Patients thread:

“If you are a teenage boy experimenting with fireworks, do not empty the gunpowder from a dozen fireworks and try to mix it in your mother’s blender. But if you do decide to do that, don’t hold the lid down with your other hand and stand right over it. This will result in the traumatic amputation of several fingers, burned and skinned forearms, glass shrapnel in your face, and a couple of badly scratched corneas as a start. You will spend months in rehab and never be able to use your left hand again.”

iii. I haven’t talked about the AlphaZero-Stockfish match, but I was of course aware of it and did read a bit about that stuff. Here’s a reddit thread where one of the Stockfish programmers answers questions about the match. A few quotes:

“Which of the two is stronger under ideal conditions is, to me, neither particularly interesting (they are so different that it’s kind of like comparing the maximum speeds of a fish and a bird) nor particularly important (since there is only one of them that you and I can download and run anyway). What is super interesting is that we have two such radically different ways to create a computer chess playing entity with superhuman abilities. […] I don’t think there is anything to learn from AlphaZero that is applicable to Stockfish. They are just too different, you can’t transfer ideas from one to the other.”

“Based on the 100 games played, AlphaZero seems to be about 100 Elo points stronger under the conditions they used. The current development version of Stockfish is something like 40 Elo points stronger than the version used in Google’s experiment. There is a version of Stockfish translated to hand-written x86-64 assembly language that’s about 15 Elo points stronger still. This adds up to roughly half the Elo difference between AlphaZero and Stockfish shown in Google’s experiment.”

“It seems that Stockfish was playing with only 1 GB for transposition tables (the area of memory used to store data about the positions previously encountered in the search), which is way too little when running with 64 threads.” [I seem to recall a comp sci guy observing elsewhere that this was less than what was available to his smartphone version of Stockfish, but I didn’t bookmark that comment].

“The time control was a very artificial fixed 1 minute/move. That’s not how chess is traditionally played. Quite a lot of effort has gone into Stockfish’s time management. It’s pretty good at deciding when to move quickly, and when to spend a lot of time on a critical decision. In a fixed time per move game, it will often happen that the engine discovers that there is a problem with the move it wants to play just before the time is out. In a regular time control, it would then spend extra time analysing all alternative moves and trying to find a better one. When you force it to move after exactly one minute, it will play the move it already know is bad. There is no doubt that this will cause it to lose many games it would otherwise have drawn.”

iv. Thrombolytics for Acute Ischemic Stroke – no benefit found.

“Thrombolysis has been rigorously studied in >60,000 patients for acute thrombotic myocardial infarction, and is proven to reduce mortality. It is theorized that thrombolysis may similarly benefit ischemic stroke patients, though a much smaller number (8120) has been studied in relevant, large scale, high quality trials thus far. […] There are 12 such trials 1-12. Despite the temptation to pool these data the studies are clinically heterogeneous. […] Data from multiple trials must be clinically and statistically homogenous to be validly pooled.14 Large thrombolytic studies demonstrate wide variations in anatomic stroke regions, small- versus large-vessel occlusion, clinical severity, age, vital sign parameters, stroke scale scores, and times of administration. […] Examining each study individually is therefore, in our opinion, both more valid and more instructive. […] Two of twelve studies suggest a benefit […] In comparison, twice as many studies showed harm and these were stopped early. This early stoppage means that the number of subjects in studies demonstrating harm would have included over 2400 subjects based on originally intended enrollments. Pooled analyses are therefore missing these phantom data, which would have further eroded any aggregate benefits. In their absence, any pooled analysis is biased toward benefit. Despite this, there remain five times as many trials showing harm or no benefit (n=10) as those concluding benefit (n=2), and 6675 subjects in trials demonstrating no benefit compared to 1445 subjects in trials concluding benefit.”

“Thrombolytics for ischemic stroke may be harmful or beneficial. The answer remains elusive. We struggled therefore, debating between a ‘yellow’ or ‘red’ light for our recommendation. However, over 60,000 subjects in trials of thrombolytics for coronary thrombosis suggest a consistent beneficial effect across groups and subgroups, with no studies suggesting harm. This consistency was found despite a very small mortality benefit (2.5%), and a very narrow therapeutic window (1% major bleeding). In comparison, the variation in trial results of thrombolytics for stroke and the daunting but consistent adverse effect rate caused by ICH suggested to us that thrombolytics are dangerous unless further study exonerates their use.”

“There is a Cochrane review that pooled estimates of effect. 17 We do not endorse this choice because of clinical heterogeneity. However, we present the NNT’s from the pooled analysis for the reader’s benefit. The Cochrane review suggested a 6% reduction in disability […] with thrombolytics. This would mean that 17 were treated for every 1 avoiding an unfavorable outcome. The review also noted a 1% increase in mortality (1 in 100 patients die because of thrombolytics) and a 5% increase in nonfatal intracranial hemorrhage (1 in 20), for a total of 6% harmed (1 in 17 suffers death or brain hemorrhage).”

v. Suicide attempts in Asperger Syndrome. An interesting finding: “Over 35% of individuals with AS reported that they had attempted suicide in the past.”

Related: Suicidal ideation and suicide plans or attempts in adults with Asperger’s syndrome attending a specialist diagnostic clinic: a clinical cohort study.

“374 adults (256 men and 118 women) were diagnosed with Asperger’s syndrome in the study period. 243 (66%) of 367 respondents self-reported suicidal ideation, 127 (35%) of 365 respondents self-reported plans or attempts at suicide, and 116 (31%) of 368 respondents self-reported depression. Adults with Asperger’s syndrome were significantly more likely to report lifetime experience of suicidal ideation than were individuals from a general UK population sample (odds ratio 9·6 [95% CI 7·6–11·9], p<0·0001), people with one, two, or more medical illnesses (p<0·0001), or people with psychotic illness (p=0·019). […] Lifetime experience of depression (p=0·787), suicidal ideation (p=0·164), and suicide plans or attempts (p=0·06) did not differ significantly between men and women […] Individuals who reported suicide plans or attempts had significantly higher Autism Spectrum Quotient scores than those who did not […] Empathy Quotient scores and ages did not differ between individuals who did or did not report suicide plans or attempts (table 4). Patients with self-reported depression or suicidal ideation did not have significantly higher Autism Spectrum Quotient scores, Empathy Quotient scores, or age than did those without depression or suicidal ideation”.

The fact that people with Asperger’s are more likely to be depressed and contemplate suicide is consistent with previous observations that they’re also more likely to die from suicide – for example a paper I blogged a while back found that in that particular (large Swedish population-based cohort-) study, people with ASD were more than 7 times as likely to die from suicide than were the comparable controls.

Also related: Suicidal tendencies hard to spot in some people with autism.

This link has some great graphs and tables of suicide data from the US.

Also autism-related: Increased perception of loudness in autism. This is one of the ‘important ones’ for me personally – I am much more sound-sensitive than are most people.

vi. Early versus Delayed Invasive Intervention in Acute Coronary Syndromes.

“Earlier trials have shown that a routine invasive strategy improves outcomes in patients with acute coronary syndromes without ST-segment elevation. However, the optimal timing of such intervention remains uncertain. […] We randomly assigned 3031 patients with acute coronary syndromes to undergo either routine early intervention (coronary angiography ≤24 hours after randomization) or delayed intervention (coronary angiography ≥36 hours after randomization). The primary outcome was a composite of death, myocardial infarction, or stroke at 6 months. A prespecified secondary outcome was death, myocardial infarction, or refractory ischemia at 6 months. […] Early intervention did not differ greatly from delayed intervention in preventing the primary outcome, but it did reduce the rate of the composite secondary outcome of death, myocardial infarction, or refractory ischemia and was superior to delayed intervention in high-risk patients.”

vii. Some wikipedia links:

Behrens–Fisher problem.
Sailing ship tactics (I figured I had to read up on this if I were to get anything out of the Aubrey-Maturin books).
Anatomical terms of muscle.
Phatic expression (“a phatic expression […] is communication which serves a social function such as small talk and social pleasantries that don’t seek or offer any information of value.”)
Three-domain system.
Beringian wolf (featured).
Subdural hygroma.
Cayley graph.
Schur polynomial.
Solar neutrino problem.
Hadamard product (matrices).
True polar wander.
Newton’s cradle.

viii. Determinant versus permanent (mathematics – technical).

ix. Some years ago I wrote a few English-language posts about some of the various statistical/demographic properties of immigrants living in Denmark, based on numbers included in a publication by Statistics Denmark. I did it by translating the observations included in that publication, which was only published in Danish. I was briefly considering doing the same thing again when the 2017 data arrived, but I decided not to do it as I recalled that it took a lot of time to write those posts back then, and it didn’t seem to me to be worth the effort – but Danish readers might be interested to have a look at the data, if they haven’t already – here’s a link to the publication Indvandrere i Danmark 2017.

x. A banter blitz session with grandmaster Peter Svidler, who recently became the first Russian ever to win the Russian Chess Championship 8 times. He’s currently shared-second in the World Rapid Championship after 10 rounds and is now in the top 10 on the live rating list in both classical and rapid – seems like he’s had a very decent year.

xi. I recently discovered Dr. Whitecoat’s blog. The patient encounters are often interesting.

December 28, 2017 Posted by | Astronomy, autism, Biology, Cardiology, Chess, Computer science, History, Mathematics, Medicine, Neurology, Physics, Psychiatry, Psychology, Random stuff, Statistics, Studies, Wikipedia, Zoology | Leave a comment

The mystery of over-parametrization in neural networks


October 6, 2017 Posted by | Computer science, Lectures, Mathematics | Leave a comment

Interactive Coding with “Optimal” Round and Communication Blowup

The youtube description of this one was rather longer than usual, and I decided to quote it in full below:

“The problem of constructing error-resilient interactive protocols was introduced in the seminal works of Schulman (FOCS 1992, STOC 1993). These works show how to convert any two-party interactive protocol into one that is resilient to constant-fraction of error, while blowing up the communication by only a constant factor. Since these seminal works, there have been many follow-up works which improve the error rate, the communication rate, and the computational efficiency. All these works assume that in the underlying protocol, in each round, each party sends a *single* bit. This assumption is without loss of generality, since one can efficiently convert any protocol into one which sends one bit per round. However, this conversion may cause a substantial increase in *round* complexity, which is what we wish to minimize in this work. Moreover, all previous works assume that the communication complexity of the underlying protocol is *fixed* and a priori known, an assumption that we wish to remove. In this work, we consider protocols whose messages may be of *arbitrary* lengths, and where the length of each message and the length of the protocol may be *adaptive*, and may depend on the private inputs of the parties and on previous communication. We show how to efficiently convert any such protocol into another protocol with comparable efficiency guarantees, that is resilient to constant fraction of adversarial error, while blowing up both the *communication* complexity and the *round* complexity by at most a constant factor. Moreover, as opposed to most previous work, our error model not only allows the adversary to toggle with the corrupted bits, but also allows the adversary to *insert* and *delete* bits. In addition, our transformation preserves the computational efficiency of the protocol. Finally, we try to minimize the blowup parameters, and give evidence that our parameters are nearly optimal. This is joint work with Klim Efremenko and Elad Haramaty.”

A few links to stuff covered/mentioned in the lecture:

Coding for interactive communication correcting insertions and deletions.
Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes.
Common reference string model.
Small-bias probability spaces: Efficient constructions and applications.
Interactive Channel Capacity Revisited.
Collision (computer science).
Chernoff bound.

September 6, 2017 Posted by | Computer science, Cryptography, Lectures, Mathematics | Leave a comment


I gave the book two stars. Some quotes and links below.

“Lenses are ubiquitous in image-forming devices […] Imaging instruments have two components: the lens itself, and a light detector, which converts the light into, typically, an electrical signal. […] In every case the location of the lens with respect to the detector is a key design parameter, as is the focal length of the lens which quantifies its ‘ray-bending’ power. The focal length is set by the curvature of the surfaces of the lens and its thickness. More strongly curved surfaces and thicker materials are used to make lenses with short focal lengths, and these are used usually in instruments where a high magnification is needed, such as a microscope. Because the refractive index of the lens material usually depends on the colour of light, rays of different colours are bent by different amounts at the surface, leading to a focus for each colour occurring in a different position. […] lenses with a big diameter and a short focal length will produce the tiniest images of point-like objects. […] about the best you can do in any lens system you could actually make is an image size of approximately one wavelength. This is the fundamental limit to the pixel size for lenses used in most optical instruments, such as cameras and binoculars. […] Much more sophisticated methods are required to see even smaller things. The reason is that the wave nature of light puts a lower limit on the size of a spot of light. […] At the other extreme, both ground- and space-based telescopes for astronomy are very large instruments with relatively simple optical imaging components […]. The distinctive feature of these imaging systems is their size. The most distant stars are very, very faint. Hardly any of their light makes it to the Earth. It is therefore very important to collect as much of it as possible. This requires a very big lens or mirror”.

“[W]hat sort of wave is light? This was […] answered in the 19th century by James Clerk Maxwell, who showed that it is an oscillation of a new kind of entity: the electromagnetic field. This field is effectively a force that acts on electric charges and magnetic materials. […] In the early 19th century, Michael Faraday had shown the close connections between electric and magnetic fields. Maxwell brought them together, as the electromagnetic force field. […] in the wave model, light can be considered as very high frequency oscillations of the electromagnetic field. One consequence of this idea is that moving electric charges can generate light waves. […] When […] charges accelerate — that is, when they change their speed or their direction of motion — then a simple law of physics is that they emit light. Understanding this was one of the great achievements of the theory of electromagnetism.”

“It was the observation of interference effects in a famous experiment by Thomas Young in 1803 that really put the wave picture of light as the leading candidate as an explanation of the nature of light. […] It is interference of light waves that causes the colours in a thin film of oil floating on water. Interference transforms very small distances, on the order of the wavelength of light, into very big changes in light intensity — from no light to four times as bright as the individual constituent waves. Such changes in intensity are easy to detect or see, and thus interference is a very good way to measure small changes in displacement on the scale of the wavelength of light. Many optical sensors are based on interference effects.”

“[L]ight beams […] gradually diverge as they propagate. This is because a beam of light, which by definition has a limited spatial extent, must be made up of waves that propagate in more than one direction. […] This phenomenon is called diffraction. […] if you want to transmit light over long distances, then diffraction could be a problem. It will cause the energy in the light beam to spread out, so that you would need a bigger and bigger optical system and detector to capture all of it. This is important for telecommunications, since nearly all of the information transmitted over long-distance communications links is encoded on to light beams. […] The means to manage diffraction so that long-distance communication is possible is to use wave guides, such as optical fibres.”

“[O]ptical waves […] guided along a fibre or in a glass ‘chip’ […] underpins the long-distance telecommunications infrastructure that connects people across different continents and powers the Internet. The reason it is so effective is that light-based communications have much more capacity for carrying information than do electrical wires, or even microwave cellular networks. […] In optical communications, […] bits are represented by the intensity of the light beam — typically low intensity is a 0 and higher intensity a 1. The more of these that arrive per second, the faster the communication rate. […] Why is optics so good for communications? There are two reasons. First, light beams don’t easily influence each other, so that a single fibre can support many light pulses (usually of different colours) simultaneously without the messages getting scrambled up. The reason for this is that the glass of which the fibre is made does not absorb light (or only absorbs it in tiny amounts), and so does not heat up and disrupt other pulse trains. […] the ‘crosstalk’ between light beams is very weak in most materials, so that many beams can be present at once without causing a degradation of the signal. This is very different from electrons moving down a copper wire, which is the usual way in which local ‘wired’ communications links function. Electrons tend to heat up the wire, dissipating their energy. This makes the signals harder to receive, and thus the number of different signal channels has to be kept small enough to avoid this problem. Second, light waves oscillate at very high frequencies, and this allows very short pulses to be generated This means that the pulses can be spaced very close together in time, making the transmission of more bits of information per second possible. […] Fibre-based optical networks can also support a very wide range of colours of light.”

“Waves can be defined by their wavelength, amplitude, and phase […]. Particles are defined by their position and direction of travel […], and a collection of particles by their density […] and range of directions. The media in which the light moves are characterized by their refractive indices. This can vary across space. […] Hamilton showed that what was important was how rapidly the refractive index changed in space compared with the length of an optical wave. That is, if the changes in index took place on a scale of close to a wavelength, then the wave character of light was evident. If it varied more smoothly and very slowly in space then the particle picture provided an adequate description. He showed how the simpler ray picture emerges from the more complex wave picture in certain commonly encountered situations. The appearance of wave-like phenomena, such as diffraction and interference, occurs when the size scales of the wavelength of light and the structures in which it propagates are similar. […] Particle-like behaviour — motion along a well-defined trajectory — is sufficient to describe the situation when all objects are much bigger than the wavelength of light, and have no sharp edges.”

“When things are heated up, they change colour. Take a lump of metal. As it gets hotter and hotter it first glows red, then orange, and then white. Why does this happen? This question stumped many of the great scientists [in the 19th century], including Maxwell himself. The problem was that Maxwell’s theory of light, when applied to this problem, indicated that the colour should get bluer and bluer as the temperature increased, without a limit, eventually moving out of the range of human vision into the ultraviolet—beyond blue—region of the spectrum. But this does not happen in practice. […] Max Planck […] came up with an idea to explain the spectrum emitted by hot objects — so-called ‘black bodies’. He conjectured that when light and matter interact, they do so only by exchanging discrete ‘packets’, or quanta, or energy. […] this conjecture was set to radically change physics.”

“What Dirac did was to develop a quantum mechanical version of Maxwell’s theory of electromagnetic fields. […] It set the quantum field up as the fundamental entity on which the universe is built — neither particle nor wave, but both at once; complete wave–particle duality. It is a beautiful reconciliation of all the phenomena that light exhibits, and provides a framework in which to understand all optical effects, both those from the classical world of Newton, Maxwell, and Hamilton and those of the quantum world of Planck, Einstein, and Bohr. […] Light acts as a particle of more or less well-defined energy when it interacts with matter. Yet it retains its ability to exhibit wave-like phenomena at the same time. The resolution [was] a new concept: the quantum field. Light particles — photons — are excitations of this field, which propagates according to quantum versions of Maxwell’s equations for light waves. Quantum fields, of which light is perhaps the simplest example, are now regarded as being the fundamental entities of the universe, underpinning all types of material and non-material things. The only explanation is that the stuff of the world is neither particle nor wave but both. This is the nature of reality.”

Some links:

Coherence (physics).
Electromagnetic spectrum.
Joseph von Fraunhofer.
Transverse wave.
Spatial frequency.
Polarization (waves).
Specular reflection.
Negative-index metamaterial.
Interference (wave propagation).
Young’s interference experiment.
Photoactivated localization microscopy.
Stimulated emission depletion (STED) microscopy.
Fourier’s theorem (I found it hard to find a good source on this one. According to the book, “Fourier’s theorem says in simple terms that the smaller you focus light, the broader the range of wave directions you need to achieve this spot”)
X-ray diffraction.
Brewster’s angle.
Liquid crystal.
Liquid crystal display.
Wave–particle duality.
Fermat’s principle.
Maupertuis’ principle.
Johann Jakob Balmer.
Max Planck.
Photoelectric effect.
Niels Bohr.
Matter wave.
Quantum vacuum.
Lamb shift.
Light-emitting diode.
Fluorescent tube.
Synchrotron radiation.
Quantum state.
Quantum fluctuation.
Spontaneous emission/stimulated emission.
Optical cavity.
X-ray absorption spectroscopy.
Diamond Light Source.
Atomic clock.
Time dilation.
High harmonic generation.
Frequency comb.
Optical tweezers.
Bose–Einstein condensate.
Pump probe spectroscopy.
Vulcan laser.
Plasma (physics).
Nonclassical light.
Photon polarization.
Quantum entanglement.
Bell test experiments.
Quantum key distribution/Quantum cryptography/Quantum computing.

August 31, 2017 Posted by | Books, Chemistry, Computer science, Engineering, Physics | Leave a comment