Econstudentlog

Imitation Games – Avi Wigderson

If you wish to skip the introduction the talk starts at 5.20. The talk itself lasts roughly an hour, with the last ca. 20 minutes devoted to Q&A – that part is worth watching as well.

Some links related to the talk below:

Theory of computation.
Turing test.
COMPUTING MACHINERY AND INTELLIGENCE.
Probabilistic encryption & how to play mental poker keeping secret all partial information Goldwasser-Micali82.
Probabilistic algorithm
How To Generate Cryptographically Strong Sequences Of Pseudo-Random Bits (Blum&Micali, 1984)
Randomness extractor
Dense graph
Periodic sequence
Extremal graph theory
Szemerédi’s theorem
Green–Tao theorem
Szemerédi regularity lemma
New Proofs of the Green-Tao-Ziegler Dense Model Theorem: An Exposition
Calibrating Noise to Sensitivity in Private Data Analysis
Generalization in Adaptive Data Analysis and Holdout Reuse
Book: Math and Computation | Avi Wigderson
One-way function
Lattice-based cryptography

August 23, 2021 Posted by | Computer science, Cryptography, Data, Lectures, Mathematics, Science, Statistics | Leave a comment

Some observations on a cryptographic problem

It’s been a long time since I last posted one of these sort of ‘rootless’ posts which are not based on a specific book or a specific lecture or something along those lines, but a question on r/science made me think about these topics and start writing a bit about it, and I decided I might as well add my thoughts and ideas here.

The reddit question which motivated me to write this post was this one: “Is it difficult to determine the password for an encryption if you are given both the encrypted and unencrypted message?

By “difficult” I mean requiring an inordinate amount of computation. If given both an encrypted and unencrypted file/message, is it reasonable to be able to recover the password that was used to encrypt the file/message?”

Judging from the way the question is worded, the inquirer obviously knows very little about these topics, but that was part of what motivated me when I started out writing; s/he quite obviously has a faulty model of how this kind of stuff actually works, and just by virtue of the way he or she asks his/her question s/he illustrates some ways in which s/he gets things wrong.

When I decided to transfer my efforts towards discussing these topics to the blog I also implicitly decided against using language that would be expected to be easily comprehensible for the original inquirer, as s/he was no longer in the target group and there’s a cost to using that kind of language when discussing technical matters. I’ve sort of tried to make this post both useful and readable to people not all that familiar with the related fields, but I tend to find it difficult to evaluate the extent to which I’ve succeeded when I try to do things like that.

I decided against adding stuff already commented on when I started out writing this, so I’ll not e.g. repeat noiwontfixyourpc’s reply below. However I have added some other observations that seem to me to be relevant and worth mentioning to people who might consider asking a similar question to the one the original inquirer asked in that thread:

i. Finding a way to make plaintext turn into cipher text (…or cipher text into plaintext; and no, these two things are not actually always equivalent, see below…) is a very different (and in many contexts a much easier problem) than finding out the actual encryption scheme that is at work producing the text strings you observe. There can be many, many different ways to go from a specific sample of plaintext to a specific sample of ciphertext, and most of the solutions won’t work if you’re faced with a new piece of ciphertext; especially not if the original samples are small, so only a small amount of (potential) information would be expected to be included in the text strings.

If you only get a small amount of plaintext and corresponding cipher text you may decide that algorithm A is the one that was applied to the message, even if the algorithm actually applied was a more complex algorithm, B. To illustrate in a very simple way how this might happen, A might be a particular case of B, because B is a superset of A and a large number of other potential encryption algorithms applied in the encryption scheme B (…or the encryption scheme C, because B also happens to be a subset of C, or… etc.). In such a context A might be an encryption scheme/approach that perhaps only applies in very specific contexts; for example (part of) the coding algorithm might have been to decide that ‘on next Tuesday, we’ll use this specific algorithm to translate plaintext into cipher text, and we’ll never use that specific translation-/mapping algorithm (which may be but one component of the encryption algorithm) again’. If such a situation applies then you’re faced with the problem that even if your rule ‘worked’ in that particular instance, in terms of translating your plaintext into cipher text and vice versa, it only ‘worked’ because you blindly fitted the two data-sets in a way that looked right, even if you actually had no idea how the coding scheme really worked (you only guessed A, not B, and in this particular instance A’s never actually going to happen again).

On a more general level some of the above comments incidentally in my view quite obviously links to results from classical statistics; there are many ways to link random variables through data fitting methods, but reliably identifying proper causal linkages through the application of such approaches is, well, difficult (and, according to some, often ill-advised)…

ii. In my view, it does not seem possible in general to prove that any specific proposed encryption/decryption algorithm is ‘the correct one’. This is because the proposed algorithm will never be a unique solution to the problem you’re evaluating. How are you going to convince me that The True Algorithm is not a more general/complex one (or perhaps a completely different one – see iii. below) than the one you propose, and that your solution is not missing relevant variables? The only way to truly test if the proposed algorithm is a valid algorithm is to test it on new data and compare its performance on this new data set with the performances of competing variables/solution proposals which also managed to correctly link cipher text and plaintext. If the algorithm doesn’t work on the new data, you got it wrong. If it does work on new data, well, you might still just have been lucky. You might get more confident with more correctly-assessed (…guessed?) data, but you never get certain. In other similar contexts a not uncommon approach for trying to get around these sorts of problems is to limit the analysis to a subset of the data available in order to obtain the algorithm, and then using the rest of the data for validation purposes (here’s a relevant link), but here even with highly efficient estimation approaches you almost certainly will run out of information (/degrees of freedom) long before you get anywhere if the encryption algorithm is at all non-trivial. In these settings information is likely to be a limiting resource.

iii. There are many different types of encryption schemes, and people who ask questions like the one above tend, I believe, to have a quite limited view of which methods and approaches are truly available to one who desires secrecy when exchanging information with others. Imagine a situation where the plaintext is ‘See you next Wednesday’ and the encrypted text is an English translation of Tolstoy’s book War and Peace (or, to make it even more fun, all pages published on the English version of Wikipedia, say on November the 5th, 2017 at midnight GMT). That’s an available encryption approach that might be applied. It might be a part (‘A’) of a more general (‘B’) encryption approach of linking specific messages from a preconceived list of messages, which had been considered worth sending in the future when the algorithm was chosen, to specific book titles decided on in advance. So if you want to say ‘good Sunday!’, Eve gets to read the Bible and see where that gets her. You could also decide that in half of all cases the book cipher text links to specific messages from a list but in the other half of the cases what you actually mean to communicate is on page 21 of the book; this might throw a hacker who saw a combined cipher text and plaintext combination resulting from that part of the algorithm off in terms of the other half, and vice versa – and it illustrates well one of the key problems you’re faced with as an attacker when working on cryptographic schemes about which you have limited knowledge; the opponent can always add new layers on top of the ones that already exist/apply to make the problem harder to solve. And so you could also link the specific list message with some really complicated cipher-encrypted version of the Bible. There’s a lot more to encryption schemes than just exchanging a few letters here and there. On related topics, see this link. On a different if related topic, people who desire secrecy when exchanging information may also attempt to try to hide the fact that any secrets are exchanged in the first place. See also this.

iv. The specific usage of the word ‘password’ in the original query calls for comment for multiple reasons, some of which have been touched upon above, perhaps mainly because it implicitly betrays a lack of knowledge about how modern cryptographic systems actually work. The thing is, even if you might consider an encryption scheme to just be an advanced sort of ‘password’, finding the password (singular) is not always the task you’re faced with today. In symmetric-key algorithm settings you might sort-of-kind-of argue that it sort-of is – in such settings you might say that you have one single (collection of) key(s) which you use to encrypt messages and also use to decrypt the messages. So you can both encrypt and decrypt the message using the same key(s), and so you only have one ‘password’. That’s however not how asymmetric-key encryption works. As wiki puts it: “In an asymmetric key encryption scheme, anyone can encrypt messages using the public key, but only the holder of the paired private key can decrypt.”

This of course relates to what you actually want to do/achieve when you get your samples of cipher text and plaintext. In some cryptographic contexts by design the route you need to to go to get from cipher text to plaintext is conceptually different from the route you need to go to get from plaintext to cipher text. And some of the ‘passwords’ that relate to how the schemes work are public knowledge by design.

v. I have already touched a bit upon the problem of the existence of an information constraint, but I realized I probably need to spell this out in a bit more detail. The original inquirer to me seems implicitly to be under the misapprehension that computational complexity is the only limiting constraint here (“By “difficult” I mean requiring an inordinate amount of computation.”). Given the setting he or she proposes, I don’t think that’s true, and why that is is sort of interesting.

If you think about what kind of problem you’re facing, what you have here in this setting is really a very limited amount of data which relates in an unknown manner to an unknown data-generating process (‘algorithm’). There are, as has been touched upon, in general many ways to obtain linkage between two data sets (the cipher text and the plaintext) using an algorithm – too many ways for comfort, actually. The search space is large, there are too many algorithms to consider; or equivalently, the amount of information supplied by the data will often be too small for us to properly evaluate the algorithms under consideration. An important observation is that more complex algorithms will both take longer to calculate (‘identify’ …at least as candidates) and be expected to require more data to evaluate, at least to the extent that algorithmic complexity constrains the data (/relates to changes in data structure/composition that needs to be modeled in order to evaluate/identify the goal algorithm). If the algorithm says a different encryption rule is at work on Wednesdays, you’re going to have trouble figuring that out if you only got hold of a cipher text/plaintext combination derived from an exchange which took place on a Saturday. There are methods from statistics that might conceivably help you deal with problems like these, but they have their own issues and trade-offs. You might limit yourself to considering only settings where you have access to all known plaintext and cipher text combinations, so you got both Wednesday and Saturday, but even here you can’t be safe – next (metaphorical, I probably at this point need to add) Friday might be different from last (metaphorical) Friday, and this could even be baked into the algorithm in very non-obvious ways.

The above remarks might give you the idea that I’m just coming up with these kinds of suggestions to try to foil your approaches to figuring out the algorithm ‘by cheating’ (…it shouldn’t matter whether or not it was ‘sent on a Saturday’), but the main point is that a complex encryption algorithm is complex, and even if you see it applied multiple times you might not get enough information about how it works from the data suggested to be able to evaluate if you guessed right. In fact, given a combination of a sparse data set (one message, or just a few messages, in plaintext and cipher text) and a complex algorithm involving a very non-obvious mapping function, the odds are strongly against you.

vi. I had the thought that one reason why the inquirer might be confused about some of these things is that s/he might well be aware of the existence of modern cryptographic techniques which do rely to a significant extent on computational complexity aspects. I.e., here you do have settings where you’re asked to provide ‘the right answer’ (‘the password’), but it’s hard to calculate the right answer in a reasonable amount of time unless you have the relevant (private) information at hand – see e.g. these links for more. One way to think about how such a problem relates to the other problem at hand (you have been presented with samples of cipher text and plaintext and you want to guess all the details about how the encryption and decryption schemes which were applied work) is that this kind of algorithm/approach may be applied in combination with other algorithmic approaches to encrypt/decrypt the text you’re analyzing. A really tough prime factorization problem might for all we know be an embedded component of the cryptographic process that is applied to our text. We could call it A.

In such a situation we would definitely be in trouble because stuff like prime factorization is really hard and computationally complex, and to make matters worse just looking at the plaintext and the cipher text would not make it obvious to us that a prime factorization scheme had even been applied to the data. But a really important point is that even if such a tough problem was not present and even if only relatively less computationally demanding problems were involved, we almost certainly still just wouldn’t have enough information to break any semi-decent encryption algorithm based on a small sample of plaintext and cipher text. It might help a little bit, but in the setting contemplated by the inquirer a ‘faster computer’ (/…’more efficient decision algorithm’, etc.) can only help so much.

vii. Shannon and Kerckhoffs may have a point in a general setting, but in specific settings like this particular one I think it is well worth taking into account the implications of not having a (publicly) known algorithm to attack. As wiki notes (see the previous link), ‘Many ciphers are actually based on publicly known algorithms or are open source and so it is only the difficulty of obtaining the key that determines security of the system’. The above remarks were of course all based on an assumption that Eve does not here have the sort of knowledge about the encryption scheme applied that she in many cases today actually might have. There are obvious and well-known weaknesses associated with having security-associated components of a specific cryptographic scheme be independent of the key, but I do not see how it does not in this particular setting cause search space blow-up making the decision problem (did we actually guess right?) intractable in many cases. A key feature of the problem considered by the inquirer is that you here – unlike in many ‘guess the password-settings’ where for example a correct password will allow you access to an application or a document or whatever – do not get any feedback neither in the case where you guess right nor in the case where you guess wrong; it’s a decision problem, not a calculation problem. (However it is perhaps worth noting on the other hand that in a ‘standard guess-the-password-problem’ you may also sometimes implicitly face a similar decision problem due to e.g. the potential for a combination of cryptographic security and steganographic complementary strategies like e.g. these having been applied).

August 14, 2018 Posted by | Computer science, Cryptography, Data, rambling nonsense, Statistics | Leave a comment

Big Data (II)

Below I have added a few observation from the last half of the book, as well as some coverage-related links to topics of interest.

“With big data, using correlation creates […] problems. If we consider a massive dataset, algorithms can be written that, when applied, return a large number of spurious correlations that are totally independent of the views, opinions, or hypotheses of any human being. Problems arise with false correlations — for example, divorce rate and margarine consumption […]. [W]hen the number of variables becomes large, the number of spurious correlations also increases. This is one of the main problems associated with trying to extract useful information from big data, because in doing so, as with mining big data, we are usually looking for patterns and correlations. […] one of the reasons Google Flu Trends failed in its predictions was because of these problems. […] The Google Flu Trends project hinged on the known result that there is a high correlation between the number of flu-related online searches and visits to the doctor’s surgery. If a lot of people in a particular area are searching for flu-related information online, it might then be possible to predict the spread of flu cases to adjoining areas. Since the interest is in finding trends, the data can be anonymized and hence no consent from individuals is required. Using their five-year accumulation of data, which they limited to the same time-frame as the CDC data, and so collected only during the flu season, Google counted the weekly occurrence of each of the fifty million most common search queries covering all subjects. These search query counts were then compared with the CDC flu data, and those with the highest correlation were used in the flu trends model. […] The historical data provided a baseline from which to assess current flu activity on the chosen search terms and by comparing the new real-time data against this, a classification on a scale from 1 to 5, where 5 signified the most severe, was established. Used in the 2011–12 and 2012–13 US flu seasons, Google’s big data algorithm famously failed to deliver. After the flu season ended, its predictions were checked against the CDC’s actual data. […] the Google Flu Trends algorithm over-predicted the number of flu cases by at least 50 per cent during the years it was used.” [For more details on why blind/mindless hypothesis testing/p-value hunting on big data sets is usually a terrible idea, see e.g. Burnham & Anderson, US]

“The data Google used [in the Google Flu Trends algorithm], collected selectively from search engine queries, produced results [with] obvious bias […] for example by eliminating everyone who does not use a computer and everyone using other search engines. Another issue that may have led to poor results was that customers searching Google on ‘flu symptoms’ would probably have explored a number of flu-related websites, resulting in their being counted several times and thus inflating the numbers. In addition, search behaviour changes over time, especially during an epidemic, and this should be taken into account by updating the model regularly. Once errors in prediction start to occur, they tend to cascade, which is what happened with the Google Flu Trends predictions: one week’s errors were passed along to the next week. […] [Similarly,] the Ebola prediction figures published by WHO [during the West African Ebola virus epidemic] were over 50 per cent higher than the cases actually recorded. The problems with both the Google Flu Trends and Ebola analyses were similar in that the prediction algorithms used were based only on initial data and did not take into account changing conditions. Essentially, each of these models assumed that the number of cases would continue to grow at the same rate in the future as they had before the medical intervention began. Clearly, medical and public health measures could be expected to have positive effects and these had not been integrated into the model.”

“Every time a patient visits a doctor’s office or hospital, electronic data is routinely collected. Electronic health records constitute legal documentation of a patient’s healthcare contacts: details such as patient history, medications prescribed, and test results are recorded. Electronic health records may also include sensor data such as Magnetic Resonance Imaging (MRI) scans. The data may be anonymized and pooled for research purposes. It is estimated that in 2015, an average hospital in the USA will store over 600 Tb of data, most of which is unstructured. […] Typically, the human genome contains about 20,000 genes and mapping such a genome requires about 100 Gb of data. […] The interdisciplinary field of bioinformatics has flourished as a consequence of the need to manage and analyze the big data generated by genomics. […] Cloud-based systems give authorized users access to data anywhere in the world. To take just one example, the NHS plans to make patient records available via smartphone by 2018. These developments will inevitably generate more attacks on the data they employ, and considerable effort will need to be expended in the development of effective security methods to ensure the safety of that data. […] There is no absolute certainty on the Web. Since e-documents can be modified and updated without the author’s knowledge, they can easily be manipulated. This situation could be extremely damaging in many different situations, such as the possibility of someone tampering with electronic medical records. […] [S]ome of the problems facing big data systems [include] ensuring they actually work as intended, [that they] can be fixed when they break down, and [that they] are tamper-proof and accessible only to those with the correct authorization.”

“With transactions being made through sales and auction bids, eBay generates approximately 50 Tb of data a day, collected from every search, sale, and bid made on their website by a claimed 160 million active users in 190 countries. […] Amazon collects vast amounts of data including addresses, payment information, and details of everything an individual has ever looked at or bought from them. Amazon uses its data in order to encourage the customer to spend more money with them by trying to do as much of the customer’s market research as possible. In the case of books, for example, Amazon needs to provide not only a huge selection but to focus recommendations on the individual customer. […] Many customers use smartphones with GPS capability, allowing Amazon to collect data showing time and location. This substantial amount of data is used to construct customer profiles allowing similar individuals and their recommendations to be matched. Since 2013, Amazon has been selling customer metadata to advertisers in order to promote their Web services operation […] Netflix collects and uses huge amounts of data to improve customer service, such as offering recommendations to individual customers while endeavouring to provide reliable streaming of its movies. Recommendation is at the heart of the Netflix business model and most of its business is driven by the data-based recommendations it is able to offer customers. Netflix now tracks what you watch, what you browse, what you search for, and the day and time you do all these things. It also records whether you are using an iPad, TV, or something else. […] As well as collecting search data and star ratings, Netflix can now keep records on how often users pause or fast forward, and whether or not they finish watching each programme they start. They also monitor how, when, and where they watched the programme, and a host of other variables too numerous to mention.”

“Data science is becoming a popular study option in universities but graduates so far have been unable to meet the demands of commerce and industry, where positions in data science offer high salaries to experienced applicants. Big data for commercial enterprises is concerned with profit, and disillusionment will set in quickly if an over-burdened data analyst with insufficient experience fails to deliver the expected positive results. All too often, firms are asking for a one-size-fits-all model of data scientist who is expected to be competent in everything from statistical analysis to data storage and data security.”

“In December 2016, Yahoo! announced that a data breach involving over one billion user accounts had occurred in August 2013. Dubbed the biggest ever cyber theft of personal data, or at least the biggest ever divulged by any company, thieves apparently used forged cookies, which allowed them access to accounts without the need for passwords. This followed the disclosure of an attack on Yahoo! in 2014, when 500 million accounts were compromised. […] The list of big data security breaches increases almost daily. Data theft, data ransom, and data sabotage are major concerns in a data-centric world. There have been many scares regarding the security and ownership of personal digital data. Before the digital age we used to keep photos in albums and negatives were our backup. After that, we stored our photos electronically on a hard-drive in our computer. This could possibly fail and we were wise to have back-ups but at least the files were not publicly accessible. Many of us now store data in the Cloud. […] If you store all your photos in the Cloud, it’s highly unlikely with today’s sophisticated systems that you would lose them. On the other hand, if you want to delete something, maybe a photo or video, it becomes difficult to ensure all copies have been deleted. Essentially you have to rely on your provider to do this. Another important issue is controlling who has access to the photos and other data you have uploaded to the Cloud. […] although the Internet and Cloud-based computing are generally thought of as wireless, they are anything but; data is transmitted through fibre-optic cables laid under the oceans. Nearly all digital communication between continents is transmitted in this way. My email will be sent via transatlantic fibre-optic cables, even if I am using a Cloud computing service. The Cloud, an attractive buzz word, conjures up images of satellites sending data across the world, but in reality Cloud services are firmly rooted in a distributed network of data centres providing Internet access, largely through cables. Fibre-optic cables provide the fastest means of data transmission and so are generally preferable to satellites.”

Links:

Health care informatics.
Electronic health records.
European influenza surveillance network.
Overfitting.
Public Health Emergency of International Concern.
Virtual Physiological Human project.
Watson (computer).
Natural language processing.
Anthem medical data breach.
Electronic delay storage automatic calculator (EDSAC). LEO (computer). ICL (International Computers Limited).
E-commerce. Online shopping.
Pay-per-click advertising model. Google AdWords. Click fraud. Targeted advertising.
Recommender system. Collaborative filtering.
Anticipatory shipping.
BlackPOS Malware.
Data Encryption Standard algorithm. EFF DES cracker.
Advanced Encryption Standard.
Tempora. PRISM (surveillance program). Edward Snowden. WikiLeaks. Tor (anonymity network). Silk Road (marketplace). Deep web. Internet of Things.
Songdo International Business District. Smart City.
United Nations Global Pulse.

July 19, 2018 Posted by | Books, Computer science, Cryptography, Data, Engineering, Epidemiology, Statistics | Leave a comment

Mathematics in Cryptography III

As she puts it herself, most of this lecture [~first 47 minutes or so] was basically “an explanation by a non-expert on how the internet uses public key” (-cryptography). The last 20 minutes cover, again in her own words, “more theoretical aspects”.

Some links:

ARPANET.
NSFNET.
Hypertext Transfer Protocol (HTTP). HTTPS.
Project Athena. Kerberos (protocol).
Pretty Good Privacy (PGP).
Secure Sockets Layer (SSL)/Transport Layer Security (TLS).
IPsec.
Wireshark.
Cipher suite.
Elliptic Curve Digital Signature Algorithm (ECDSA).
Request for Comments (RFC).
Elliptic-curve Diffie–Hellman (ECDH).
The SSL/TLS Handshake: an Overview.
Advanced Encryption Standard.
Galois/Counter Mode.
XOR gate.
Hexadecimal.
IP Header.
Time to live (TTL).
Transmission Control Protocol. TCP segment structure.
TLS record.
Security level.
Birthday problem. Birthday attack.
Handbook of Applied Cryptography (Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone). (§3.6 in particular is mentioned/referenced as this is stuff she talks about in the last ‘theoretical’ part of the lecture).

 

June 8, 2018 Posted by | Computer science, Cryptography, Lectures, Mathematics | Leave a comment

Mathematics in Cryptography II

Some links to stuff covered in the lecture:

Public-key cryptography.
New Directions in Cryptography (Diffie & Hellman, 1976).
The history of Non-Secret Encryption (James Ellis).
Note on “Non-Secret Encryption” – Cliff Cocks (1973).
RSA (cryptosystem).
Discrete Logarithm Problem.
Diffie–Hellman key exchange.
AES (Advanced Encryption Standard).
Triple DES.
Trusted third party (TTP).
Key management.
Man-in-the-middle attack.
Digital signature.
Public key certificate.
Secret sharing.
Hash function. Cryptographic hash function.
Secure Hash Algorithm 2 (SHA-2).
Non-repudiation (digital security).
L-notation. L (complexity).
ElGamal signature scheme.
Digital Signature Algorithm (DSA).
Schnorr signature.
Identity-based cryptography.
Identity-Based Cryptosystems and Signature Schemes (Adi Shamir, 1984).
Algorithms for Quantum Computation: Discrete Logarithms and Factoring (Peter Shor, 1994).
Quantum resistant cryptography.
Elliptic curve. Elliptic-curve cryptography.
Projective space.

I have included very few links relating to the topics covered in the last part of the lecture. This was deliberate and not just a result of the type of coverage included in that part of the lecture. In my opinion non-mathematicians should probably skip the last 25 minutes or so as they’re – not only due to technical issues (the lecturer is writing stuff on the blackboard and for several minutes you’re unable to see what she’s writing, which is …unfortunate), but those certainly were not helping – not really worth the effort. The first hour of the lecture is great, the last 25 minutes are, well, less great, in my opinion. You should however not miss the first part of the coverage of ECC-related stuff (in particular the coverage ~55-58 minutes in), if you’re interested in making sense of how ECC works; I certainly found that part of the coverage very helpful.

June 2, 2018 Posted by | Computer science, Cryptography, Lectures, Mathematics, Papers | Leave a comment

Mathematics in Cryptography

Some relevant links:

Caesar cipher.
Substitution cipher.
Frequency analysis.
Vigenère cipher.
ADFGVX cipher.
One-time pad.
Arthur Scherbius.
Enigma machine.
Permutation.
Cycle notation.
Permutation group.
Cyclic permutation.
Involution (mathematics).
An Application of the Theory of Permutations in Breaking the Enigma CipherMarian Rejewski.

May 23, 2018 Posted by | Cryptography, Lectures, Mathematics | Leave a comment

On the cryptographic hardness of finding a Nash equilibrium

I found it annoying that you generally can’t really hear the questions posed by the audience (which includes people like Avi Wigderson), especially considering that there are quite a few of these, especially in the middle section of the lecture. There are intermittent issues with the camera’s focus occasionally throughout the talk, but those are all transitory problems that should not keep you from watching the lecture. The sound issue at the beginning of the talk is resolved after 40 seconds.

One important take-away from this talk, if you choose not to watch it: “to date, there is no known efficient algorithm to find Nash equilibrium in games”. In general this paper – coauthored by the lecturer – seems from a brief skim to cover many of the topics also included in the lecture. I have added some other links to articles and topics covered/mentioned in the lecture below.

Nash’s Existence Theorem.
Reducibility Among Equilibrium Problems (Goldberg & Papadimitriou).
Three-Player Games Are Hard (Daskalakis & Papadimitriou).
3-Nash is PPAD-Complete (Chen & Deng).
PPAD (complexity).
NP-hardness.
On the (Im)possibility of Obfuscating Programs (Barak et al.).
On the Impossibility of Obfuscation with Auxiliary Input (Goldwasser & Kalai).
On Best-Possible Obfuscation (Goldwasser & Rothblum).
Functional Encryption without Obfuscation (Garg et al.).
On the Complexity of the Parity Argument and Other Inefficient Proofs of Existence (Papadimitriou).
Pseudorandom function family.
Revisiting the Cryptographic Hardness of Finding a Nash Equilibrium (Garg, Pandei & Srinivasan).
Constrained Pseudorandom Functions and Their Applications (Boneh & Waters).
Delegatable Pseudorandom Functions and Applications (Kiayias et al.).
Functional Signatures and Pseudorandom Functions (Boyle, Goldwasser & Ivan).
Universal Constructions and Robust Combiners for Indistinguishability Obfuscation and Witness Encryption (Ananth et al.).

April 18, 2018 Posted by | Computer science, Cryptography, Game theory, Lectures, Mathematics, Papers | Leave a comment

The Computer

Below some quotes and links related to the book‘s coverage:

“At the heart of every computer is one or more hardware units known as processors. A processor controls what the computer does. For example, it will process what you type in on your computer’s keyboard, display results on its screen, fetch web pages from the Internet, and carry out calculations such as adding two numbers together. It does this by ‘executing’ a computer program that details what the computer should do […] Data and programs are stored in two storage areas. The first is known as main memory and has the property that whatever is stored there can be retrieved very quickly. Main memory is used for transient data – for example, the result of a calculation which is an intermediate result in a much bigger calculation – and is also used to store computer programs while they are being executed. Data in main memory is transient – it will disappear when the computer is switched off. Hard disk memory, also known as file storage or backing storage, contains data that are required over a period of time. Typical entities that are stored in this memory include files of numerical data, word-processed documents, and spreadsheet tables. Computer programs are also stored here while they are not being executed. […] There are a number of differences between main memory and hard disk memory. The first is the retrieval time. With main memory, an item of data can be retrieved by the processor in fractions of microseconds. With file-based memory, the retrieval time is much greater: of the order of milliseconds. The reason for this is that main memory is silicon-based […] hard disk memory is usually mechanical and is stored on the metallic surface of a disk, with a mechanical arm retrieving the data. […] main memory is more expensive than file-based memory”.

The Internet is a network of computers – strictly, it is a network that joins up a number of networks. It carries out a number of functions. First, it transfers data from one computer to another computer […] The second function of the Internet is to enforce reliability. That is, to ensure that when errors occur then some form of recovery process happens; for example, if an intermediate computer fails then the software of the Internet will discover this and resend any malfunctioning data via other computers. A major component of the Internet is the World Wide Web […] The web […] uses the data-transmission facilities of the Internet in a specific way: to store and distribute web pages. The web consists of a number of computers known as web servers and a very large number of computers known as clients (your home PC is a client). Web servers are usually computers that are more powerful than the PCs that are normally found in homes or those used as office computers. They will be maintained by some enterprise and will contain individual web pages relevant to that enterprise; for example, an online book store such as Amazon will maintain web pages for each item it sells. The program that allows users to access the web is known as a browser. […] A part of the Internet known as the Domain Name System (usually referred to as DNS) will figure out where the page is held and route the request to the web server holding the page. The web server will then send the page back to your browser which will then display it on your computer. Whenever you want another page you would normally click on a link displayed on that page and the process is repeated. Conceptually, what happens is simple. However, it hides a huge amount of detail involving the web discovering where pages are stored, the pages being located, their being sent, the browser reading the pages and interpreting how they should be displayed, and eventually the browser displaying the pages. […] without one particular hardware advance the Internet would be a shadow of itself: this is broadband. This technology has provided communication speeds that we could not have dreamed of 15 years ago. […] Typical broadband speeds range from one megabit per second to 24 megabits per second, the lower rate being about 20 times faster than dial-up rates.”

“A major idea I hope to convey […] is that regarding the computer as just the box that sits on your desk, or as a chunk of silicon that is embedded within some device such as a microwave, is only a partial view. The Internet – or rather broadband access to the Internet – has created a gigantic computer that has unlimited access to both computer power and storage to the point where even applications that we all thought would never migrate from the personal computer are doing just that. […] the Internet functions as a series of computers – or more accurately computer processors – carrying out some task […]. Conceptually, there is little difference between these computers and [a] supercomputer, the only difference is in the details: for a supercomputer the communication between processors is via some internal electronic circuit, while for a collection of computers working together on the Internet the communication is via external circuits used for that network.”

“A computer will consist of a number of electronic circuits. The most important is the processor: this carries out the instructions that are contained in a computer program. […] There are a number of individual circuit elements that make up the computer. Thousands of these elements are combined together to construct the computer processor and other circuits. One basic element is known as an And gate […]. This is an electrical circuit that has two binary inputs A and B and a single binary output X. The output will be one if both the inputs are one and zero otherwise. […] the And gate is only one example – when some action is required, for example adding two numbers together, [the different circuits] interact with each other to carry out that action. In the case of addition, the two binary numbers are processed bit by bit to carry out the addition. […] Whatever actions are taken by a program […] the cycle is the same; an instruction is read into the processor, the processor decodes the instruction, acts on it, and then brings in the next instruction. So, at the heart of a computer is a series of circuits and storage elements that fetch and execute instructions and store data and programs.”

“In essence, a hard disk unit consists of one or more circular metallic disks which can be magnetized. Each disk has a very large number of magnetizable areas which can either represent zero or one depending on the magnetization. The disks are rotated at speed. The unit also contains an arm or a number of arms that can move laterally and which can sense the magnetic patterns on the disk. […] When a processor requires some data that is stored on a hard disk […] then it issues an instruction to find the file. The operating system – the software that controls the computer – will know where the file starts and ends and will send a message to the hard disk to read the data. The arm will move laterally until it is over the start position of the file and when the revolving disk passes under the arm the magnetic pattern that represents the data held in the file is read by it. Accessing data on a hard disk is a mechanical process and usually takes a small number of milliseconds to carry out. Compared with the electronic speeds of the computer itself – normally measured in fractions of a microsecond – this is incredibly slow. Because disk access is slow, systems designers try to minimize the amount of access required to files. One technique that has been particularly effective is known as caching. It is, for example, used in web servers. Such servers store pages that are sent to browsers for display. […] Caching involves placing the frequently accessed pages in some fast storage medium such as flash memory and keeping the remainder on a hard disk.”

“The first computers had a single hardware processor that executed individual instructions. It was not too long before researchers started thinking about computers that had more than one processor. The simple theory here was that if a computer had n processors then it would be n times faster. […] it is worth debunking this notion. If you look at many classes of problems […], you see that a strictly linear increase in performance is not achieved. If a problem that is solved by a single computer is solved in 20 minutes, then you will find a dual processor computer solving it in perhaps 11 minutes. A 3-processor computer may solve it in 9 minutes, and a 4-processor computer in 8 minutes. There is a law of diminishing returns; often, there comes a point when adding a processor slows down the computation. What happens is that each processor needs to communicate with the others, for example passing on the result of a computation; this communicational overhead becomes bigger and bigger as you add processors to the point when it dominates the amount of useful work that is done. The sort of problems where they are effective is where a problem can be split up into sub-problems that can be solved almost independently by each processor with little communication.”

Symmetric encryption methods are very efficient and can be used to scramble large files or long messages being sent from one computer to another. Unfortunately, symmetric techniques suffer from a major problem: if there are a number of individuals involved in a data transfer or in reading a file, each has to know the same key. This makes it a security nightmare. […] public key cryptography removed a major problem associated with symmetric cryptography: that of a large number of keys in existence some of which may be stored in an insecure way. However, a major problem with asymmetric cryptography is the fact that it is very inefficient (about 10,000 times slower than symmetric cryptography): while it can be used for short messages such as email texts, it is far too inefficient for sending gigabytes of data. However, […] when it is combined with symmetric cryptography, asymmetric cryptography provides very strong security. […] One very popular security scheme is known as the Secure Sockets Layer – normally shortened to SSL. It is based on the concept of a one-time pad. […] SSL uses public key cryptography to communicate the randomly generated key between the sender and receiver of a message. This key is only used once for the data interchange that occurs and, hence, is an electronic analogue of a one-time pad. When each of the parties to the interchange has received the key, they encrypt and decrypt the data employing symmetric cryptography, with the generated key carrying out these processes. […] There is an impression amongst the public that the main threats to security and to privacy arise from technological attack. However, the threat from more mundane sources is equally high. Data thefts, damage to software and hardware, and unauthorized access to computer systems can occur in a variety of non-technical ways: by someone finding computer printouts in a waste bin; by a window cleaner using a mobile phone camera to take a picture of a display containing sensitive information; by an office cleaner stealing documents from a desk; by a visitor to a company noting down a password written on a white board; by a disgruntled employee putting a hammer through the main server and the backup server of a company; or by someone dropping an unencrypted memory stick in the street.”

“The basic architecture of the computer has remained unchanged for six decades since IBM developed the first mainframe computers. It consists of a processor that reads software instructions one by one and executes them. Each instruction will result in data being processed, for example by being added together; and data being stored in the main memory of the computer or being stored on some file-storage medium; or being sent to the Internet or to another computer. This is what is known as the von Neumann architecture; it was named after John von Neumann […]. His key idea, which still holds sway today, is that in a computer the data and the program are both stored in the computer’s memory in the same address space. There have been few challenges to the von Neumann architecture.”

[A] ‘neural network‘ […] consists of an input layer that can sense various signals from some environment […]. In the middle (hidden layer), there are a large number of processing elements (neurones) which are arranged into sub-layers. Finally, there is an output layer which provides a result […]. It is in the middle layer that the work is done in a neural computer. What happens is that the network is trained by giving it examples of the trend or item that is to be recognized. What the training does is to strengthen or weaken the connections between the processing elements in the middle layer until, when combined, they produce a strong signal when a new case is presented to them that matches the previously trained examples and a weak signal when an item that does not match the examples is encountered. Neural networks have been implemented in hardware, but most of the implementations have been via software where the middle layer has been implemented in chunks of code that carry out the learning process. […] although the initial impetus was to use ideas in neurobiology to develop neural architectures based on a consideration of processes in the brain, there is little resemblance between the internal data and software now used in commercial implementations and the human brain.”

Links:

Computer.
Byte. Bit.
Moore’s law.
Computer program.
Programming language. High-level programming language. Low-level programming language.
Zombie (computer science).
Therac-25.
Cloud computing.
Instructions per second.
ASCII.
Fetch-execute cycle.
Grace Hopper. Software Bug.
Transistor. Integrated circuit. Very-large-scale integration. Wafer (electronics). Photomask.
Read-only memory (ROM). Read-write memory (RWM). Bus (computing). Address bus. Programmable read-only memory (PROM). Erasable programmable read-only memory (EPROM). Electrically erasable programmable read-only memory (EEPROM). Flash memory. Dynamic random-access memory (DRAM). Static random-access memory (static RAM/SRAM).
Hard disc.
Miniaturization.
Wireless communication.
Radio-frequency identification (RFID).
Metadata.
NP-hardness. Set partition problem. Bin packing problem.
Routing.
Cray X-MP. Beowulf cluster.
Vector processor.
Folding@home.
Denial-of-service attack. Melissa (computer virus). Malware. Firewall (computing). Logic bomb. Fork bomb/rabbit virus. Cryptography. Caesar cipher. Social engineering (information security).
Application programming interface.
Data mining. Machine translation. Machine learning.
Functional programming.
Quantum computing.

March 19, 2018 Posted by | Books, Computer science, Cryptography, Engineering | Leave a comment

Interactive Coding with “Optimal” Round and Communication Blowup

The youtube description of this one was rather longer than usual, and I decided to quote it in full below:

“The problem of constructing error-resilient interactive protocols was introduced in the seminal works of Schulman (FOCS 1992, STOC 1993). These works show how to convert any two-party interactive protocol into one that is resilient to constant-fraction of error, while blowing up the communication by only a constant factor. Since these seminal works, there have been many follow-up works which improve the error rate, the communication rate, and the computational efficiency. All these works assume that in the underlying protocol, in each round, each party sends a *single* bit. This assumption is without loss of generality, since one can efficiently convert any protocol into one which sends one bit per round. However, this conversion may cause a substantial increase in *round* complexity, which is what we wish to minimize in this work. Moreover, all previous works assume that the communication complexity of the underlying protocol is *fixed* and a priori known, an assumption that we wish to remove. In this work, we consider protocols whose messages may be of *arbitrary* lengths, and where the length of each message and the length of the protocol may be *adaptive*, and may depend on the private inputs of the parties and on previous communication. We show how to efficiently convert any such protocol into another protocol with comparable efficiency guarantees, that is resilient to constant fraction of adversarial error, while blowing up both the *communication* complexity and the *round* complexity by at most a constant factor. Moreover, as opposed to most previous work, our error model not only allows the adversary to toggle with the corrupted bits, but also allows the adversary to *insert* and *delete* bits. In addition, our transformation preserves the computational efficiency of the protocol. Finally, we try to minimize the blowup parameters, and give evidence that our parameters are nearly optimal. This is joint work with Klim Efremenko and Elad Haramaty.”

A few links to stuff covered/mentioned in the lecture:

Coding for interactive communication correcting insertions and deletions.
Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes.
Common reference string model.
Small-bias probability spaces: Efficient constructions and applications.
Interactive Channel Capacity Revisited.
Collision (computer science).
Chernoff bound.

September 6, 2017 Posted by | Computer science, Cryptography, Lectures, Mathematics | Leave a comment

Random Stuff

i. Some new words I’ve encountered (not all of them are from vocabulary.com, but many of them are):

Uxoricide, persnickety, logy, philoprogenitive, impassive, hagiography, gunwale, flounce, vivify, pelage, irredentism, pertinacity,callipygous, valetudinarian, recrudesce, adjuration, epistolary, dandle, picaresque, humdinger, newel, lightsome, lunette, inflect, misoneism, cormorant, immanence, parvenu, sconce, acquisitiveness, lingual, Macaronic, divot, mettlesome, logomachy, raffish, marginalia, omnifarious, tatter, licit.

ii. A lecture:

I got annoyed a few times by the fact that you can’t tell where he’s pointing when he’s talking about the slides, which makes the lecture harder to follow than it ought to be, but it’s still an interesting lecture.

iii. Facts about Dihydrogen Monoxide. Includes coverage of important neglected topics such as ‘What is the link between Dihydrogen Monoxide and school violence?’ After reading the article, I am frankly outraged that this stuff’s still legal!

iv. Some wikipedia links of interest:

Steganography.

Steganography […] is the practice of concealing a file, message, image, or video within another file, message, image, or video. The word steganography combines the Greek words steganos (στεγανός), meaning “covered, concealed, or protected”, and graphein (γράφειν) meaning “writing”. […] Generally, the hidden messages appear to be (or be part of) something else: images, articles, shopping lists, or some other cover text. For example, the hidden message may be in invisible ink between the visible lines of a private letter. Some implementations of steganography that lack a shared secret are forms of security through obscurity, whereas key-dependent steganographic schemes adhere to Kerckhoffs’s principle.[1]

The advantage of steganography over cryptography alone is that the intended secret message does not attract attention to itself as an object of scrutiny. Plainly visible encrypted messages—no matter how unbreakable—arouse interest, and may in themselves be incriminating in countries where encryption is illegal.[2] Thus, whereas cryptography is the practice of protecting the contents of a message alone, steganography is concerned with concealing the fact that a secret message is being sent, as well as concealing the contents of the message.”

H. H. Holmes. A really nice guy.

Herman Webster Mudgett (May 16, 1861 – May 7, 1896), better known under the name of Dr. Henry Howard Holmes or more commonly just H. H. Holmes, was one of the first documented serial killers in the modern sense of the term.[1][2] In Chicago, at the time of the 1893 World’s Columbian Exposition, Holmes opened a hotel which he had designed and built for himself specifically with murder in mind, and which was the location of many of his murders. While he confessed to 27 murders, of which nine were confirmed, his actual body count could be up to 200.[3] He brought an unknown number of his victims to his World’s Fair Hotel, located about 3 miles (4.8 km) west of the fair, which was held in Jackson Park. Besides being a serial killer, H. H. Holmes was also a successful con artist and a bigamist. […]

Holmes purchased an empty lot across from the drugstore where he built his three-story, block-long hotel building. Because of its enormous structure, local people dubbed it “The Castle”. The building was 162 feet long and 50 feet wide. […] The ground floor of the Castle contained Holmes’ own relocated drugstore and various shops, while the upper two floors contained his personal office and a labyrinth of rooms with doorways opening to brick walls, oddly-angled hallways, stairways leading to nowhere, doors that could only be opened from the outside and a host of other strange and deceptive constructions. Holmes was constantly firing and hiring different workers during the construction of the Castle, claiming that “they were doing incompetent work.” His actual reason was to ensure that he was the only one who fully understood the design of the building.[3]

Minnesota Starvation Experiment.

“The Minnesota Starvation Experiment […] was a clinical study performed at the University of Minnesota between November 19, 1944 and December 20, 1945. The investigation was designed to determine the physiological and psychological effects of severe and prolonged dietary restriction and the effectiveness of dietary rehabilitation strategies.

The motivation of the study was twofold: First, to produce a definitive treatise on the subject of human starvation based on a laboratory simulation of severe famine and, second, to use the scientific results produced to guide the Allied relief assistance to famine victims in Europe and Asia at the end of World War II. It was recognized early in 1944 that millions of people were in grave danger of mass famine as a result of the conflict, and information was needed regarding the effects of semi-starvation—and the impact of various rehabilitation strategies—if postwar relief efforts were to be effective.”

“most of the subjects experienced periods of severe emotional distress and depression.[1]:161 There were extreme reactions to the psychological effects during the experiment including self-mutilation (one subject amputated three fingers of his hand with an axe, though the subject was unsure if he had done so intentionally or accidentally).[5] Participants exhibited a preoccupation with food, both during the starvation period and the rehabilitation phase. Sexual interest was drastically reduced, and the volunteers showed signs of social withdrawal and isolation.[1]:123–124 […] One of the crucial observations of the Minnesota Starvation Experiment […] is that the physical effects of the induced semi-starvation during the study closely approximate the conditions experienced by people with a range of eating disorders such as anorexia nervosa and bulimia nervosa.”

Post-vasectomy pain syndrome. Vasectomy reversal is a risk people probably know about, but this one seems to also be worth being aware of if one is considering having a vasectomy.

Transport in the Soviet Union (‘good article’). A few observations from the article:

“By the mid-1970s, only eight percent of the Soviet population owned a car. […]  From 1924 to 1971 the USSR produced 1 million vehicles […] By 1975 only 8 percent of rural households owned a car. […] Growth of motor vehicles had increased by 224 percent in the 1980s, while hardcore surfaced roads only increased by 64 percent. […] By the 1980s Soviet railways had become the most intensively used in the world. Most Soviet citizens did not own private transport, and if they did, it was difficult to drive long distances due to the poor conditions of many roads. […] Road transport played a minor role in the Soviet economy, compared to domestic rail transport or First World road transport. According to historian Martin Crouch, road traffic of goods and passengers combined was only 14 percent of the volume of rail transport. It was only late in its existence that the Soviet authorities put emphasis on road construction and maintenance […] Road transport as a whole lagged far behind that of rail transport; the average distance moved by motor transport in 1982 was 16.4 kilometres (10.2 mi), while the average for railway transport was 930 km per ton and 435 km per ton for water freight. In 1982 there was a threefold increase in investment since 1960 in motor freight transport, and more than a thirtyfold increase since 1940.”

March 3, 2016 Posted by | Biology, Cryptography, Engineering, History, Language, Lectures, Ophthalmology, Random stuff, Wikipedia, Zoology | Leave a comment

Wikipedia articles of interest

i. Trade and use of saffron.

Saffron has been a key seasoning, fragrance, dye, and medicine for over three millennia.[1] One of the world’s most expensive spices by weight,[2] saffron consists of stigmas plucked from the vegetatively propagated and sterile Crocus sativus, known popularly as the saffron crocus. The resulting dried “threads”[N 1] are distinguished by their bitter taste, hay-like fragrance, and slight metallic notes. The saffron crocus is unknown in the wild; its most likely precursor, Crocus cartwrightianus, originated in Crete or Central Asia;[3] The saffron crocus is native to Southwest Asia and was first cultivated in what is now Greece.[4][5][6]

From antiquity to modern times the history of saffron is full of applications in food, drink, and traditional herbal medicine: from Africa and Asia to Europe and the Americas the brilliant red threads were—and are—prized in baking, curries, and liquor. It coloured textiles and other items and often helped confer the social standing of political elites and religious adepts. Ancient peoples believed saffron could be used to treat stomach upsets, bubonic plague, and smallpox.

Saffron crocus cultivation has long centred on a broad belt of Eurasia bounded by the Mediterranean Sea in the southwest to India and China in the northeast. The major producers of antiquity—Iran, Spain, India, and Greece—continue to dominate the world trade. […] Iran has accounted for around 90–93 percent of recent annual world production and thereby dominates the export market on a by-quantity basis. […]

The high cost of saffron is due to the difficulty of manually extracting large numbers of minute stigmas, which are the only part of the crocus with the desired aroma and flavour. An exorbitant number of flowers need to be processed in order to yield marketable amounts of saffron. Obtaining 1 lb (0.45 kg) of dry saffron requires the harvesting of some 50,000 flowers, the equivalent of an association football pitch’s area of cultivation, or roughly 7,140 m2 (0.714 ha).[14] By another estimate some 75,000 flowers are needed to produce one pound of dry saffron. […] Another complication arises in the flowers’ simultaneous and transient blooming. […] Bulk quantities of lower-grade saffron can reach upwards of US$500 per pound; retail costs for small amounts may exceed ten times that rate. In Western countries the average retail price is approximately US$1,000 per pound.[5] Prices vary widely elsewhere, but on average tend to be lower. The high price is somewhat offset by the small quantities needed in kitchens: a few grams at most in medicinal use and a few strands, at most, in culinary applications; there are between 70,000 and 200,000 strands in a pound.”

ii. Scramble for Africa.

“The “Scramble for Africa” (also the Partition of Africa and the Conquest of Africa) was the invasion and occupation, colonization and annexation of African territory by European powers during the period of New Imperialism, between 1881 and 1914. In 1870, 10 percent of Africa was under European control; by 1914 it was 90 percent of the continent, with only Abyssinia (Ethiopia) and Liberia still independent.”

Here’s a really neat illustration from the article:

Scramble-for-Africa-1880-1913

“Germany became the third largest colonial power in Africa. Nearly all of its overall empire of 2.6 million square kilometres and 14 million colonial subjects in 1914 was found in its African possessions of Southwest Africa, Togoland, the Cameroons, and Tanganyika. Following the 1904 Entente cordiale between France and the British Empire, Germany tried to isolate France in 1905 with the First Moroccan Crisis. This led to the 1905 Algeciras Conference, in which France’s influence on Morocco was compensated by the exchange of other territories, and then to the Agadir Crisis in 1911. Along with the 1898 Fashoda Incident between France and Britain, this succession of international crises reveals the bitterness of the struggle between the various imperialist nations, which ultimately led to World War I. […]

David Livingstone‘s explorations, carried on by Henry Morton Stanley, excited imaginations. But at first, Stanley’s grandiose ideas for colonisation found little support owing to the problems and scale of action required, except from Léopold II of Belgium, who in 1876 had organised the International African Association (the Congo Society). From 1869 to 1874, Stanley was secretly sent by Léopold II to the Congo region, where he made treaties with several African chiefs along the Congo River and by 1882 had sufficient territory to form the basis of the Congo Free State. Léopold II personally owned the colony from 1885 and used it as a source of ivory and rubber.

While Stanley was exploring Congo on behalf of Léopold II of Belgium, the Franco-Italian marine officer Pierre de Brazza travelled into the western Congo basin and raised the French flag over the newly founded Brazzaville in 1881, thus occupying today’s Republic of the Congo. Portugal, which also claimed the area due to old treaties with the native Kongo Empire, made a treaty with Britain on 26 February 1884 to block off the Congo Society’s access to the Atlantic.

By 1890 the Congo Free State had consolidated its control of its territory between Leopoldville and Stanleyville, and was looking to push south down the Lualaba River from Stanleyville. At the same time, the British South Africa Company of Cecil Rhodes was expanding north from the Limpopo River, sending the Pioneer Column (guided by Frederick Selous) through Matabeleland, and starting a colony in Mashonaland.

To the West, in the land where their expansions would meet, was Katanga, site of the Yeke Kingdom of Msiri. Msiri was the most militarily powerful ruler in the area, and traded large quantities of copper, ivory and slaves — and rumours of gold reached European ears. The scramble for Katanga was a prime example of the period. Rhodes and the BSAC sent two expeditions to Msiri in 1890 led by Alfred Sharpe, who was rebuffed, and Joseph Thomson, who failed to reach Katanga. Leopold sent four CFS expeditions. First, the Le Marinel Expedition could only extract a vaguely worded letter. The Delcommune Expedition was rebuffed. The well-armed Stairs Expedition was given orders to take Katanga with or without Msiri’s consent. Msiri refused, was shot, and the expedition cut off his head and stuck it on a pole as a “barbaric lesson” to the people. The Bia Expedition finished the job of establishing an administration of sorts and a “police presence” in Katanga.

Thus, the half million square kilometres of Katanga came into Leopold’s possession and brought his African realm up to 2,300,000 square kilometres (890,000 sq mi), about 75 times larger than Belgium. The Congo Free State imposed such a terror regime on the colonised people, including mass killings and forced labour, that Belgium, under pressure from the Congo Reform Association, ended Leopold II’s rule and annexed it in 1908 as a colony of Belgium, known as the Belgian Congo. […]

“Britain’s administration of Egypt and the Cape Colony contributed to a preoccupation over securing the source of the Nile River. Egypt was overrun by British forces in 1882 (although not formally declared a protectorate until 1914, and never an actual colony); Sudan, Nigeria, Kenya and Uganda were subjugated in the 1890s and early 20th century; and in the south, the Cape Colony (first acquired in 1795) provided a base for the subjugation of neighbouring African states and the Dutch Afrikaner settlers who had left the Cape to avoid the British and then founded their own republics. In 1877, Theophilus Shepstone annexed the South African Republic (or Transvaal – independent from 1857 to 1877) for the British Empire. In 1879, after the Anglo-Zulu War, Britain consolidated its control of most of the territories of South Africa. The Boers protested, and in December 1880 they revolted, leading to the First Boer War (1880–81). British Prime Minister William Gladstone signed a peace treaty on 23 March 1881, giving self-government to the Boers in the Transvaal. […] The Second Boer War, fought between 1899 and 1902, was about control of the gold and diamond industries; the independent Boer republics of the Orange Free State and the South African Republic (or Transvaal) were this time defeated and absorbed into the British Empire.”

There are a lot of unsourced claims in the article and some parts of it actually aren’t very good, but this is a topic about which I did not know much (I had no idea most of colonial Africa was acquired by the European powers as late as was actually the case). This is another good map from the article to have a look at if you just want the big picture.

iii. Cursed soldiers.

“The cursed soldiers (that is, “accursed soldiers” or “damned soldiers”; Polish: Żołnierze wyklęci) is a name applied to a variety of Polish resistance movements formed in the later stages of World War II and afterwards. Created by some members of the Polish Secret State, these clandestine organizations continued their armed struggle against the Stalinist government of Poland well into the 1950s. The guerrilla warfare included an array of military attacks launched against the new communist prisons as well as MBP state security offices, detention facilities for political prisoners, and concentration camps set up across the country. Most of the Polish anti-communist groups ceased to exist in the late 1940s or 1950s, hunted down by MBP security services and NKVD assassination squads.[1] However, the last known ‘cursed soldier’, Józef Franczak, was killed in an ambush as late as 1963, almost 20 years after the Soviet take-over of Poland.[2][3] […] Similar eastern European anti-communists fought on in other countries. […]

Armia Krajowa (or simply AK)-the main Polish resistance movement in World War II-had officially disbanded on 19 January 1945 to prevent a slide into armed conflict with the Red Army, including an increasing threat of civil war over Poland’s sovereignty. However, many units decided to continue on with their struggle under new circumstances, seeing the Soviet forces as new occupiers. Meanwhile, Soviet partisans in Poland had already been ordered by Moscow on June 22, 1943 to engage Polish Leśni partisans in combat.[6] They commonly fought Poles more often than they did the Germans.[4] The main forces of the Red Army (Northern Group of Forces) and the NKVD had begun conducting operations against AK partisans already during and directly after the Polish Operation Tempest, designed by the Poles as a preventive action to assure Polish rather than Soviet control of the cities after the German withdrawal.[5] Soviet premier Joseph Stalin aimed to ensure that an independent Poland would never reemerge in the postwar period.[7] […]

The first Polish communist government, the Polish Committee of National Liberation, was formed in July 1944, but declined jurisdiction over AK soldiers. Consequently, for more than a year, it was Soviet agencies like the NKVD that dealt with the AK. By the end of the war, approximately 60,000 soldiers of the AK had been arrested, and 50,000 of them were deported to the Soviet Union’s gulags and prisons. Most of those soldiers had been captured by the Soviets during or in the aftermath of Operation Tempest, when many AK units tried to cooperate with the Soviets in a nationwide uprising against the Germans. Other veterans were arrested when they decided to approach the government after being promised amnesty. In 1947, an amnesty was passed for most of the partisans; the Communist authorities expected around 12,000 people to give up their arms, but the actual number of people to come out of the forests eventually reached 53,000. Many of them were arrested despite promises of freedom; after repeated broken promises during the first few years of communist control, AK soldiers stopped trusting the government.[5] […]

The persecution of the AK members was only a part of the reign of Stalinist terror in postwar Poland. In the period of 1944–56, approximately 300,000 Polish people had been arrested,[21] or up to two million, by different accounts.[5] There were 6,000 death sentences issued, the majority of them carried out.[21] Possibly, over 20,000 people died in communist prisons including those executed “in the majesty of the law” such as Witold Pilecki, a hero of Auschwitz.[5] A further six million Polish citizens (i.e., one out of every three adult Poles) were classified as suspected members of a ‘reactionary or criminal element’ and subjected to investigation by state agencies.”

iv. Affective neuroscience.

Affective neuroscience is the study of the neural mechanisms of emotion. This interdisciplinary field combines neuroscience with the psychological study of personality, emotion, and mood.[1]

This article is actually related to the Delusion and self-deception book, which covered some of the stuff included in this article, but I decided I might as well include the link in this post. I think some parts of the article are written in a somewhat different manner than most wiki articles – there are specific paragraphs briefly covering the results of specific meta-analyses conducted in this field. I can’t really tell from this article if I actually like this way of writing a wiki article or not.

v. Hamming distance. Not a long article, but this is a useful concept to be familiar with:

“In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. […]

The Hamming distance is named after Richard Hamming, who introduced it in his fundamental paper on Hamming codes Error detecting and error correcting codes in 1950.[1] It is used in telecommunication to count the number of flipped bits in a fixed-length binary word as an estimate of error, and therefore is sometimes called the signal distance. Hamming weight analysis of bits is used in several disciplines including information theory, coding theory, and cryptography. However, for comparing strings of different lengths, or strings where not just substitutions but also insertions or deletions have to be expected, a more sophisticated metric like the Levenshtein distance is more appropriate.”

vi. Menstrual synchrony. I came across that one recently in a book, and when I did it was obvious that the author had not read this article, and lacked some knowledge included in this article (the phenomenon was assumed to be real in the coverage, and theory was developed assuming it was real which would not make sense if it was not). I figured if that person didn’t know this stuff, a lot of other people – including people reading along here – probably also do not, so I should cover this topic somewhere. This is an obvious place to do so. Okay, on to the article coverage:

Menstrual synchrony, also called the McClintock effect,[2] is the alleged process whereby women who begin living together in close proximity experience their menstrual cycle onsets (i.e., the onset of menstruation or menses) becoming closer together in time than previously. “For example, the distribution of onsets of seven female lifeguards was scattered at the beginning of the summer, but after 3 months spent together, the onset of all seven cycles fell within a 4-day period.”[3]

Martha McClintock’s 1971 paper, published in Nature, says that menstrual cycle synchronization happens when the menstrual cycle onsets of two women or more women become closer together in time than they were several months earlier.[3] Several mechanisms have been hypothesized to cause synchronization.[4]

After the initial studies, several papers were published reporting methodological flaws in studies reporting menstrual synchrony including McClintock’s study. In addition, other studies were published that failed to find synchrony. The proposed mechanisms have also received scientific criticism. A 2013 review of menstrual synchrony concluded that menstrual synchrony is doubtful.[4] […] in a recent systematic review of menstrual synchrony, Harris and Vitzthum concluded that “In light of the lack of empirical evidence for MS [menstrual synchrony] sensu stricto, it seems there should be more widespread doubt than acceptance of this hypothesis.” […]

The experience of synchrony may be the result of the mathematical fact that menstrual cycles of different frequencies repeatedly converge and diverge over time and not due to a process of synchronization.[12] It may also be due to the high probability of menstruation overlap that occurs by chance.[6]

 

December 4, 2014 Posted by | Biology, Botany, Computer science, Cryptography, Geography, History, Medicine, Neurology, Psychology, Wikipedia | Leave a comment

Wikipedia articles of interest

Some of these, though I don’t remember precisely which, are from the wikipedia list of unusual articles I recently linked to:

i. S. A. Andrée’s Arctic Balloon Expedition of 1897 (featured).

Eagle-crashed

S. A. Andrée’s Arctic balloon expedition of 1897 was an ill-fated effort to reach the North Pole in which all three expedition members perished. S. A. Andrée (1854–97),[1] the first Swedish balloonist, proposed a voyage by hydrogen balloon from Svalbard to either Russia or Canada, which was to pass, with luck, straight over the North Pole on the way. The scheme was received with patriotic enthusiasm in Sweden, a northern nation that had fallen behind in the race for the North Pole.

Andrée neglected many early signs of the dangers associated with his balloon plan. Being able to steer the balloon to some extent was essential for a safe journey, and there was plenty of evidence that the drag-rope steering technique he had invented was ineffective; yet he staked the fate of the expedition on drag ropes. Worse, the polar balloon Örnen (Eagle) was delivered directly to Svalbard from its manufacturer in Paris without being tested; when measurements showed it to be leaking more than expected, Andrée refused to acknowledge the alarming implications of this. Most modern students of the expedition see Andrée’s optimism, faith in the power of technology, and disregard for the forces of nature as the main factors in the series of events that led to his death and the deaths of his two companions Nils Strindberg (1872–97) and Knut Frænkel (1870–97).[2]

After Andrée, Strindberg, and Frænkel lifted off from Svalbard in July 1897, the balloon lost hydrogen quickly and crashed on the pack ice after only two days. The explorers were unhurt but faced a grueling trek back south across the drifting icescape. Inadequately clothed, equipped, and prepared, and shocked by the difficulty of the terrain, they did not make it to safety. As the Arctic winter closed in on them in October, the group ended up exhausted on the deserted Kvitøya (White Island) in Svalbard and died there. For 33 years the fate of the Andrée expedition remained one of the unsolved riddles of the Arctic. The chance discovery in 1930 of the expedition’s last camp created a media sensation in Sweden, where the dead men were mourned and idolized.”

ii. Raven paradox.

“The Raven paradox, also known as Hempel’s paradox or Hempel’s ravens is a paradox arising from the question of what constitutes evidence for a statement. Observing objects that are neither black nor ravens may formally increase the likelihood that all ravens are black—even though intuitively these observations are unrelated.”

iii. Voynich manuscript.

“The Voynich manuscript, described as “the world’s most mysterious manuscript”,[3] is a work which dates to the early 15th century (1404–1438), possibly from northern Italy.[1][2] It is named after the book dealer Wilfrid Voynich, who purchased it in 1912.

Some pages are missing, but there are now about 240 vellum pages, most with illustrations. Much of the manuscript resembles herbal manuscripts of the 1500s, seeming to present illustrations and information about plants and their possible uses for medical purposes. However, most of the plants do not match known species, and the manuscript’s script and language remain unknown. Possibly some form of encrypted ciphertext, the Voynich manuscript has been studied by many professional and amateur cryptographers, including American and British codebreakers from both World War I and World War II. It has defied all decipherment attempts, becoming a famous case of historical cryptology. The mystery surrounding it has excited the popular imagination, making the manuscript a subject of both fanciful theories and novels. None of the many speculative solutions proposed over the last hundred years has yet been independently verified.[4]

iv. Infinite monkey theorem. Most people have heard about this one, but the article may have some stuff you didn’t know. This part made me laugh:

“In 2003, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes Crested Macaques in Paignton Zoo in Devon in England for a month, with a radio link to broadcast the results on a website.[10]

Not only did the monkeys produce nothing but five pages[11] consisting largely of the letter S, the lead male began by bashing the keyboard with a stone, and the monkeys continued by urinating and defecating on it. ”

v. Boston Massacre.

“The Boston Massacre, known as the Incident on King Street by the British, was an incident on March 5, 1770, in which British Army soldiers killed five civilian men and injured six others. British troops had been stationed in Boston, capital of the Province of Massachusetts Bay, since 1768 in order to protect and support crown-appointed colonial officials attempting to enforce unpopular Parliamentary legislation. Amid ongoing tense relations between the population and the soldiers, a mob formed around a British sentry, who was subjected to verbal abuse and harassment. He was eventually supported by eight additional soldiers, who were subjected to verbal threats and thrown objects. They fired into the crowd, without orders, instantly killing three people and wounding others. Two more people died later of wounds sustained in the incident.

The crowd eventually dispersed after Acting Governor Thomas Hutchinson promised an inquiry, but reformed the next day, prompting the withdrawal of the troops to Castle Island. Eight soldiers, one officer, and four civilians were arrested and charged with murder. Defended by the lawyer and future American President, John Adams, six of the soldiers were acquitted, while the other two were convicted of manslaughter and given reduced sentences. The sentence that the men guilty of manslaughter received was a branding on their hand.

Depictions, reports, and propaganda about the event […] heightened tensions throughout the Thirteen Colonies. The event is widely viewed as foreshadowing the outbreak of the American Revolutionary War five years later. […]

The Boston Massacre is considered one of the most important events that turned colonial sentiment against King George III and British Parliamentary authority. John Adams wrote that the “foundation of American independence was laid” on March 5, 1770, and Samuel Adams and other Patriots used annual commemorations (Massacre Day) of the event to fulminate against British rule.[68] Christopher Monk, the boy who was wounded [and crippled – US] in the attack and died in 1780, was paraded before the crowds as a reminder of British hostility.[29] Later events such as the Boston Tea Party further illustrated the crumbling relationship between Britain and its colonies. Although five years passed between the massacre and outright revolution, and direct connections between the massacre and the later war are (according to historian Neil Langley York) somewhat tenuous,[69] it is widely perceived as a significant event leading to the violent rebellion that followed.[70][71]

vi. History of Chinese Americans. I thought this was a fascinating article – it has a lot of stuff.

Chinese immigration to the U.S. consisted of three major waves, with the first beginning in the 19th century. Chinese immigrants in the 19th century worked as laborers, particularly on the transcontinental railroad, such as the Central Pacific Railroad. They also worked as laborers in the mining industry, and suffered racial discrimination. While industrial employers were eager to get this new and cheap labor, the ordinary white public was stirred to anger by the presence of this “yellow peril.” Despite the provisions for equal treatment of Chinese immigrants in the 1868 Burlingame Treaty, political and labor organizations rallied against the immigration of what they regarded as a degraded race and “cheap Chinese labor.” Newspapers condemned the policies of employers, and even church leaders denounced the entrance of these aliens into what was regarded as a land for whites only. So hostile was the opposition that in 1882 the United States Congress eventually passed the Chinese Exclusion Act, which prohibited immigration from China for the next ten years. This law was then extended by the Geary Act in 1892. The Chinese Exclusion Act was the only U.S. law ever to prevent immigration and naturalization on the basis of race.[1] These laws not only prevented new immigration but also brought additional suffering as they prevented the reunion of the families of thousands of Chinese men already living in the U.S. (that is, men who had left China without their wives and children); anti-miscegenation laws in many states prohibited Chinese men from marrying white women.[2]

In 1924 the law barred further entries of Chinese; those already in the United States had been ineligible for citizenship since the previous year. Also by 1924, all Asian immigrants (except people from the Philippines, which had been annexed by the United States in 1898) were utterly excluded by law, denied citizenship and naturalization, and prevented from marrying Caucasians or owning land.[3]

Only since the 1940s when the US and China became allies during World War II, did the situation for Chinese Americans begin to improve, as restrictions on entry into the country, naturalization and mixed marriage were being lessened. In 1943, Chinese immigration to the U.S. was once again permitted — by way of the Magnuson Act — thereby repealing 61 years of official racial discrimination against the Chinese. Large-scale Chinese immigration did not occur until 1965 when the Immigration and Nationality Act of 1965[4] lifted national origin quotas.[5] […] As of the 2010 United States Census, there are more than 3.3 million Chinese in the United States — about 1% of the total population. […]

Of the first wave of Chinese who came to America, few were women. In 1850, the Chinese community of San Francisco consisted of 4018 men and only 7 women. In 1855, women made up only two percent of the Chinese population in the U.S., and even in 1890 it had increased to only 4.8 percent. The lack of visibility of Chinese women in the general public was due partially to factors such as the cost of making the voyage when there was a lack of work opportunities for Chinese women in America, harsh working conditions and having the traditional female responsibility of looking after the children and extended family back in China. The only women who did go to America were usually the wives of merchants. […] With the heavily uneven gender ratio, prostitution grew rapidly and the Chinese sex trade and trafficking became a lucrative business. From the documents of the 1870 U.S. Census, 61 percent of 3536 Chinese women in California had been classified as prostitutes as an occupation. The existence of Chinese prostitution was detected early, after which the police, legislature and popular press singled out Chinese prostitutes for criticism and were seen as further evidence of the depravity of the Chinese and the repression of their women by their patriarchal cultural values.[25] […]

After the 1893 economic downturn, measures adopted in the severe depression included anti-Chinese riots that eventually spread throughout the West from which came racist violence and massacres. Most of the Chinese farm workers, which in 1890 made up a 75 percent share of all Californian agricultural workers, were expelled. The Chinese found refuge and shelter in the Chinatowns of large cities. The vacant agricultural jobs subsequently proved to be so unattractive to the unemployed white Europeans that they avoided to sign up; most of the vacancies were then filled by Japanese workers, after whom in the decades later came Filipinos, and finally Mexicans.[64] […]

Other laws included the Cubic Air Ordinance, which prohibited Chinese from occupying a sleeping room with less than 500 cubic feet (14 m3) of breathing space between each person, the Queue Ordinance,[70] which forced Chinese with long hair worn in a queue to pay a tax or to cut it, and Anti-Miscegenation Act of 1889 that prohibited Chinese men from marrying white women, and the Cable Act of 1922, which terminated citizenship for white American women who married an Asian man. The majority of these laws were not fully overturned until the 1950s, at the dawn of the modern American civil rights movement. […] Many Western states also enacted discriminatory laws that made it difficult for Chinese and Japanese immigrants to own land and find work. Some of these Anti-Chinese laws were the Foreign Miners’ License tax, which required a monthly payment of three dollars from every foreign miner who did not desire to become a citizen. Foreign-born Chinese could not become citizens because they had been rendered ineligible to citizenship by the Naturalization Act of 1790 […]

Between 1850 and 1875, the most popular complaint against Chinese residents was their involvement in prostitution.[85] During this time, Hip Yee Tong, a secret society, imported over six-thousand Chinese women to serve as prostitutes.[86] Most of these women came from southeastern China and were either kidnapped, purchased from poor families or lured to ports like San Francisco with the promise of marriage.[86] Prostitutes fell into three categories, namely, those sold to wealthy Chinese merchants as concubines, those purchased for high-class Chinese brothels catering exclusively to Chinese men or those purchased for prostitution in lower-class establishments frequented by a mixed clientele.[86] In late-19th century San Francisco, most notably Jackson Street, prostitutes were often housed in rooms 10×10 or 12×12 feet and were often beaten or tortured for not attracting enough business or refusing to work for any reason.[87] […]

Another major concern of European-Americans in relation to Chinatowns was the smoking of opium, even though the importation and consumption of opium long predated Chinese immigration to the United States.[92] Tariff acts of 1832 established opium regulation and in 1842 opium was taxed at seventy-five cents per pound.[93] In New York, by 1870, opium dens had opened on Baxter and Mott Streets in Manhattan Chinatown,[93] while in San Francisco, by 1876, Chinatown supported over 200 opium dents, each with a capacity of between five and fifteen people.[93] After the Burlingame Commercial Treaty of 1880, only American citizens could legally import opium into the United States, thus Chinese businessmen had to rely on non-Chinese importers to maintain opium supply.”

vii. Eye (cyclone) (featured).

“The eye is a region of mostly calm weather at the center of strong tropical cyclones. The eye of a storm is a roughly circular area, typically 30–65 km (20–40 miles) in diameter. It is surrounded by the eyewall, a ring of towering thunderstorms where the most severe weather occurs. The cyclone’s lowest barometric pressure occurs in the eye, and can be as much as 15 percent lower than the pressure outside the storm.[1]

In strong tropical cyclones, the eye is characterized by light winds and clear skies, surrounded on all sides by a towering, symmetric eyewall. In weaker tropical cyclones, the eye is less well defined, and can be covered by the central dense overcast, an area of high, thick clouds that show up brightly on satellite imagery. Weaker or disorganized storms may also feature an eyewall that does not completely encircle the eye, or have an eye that features heavy rain. In all storms, however, the eye is the location of the storm’s minimum barometric pressure: the area where the atmospheric pressure at sea level is the lowest.[1][2]

May 19, 2013 Posted by | Cryptography, Demographics, History, Philosophy, Wikipedia | Leave a comment

Stuff

i. Econometric methods for causal evaluation of education policies and practices: a non-technical guide. This one is ‘work-related’; in one of my courses I’m writing a paper and this working paper is one (of many) of the sources I’m planning on using. Most of the papers I work with are unfortunately not freely available online, which is part of why I haven’t linked to them here on the blog.

I should note that there are no equations in this paper, so you should focus on the words ‘a non-technical guide’ rather than the words ‘econometric methods’ in the title – I think this is a very readable paper for the non-expert as well. I should of course also note that I have worked with most of these methods in a lot more detail, and that without the math it’s very hard to understand the details and really know what’s going on e.g. when applying such methods – or related methods such as IV methods on panel data, a topic which was covered in another class just a few weeks ago but which is not covered in this paper.

This is a place to start if you want to know something about applied econometric methods, particularly if you want to know how they’re used in the field of educational economics, and especially if you don’t have a strong background in stats or math. It should be noted that some of the methods covered see wide-spread use in other areas of economics as well; IV is widely used, and the difference-in-differences estimator have seen a lot of applications in health economics.

ii. Regulating the Way to Obesity: Unintended Consequences of Limiting Sugary Drink Sizes. The law of unintended consequences strikes again.

You could argue with some of the assumptions made here (e.g. that prices (/oz) remain constant) but I’m not sure the findings are that sensitive to that assumption, and without an explicit model of the pricing mechanism at work it’s mostly guesswork anyway.

iii. A discussion about the neurobiology of memory. Razib Khan posted a short part of the video recently, so I decided to watch it today. A few relevant wikipedia links: Memory, Dead reckoning, Hebbian theory, Caenorhabditis elegans. I’m skeptical, but I agree with one commenter who put it this way: “I know darn well I’m too ignorant to decide whether Randy is possibly right, or almost certainly wrong — yet I found this interesting all the way through.” I also agree with another commenter who mentioned that it’d have been useful for Gallistel to go into details about the differences between short term and long term memory and how these differences relate to the problem at hand.

iv. Plos-One: Low Levels of Empathic Concern Predict Utilitarian Moral Judgment.

“An extensive body of prior research indicates an association between emotion and moral judgment. In the present study, we characterized the predictive power of specific aspects of emotional processing (e.g., empathic concern versus personal distress) for different kinds of moral responders (e.g., utilitarian versus non-utilitarian). Across three large independent participant samples, using three distinct pairs of moral scenarios, we observed a highly specific and consistent pattern of effects. First, moral judgment was uniquely associated with a measure of empathy but unrelated to any of the demographic or cultural variables tested, including age, gender, education, as well as differences in “moral knowledge” and religiosity. Second, within the complex domain of empathy, utilitarian judgment was consistently predicted only by empathic concern, an emotional component of empathic responding. In particular, participants who consistently delivered utilitarian responses for both personal and impersonal dilemmas showed significantly reduced empathic concern, relative to participants who delivered non-utilitarian responses for one or both dilemmas. By contrast, participants who consistently delivered non-utilitarian responses on both dilemmas did not score especially high on empathic concern or any other aspect of empathic responding.”

In case you were wondering, the difference hasn’t got anything to do with a difference in the ability to ‘see things from the other guy’s point of view’: “the current study demonstrates that utilitarian responders may be as capable at perspective taking as non-utilitarian responders. As such, utilitarian moral judgment appears to be specifically associated with a diminished affective reactivity to the emotions of others (empathic concern) that is independent of one’s ability for perspective taking”.

On a small sidenote, I’m not really sure I get the authors at all – one of the questions they ask in the paper’s last part is whether ‘utilitarians are simply antisocial?’ This is such a stupid way to frame this I don’t even know how to begin to respond; I mean, utilitarians make better decisions that save more lives, and that’s consistent with them being antisocial? I should think the ‘social’ thing to do would be to save as many lives as possible. Dead people aren’t very social, and when your actions cause more people to die they also decrease the scope for future social interaction.

v. Lastly, some Khan Academy videos:

(Relevant links: Compliance, Preload).

(This one may be very hard to understand if you haven’t covered this stuff before, but I figured I might as well post it here. If you don’t know e.g. what myosin and actin is you probably won’t get much out of this video. If you don’t watch it, this part of what’s covered is probably the most important part to take away from it.)

It’s been a long time since I checked out the Brit Cruise information theory playlist, and I was happy to learn that he’s updated it and added some more stuff. I like the way he combines historical stuff with a ‘how does it actually work, and how did people realize that’s how it works’ approach – learning how people figured out stuff is to me sometimes just as fascinating as learning what they figured out:

(Relevant wikipedia links: Leyden jar, Electrostatic generator, Semaphore line. Cruise’ play with the cat and the amber may look funny, but there’s a point to it: “The Greek word for amber is ηλεκτρον (“elektron”) and is the origin of the word “electricity”.” – from the first link).

(Relevant wikipedia links: Galvanometer, Morse code)

April 14, 2013 Posted by | Cardiology, Computer science, Cryptography, Econometrics, Khan Academy, Medicine, Neurology, Papers, Physics, Random stuff, Statistics | Leave a comment

Stuff

i. Temporal view of the costs and benefits of self-deception, by Chance, Nortona, Ginob, and Ariely. The abstract:

“Researchers have documented many cases in which individuals rationalize their regrettable actions. Four experiments examine situations in which people go beyond merely explaining away their misconduct to actively deceiving themselves. We find that those who exploit opportunities to cheat on tests are likely to engage in self-deception, inferring that their elevated performance is a sign of intelligence. This short-term psychological benefit of self-deception, however, can come with longer-term costs: when predicting future performance, participants expect to perform equally well—a lack of awareness that persists even when these inflated expectations prove costly. We show that although people expect to cheat, they do not foresee self-deception, and that factors that reinforce the benefits of cheating enhance self-deception. More broadly, the findings of these experiments offer evidence that debates about the relative costs and benefits of self-deception are informed by adopting a temporal view that assesses the cumulative impact of self-deception over time.”

A bit more from the paper:

“People often rationalize their questionable behavior in an effort to maintain a positive view of themselves. We show that, beyond merely sweeping transgressions under the psychological rug, people can use the positive outcomes resulting from negative behavior to enhance their opinions of themselves—a mistake that can prove costly in the long run. We capture this form of self-deception in a series of laboratory experiments in which we give some people the opportunity to perform well on an initial test by allowing them access to the answers. We then examine whether the participants accurately attribute their inflated scores to having seen the answers, or whether they deceive themselves into believing that their high scores reflect new-found intelligence, and therefore expect to perform similarly well on future tests without the answer key.

Previous theorists have modeled self-deception after interpersonal deception, proposing that self-deception—one part of the self deceiving another part of the self—evolved in the service of deceiving others, since a lie can be harder to detect if the liar believes it to be true (1, 2). This interpersonal account reflects the calculated nature of lying; the liar is assumed to balance the immediate advantages of deceit against the risk of subsequent exposure. For example, people frequently lie in matchmaking contexts by exaggerating their own physical attributes, and though such deception might initially prove beneficial in convincing an attractive prospect to meet for coffee, the ensuing disenchantment during that rendezvous demonstrates the risks (3, 4). Thus, the benefits of deceiving others (e.g., getting a date, getting a job) often accrue in the short term, and the costs of deception (e.g., rejection, punishment) accrue over time.

The relative costs and benefits of self-deception, however, are less clear, and have spurred a theoretical debate across disciplines (5–10). […]

As we had expected, social recognition exacerbated self-deception: those who were commended for their answers-aided performance were even more likely to inflate their beliefs about their subsequent performance. The fact that social recognition, which so often accompanies self-deception in the real world, enhances self-deception has troubling implications for the prevalence and magnitude of self-deception in everyday life.”

ii. Nonverbal Communication, by Albert Mehrabian. Some time ago I decided that I wanted to know more about this stuff, but I haven’t really gotten around to it until now. It’s old stuff, but it’s quite interesting. Some quotes:

“The work of Condon and Ogston (1966, 1967) has dealt with the synchronous relations of a speaker’s verbal cues to his own and his addressee’s nonverbal behaviors. One implication of their work is the existence of a kind of coactive regulation of communicator-addressee behaviors which is an intrinsic part of social interaction and which is certainly not exhausted through a consideration of speech alone. Kendon (1967a) recognized these and other functions that are also served by implicit behaviors, particularly eye contact. He noted that looking at another person helps in getting information about how that person is behaving (that is, to monitor), in regulating the initiation and termination of speech, and in conveying emotionality or intimacy. With regard to the regulatory function, Kendon’s (1967a) findings showed that when the speaker and his listener are baout to change roles, the speaker looks in the direction of his listener as he stops talking, and his listener in turn looks away as he starts speaking. Further, when speech is fluent, the speaker looks more in the direction of his listener than when his speech is disrupted with errors and hesitations. Looking away during these awkward moments implies recognition by the speaker that he has less to say, and is demanding less attention from his listener. It also provides the speaker with some relief to organize his thoughts.

The concept of regulation has also been studied by Scheflen (1964, 1965). According to him, a communicator may use changes in posture, eye contact, or position to indicate that (1) he is about to make a new point, (2) he is assuming an attitude relative to several points being made by himself or his addresse, or (3) he wishes to temporarily remove himself from the communication situation, as would be the case if he were to select a great distance from the addressee or begin to turn his back on him. There are many interesting aspects of this regulative function of nonverbal cues that have been dealt with only informally. […]

One of the first attempts for a more general characterization of the referents of implicit behavior and, therefore, possibly of the behaviors themselves, was made by Schlosberg (1954). He suggested a three-dimensional framework involving pleasantness-unpleasantness, sleep-tension, and attention-rejection. Any feeling could be assigned a value on each of these three dimensions, and different feelings would correspond to different points in this three-dimensional space. This shift away from the study of isolated feelings and their corresponding nonverbal cues and toward a characterization of the general referents of nonverbal behavior on a limited set of dimensions was seen as beneficial. It was hoped that it could aid in the identification of large classes of interrelated nonverbal behaviors.

Recent factor-analytic work by Williams and Sundene (1965) and Osgood (1966) provided further impetus for characterizing the referents of implicit behavior in terms of a limited set of dimensions. Williams and Sundene (1965) found that facial, vocal, or facial-vocal cues can be categorized primarily in terms of three orthogonal factors: general evalution, social control, and activity.

For facial expression of emotion, Osgood (1966) suggested the following dimensions as primary referents: pleasantness (joy and glee versus dread and anxiety), control (annoyance, disgust, contempt, scorn, and loathing versus dismay, bewilderment, surprise, amazement, and excitement), and activation (sullen anger, rage, disgust, scorn, and loathing versus despair, pity, dreamy sadness, boredom, quiet pleasure, complacency, and adoration). […]

Scheflen (1964, 1965, 1966) provided detailed observations of an informal quality on the significance of postures and positions in interpersonal situations. Along similar lines, Kendon (1967a) and Exline and his colleagues explored the many-faceted significance of eye contact with, or observation of, another […] These investigations consistently found, among same-sexed pairs of communicators, that females generally had more eye contact with each other than did males; also, members of both sexes had less eye contact with one another when the interaction between them was aversive […] In generally positive exchanges, males had a tendency to decrease their eye contact over a period of time, whereas females tended to increase it (Exline and Winters, 1965). […]

extensive data provided by Kendon (1967a) showed that observation of another person duing a social exchange varied from about 30 per cent of 70 per cent, and that corresponding figures for eye contact ranged from 10 per cent to 40 per cent. […]

Physical proximity, touching, eye contact, a forward lean rather than a reclining position, and an orientation of the torso toward rather than away from an addressee have all been found to communicate a more positive attitude toward him. A second set of cues that indicates postural relaxation includes asymmetrical placement of the limbs, a sideways lean and/or reclining position by the seated communicator, and specific relaxation measures of the hands or neck. This second set of cues relates primarily to status differences between the communicator and his addressee: there is more relaxation with an addressee of lower status, and less relaxation with one of higher status. […]

In sum, the findings from studies of posture and position and subtle variations in verbal statements […] show that immediacy cues primarily denote evaluation, and postural relaxation ues denote status or potency in a relationship. It is interesting to note a weaker effect: less relaxation of one’s posture also conveys a more positive attitude toward another. One way to interpret this overlap of the referential significance of less relaxation and more immediacy in communicating a more positive feeling is in terms of the implied positive connotations of higher status in our culture. A respectful attitude (that is, when one conveys that the other is of higher status) does indeed have implied positive connotations. Therefore it is not surprising that the communication of respect and of positive attitude exhibits some similarity in the nonverbal cues that they require. However, whereas the communication of liking is more heavily weighted by variations in immediacy, that of respect is weighted more by variations in relaxation.”

I should probably note here that whereas it makes a lot of sense to be skeptical of some of the reported findings in the book, simply to get an awareness of some of the key variables and some proposed dynamics may actually be helpful. I don’t know how deficient I am in these areas because I haven’t really given body language and similar stuff much thought; I assume most people haven’t/don’t, but I may be mistaken.

iii. A friend let me know about this ressource and I thought I should share it here. It’s a collection of free online courses/lectures provided by Yale University.

iv. Prevalence, Heritability, and Prospective Risk Factors for Anorexia Nervosa. It’s a pretty neat setup: “During a 4-year period ending in 2002, all living, contactable, interviewable, and consenting twins in the Swedish Twin Registry (N = 31 406) born between January 1, 1935, and December 31, 1958, underwent screening for a range of disorders, including AN. Information collected systematically in 1972 to 1973, before the onset of AN, was used to examine prospective risk factors for AN.”

Results  The overall prevalence of AN was 1.20% and 0.29% for female and male participants, respectively. The prevalence of AN in both sexes was greater among those born after 1945. Individuals with lifetime AN reported lower body mass index, greater physical activity, and better health satisfaction than those without lifetime AN. […]

[…]

This study represents, to our knowledge, the largest twin study conducted to date of individuals with rigorously diagnosed AN. Our results confirm and extend the findings of previous studies on prevalence, risk factors, and heritability.

Consistent with several studies, the lifetime prevalence of AN identified by all sources was 1.20% in female participants and 0.29% in male participants, reflecting the typically observed disproportionate sex ratio. Similarly, our data show a clear increase in prevalence of DSM-IV AN (broadly and narrowly defined) with historical time in Swedish twins. The increase was apparent for both sexes. Hoek and van Hoeken3 also reported a consistent increase in prevalence, with a leveling out of the trajectory around the 1970s. Future studies in younger STR participants will allow verification of this observation.

Several observed differences between individuals with and without AN were expected, ie, more frequent endorsement of symptoms of eating disorders. Other differences are noteworthy. Consistent with previous observations, individuals with lifetime AN reported lower BMIs at the time of interview than did individuals with no history of AN. Although this could be partially accounted for by the presence of currently symptomatic individuals in the sample, our results remained unchanged when we excluded individuals likely to have current AN (ie, current BMI, ≤17.5). Previous studies have shown that, even after recovery, individuals with a history of AN have a low BMI.59 Although perhaps obvious, a history of AN appears to offer protection against becoming overweight. The protective effect also holds for obesity (BMI, ≥30), although there were too few individuals in the sample with histories of AN who had become obese for meaningful analyses. Despite the obvious nature of this observation, the mechanism whereby protection against overweight is afforded is not immediately clear. Those with a history of AN reported greater current exercise and a perception of being in better physical health. One possible interpretation of this pattern of findings is that individuals with a history of AN continue to display subthreshold symptoms of AN (ie, excessive exercise and caloric restriction) that contribute to their low BMIs. Alternatively, symptoms that were pathologic during acute phases of AN, such as excessive exercise and decreased caloric intake, may resolve over time into healthy behaviors, such as consistent exercise patterns and a healthful diet, that result in better weight control and self-rated health.

Regardless of which of these hypotheses is true, another intriguing difference is that individuals with lifetime AN report a lower age at highest BMI, although the magnitude of the highest lifetime BMI does not differ in those with and without a history of AN. Those with AN report their highest lifetime BMIs early in their fourth decade of life on average, whereas those without AN report their highest BMIs in the middle of their fifth decade of life (close to the age at interview). On a population level, adults tend to gain on average 2.25 kg (5 lb) per decade until reaching their eighth decade of life.60 Although more detailed data are necessary to make definitive statements about different weight trajectories, our results suggest not only that individuals with AN may maintain low BMIs but also that they may not follow the typical adult weight gain trajectories. These data are particularly intriguing in light of recent reports of AN being associated with reduced risk of certain cancers61 – 62 and protective against mortality due to diseases of the circulatory system.63 – 64 Energy intake is closely related to fat intake and obesity, both of which have also been related to cancer development65 – 66 and both of which are reduced in AN. Further detailed studies of the weight trajectories and health of individuals with histories of AN are required to explicate the nature and magnitude of these intriguing findings.

Of the variables assessed in 1972 to 1973, neuroticism emerged as the only significant prospective predictor of AN. This is notable because there have been few truly prospective risk factor studies of AN.”

v. The music is a bit much for me towards the end, but this is just an awesome video. I think I’d really have liked to know that guy:

vi. Political Sorting in Social Relationships: Evidence from an Online Dating Community, by Huber and Malhotra.

I found these data surprising (and I’m skeptical about the latter finding):

“Among paid content, online dating is the third largest driver of Internet traffic behind music and games (Jupiter Research 2011).A substantial number of marriages also result from interactions started online. For instance, a Harris Interactive study conducted in 2007 found that 2% of U.S. marriages could be traced back to relationships formed on eHarmony.com, a single online dating site (Bialik 2009).”

Anyway I’ll just post some data/results below and leave out the discussion (click to view tables in full size). Note that there are a lot of significant results here:

The last few figures are also interesting (people really care about that black/white thing when they date (online)…). but you can go have a look for yourself. As I’ve already mentioned there are a lot of significant results – they had a huge number of data to work with (170,413 men and 132,081 women).

vii. John Nash on Cryptography.

November 16, 2012 Posted by | Books, Cryptography, Data, dating, education, Papers, Psychology, Random stuff | Leave a comment

A few notes on Singh’s The Code Book

It seems that nine out of ten readers don’t read/like my book posts, so I probably will try to hold back on those in the future or at least put a bit less effort into them. But I thought I’d just post a quick note here anyway:

I spent part of yesterday and a big chunk of today reading Simon Singh’s The Code Book. I generally liked the book – if you liked Fermat’s last Theorem, you’ll probably like this book too. I didn’t think much of the last two chapters, but the rest of it was quite entertaining and instructive. You know you have your hands on a book that covers quite a bit of stuff when you find yourself looking up something in an archaeology textbook to check some details in a book about cryptography (the book has a brief chapter which covers the decipherment of the linear B script, among other things). Having read the book, I can’t not mention here that I blogged this some time ago – needless to say, back then I had no idea how big of a name Hellman is ‘in the cryptography business’ (this was a very big deal – in Singh’s words: “The Diffie-Hellman-Merkle key exchange scheme […] is one of the most counterintuitive discoveries in the history of science, and it forced the cryptographic establishment to rewrite the rules of encryption. […] Hellman had shattered one of the tenets of cryptography and proved that Bob and Alice did not need to meet to agree a secret key.” (p.267))

August 22, 2012 Posted by | Books, Computer science, Cryptography | Leave a comment

Wikipedia articles of interest

i. Song Dynasty. [featured, great article with lots of links)

“The Song Dynasty (Chinese: 宋朝; pinyin: Sòng Cháo; Wade-Giles: Sung Ch’ao; IPA: [sʊ̂ŋ tʂʰɑ̌ʊ̯]) was a ruling dynasty in China between 960 and 1279; it succeeded the Five Dynasties and Ten Kingdoms Period, and was followed by the Yuan Dynasty. It was the first government in world history to issue banknotes or paper money, and the first Chinese government to establish a permanent standing navy. This dynasty also saw the first known use of gunpowder, as well as first discernment of true north using a compass. […]

The population of China doubled in size during the 10th and 11th centuries. This growth came through expanded rice cultivation in central and southern China, the use of early-ripening rice from southeast and southern Asia, and the production of abundant food surpluses.[4][5] […]

The Song Dynasty was an era of administrative sophistication and complex social organization. Some of the largest cities in the world were found in China during this period (Kaifeng and Hangzhou had populations of over a million).[1][48] People enjoyed various social clubs and entertainment in the cities, and there were many schools and temples to provide the people with education and religious services.[1] The Song government supported multiple forms of social welfare programs, including the establishment of retirement homes, public clinics, and pauper‘s graveyards.[1] The Song Dynasty supported a widespread postal service that was modeled on the earlier Han Dynasty (202 BC – AD 220) postal system to provide swift communication throughout the empire.[49] The central government employed thousands of postal workers of various ranks and responsibilities to provide service for post offices and larger postal stations.[50] […]

The Song military was chiefly organized to ensure that the army could not threaten Imperial control, often at the expense of effectiveness in war. Northern Song’s Military Council operated under a Chancellor, who had no control over the imperial army. The imperial army was divided among three marshals, each independently responsible to the Emperor. Since the Emperor rarely led campaigns personally, Song forces lacked unity of command.[89] The imperial court often believed that successful generals endangered royal authority, and relieved or even executed them (notably Li Gang,[90] Yue Fei, and Han Shizhong.[91])

Although the scholar-officials viewed military soldiers as lower members in the hierarchic social order,[92] a person could gain status and prestige in society by becoming a high ranking military officer with a record of victorious battles.[93] At its height, the Song military had one million soldiers[22] divided into platoons of 50 troops, companies made of two platoons, and one battalion composed of 500 soldiers.[94][95] Crossbowmen were separated from the regular infantry and placed in their own units as they were prized combatants, providing effective missile fire against cavalry charges.[95] The government was eager to sponsor new crossbow designs that could shoot at longer ranges, while crossbowmen were also valuable when employed as long-range snipers.[96] Song cavalry employed a slew of different weapons, including halberds, swords, bows, spears, and ‘fire lances‘ that discharged a gunpowder blast of flame and shrapnel.[97]

Military strategy and military training were treated as science that could be studied and perfected; soldiers were tested in their skills of using weaponry and in their athletic ability.[98] The troops were trained to follow signal standards to advance at the waving of banners and to halt at the sound of bells and drums.[95] […]

The economy of the Song Dynasty was one of the most prosperous and advanced economies in the medieval world. […] The Song economy was stable enough to produce over a hundred million kilograms (over two hundred million pounds) of iron product a year.[133] […] The annual output of minted copper currency in 1085 alone reached roughly six billion coins.[4] The most notable advancement in the Song economy was the establishment of the world’s first government issued paper-printed money, known as Jiaozi […] The size of the workforce employed in paper money factories was large; it was recorded in 1175 that the factory at Hangzhou employed more than a thousand workers a day.[135] […]

The innovation of movable type printing was made by the artisan Bi Sheng (990–1051), first described by the scientist and statesman Shen Kuo in his Dream Pool Essays of 1088.[179][180] The collection of Bi Sheng’s original clay-fired typeface was passed on to one of Shen Kuo’s nephews, and was carefully preserved.[180][181] Movable type enhanced the already widespread use of woodblock methods of printing thousands of documents and volumes of written literature, consumed eagerly by an increasingly literate public. The advancement of printing had a deep impact on education and the scholar-official class, since more books could be made faster while mass-produced, printed books were cheaper in comparison to laborious handwritten copies.[67][71] The enhancement of widespread printing and print culture in the Song period was thus a direct catalyst in the rise of social mobility and expansion of the educated class of scholar elites, the latter which expanded dramatically in size from the 11th to 13th centuries.[67][182]

ii. Tidal flexing.

“Tidal acceleration is an effect of the tidal forces between an orbiting natural satellite (e.g. the Moon), and the primary planet that it orbits (e.g. the Earth). The acceleration is usually negative, as it causes a gradual slowing and recession of a satellite in a prograde orbit away from the primary, and a corresponding slowdown of the primary’s rotation. The process eventually leads to tidal locking of first the smaller, and later the larger body. The Earth-Moon system is the best studied case.

The similar process of tidal deceleration occurs for satellites that have an orbital period that is shorter than the primary’s rotational period, or that orbit in a retrograde direction. […]

Because the Moon‘s mass is a considerable fraction of that of the Earth (about 1:81), the two bodies can be regarded as a double planet system, rather than as a planet with a satellite. The plane of the Moon’s orbit around the Earth lies close to the plane of the Earth’s orbit around the Sun (the ecliptic), rather than in the plane perpendicular to the axis of rotation of the Earth (the equator) as is usually the case with planetary satellites. The mass of the Moon is sufficiently large, and it is sufficiently close, to raise tides in the matter of the Earth. In particular, the water of the oceans bulges out along both ends of an axis passing through the centers of the Earth and Moon. The average tidal bulge closely follows the Moon in its orbit, and the Earth rotates under this tidal bulge in just over a day. However, the rotation drags the position of the tidal bulge ahead of the position directly under the Moon. As a consequence, there exists a substantial amount of mass in the bulge that is offset from the line through the centers of the Earth and Moon. Because of this offset, a portion of the gravitational pull between Earth’s tidal bulges and the Moon is perpendicular to the Earth-Moon line, i.e. there exists a torque between the Earth and the Moon. This boosts the Moon in its orbit, and decelerates the rotation of the Earth.

As a result of this process, the mean solar day, which is nominally 86400 seconds long, is actually getting longer when measured in SI seconds with stable atomic clocks. (The SI second, when adopted, was already a little shorter than the current value of the second of mean solar time.[9]) The small difference accumulates every day, which leads to an increasing difference between our clock time (Universal Time) on the one hand, and Atomic Time and Ephemeris Time on the other hand: see ΔT. This makes it necessary to insert a leap second at irregular intervals. […]

Tidal acceleration is one of the few examples in the dynamics of the Solar System of a so-called secular perturbation of an orbit, i.e. a perturbation that continuously increases with time and is not periodic. Up to a high order of approximation, mutual gravitational perturbations between major or minor planets only cause periodic variations in their orbits, that is, parameters oscillate between maximum and minimum values. The tidal effect gives rise to a quadratic term in the equations, which leads to unbounded growth. In the mathematical theories of the planetary orbits that form the basis of ephemerides, quadratic and higher order secular terms do occur, but these are mostly Taylor expansions of very long time periodic terms. The reason that tidal effects are different is that unlike distant gravitational perturbations, friction is an essential part of tidal acceleration, and leads to permanent loss of energy from the dynamical system in the form of heat.”

iii. Error function. Somewhat technical, but interesting (the article has a lot more):

“In mathematics, the error function (also called the Gauss error function) is a special function (non-elementary) of sigmoid shape which occurs in probability, statistics and partial differential equations. It is defined as:[1][2]

\operatorname{erf}(x) = \frac{2}{\sqrt{\pi}}\int_{0}^x e^{-t^2} dt.

(When x is negative, the integral is interpreted as the negative of the integral from x to zero.) […]

The error function is used in measurement theory (using probability and statistics), and although its use in other branches of mathematics has nothing to do with the characterization of measurement errors, the name has stuck.

The error function is related to the cumulative distribution \Phi, the integral of the standard normal distribution (the “bell curve”), by[2]

\Phi (x) = \frac{1}{2}+ \frac{1}{2} \operatorname{erf} \left(x/ \sqrt{2}\right)

The error function, evaluated at  \frac{x}{\sigma \sqrt{2}}  for positive x values, gives the probability that a measurement, under the influence of normally distributed errors with standard deviation \sigma, has a distance less than x from the mean value.[3] This function is used in statistics to predict behavior of any sample with respect to the population mean. This usage is similar to the Q-function, which in fact can be written in terms of the error function.”

iv. Lake Vostok

Lake Vostok (Russian: озеро Восток, lit. “Lake East”) is the largest of more than 140 sub-glacial lakes and was recently drilled into by Russian scientists. The overlying ice provides a continuous paleoclimatic record of 400,000 years, although the lake water itself may have been isolated for 15[3][4] to 25 million years.[5]

Lake Vostok is located at the southern Pole of Cold, beneath Russia‘s Vostok Station under the surface of the central East Antarctic Ice Sheet, which is at 3,488 metres (11,444 ft) above mean sea level. The surface of this fresh water lake is approximately 4,000 m (13,100 ft) under the surface of the ice, which places it at approximately 500 m (1,600 ft) below sea level. Measuring 250 km (160 mi) long by 50 km (30 mi) wide at its widest point, and covering an area of 15,690 km2 (6,060 sq mi), it is similar in area to Lake Ontario, but with over three times the volume. The average depth is 344 m (1,129 ft). It has an estimated volume of 5,400 km3 (1,300 cu mi).[2] The lake is divided into two deep basins by a ridge. The liquid water over the ridge is about 200 m (700 ft), compared to roughly 400 m (1,300 ft) deep in the northern basin and 800 m (2,600 ft) deep in the southern. […]

The coldest temperature ever observed on Earth, −89 °C (−128 °F), was recorded at Vostok Station on 21 July 1983.[3] The average water temperature is calculated to be around −3 °C (27 °F); it remains liquid below the normal freezing point because of high pressure from the weight of the ice above it.[30] Geothermal heat from the Earth’s interior may warm the bottom of the lake.[31][32][33] The ice sheet itself insulates the lake from cold temperatures on the surface. […]

The lake is under complete darkness, under 350 atmospheres (5143 psi) of pressure and expected to be rich in oxygen, so there is speculation that any organisms inhabiting the lake could have evolved in a manner unique to this environment.[19][36] These adaptations to an oxygen-rich environment might include high concentrations of protective oxidative enzymes.

Living Hydrogenophilus thermoluteolus micro-organisms have been found in Lake Vostok’s deep ice core drillings; they are an extant surface-dwelling species.[35][40] This suggests the presence of a deep biosphere utilizing a geothermal system of the bedrock encircling the subglacial lake. There is optimism that microbial life in the lake may be possible despite high pressure, constant cold, low nutrient input, potentially high oxygen concentration and an absence of sunlight.[35][41][42]

Jupiter‘s moon Europa and Saturn‘s moon Enceladus may also harbor lakes or oceans below a thick crust of ice. Any confirmation of life in Lake Vostok could strengthen the prospect for the presence of life on icy moons.[35][43]

v. Nicosia

Nicosia (/ˌnɪkəˈsə/ NIK-ə-SEE), known locally as Lefkosia (Greek: Λευκωσία, Turkish: Lefkoşa), is the capital and largest city in Cyprus, as well as its main business center.[2] After the collapse of the Berlin Wall, Nicosia remained the only divided capital in the world,[3] with the southern and the northern portions divided by a Green Line.[4] It is located near the center of the island, on the banks of the Pedieos River.

Nicosia is the capital and seat of government of the Republic of Cyprus. The northern part of the city functions as the capital of the self-proclaimed Turkish Republic of Northern Cyprus, a disputed breakaway region whose independence is recognized only by Turkey, and which the rest of the international community considers as occupied territory of the Republic of Cyprus since the Turkish Invasionin 1974. […]

The Turkish invasion, the continuous occupation of Cyprus as well as the self-declaration of independence of the TRNC have been condemned by several United Nations Resolutions adopted by the General Assembly and the Security Council. The Security Council is reaffirming their condemnation every year.[40]

vi. Perennial plant

“A perennial plant or simply perennial (Latin per, “through”, annus, “year”) is a plant that lives for more than two years.[1] The term is often used to differentiate a plant from shorter lived annuals and biennials. The term is sometimes misused by commercial gardeners or horticulturalists to describe only herbaceous perennials. More correctly, woody plants like shrubs and trees are also perennials.

Perennials, especially small flowering plants, grow and bloom over the spring and summer and then die back every autumn and winter, then return in the spring from their root-stock, in addition to seeding themselves as an annual plant does. These are known as herbaceous perennials. However, depending on the rigors of local climate, a plant that is a perennial in its native habitat, or in a milder garden, may be treated by a gardener as an annual and planted out every year, from seed, from cuttings or from divisions. […]

Although most of humanity is fed by seeds from annual grain crops, perennial crops provide numerous benefits.[3] Perennial plants often have deep, extensive root systems which can hold soil to prevent erosion, capture dissolved nitrogen before it can contaminate ground and surface water, and outcompete weeds (reducing the need for herbicides). These potential benefits of perennials have resulted in new attempts to increase the seed yield of perennial species,[4] which could result in the creation of new perennial grain crops.[5] Some examples of new perennial crops being developed are perennial rice and intermediate wheatgrass.”

vii. Anaconda Plan.

“The Anaconda Plan or Scott’s Great Snake is the name widely applied to an outline strategy for subduing the seceding states in the American Civil War. Proposed by General-in-Chief Winfield Scott, the plan emphasized the blockade of the Southern ports, and called for an advance down the Mississippi River to cut the South in two. Because the blockade would be rather passive, it was widely derided by the vociferous faction who wanted a more vigorous prosecution of the war, and who likened it to the coils of an anaconda suffocating its victim. The snake image caught on, giving the proposal its popular name.”

viii. Caesar cipher (featured).

“In cryptography, a Caesar cipher, also known as Caesar’s cipher, the shift cipher, Caesar’s code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a shift of 3, A would be replaced by D, B would become E, and so on. The method is named after Julius Caesar, who used it in his private correspondence.

The encryption step performed by a Caesar cipher is often incorporated as part of more complex schemes, such as the Vigenère cipher, and still has modern application in the ROT13 system. As with all single alphabet substitution ciphers, the Caesar cipher is easily broken and in modern practice offers essentially no communication security.”

If you don’t really know much about cryptography but would like a quick and accessible introduction to the subject matter, I recommend Brit Cruise’ videos on the subject at Khan Academy.

ix. Water purification. From the article:

“It is not possible to tell whether water is of an appropriate quality by visual examination. Simple procedures such as boiling or the use of a household activated carbon filter are not sufficient for treating all the possible contaminants that may be present in water from an unknown source. Even natural spring water – considered safe for all practical purposes in the 19th century – must now be tested before determining what kind of treatment, if any, is needed. Chemical and microbiological analysis, while expensive, are the only way to obtain the information necessary for deciding on the appropriate method of purification.

According to a 2007 World Health Organization (WHO) report, 1.1 billion people lack access to an improved drinking water supply, 88 percent of the 4 billion annual cases of diarrheal disease are attributed to unsafe water and inadequate sanitation and hygiene, and 1.8 million people die from diarrheal diseases each year. The WHO estimates that 94 percent of these diarrheal cases are preventable through modifications to the environment, including access to safe water.[1] Simple techniques for treating water at home, such as chlorination, filters, and solar disinfection, and storing it in safe containers could save a huge number of lives each year.[2] Reducing deaths from waterborne diseases is a major public health goal in developing countries.”

Here’s a related paper on ‘Global Distribution of Outbreaks of Water-Associated Infectious Diseases‘ which I’ve previously blogged here.

June 6, 2012 Posted by | Astronomy, Biology, Botany, Cryptography, Geography, History, Infectious disease, Mathematics, Medicine, Microbiology, Physics, Wikipedia | Leave a comment

(More) random stuff

Can’t let the blog die so I sort of have to at least post something from time to time. So here goes…

1. Global sex ratios:

At birth: 1.07 male(s)/female
Under 15 years: 1.07 male(s)/female
15-64 years: 1.02 male(s)/female
65 years and over: 0.79 male(s)/female
Total population: 1.01 male(s)/female (2011 est.)

Link. Here’s an image of child sex ratios in India (via brownpundits:

Here’s one for the whole population, image credit: Wikipedia (much larger version at the link):

I’ve from time to time read about the Chinese gender ratio problem, I didn’t know there were much going on on that score in India. The clustering of gender ratio frequencies seems in my opinion sufficiently non-random to merit some explanation or other, especially when it comes to the northern provinces (Punjab, Haryana & Kashmir). Here’s a pic dealing with more countries:

Link.

2. Gambler’s ruin. I remember having read about this before, but you forget that kind of stuff over time so worth rehashing. I think the version of the idea I’ve seen before is the first of the four in the article; ‘a gambler who raises his bet to a fixed fraction of bankroll when he wins, but does not reduce it when he loses, will eventually go broke, even if he has a positive expected value on each bet.’ I assume all readers of this blog already know about the Gambler’s fallacy but in case one or two of you don’t already do click the link (and go here afterwards, lots of good stuff at that link and I shall quote from it below as well) – that one is likely far more important in terms of ‘useful stuff to know’ because we’re so prone to committing this error; basically the important thing to note there is that random and independent events are actually random and independent.

A couple of statistics quotes from the tvtropes link:

“The Science Of Discworld books have an arguably accurate but somewhat twisted take on statistics: the chances of anything at all happening are so remote that it doesn’t make sense to be surprised at specific unlikely things.”

“There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” (Mark Twain. Maybe it’s more of a science quote really – or perhaps a ‘science’ quote?)

“People (especially TV or movie characters who are against the idea of marriage) often like to cite the “50 percent of marriages end in divorce” statistic as the reason they won’t risk getting hitched. That is actually a misleading statistic as it seems to imply that half of all people who get married will wind up divorced. What it doesn’t take into account is the fact that a single person could be married and divorced more than once in a single lifetime. Thus the number of marriages will exceed the number of people and skew the statistics. The likelihood that any one person chosen at random will be divorced during their lifetime is closer to 35 percent (the rate fluctuates wildly for males, females, educated and uneducated populations). It’s still a huge chunk of people, but not as high a failure rate for marriage for an individual as the oft-cited “50 percent of all marriages” statistic would leave you to believe.” (comment after this: “How can you give that setup and not deliver the punchline. “But the other half end in death!””)

A mathematics quote:

“Black Mage: 2 + 2 = 4
Fighter: You can’t transform numbers into other numbers like that. It’d just go on forever. That’s like Witchcraft! ”

3. Messier 87. Interesting stuff, ‘good article’, lots of links.

4. Substitution cipher. I’d guess most people think of codes and codebreaking within this context:

“In cryptography, a substitution cipher is a method of encryption by which units of plaintext are replaced with ciphertext according to a regular system; the “units” may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution.

Substitution ciphers can be compared with transposition ciphers. In a transposition cipher, the units of the plaintext are rearranged in a different and usually quite complex order, but the units themselves are left unchanged. By contrast, in a substitution cipher, the units of the plaintext are retained in the same sequence in the ciphertext, but the units themselves are altered.

There are a number of different types of substitution cipher. If the cipher operates on single letters, it is termed a simple substitution cipher; a cipher that operates on larger groups of letters is termed polygraphic. A monoalphabetic cipher uses fixed substitution over the entire message, whereas a polyalphabetic cipher uses a number of substitutions at different times in the message, where a unit from the plaintext is mapped to one of several possibilities in the ciphertext and vice-versa.”

The one-time-pad stuff related is quite fascinating; that encryption mechanism is literally proven unbreakable if applied correctly (it has other shortcomings though..).

5. Evolution may explain why baby comes early.

“there’s only so much a human female pelvis can increase in terms of width before serious functional problems in locomotion make change in that direction unfeasible. […] If the pelvis was prevented from getting any wider due to biomechanics, and a large adult brain was a necessary condition of high fitness value for humans, then one had to accelerate the timing of childbirth so that the neonate exited while the cranium was manageable in circumference.”

Interesting stuff.

6. Random walk. The article actually has some stuff related to the previous remarks on gambler’s ruin.

April 18, 2011 Posted by | Astronomy, Cryptography, Data, Demographics, Evolutionary biology, marriage, Mathematics, Random stuff, Statistics, Wikipedia | 2 Comments