Econstudentlog

Some data

I spent a bit of time on Statistikbanken, a site run by Statistics Denmark which gives you access to a lot of neat Danish data. Below a table I made from (SKI5), one of the databases; click to view full size:

Divorce 1

The variable to the left is a marriage duration indicator at the time of measurement – note that the years at the top (1980, 1990,…) are not the years where the marriages were formed, but rather the years of measurement – and they’re looking back in time and implicitly include marriages which were dissolved decades ago. So if you take the year 1980 for example, back then 21 % of marriages which had been going on (/…would have been going on…) for 10 years had been dissolved through divorce, whereas 36 % of marriages which had been going on for 30 years had ended in divorce. When I last looked at this stuff, I didn’t include these particular numbers and I got curious (plus I was bored).

Here’s what happens if you zoom in on the first 10 years of marriage:

Divorce 2

The bolded ones are the cohorts with the highest divorce rate for that specific marriage duration. Interestingly, although the 2012 numbers are generally a bit smaller than the rest the 1990 numbers are in most cases marginally higher than the 1980 numbers; some constant, ‘rule-based’ (monotonous?) development in divorce risk over time is hard to identify when you demand it be consistent with the information provided in the two tables above. That said, the numbers are actually in my opinion very similar all things considered – I’d assume that if you could compare these cohorts with earlier cohorts, you’d see more dramatic differences.

Okay, what about cars, busses and so on? How many of those are there in Denmark? This is the kind of question children ask, but when you become an adult most people stop asking these questions. I (childishly..) had a look, here are the numbers for the entire country (Statistikbanken, BIL707):

Cars

Despite population growth there’s been a decrease in the number of Danish busses, vans, and lorries during the last six years – the number of lorries has dropped 15%, and the number of vans dropped by roughly 10 %.

Here are the numbers for Region Hovedstaden, the area around Copenhagen. With 1.7 million people, this area makes up almost a third of the Danish population:

Cars2

Whereas the population share of the region is around ~30%, the 2013 share of car-owners is ~27% – quite close to the national average. This really surprised me; I’d have assumed the number of car-owners was smaller than this, and that people relied more on public transportation; but the proportion of all Danish busses committed to this region is actually around ~30% (28,7), close to the population share of the region. I’d have expected the numbers to look different; that a biggish proportion of all Danish busses were committed to this region and that the number of car-owners was lower.

Incidentally there’s roughly one bus per 400 people in Denmark.

How many people are actually caught violating the national gun laws (‘weapons laws’ – the laws also regulate the use of other weapons such as knives and explosives; e.g. in Denmark it’s illegal to carry a knife with a blade longer than 7 centimeters on you, and until last year a violation of that law would lead to a mandatory one week prison sentence in the absence of exceptional extenuating circumstances)? I didn’t know and so I got curious. I looked at the data included in STRAF11, and it turns out that there were 6808 violations of the weapons law in Denmark in 2007 (before the knife law mentioned above was introduced in 2008), and 6517 in 2012. This is close to 18 people per day over the course of the year.

Computer and internet? How many families own a computer and/or have internet access at home? Unfortunately there are some missing data problems here, but here’s what they got (VARFORBR):

Computer and internet
As you’d expect internet lags computers a bit but there seems to have been convergence over time, and by now only a small minority do not have a computer at home. The above data is not, however, all the stuff they have when it comes to internet usage. I looked around and I found the DIS129 dataset, which deals with active internet subscriptions in Denmark. A funny thing is that if you compare the numbers you get from the two datasets, the numbers don’t really add up; internet penetrance is significantly lower if you base your conclusions on the register data from DIS129 than if you use VARFORBR, which is survey based (actually it’s clear from the description that the DIS129 dataset is also partly survey based, but it’s also made clear that the specific data I use here (there’s a lot of data in that dataset) are from the register-based part of the dataset).

I combined the DIS129 data – limiting myself to private (non-corporate) subscriptions and corporate internet subscriptions used by private individuals as well (i.e. ‘purely corporate’ internet subscriptions were excluded from the sample) – with the household data from FAM55N (we don’t care about internet subscriptions as such, we care about penetrance/adoption rates) to construct a variable indicating the proportion of households with active internet subscriptions. The DIS129 data has a data point for each six months; I decided I didn’t like that very much and so I averaged the data out in order to report only one data-point for each year – results are given below, first the ‘raw’ (averaged) subscription numbers, then the household data, and lastly the proportion of households with active internet subscriptions:

Active internet users

Households

Internet subscribers
Maybe I should have included the word ‘estimated’ in front of ‘proportion’ in the title above, but all we have are estimates anyway, so…  Do note that the x-axes are not identical for the figures based on the VARFORBR and the DIS129 data – unsurprisingly the growth rate was much higher in the 90es than it has been later on; what you want to compare is the last graph above and the part of the VARFORBR graph for which the two x-axes match each other. It’s obvious that the VARFORBR numbers are significantly higher than the DIS129 numbers. In case you were wondering why I don’t compare similar time periods; I figured the development in the 90es was interesting (most adoption took place in the 90es), however the register data didn’t go back further than 2003. If it had I’d have included the data, but I didn’t think it made a lot of sense to exclude the data from the 90es from the VARFORBR data set just because corresponding figures didn’t exist in the DIS129 data set.

Purely corporate subscriptions make up roughly 10 percent of the market share, so not excluding those when calculating adoption rates may lead to a significant overestimate of household internet use. I believe I’ve seen higher adoption rates than the ones derived from the DIS129 data set reported in the media before, but I also believe these estimates have all been based on surveys by Statistics Denmark – so presumably they’re derived from the VARFORBR data set or the source material of this data set. Note that if you’re basing your estimate on the DIS129 sample then you could probably argue that the numbers provided are overestimates of the actual penetrance rates; some households may have more than one active internet subscription, and this arrangement is presumably more common than is the one where different households share the same internet connection. On the other hand they note in the documentation that the registers, despite being very comprehensive, may not be complete and that some relevant data here may be missing from the registers.

Basing our analysis on the register data provided, in the second half of 2011 there were 1.94 million active internet subscriptions used by private individuals, and there were 2,58 million households. I think that I consider the data from DIS129 to be more reliable than the data from VARFORBR; register data is usually better than survey data, although measurement error is always a potential problem. I also think an overestimate of the adoption rate resulting from the use of survey data, which is likely here given the discrepancy, is more plausible from a theoretical point of view than would be an underestimate; people participating in surveys are more likely to say that they have an internet connection even though they don’t than they are to say that they don’t have an internet connection even though they do. I also believe that this bias is likely to increase in people’s estimates of the ‘true’ penetrance rate; when you think everybody else have internet access you become less likely to admit that you don’t if you don’t. But there are multiple ways to explain the gap – for now perhaps the important point is that there is a gap, and that this should be kept in mind the next time the media talks about the results of the latest survey they’ve conducted (people rarely talk about the results of the latest register update…).

June 12, 2013 - Posted by | data, demographics, statistics

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: