The old and new Guttenberg

Johannes Gutenberg is well know for being the guy that invented the printed press, thanks for that mate. At the time this was a huge thing. Before Johannes revolutionary method that enabled the production of books in large scale, these were a luxury item. Suddenly information was distributed in an economical way and easily accessed by the masses, however with some production and distribution costs associated. With the advent of the internet the production and distribution costs were dramatically reduced. Only one caveat remained. The reading experience in the digital medium was, until recently, not as good as the traditional physical book. Fortunately we got Gutenberg 2.0 also known as Electronic Ink technology, in the form of an ebook reader. An e-reader is a digital device which instead of a traditional light emitting screen is equipped with a electronic ink screen. The reading experience is finally something close to a traditional book (there are, however, many people who disagrees so take this information with a grain of salt). To be more accurate these devices resemble more with a library than with a book. The fact is that most of them hold several dozen Gb of internal memory storage that can hold several hundreds, thousands of books. So now you don't carry just your favourite book but instead all your collection.

The differences

The distinctions don't end up here. Aside from the reading activity for which the book is used there are also other side features. You usually also take notes and highlight relevant information. This was done traditionally with the use of a pen and a piece of paper. Now your highlights and annotations are done in the device and the old piece of paper is now a database in which your activity is recorded as meta-information.

The differences don't end up here, however. Actually a staggering amount of meta-information is being recorded and sent for market analysis. And here is where the similarities between the old and the new world break apart. This is a bold claim. And as expected extraordinary claims require extraordinary evidence.

I have an ereader, a Kobo Ereader. Kobo is was my choice because unlike amazon kindle they have a very open attitude. They have their kobo OS on github and they don't obfuscate nor hide the inner workings of the device. So you can, as I did, inspect and analyse the meta information being gathered and pushed into their servers. And, boy this is an interesting exercise.

The kobo ereader database

When connected to your laptop. Kobo ereader will mount a local folder with the directory .kobo. Inside you'll have an sqlite database with meta information recorded.

sqlite> .tables  
AbTest                 OverDriveCards         WordList  
Achievement            OverDriveCheckoutBook  content  
Activity               OverDriveLibrary       content_keys  
AnalyticsEvents        Reviews                content_settings  
Authors                Rules                  ratings  
BookAuthors            Shelf                  shortcover_page  
Bookmark               ShelfContent           user  
DbVersion              SubscriptionProducts   volume_shortcovers  
Dictionary             SyncQueue              volume_tabs  
DropboxItem            Tab  
Event                  Wishlist  

These are the tables used by kobo to store all your meta information. From those there are a few of manifest interest. As expected your activity meta information is stored mainly in the Activity and AnalyticsEvents tables.

.schema Activity
CREATE TABLE Activity (  
    Id      TEXT,
    Enabled BIT default TRUE,
    Type    TEXT,
    Action  INTEGER,
    Date    TEXT,
    Data    BLOB,
    PRIMARY KEY(Id, Type)
CREATE INDEX activity_id_index ON Activity (Id);  
sqlite> .schema AnalyticsEvents  
CREATE TABLE AnalyticsEvents(  
    Id TEXT, 
    Type TEXT, 
    Timestamp TEXT, 
    Attributes TEXT, 
    Metrics TEXT, 
    TestGroups TEXT, ClientApplicationVersion TEXT, Mandatory BIT DEFAULT FALSE, 
CREATE INDEX analytics_events_timestamp ON AnalyticsEvents (Timestamp);  

Both tables hold a very interesting column. The type column. To understand a little bit more about these two tables I decided to dig a little bit more and query for the different values for this column.

select distinct type from Activity

'Library, WhatsNew, CategoryFTE, Help, QuickTour, Sync, RecentBook, RelatedItems, Bookstore, TopPicksTab, NewReleases, Top50, ArticlesAdded, RecentFTEBook, RecentPocketArticle, Recommendations, Extras, SideLoaded'  
select distinct type from AnalyticsEvents

'WifiSettings, StatusBarOption, BatteryLevelAtSync, WifiToggle, FilterSelected, AccessLibrary, AdobeErrorEncountered, OpenContent, CreateHighlight, LeaveContent, OpenReadingSettingsMenu, MainNavOption, Books, DictionaryLookup, BookProgress, ReadingSandboxUsed, FinishedReadingBook, LibraryTabSelected, StartReadingBook, PocketSectionOverflowMenu, AddToCollection, BrightnessAdjusted, AmbientLightSensorToggled, Search, TableOfContentsOpened, PluggedIn'  

Want to know when was the last time (since remote synchronization, from the data observed it seems these two tables are truncated everytime the information is uploaded to the server) I open the bookstore on my kobo? Easy

select * from Activity as a where a."Type"="Bookstore";  

Don't know what action 2 really means but it is clear that we can find this kind of activity.

Things become interesting when I looked at the SideLoaded Activity column type. Sideloading is a term used mostly on the Internet, similar to "upload" and "download", but in reference to the process of transferring files between two local devices, in particular between a computer and a mobile device such as a mobile phone, smartphone, PDA, tablet, portable media player or e-reader. So if semantics hold we should have interesting data here.

select * from Activity as a where a."Type"="SideLoaded";  
"file:///mnt/onboard/david gerard-attack of the 50 foot blockchain_ bitcoin, blockchain, ethereum & smart contracts-smashwords (2017).epub",true,SideLoaded,2,"2018-03-24T00:53:34"
file:///mnt/onboard/randall munroe-what if__ serious scientific answers to absurd hypothetical questions-houghton mifflin harcourt (2014).epub,true,SideLoaded,2,"2018-03-24T00:54:31"  
file:///mnt/onboard/max tegmark-our mathematical universe_ my quest for the ultimate nature of reality-knopf (2014).epub,true,SideLoaded,2,"2018-03-24T00:55:10"

Wow. Apparently Kobo knows what books I have on my ereader, when did I add them and the method. What about this for right of privacy? Don't quick jump into accusations, and don't misunderstand me. Kobo is actually the most transparent company they do this knowing how easy it is to get this information and nevertheless they don't try to hide. They should be praised being this open. At least with kobo we know what they are collecting. Disclaimer apart this is scary, right? Well this don't ends here.

Now lets look into the second table AnalyticsEvents. What a fancy name, what they mean by Analytics Events, well we'll soon find it.

"select * from AnalyticsEvents as a where a.Type=\"WifiSettings\"": [
        "Id" : "08eb6209-d4ad-48a5-80ec-2e78818714c0",
        "Type" : "WifiSettings",
        "Timestamp" : "2019-10-27T16:02:27Z",
        "Attributes" : "{}",
        "Metrics" : "{}",
        "TestGroups" : "{\"001f6e06-9025-4a2b-af51-9abb5084ce48\": 2,\"06ede8e3-0dce-4f91-a943-a7cc3c8d84a8\": 1,\"09f49520-edfc-43c4-9b58-2855ccbc0a16\": 2,\"6aa564c8-1571-4f9f-84e7-6fa411abca91\": 2,\"7b1a118d-0504-4a9d-bf21-75968b710f70\": 2,\"c93fbbf6-ec13-43a6-a922-583ff35e567f\": 2}",
        "ClientApplicationVersion" : "4.18.13737",
        "Mandatory" : "false"

So a few dozen rows with information regarding my wifi connectivity. Nothing very critical, lets proceed with another query

select * from AnalyticsEvents as a where a.Type="StatusBarOption";  
StatusBarOption,"2019-10-28T23:23:56Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-10-28T23:24:44Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-10-28T23:25:13Z","{""Action"": ""Wifi""}"  
StatusBarOption,"2019-10-29T08:30:41Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-10-29T12:09:25Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-10-29T12:13:40Z","{""Action"": ""Wifi""}"  
StatusBarOption,"2019-10-29T12:13:41Z","{""Action"": ""Light""}"  
StatusBarOption,"2019-10-29T21:12:29Z","{""Action"": ""Search""}"  
StatusBarOption,"2019-10-29T21:12:31Z","{""Action"": ""Wifi""}"  
StatusBarOption,"2019-10-29T21:12:32Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-10-29T23:38:05Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-11-01T00:57:36Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-11-01T08:34:16Z","{""Action"": ""Battery""}"  
StatusBarOption,"2019-11-01T10:00:36Z","{""Action"": ""Battery""}"  

Jesus fuck. They know exactly when and what my activity was regarding my StatusBar. Kobo interface has a status bar which enables you to check on the Wifi status and Battery status as well. Now you know when!

Could be worst right? They could know for instance the exact amount of battery when synchronized with your laptop. Oh wait...

select * from AnalyticsEvents as a where a.Type="BatteryLevelAtSync";  
"2019-10-27T16:02:29Z",{},"{""Battery"": 91}"
"2019-10-28T20:41:36Z",{},"{""Battery"": 83}"

Well this is only battery information not so critical. What about my wifi connection. Do they store any kind of information?

select * from AnalyticsEvents as a where a.Type="WifiToggle";  
WifiToggle,"2019-10-27T16:02:30Z","{""WifiState"": ""Off""}"  
WifiToggle,"2019-10-28T20:40:59Z","{""WifiState"": ""On""}"  
WifiToggle,"2019-10-28T22:09:34Z","{""WifiState"": ""Off""}"

Yep. They also know when i toggled my wifi state. Well actually its worse they know if I turned it on or off.

select * from AnalyticsEvents as a where a.Type="PluggedIn";  

The PluggedIn event type has more meta information, in this case the amount of time between battery charges, and the current state between them.

PluggedIn,"2019-11-01T08:05:02Z",{},"{""BatteryLevel"": 37,""TimeSinceLastPlugIn"": 428949}"  
PluggedIn,"2019-11-01T10:00:50Z",{},"{""BatteryLevel"": 99,""TimeSinceLastPlugIn"": 6947}"  
PluggedIn,"2019-11-01T10:00:54Z",{},"{""BatteryLevel"": 99,""TimeSinceLastPlugIn"": 4}"  
PluggedIn,"2019-11-01T10:00:54Z",{},"{""BatteryLevel"": 99,""TimeSinceLastPlugIn"": 0}"

Want more good stuff. What about state regarding the time when finish my reading session? You want you get it.
You have detailed information regarding the amount of time spent per session and the number of pages read. The information is so much and so detailed that I'm embarrassed to confess that kobo reader knows more about me than my mom or even me.

select * from AnalyticsEvents as a where a.Type="LeaveContent";  
LeaveContent,"2019-10-27T16:13:51Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""30"",""title"": ""Practical Monitoring""}"  
LeaveContent,"2019-10-27T19:56:55Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""31"",""title"": ""Practical Monitoring""}"  
LeaveContent,"2019-10-27T20:42:52Z","{""ContentFormat"": ""application/pdf"",""Monetization"": ""Sideloaded"",""author"": ""John Arundel & Justin Domingus"",""progress"": ""63"",""title"": ""Cloud Native DevOps with Kubernetes""}"  
LeaveContent,"2019-10-27T20:52:53Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""33"",""title"": ""Practical Monitoring""}"

And by the way you got the same for when you open content

select * from AnalyticsEvents as a where a.Type="OpenContent";  
OpenContent,"2019-10-27T16:03:25Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""ViewType"": ""MyBooks"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""29"",""title"": ""Practical Monitoring""}"  
OpenContent,"2019-10-27T19:47:31Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""ViewType"": ""Sleep"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""30"",""title"": ""Practical Monitoring""}"  
OpenContent,"2019-10-27T20:04:54Z","{""ContentFormat"": ""application/pdf"",""Monetization"": ""Sideloaded"",""ViewType"": ""MyBooks"",""author"": ""John Arundel & Justin Domingus"",""progress"": ""56"",""title"": ""Cloud Native DevOps with Kubernetes""}"  
OpenContent,"2019-10-27T20:43:10Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""ViewType"": ""MyBooks"",""author"": ""Mike Julian"",""isbn"": ""urn:uuid:4c2d19c4-43f9-42b5-b459-e25373feae00"",""progress"": ""31"",""title"": ""Practical Monitoring""}"

In this case you know my current progress and also what book I open at which date. But lets proceed. What about the books I start reading?

select * from AnalyticsEvents as a where a.Type="StartReadingBook";  
StartReadingBook,"2019-10-28T23:16:21Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""author"": ""Alex Petrov"",""isbn"": ""calibre:55"",""progress"": ""0"",""title"": ""Database Internals""}"  
StartReadingBook,"2019-10-29T14:47:26Z","{""ContentFormat"": ""application/epub+zip"",""Monetization"": ""Sideloaded"",""author"": ""Kaufman, Josh"",""isbn"": ""7f117b41-1e39-43a7-95b4-908911aee56a"",""progress"": ""0"",""title"": ""The Personal MBA""}"

The most funny event is the brightness event

select * from AnalyticsEvents as a where a.Type="BrightnessAdjusted";  
"2019-10-29T12:13:45Z","{""Method"": ""MenuTapped""}","{""NewBrightness"": 71,""OldBrightness"": 0}"
"2019-10-29T12:14:10Z","{""Method"": ""MenuTapped""}","{""NewBrightness"": 0,""OldBrightness"": 71}"

Now what?

I think by now the amount of evidence is enough to convince you that there are some differences between a traditional book and the modern digital medium. Don't get me wrong, I have nothing against this. When I used this kind of equipment I already assume they will gather all kind of information. The problem is that most of the people is completely oblivious of this. And if this knowledge can be used for good it also has the power to be used for very nasty purposes, that go against the interest of the paying user of the device.

The reality is that many people is not ok with this kidnapping of personal information. Is this behaviour that justifies technologies that live in the other side of the spectrum. Personal VPNs, Tor, web tracking filters and much more tools are created not to be used as a criminal tool but as a way to combat the excessive collection of personal data which can be ultimately used for our interest but also against us. Hope this article help to understand the privacy issues we currently face and make you sensible for the potencial danger that come from it.

Blog Logo





Thoughts, stories and ideas.

Back to Overview