Author Archives: Frederick Hirsch

Insecurity in Depth

If I put a fence with a hole in it in front of a broken wall in front of a partly filled in moat, is my castle secure?

The answer is ‘No’.

On the other hand if the defects are not immediately visible and not lined up with each other, then having these three layers could stop some attackers completely, while others may need time to find the flaw in each. Thus it could require more time and effort on the part of an attacker.

If everyone in the village knows about the flaws, then there might as well not be any barriers. If every weekend they walk through the various openings to have a picnic on the castle grounds, then all know that these barriers are not meaningful, at least to those who are informed.

It is interesting that Defense in Depth was supposedly conceived by the NSA, or at least documented by them, the masters of penetrating systems. To be honest, security in depth has its place, since one of the rationales is that attackers may come from different points in the system, so different security measures may be needed to address different aspects of the overall concern. As the NSA notes, an understanding of the system, adversaries, risks etc is required. Thus “security in depth” has a place as part of a broader understanding but is not functional merely as a mantra.

Security in Depth is mentioned repeatedly in the OPM oversight hearing, an interesting view for both the questions and the answers or lack of answers. Mention of security in depth is usually followed by a statement that there is no security silver bullet (other than security in depth).

There is an alternative to security by depth which is security through simplicity.

Take the case of the OPM, where it is speculated that security clearance background check forms (form SF-86) were taken, each having a wealth of personal information about an individual and their contacts. Security technologies failed to prevent the breach or even detect it while it was in progress (while the OPM is not disclosing details, apparently there were first breaches of the contractors working for OPM, then at least two subsequent breaches. Information on one later breach was loaded into Einstein, an intrusion detection and analysis system , which then flagged a previously unknown earlier breach).

Rather than piling up all these questionable and complex technologies wouldn’t it have been simpler and safer to document and follow a single governance rule:

“All clearance forms and their related documentation, including backups, will be
immediately and completely destroyed following the decision whether to grant clearance on the basis of those forms.”

The principle here is that the information is collected to make a decision, so once the decision is made, get rid of the information. The only reason to keep the information is in the event that a mistaken decision was made, to go back and look for indications that could have indicated the mistake. Is the ability to go back worth the time, costs and risks of keeping the information? It seems not.

During the OPM hearings the question of priorities came up, with the theme of “Isn’t security your #1 priority, so why did you let this happen?”. There was no clear statement of the obvious, which might have been ‘No, security was not the only priority. The priority was the running of operational support systems for other functions, with security as an aspect of that.’

So if those in charge are not willing to destroy the records once a decision is made, what would be the next best alternative? Probably to keep those records on a machine without internet/network access in a locked room. This would raise the cost of adding or reviewing records. By why should they be online once a decision is made?

All of this leads to the question of whether the costs and risks of (in)security in depth are the primary concerns in this case when a policy decision to ‘Eliminate records that have served their purpose’ might have sufficed.

Technology mechanisms and the speed of deployment might not have been the core problem, but rather governance decisions.

Taking Time

Many things in nature take time.

There is a certain time frame (typically 9 months) to have a baby. You cannot accelerate the process. If honey hardens you can soften it in warm water but it takes time, boiling water won’t make it faster – it will just ruin the honey. You do not accelerate the process so that you have teenagers the week after they are born, it takes time to get there.

Why in business do we limit ourselves to very short time horizons, when some things take time?

The answer is presumably that we do not know the outcome, so want to manage the risk and limit the cost. With a pregnancy you pretty much can expect it will take 9 months, there is past experience.

That said, there are companies that are doing extremely well since they do take a long view, ignoring the chatter around them. We know who they are.

Annotating the Internet of Things: Annotate Automation

As sensors proliferate the quantity and variety of data will become overwhelming and difficult to process efficiently and effectively. Information of value will be derived from multiple sensors, possibility of different types and with different creators and software. This situation will call for standardization for interoperability yet also require standardization that can scale and not require centralized management.

Adoption will also require simplicity and usability. Conceptually, providing an extensible annotation mechanism will allow arbitrary information to be associated with sensor information, shared and used in an open and extensible manner. The key idea is that rather than requiring every single sensor schema/data structure to define additional data formats or extension mechanisms, annotations can be added in a uniform manner at any time at any point in the processing flow with definitions of what the annotations are deferred until needed and used.

In some ways the Internet of Things may be like “web services” were envisioned earlier (e.g. WS*, SOAP etc): there will be multiple sources of information that will send messages that may be aggregated and correlated by intermediaries that may go on to be sources for further recipients. Ultimately there will be a sink to provide information to a “user”, although this may be a software application (not depicted in the diagram below). Annotations could serve as a interoperable means to associate semantic information. The following diagram [1] might represent this, though it was for a paper specifically about a specific sports monitoring application.

Network of sensors

One can envision a world where sensors use annotations to provide meta-data associated with sensor data, such as providing calibration, sensitivity, location and environmental readings relevant to the core data of the sensor. One can also imagine web intermediaries adding annotations (e.g. weather data to associate with basic readings of a thermostat) or humans or others later adding annotations. All of this additional information can aid with the correlation and processing of the data. There are also numerous practical applications of annotations beyond sensors.

The web community has already defined widely-used core mechanisms such as HTTP and REST APIs, JSON, HTML5, etc that can be used to form a stack for sensor sharing. One aspect of this stack will be the need to share meaning of data, so that it can be combined and used by applications that offer more value. The semantic web community has worked for years to create a strong, flexible well-defined model including many needed aspects such as a powerful triple model and use of URLs for type definitions. Semantic web adoption has occurred behind the scenes but has not been very user-visible since some technologies are verbose and complicated (RDF) and some discussions tend to be obscure (e.g. debates about ontology theory). None of this takes away from the fact that there is a well-defined and tested infrastructure that has been analyzed and well developed.

The definition of a simple approach to associate JSON named values with triples (or quads) in the semantic web model through the inclusion of simple JSON definition files can be considered a breakthrough. This JSON-LD approach hides the entire semantic web mechanism from web developers and does not require the apparent explicit use of RDF, yet enables the full power of the semantic web to be used behind the scenes when information is processed, without burdening information creators (or creating large data traffic to represent the information). The following hides a richer model behind an easy to use syntax [2]:

<script type=”application/ld+json”>
{

“@context”: “http://schema.org”,
“@type”: “Restaurant”,
“name”: “Fondue for Fun and Fantasy”,
“description”: “Fantastic and fun for all your cheesy occasions”,
“openingHours”: “Mo,Tu,We,Th,Fr,Sa,Su 11:30-23:00”,
“telephone”: “+155501003333”,
“menu”: “http://example.com/menu”

}
</script>

This means that the semantic layer in the diagram above need not mean hard to understand and thus hard to adopt syntax, nor need it mean excessive syntax or message sizes creating a barrier to deployment.

The W3C Annotation Community Group has already created an Open Annotation Data Model that leverages the semantic web model to enable representation of a wide variety of annotation use cases, whether the annotation is of text, audio, video, raw data or what have you, while also enabling a wide variety of annotations on those targets. This flexible model is not required to use JSON-LD but I believe that JSON-LD will pave a way toward rapid adoption.

The W3C Web Annotation Working Group will produce standards to address these needs. This work is not from scratch but building on the previous work of the community group. As outlined in the Annotation Working Group charter, the deliverables will include key components needed to make annotations useful:

  1. Abstract Data Model: An abstract data model for annotations
  2. Vocabulary: A precise vocabulary describing/defining the data model
  3. Serializations: one or more serialization formats of the abstract data model, such as JSON/JSON-LD or HTML
  4. HTTP API: An API specification to create, edit, access, search, manage, and otherwise manipulate annotations through HTTP
  5. Client-side API: A script interface and events to ease the creation of annotation systems in a browser, a reading system, or a JavaScript plugin
  6. Robust Link Anchoring: One or more mechanisms to determine a selected range of text or portion of media that may serve as a target for an annotation within, in a predictable and interoperable manner, with allowance for some degree of document changes; these mechanisms must work in HTML5, and must provide an extension point for additional media types and formats

Take a look at the Web Annotation Architecture diagram.

Creating a layer in the stack to enable collecting and combining sensor information in a meaningful way will require the means to associate additional information with sensor data (data model and vocabulary), a means to share the model in a concrete representation (serialization), and application interfaces (APIs).

Sensors are only one area where open annotations will have value. There are many use cases, including annotations of e-books, web pages, audio, video, maps, portions of data sets and many more. Annotations are fundamental to human interaction and as I suggest in this blog also to automated systems such as those using sensors.

To learn more about the basis for the W3C Annotation WG effort see the W3C Workshop on Annotations report and the materials from the I Annotate conference.


[1] Combining Wireless Sensor Networks and Semantic Middleware for an Internet of Things-Based Sportsman/Woman Monitoring Application, Jesús Rodríguez-Molina,* José-Fernán Martínez, Pedro Castillejo, and Lourdes López in Sensors 2013. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649371/

[2] See What is JSON-LD? A Talk with Gregg Kellogg by Aaron Bradley on September 10, 2014 and schema.org examples for location pages.

The Problem with Defaults

Recently I was using a map application on my phone, an application that lets gives turn by turn driving directions and works offline without a network connection. It works very well, but provided a lesson in defaults.

Typically I use the application as it comes “out of the box”, preferring highway travel as it is typically faster and simpler. Being in California Silicon Valley I decided after one agonizing drive on 87 and 101 during rush hour that maybe back roads would be better, so I changed a preference to disable highway travel. To my delight I discovered that back roads were much preferable, especially on short trips from San Jose to Mountain View, for example. Why bother sitting on 101 if you do not need to.

Everything was fine for a few days until I decided to leave California, and drive back to SFO (the San Francisco airport, for you non-frequent travelers who haven’t memorized a wide variety of airport codes). Guess what, I started driving and soon realized I was getting a local sightseeing tour of San Jose, and had a pretty good idea I was not aimed at the highway entrance leading to 101. I definitely wanted to use 101 for that drive (or so I thought; has traffic really gotten that bad in the Valley or did I just hit a bad day?) I pulled off the road, changed the preference, and then turned off the confused device since I had neither the time nor patience to wait while it sorted itself out. I made my own decisions on how to get to 87/101 and the problem was solved.

There are two lessons here. First, it is easy to forget about preferences (that is the idea and why they are “defaults” after all). Second, recovery might require some “out of band” effort, like giving up on the tool, making a manual (human, dare I suggest) correction.

My navigation experience was not a problem because I was somewhat familiar with the area, not really relying on the device after a few days and could just “punt”. If I had really needed I could have driven around a while until the device (hopefully) oriented itself.

I’m not sure what would happen in a case where a preference is related to privacy, but I suspect that I would not be able to recover as the personal data would already have been deposited in a giant “big data” store somewhere, ready to be sold, shared and used without my control or knowledge. Thus, if I choose to set a default to remember my decision to grant access (to location, address book, camera, microphone etc) forgetting this decision might be more serious. Although I do not use many such apps now, someday I might (1). If I forget the default, seeing an indicator in the chrome probably won’t help, as ads are training me to ignore every pane except the text in the pane I care about (2).

So let us say I mistakenly forget my privacy settings and realize it later. Is there a manual, human way to recover? Ideally I would go the the record on my device of which databases the apps shared information, follow the links and request the data to be removed, which it would be. That would be nice, but I suspect not so likely.

Thus perhaps a more significant change might be needed if user privacy matters. The best story I’ve heard is that the new currency is your information, and thus it should be marked appropriately, shared conservatively, and we should all participate in the monetization. Obviously this will require some work, but seems very interesting. Privacy will be a byproduct of the monetization, not the end in itself.

Asides:

(1) Perhaps someone can explain to me why so many Lumia apps seem to require knowing my location to be installed. For example, why does a battery level app need to know my location? I can only assume it is not for me, the end-user, but for ad delivery.

(2) On Safari in Reader mode the browser removes the non-interesting material, a feature that is very useful, and on Firefox I can suppress ads with an extension, but many times I find myself in a raw, ad-splattered browser window when I forget to take special action.