Category Archives: Standards

Annotating the Internet of Things: Annotate Automation

As sensors proliferate the quantity and variety of data will become overwhelming and difficult to process efficiently and effectively. Information of value will be derived from multiple sensors, possibility of different types and with different creators and software. This situation will call for standardization for interoperability yet also require standardization that can scale and not require centralized management.

Adoption will also require simplicity and usability. Conceptually, providing an extensible annotation mechanism will allow arbitrary information to be associated with sensor information, shared and used in an open and extensible manner. The key idea is that rather than requiring every single sensor schema/data structure to define additional data formats or extension mechanisms, annotations can be added in a uniform manner at any time at any point in the processing flow with definitions of what the annotations are deferred until needed and used.

In some ways the Internet of Things may be like “web services” were envisioned earlier (e.g. WS*, SOAP etc): there will be multiple sources of information that will send messages that may be aggregated and correlated by intermediaries that may go on to be sources for further recipients. Ultimately there will be a sink to provide information to a “user”, although this may be a software application (not depicted in the diagram below). Annotations could serve as a interoperable means to associate semantic information. The following diagram [1] might represent this, though it was for a paper specifically about a specific sports monitoring application.

Network of sensors

One can envision a world where sensors use annotations to provide meta-data associated with sensor data, such as providing calibration, sensitivity, location and environmental readings relevant to the core data of the sensor. One can also imagine web intermediaries adding annotations (e.g. weather data to associate with basic readings of a thermostat) or humans or others later adding annotations. All of this additional information can aid with the correlation and processing of the data. There are also numerous practical applications of annotations beyond sensors.

The web community has already defined widely-used core mechanisms such as HTTP and REST APIs, JSON, HTML5, etc that can be used to form a stack for sensor sharing. One aspect of this stack will be the need to share meaning of data, so that it can be combined and used by applications that offer more value. The semantic web community has worked for years to create a strong, flexible well-defined model including many needed aspects such as a powerful triple model and use of URLs for type definitions. Semantic web adoption has occurred behind the scenes but has not been very user-visible since some technologies are verbose and complicated (RDF) and some discussions tend to be obscure (e.g. debates about ontology theory). None of this takes away from the fact that there is a well-defined and tested infrastructure that has been analyzed and well developed.

The definition of a simple approach to associate JSON named values with triples (or quads) in the semantic web model through the inclusion of simple JSON definition files can be considered a breakthrough. This JSON-LD approach hides the entire semantic web mechanism from web developers and does not require the apparent explicit use of RDF, yet enables the full power of the semantic web to be used behind the scenes when information is processed, without burdening information creators (or creating large data traffic to represent the information). The following hides a richer model behind an easy to use syntax [2]:

<script type=”application/ld+json”>
{

“@context”: “http://schema.org”,
“@type”: “Restaurant”,
“name”: “Fondue for Fun and Fantasy”,
“description”: “Fantastic and fun for all your cheesy occasions”,
“openingHours”: “Mo,Tu,We,Th,Fr,Sa,Su 11:30-23:00”,
“telephone”: “+155501003333”,
“menu”: “http://example.com/menu”

}
</script>

This means that the semantic layer in the diagram above need not mean hard to understand and thus hard to adopt syntax, nor need it mean excessive syntax or message sizes creating a barrier to deployment.

The W3C Annotation Community Group has already created an Open Annotation Data Model that leverages the semantic web model to enable representation of a wide variety of annotation use cases, whether the annotation is of text, audio, video, raw data or what have you, while also enabling a wide variety of annotations on those targets. This flexible model is not required to use JSON-LD but I believe that JSON-LD will pave a way toward rapid adoption.

The W3C Web Annotation Working Group will produce standards to address these needs. This work is not from scratch but building on the previous work of the community group. As outlined in the Annotation Working Group charter, the deliverables will include key components needed to make annotations useful:

  1. Abstract Data Model: An abstract data model for annotations
  2. Vocabulary: A precise vocabulary describing/defining the data model
  3. Serializations: one or more serialization formats of the abstract data model, such as JSON/JSON-LD or HTML
  4. HTTP API: An API specification to create, edit, access, search, manage, and otherwise manipulate annotations through HTTP
  5. Client-side API: A script interface and events to ease the creation of annotation systems in a browser, a reading system, or a JavaScript plugin
  6. Robust Link Anchoring: One or more mechanisms to determine a selected range of text or portion of media that may serve as a target for an annotation within, in a predictable and interoperable manner, with allowance for some degree of document changes; these mechanisms must work in HTML5, and must provide an extension point for additional media types and formats

Take a look at the Web Annotation Architecture diagram.

Creating a layer in the stack to enable collecting and combining sensor information in a meaningful way will require the means to associate additional information with sensor data (data model and vocabulary), a means to share the model in a concrete representation (serialization), and application interfaces (APIs).

Sensors are only one area where open annotations will have value. There are many use cases, including annotations of e-books, web pages, audio, video, maps, portions of data sets and many more. Annotations are fundamental to human interaction and as I suggest in this blog also to automated systems such as those using sensors.

To learn more about the basis for the W3C Annotation WG effort see the W3C Workshop on Annotations report and the materials from the I Annotate conference.


[1] Combining Wireless Sensor Networks and Semantic Middleware for an Internet of Things-Based Sportsman/Woman Monitoring Application, Jesús Rodríguez-Molina,* José-Fernán Martínez, Pedro Castillejo, and Lourdes López in Sensors 2013. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649371/

[2] See What is JSON-LD? A Talk with Gregg Kellogg by Aaron Bradley on September 10, 2014 and schema.org examples for location pages.

Chasing threats – a security (and privacy) symptom

Reactively responding to security threats is like a never-ending session of “whack-a-mole”. It will keep everyone busy but probably never end and does not scale with the complexity of the web, its applications and context. Responding to threats is important but we need a longer term solution to the underlying problem.

A house on top of a mishmash of bricks is not very stable
A shaky foundation (Source)

With the advent of the Web and the reshaping of entire industries to base themselves on the Open Web Platform we are all becoming more dependent on the underlying Trust associated with this platform. Fixing numerous flaws in security and privacy related to ambiguities or errors related to deployment, implementation and design of numerous technologies and their interactions is a huge task that could outlast the businesses and individuals that are relying on the technology. What makes this especially hard is that the complexity of both individual technologies and the composition of those technologies offer many creative avenues for attack (not to mention the impact of Moore’s law to reduce the efficacy of older cryptographic algorithms). To give one example, HTML5 is generally taken to mean the composition of many specifications, such as HTML5, CSS, JavaScript, and a variety of web APIs.

Ultimately what is needed is accountability as noted by Professor Hal Abelson of MIT (slides, PDF). What is also needed are systematic approaches to the underlying issues. For the most part this currently consists of best practices for code development (e.g. validate inputs), for operating system design (e.g. sandbox applications) and deployments (e.g. enforce password strength rules). One issue with this is that everyone is busy meeting time to market constraints and focused on “getting the job done” which typically is the visible functionality, not security. It takes a lot of discipline to build in security, and even so time with the degradation of algorithms and attacks based on complexity remain, creating a long term cost issue. Security and Privacy by Design are worthy approaches toward incorporating concern for these issues into the entire process, but are easier said than done.

A mole is popping out of its hole.
A mole looking out of the hole… (Source)

Creating standards to enable interoperability is a lot of work, even when the standards are based on previous development experience. Just as code is modularized, so are standards, enabling writing, reviewing and interop testing in a reasonable time frame. This also allows the work to scale as different people work on different standards. This also creates issues as not all assumptions are documented or shared, or as new ideas and approaches appear later in the process (an example might be Promises for example). Some work is also abandoned for a variety of reasons, and this can be good as the community learns. The net result is that there can be inconsistencies among specifications in basic approaches (e.g. to the API interface designs). All of these groups are tasked with creating specific deliverables that specify functionality to be composed with the implementation of other specifications to create applications. This puts the application developer in charge of security and privacy, for only they understand the application, its context and end-end requirements. The designer of a component cannot speak to the privacy data re-use or retention possibilities, or key distribution approaches, for example.

This does not mean that security or privacy cannot be improved by the standardization community. They can. Notable examples include Strict Transport Security to ensure all requests for all web page resources use TLS regardless of web page links, and Cross Origin Sharing (CORS) to define a uniform approach for web browsers to enforce cross-origin web access, to enable use of resources in a web application from a site other than the source of the web application. What else can be done?

Taking an overall architectural view is helpful (see “Framework for Web Science”). The 2001 semantic web layering diagram is illuminating in that the capstone is “Trust” and that “Digital Signature” is a glue binding the parts together, showing the fundamental importance of trust based on security mechanisms (the 2006 version is also in the text showing Crypto instead of Digital Signature and other refinements but still requiring security mechanisms and proof to support trust):

Semantic web layering including trust at the top

XML Digital Signature 1.1 reached W3C Recommendation this year, demonstrating that creating the security basis is not easy (the JSON approach simplified the requirements and thus the effort but I expect fundamental issues will remain).

I offer another security-centric architectural diagram to suggest the magnitude of the size of the task of “simply providing a security foundation”:

Security functionality can be layered as well

Working through the diagram we see the following items:

  1. Entropy. The basis of most digital security (as opposed to building a physical moat around your castle) is the amount of true randomness or entropy upon which the techniques depend. If the randomness is not there, then the digital techniques fall apart. That makes this the basis, though often ignored.
  2. Key Management. A fundamental security principle is that only the key need be secret, not the algorithms etc. Thus given good entropy, the next building block is suitable keys, keeping private keys secret and so on. A lousy key won’t be of much use.
  3. Next is some means of associating keys with their purpose, discovering and using appropriate keys, and knowing they are valid. I put this as Certificate management (including revocation) and all that goes behind CA certificate issuance. I use PKI terminology but this may not be the only way to accomplish this (in fact the question appears whether X.509 should be replaced, given the ambiguities and complexity)
  4. To be useful the use of crypto algorithms depends on keys and meaningful associations (even though certs may be created using crypto functions as well)
  5. Confidentiality and integrity are fundamental security features, I add identity as an essential building block in this layer (though again obviously certs may support this functionality there may be more to it in terms of policy, access control etc)
  6. Next we get to the Open Web Platform, including a variety of APIs that may use the underlying functionality (yes, Web Crypto may also offer some of this stack in the Javascript layer as well, layers are not clean are they?)
  7. Finally we get to the Web Applications that pull it all together (or do they)?
  8. The reason for doubt is on the side: implementation quality for all items matters a great deal, as does the fact that everything must evolve over time (e.g. key and certification roll-over, algorithm agility etc)

I put trust on the other side to indicate that items must operate in an integrated manner to produce a usable result. (I also left out reputation management as another trust mechanism).

As experienced with Internet protocol layering, some functionality is replicated in different layers and we can discuss what the exact layering should be. However, it is clear that there are a large number of logical components, all of which must work correctly depending on correct design, correct implementation, and correct deployment and use. That offers a large number of opportunities for failure.

What is needed are generic high level simplifications to make trust more achievable. Strict Transport Security does that, taking a successfully deployed protocol and reducing the attack surface. CORS works toward that end at well, by slightly increasing an attack surface to enable needed functionality but in a controlled and understood manner.

It seems that we need more work to reduce the attack surface in a consistent manner, by reducing optionality and choices. It seems that one area is certification – are there too many choices and details in creating certificates and managing them? Can we reduce the choices and ambiguities?

How about Javascript APIs and WebIDL? Can more be done to unify and simplify? Is a best practices guide needed?

It seems a good time to review how much can be simplified, how many options can be removed, and how much consistency can be encouraged. Maybe the W3C TAG could work on this, for example. It seems fundamental to next steps for the Architecture of the World Wide Web.

© Frederick Hirsch 2011-2013

What should the W3C TAG do next?

I am running for election to the W3C Technical Architecture Group (TAG).

What is the TAG known for? Well, primarily the Architecture of the World Wide Web, Volume 1 which was created in 2004. This document summarized principles, constraints, and good practice notes as of that time.

Since then the TAG has reviewed architectural issues on what appears to be an ad hoc basis, producing what are known as “TAG Findings“, expert answers to specific questions.  I surmise that the visibility and usefulness of these has been less than it could be, probably since W3C working groups are not always either aware of them at all, or not sure which might be applicable to the problems they are working on, or if they’ve seen them, not sure how to apply them to the specific problem they are facing.

The TAG has also devoted much effort on work that has not reached conclusion, given the wide constituency and difficulty of the problems they have worked on. In some cases the work has stopped with “draft findings, in other cases only discussion on the mail list. A lot of good ideas and information may be lost but it is hard to tell.

The W3C is in a sea of change, including movement toward “living documents” as opposed to versioned dated documents. The idea of versioned documents was simple – a group agrees to approve a definitive version as of a date in time, making it a standard. The benefits of this approach are clear – there is a static document that can be referenced, will not change, and for which consensus can be clearly recorded. The issues are also clear: with faster development cycles static documents can become out of date more quickly making it misleading to have a static approved document as the target of links when newer corrected material is available yet not readily found.  The solution to this issue is to have continuously updated drafts, so that the latest version is the definitive version, so called “living documents”. The concern here is that the process can spin out of control, with editors adding what they will without checks and balances – “if you don’t spot it, you must approve it”could become the new mantra. We need both – a rapid cycle time as well as clarity of approval and agreement.  (The solution in progress appears to be pull requests in git accepted by those we trust).

What does this have to do with the TAG? Well the TAG is part of this sea of change as well, as reflected in the previous TAG election where a common theme was that the TAG need work less on abstractions and closer to the needs of developers and working groups. There is a desire that the TAG produce material relevant to current work on web applications (and other topics) and that this material can be easily found and used.

It seems the time has come for an new volume of the Architecture of the World Wide Web, addressing topics related to Open Web Platform applications, APIs and programming, open data, and security and privacy.  I’d argue for a new volume, as I’d rather not see history rewritten (e.g. to expunge XML, despite it’s continued use in various communities). It is usually easier to correct and improve a draft than to create a new one, so the TAG should seed the process of “documenting and creating consensus around principles of Web architecture” by creating a new Architecture of the World Wide Web volume and working with the W3C community to get it right. This continues the original TAG mission yet represents a change from the recent past in how it is done.

I suggest that focusing on addressing high priority issues of the web architectural evolution and interoperability with the focus of producing a specific document can give the TAG focus and enable  feedback, organizing “findings” and issue resolution into a specific result that can be readily shared. I can help with this, both with the writing and creation of this new work as well as collaborating with the TAG, in the Working Groups and communities (such as PING and others). What I can do, as a member of the TAG, is to get this started, organized, written down, and communicated so that the work of the TAG is visible and useful to the community and so the community can get involved to make it better.

W3C Workshop on Web Tracking and User Privacy

One topic that is getting a lot of press lately is privacy on the Internet, especially web tracking [Notes].

The W3C held a “Workshop on Web Tracking and User Privacy” on 28/29 April 2011, for which an agenda with links to presentations workshop papers and a final report are available.

This is a difficult topic since there is a need for a balance between what appears to be a legitimate need to enable advertising-based business models to support “free” content and the ability of users to protect their privacy, not losing control over their own personal data.  

Discussion at the workshop reflected the privacy needs of individuals on the web as well as support for business models driven by advertising. Technical proposals such as an HTTP do not track header and use of tracking protection lists were considered.

Ed Felton of the FTC noted five desired properties of a “Do Not Track” mechanism in his slides:







  1. Is it universal? Will it cover all trackers? 
  2. Is it usable?  Easy to find, understand and use?
  3. Is it permanent? Does opt-out get lost? 
  4. Is it effective and enforceable? Does it cover all tracking technologies? 
  5. Does it cover collection in general and not just some uses like ads? 

A significant issue noted at the workshop is that “user expectations may  not match what is implemented”. One example is that the discussion is not about “opting out of ads” but out of “tracking”, so even with opt-out, ads might still appear.  More complicated for users is that nuances might be possible such as allowing 1st party tracking but not third party tracking – yet what does this mean at the edge cases? Is a subsidiary a third party? What about outsourced work? This could be confusing for users and lead to results that are not what they expect or want. As mentioned at the workshop, the details will matter here.


Craig Wills of the Computer Science Department, Worcester Polytechnic Institute noted that first parties have a responsibility for not “leaking” privacy information to third parties by not being careful in their implementations. This is detailed in his paper.


Helen Nissbaum made an important point during the discussion. Consent is not always needed, but only when user expectations are not met (or there is a risk of not meeting user expectations, I assume). Consent is not needed every step of the way. This relates to the theme of avoiding unnecessary user interaction, avoiding meaningless dialogs and increasing usability.

Questions to ask before tracking include:






  1. Is  it necessary to collect the data
  2. Can the goal be accomplished another way, with less data


Regulations and laws should not be overly prescriptive with respect to technology details, otherwise as the technology changes they lose effect. Instead they should focus on the policy and goals. This is similar to mandating fuel efficiency in cars rather than the way it is achieved.

Apparently enabling some tracking but not all tracking, for a variety of parties, is difficult.

Workshop participants recognized the complexity and difficulties of the topic but also the need for steps to be taken in the near term. During the workshop goals were mentioned that included providing  transaction transparency, relevant information, and  meaningful user choices. It is clear that some changes may be required.
Workshop participants noted that there is much research into the economics of “do not track”.

John Morris of CDT enumerated in his slides the typical objections raised with respect to implementing mechanisms to increase user privacy and indicated how they might be addressed, for example relying on  non-technical mechanisms such as reputation, law or regulation rather than technology for enforcement. 

Given the various stakeholders and concerns, the principle of doing what is “reasonable” seems to apply here, just as in other aspects of law. 


Thus it is not surprising that there was general acceptance by workshop participants of adopting a middle-ground approach – specifically there was no objection to the proposal from CDT that includes the following definition:

“Tracking is the collection and correlation of data about the web-based activities of a particular user, computer, or device across non-commonly branded websites, for any purpose other than specifically excepted third-party ad reporting practices, narrowly scoped fraud prevention, or compliance with law enforcement requests.”

As noted in the W3C workshop report, possible next steps include the W3C chartering a general Interest Group to consider ongoing Web privacy issues and a W3C Working Group to standardize technologies and explore policy definitions of tracking.

Notes:

[1] Retargeting Ads Follow Surfers to Other Sites, August 29, 2010, New York Times

[2] How to Fix (or Kill) Web Data About You, April 13, 2011,  New York Times


[3] Tracking File Found in iPhones, April 20, 2011, New York Times

[4] Apple, Google Collect User Data, April 22, 2011, Wall Street Journal

[5] Avoiding Mobile Trackers, April 22, 2011, Wall Street Journal

[6] Facebook hit by privacy complaints, June 9, 2011 Financial Times


(edit – added paragraphs at end re CDT proposal)