Re: Example Immunoglobulin Detection Test Credential


Eric Welton (Korsimoro) <eric@...>
 


My original concern about the proposed VC involved the linking of id-proofing material with the medical data.  For example
{
  { IgG...., nameData...., humanSexualityFields...., schema.orgPersonDescriptions..... }
}
And this was replaced with the following model
{
  { IgG...., identityDocumentLinkData.....}
  { identityDocumentLinkData...., restOfIdentityData..... }
}
where identityDocumentLinkData included an "identity document type", and a field value for the externally defined "primary key like field" for that identity document type.  The example given was the highly u.s. specific "Driving License", but could easily be extended to include other identifying document, like national ids.  This however, introduces "low-value complexity" because of the need to have the medical information model include a semantical model of allowable identity-proofing documents - for example, are Saudi Arabian Iqama's or Thai Resident Laborer Alien Cards ("Pink Cards") or United States IR-visa cards acceptable?  Where is this registry maintained - because it is required so that you can look up the appropriate "field name" in the id-proofing data - either that or you start pulling even more of the id-proofing metadata into the medical information.

And what does this identity proofing data have to do with the raw medical information?  All of this complexity can be avoided by simply taking advantage of the adjacency of the credentialSubject material and the fact that there is a unifying signature over that data - as in
{
 {.... medical information }
 {.... id-proofing data, "this is a <name of document type>" }
} => signature (or hash)

This lets you focus on modeling the medical information independent of the identity-proofing information, and lets the verifier, holder, and issuer use the id-proofing information as is appropriate to the context.  It does not mandate, for example, disclosure of a Social Security Number to a medical testing lab.

We also have a need to associate multiple pieces of medical information, which are produced later - in different times and places - with the identiity information.  For example - blood collected at a medical intake facility might produce an initial record (SampleCollectionCredential) like
{
 { linkCode = <collision-resistant-pii-free-identifier> }
 {.... medical information }
 {.... id-proofing data, "this is a <name of document type>" }
} => signature
where the medical information may be nothing more than a bar-code printed on a label printer and attached to a test-tube, or may be more sophisticated - as long as it is PII free.  Location, facility, attendant and that sort of information is split across the medical information and issuer identification as appropriate - or perhaps there are other data blocks {.... facility information } or {.... weather conditions }, etc.

At a later time, the above SampleCollectionCredential is used to produce the following TestResultsCredential
{
 { .... linkCode }
 { .... second medical information }
 { .... third medical information }
} => signature

where the lab facility and processing information is again split between the issuer and the medical records - perhaps there are serial numbers of re-agent lots or testing-kid GTIN/NTIN numbers, or any bits of other data TBD.

Importantly, no PII is used to link these records - it is an arbitrary id, and not sha256(id-proofing-document-type, id-proofing-document-primary-key-field-value, 8-digit-one-time-use-passcode).  Use of a purely random collision resistant identifier, like a UUID or public-key fingerprint, avoids the problems inherent in id-proofing document meta modeling (which require that we have a central registry of allowable id-proof types and models which minimally allows us to identify the primary key field in the id-proofing data model)...

It also avoids placing the cognitive burden on the subject for remembering yet another passcode - which they should probably write down on whatever document is provided.  For example, a feasible way to maintain reference to the data is through a couple of QR codes pointing to cloud-backed storage of the information, such as
http://example.com/covid/collection/<linkCode>?password=12345678
with the later-produced lab results available at
http://example.com/covid/results/<linkCode>?password=12345678

Importantly, the password is just encoded in the QR code, but otherwise in plaintext.  If we force the user to memorize the password, we should provide a way to to record the user's password and allow them to recover it, as remembering "yet another series of random digits" is very, very difficult.  Perhaps they could "store it in their phone" as a fake phone number, or write it down on a little sticker, or we can give them a password recovery password, which itself would need a password recovery password recovery password, which itself would need a ......  Perhaps we could let them "authenticate with google" or use their facebook login to retrieve the password automatically....

The difficulty of supporting this password makes me want to step back and look at the role it plays.  The need to restrict access to information is *critical* when the medical information contains personally identifying information, such as is the case in this model
{
 { medical information...., idType=texasDriversLicense, idPrimaryKey="12345" }
 { detailed id-proofing information, idType=texasDriversLicense, texasDepartmentOfPublicSafetyNumber="12345" }
}
and, the subsequent test results are modeled as
{
 { hashOfPII_and_Password }
 { test1Results..... }
 { test2Results..... }
}
but it is less of a concern when modeled as
{
 { <collision-resistant-arbitrary-pii-free-identifier> }
 { medical information.... }
 { detailed id-proofing information, idType=texasDriversLicense, texasDepartmentOfPublicSafetyNumber="12345" }
}
and, the subsequent test results are modeled as
{
 { <collision-resistant-arbitrary-pii-free-identifier> }
 { test1Results..... }
 { test2Results..... }
}

In the latter there is no strong connection between the source and result data, but the connection is there and acceptance of it is a matter of personal taste and the reality of the sponsoring agency or health authority operating the initial point of medical contact and testing.  Subjects should not trust any random person sitting in a cardboard box under a bridge with collecting their blood and promising they'll send it for testing - rather, there is some organization making the outreach and contracting with testing labs.  It is *this* organization which is in a position to provide some level of access control, such as running the example.com website allowing subjects and policy enforcement officers to access the data later on.

In terms of "logging on to the lab website" rather than through the example.com website, is a secondary consideration which may or may not make sense.  In either case, the orchestrators of the collection process have first-line access to id-proofing material, and this material can be used to log in later - perhaps making use of any of the numerous services that provide direct verification of primary id-proofing documents.  Alternatively, sponsoring organizations are likely to require that users download an "app" for the devices that manage those users, and use a "log in with facebook" model for a web-site serving non-device managed people.

The business concerns of that sponsoring organization, or of the lab's business models should not influence the credential design - but they do benefit from maximum separation of concerns within the credential design.  This gives the most flexilibity and minimizes PII liability.

Jurisdiction also plays a huge role - at MyData 2019 i saw a slide (I am blanking on whose presentation) - but it listed three approaches to data.  The american approach where data control belongs to businesses, the chinese approach where data control belongs to the state, and the european model where data control belongs to the individual.  The organization supporting the testing and issuance, and any direct access to results via the testing lab or 3rd parties,  will operate in one of these contexts.  It is not the responsibility of the covid credential model to take a position on this - rather it must remain orthogonal to these concerns.  This means that ensuring individual control over the use of the data is *not* in scope while making any structure which *prohibits* control over the data must be avoided.

What we must maximize is the independence of data points - maximize the separation of concerns - because this and only this gives us the flexibility we need to operate on a global scale.

We also need to note that, while the initial { medical information... } can be PII free, the following test results may or may not be PII free - depending on the depth and detail of the test.  If the test is only about "antibody present/absent" - then it is not PII sensitive, but if it contained detailed analysis of a set of genetic markers such that it was effectively a DNA fingerprint, then one might argue that it was PII.  This influences the acceptability of the other components and the risks inherent in the linkability of data - and it is sensitive to your operating context.

A great deal can be achieved in mitigating the current virus distribution by supporting only low-PII rest results, and zero-PII linking keys.

Regardless of PII exposure in the testing pipeline - the verification context we seek looks like this:

1. subject approaches policy enforcement officer and presents
1.1 - QR code (digital or analog)
1.2 - original identity proofing document (digital or analog)
1.3 - self (analog)
2. policy enforcement officer....
2.1 - scans QR code and retrieves id-proofing data for <collision-resistant-arbitrary-pii-free-identifier> (performing issuer verification)
2.2 - makes judgement call as to whether id-proofing matches, using whatever tools are appropriate - this could include everything from a checkpoint guard simply looking at the photo on their screen, along with basic demographic data and then looking at the person in front of them - or it could be more sophisticated, using biometrics ranging from facial recognition to voice printing and fingerprints.
2.3 - optionally requests some indication from the subject to "release" the test results - but this is optional, and depends upon the tooling available to the policy enforcement officer and the subject and the context - this is roughly equivalent to the use of the PIN, but could be expanded (if SSI enabled) to include consent tracking (in situations where that is relevant)
2.4 - retrieves the relevant test results (performing issuer verification) and determines appropriate course of action - allowing the subject to continue, blocking the subject, or detaining the subject

Note that the subject is almost entirely passive.  The digital signatures support the identification of the issuer - and that can easily be tied in with existing PKI, where that exists, as well as using DIDs if that works - this is determined by the policy enforcement context.  Data could be presented by the subject either digitally or analog, as is appropriate to the context.  What we generally do not need is the subject to engage in negotiation, or be able to "forget a password" or otherwise hose up the policy enforcement pipeline.  Imagine how long it would take to get on the subway if everyone had to "haggle" a ticket price - it is bad enough using PIN based debit cards, which is why the typical subway access is mediated by side-loaded stored-value systems - which excludes "putting in a PIN" from the primary enforcement pipeline - this is the sensibility we need to pursue.

What is horrific about the above model is that the policy enforcement is going to a central database that has "all the records" - that is kinda what we want to avoid.  To avoid this we need to get a third credential - and ideally one that is "fit for purpose" (e.g. tailored to the policy enforcement need, and perhaps sponsored by/issued by that agency) - or, alternatively, one that is generic - in which case it should be offered by some common agency which is financially supported by the policy enforcement agencies.

Such a third party might be related to the same agency supporting the testing tents and outreach - and the same considerations about "logging in" apply.

What is important is that these tertiary credentials can effectively act like a zero-knowledge-proof or a ZCAP key.  For example - let's imagine that I wanted to screen people at an inter-state border, only allowing people to enter my state if they could prove that they had recently been tested and are negative.  What I want at the border is a very fast test with no network access - like scanning a "big QR code" - it is easy to imagine a system with this verification strategy:
1. subject approaches policy enforcement officer (or system) and presents
1.1. self (stares into camera)
1.2 qr code (holds up to camera)
2. policy enforcement officer (or system)
2.1 performs facial recognition between self and image encoded in qr code and/or bound to id-proofing document
2.2 checks that 'virus testing result' matches policy
2.3 verifies issuer credentials
2.4 makes policy decision (for example, raises gate and allows subject to pass)

This would be completely feasible - it has no network round trips so there is no essential central honeypot of test results.  Furthermore, the subject could (in jurisdictions with GDPR-like legislation) request that any central records of the first stages be deleted, meaning that the QR code (either on a sticker or in their phone or any other device that manages them) is the only record of the data - yet it was born with a chain-of-evidence and data-provenance which provides strong indication of trustworthiness.

You can also imagine that later, as this ecosystem matures, tertiary credentials could be issued almost automatically - using the ever improving systems that support online verification of the original id-proofing document, along with liveness checks and strong audio/visual biometrics - and this brings us to *SSI* and *cloud agents*.  All of the above can be improved through the use of SSI technology - be it either cloud-agents or edge-agents.  What is essential is that the process is possible without SSI technology.  SSI and private-key management must be an enhancement to this process, not a requirement.

Furthermore, SSI agents enable advanced control over the further uses of testing associated information - where agents are understood generically - perhaps they are Aires agents, or perhaps they are HIE-of-one Trustees.  Until that infrastructure is ubiquitous and business practices and policy enforcement endpoints are well poised to make use of them, we must not require it.  We are building towards a world where our solution looks like this
{
 {.... medical information }
 {.... did }
} => signature (or hash)
And where eventually we will use the service_endpoints of the DID to answer every request for id-proofing data with the question "who want's to know" and make a private-policy driven decision for information release, therefore giving the individual control over all exchanges of their health information - we just aren't there yet.  On the other hand, this is a phenomenal opportunity for bootstrapping that universe if we can find just the right way to sidestep the chicken-and-egg problem.

The only step that can be taken - at this point - is isolating, as early as possible, PII from medical information and break from traditional practices, which would use PII in place of opaque-identifiers.  This is a significant step forward - as it is extremely tempting to start out with data using structures like
{
 IgG...
 IgM...
 gender....
 firstName...
 lastName...
 address....
 mobileNumber...
 emailAddress...
}

So - I think we can make a huge step forward using the technology we have and deploying it in a way that works for DID and phone free individuals, and supports  rapid, low cost, offline deployment in front-line policy enforcement scenarios across all tiers of technical capabilities.  We can achieve these goals with this model:
Sample Collection Credential:
{
 { linkCode = <collision-resistant-pii-free-identifier> }
 {.... medical information }
 {.... id-proofing data, "this is a <name of document type>" }
} => signature
Test Results Credential:
{
 { .... linkCode }
 { .... second medical information }
 { .... third medical information }
} => signature
and derive Rapid Clearance Credentials (QR encoding)  to facilitate zero-network, minimal-PII, high quality, point-of-enforcement decision making.

This is something we have all the tools to deliver quickly.  Maximal separation of data concerns minimizes risk of abuse and can adapt to worldwide conditions across a huge range of technical realities.  This opens the door to improvement via SSI technology - specifically agents and trustees, and sophisticated personal information wallets.

best,

 -e


On Sun, Apr 12, 2020 at 12:28 AM orie <orie@...> wrote:
Based on feedback from Daniel Hardman and Adrian's comments, I'm planning on implementing a new ImmunoglobulinDetectionTest schema.

The first format was aimed at anyone with a face, and assumed a cassette test and that the credential subject has a DID.

There pros and cons to that approach... the most obvious con is that absolutely nobody has DIDs.

I like the idea of splitting up the sample collection part and rest results part into 2 credentials, and using existing identifiers and hashing to link them.

Here would be the new user story:

1. Subject drives up to a tent in a parking lot.

2. Testing Facility Checks some id ( "presentedIDType: a picklist with strings such as "drivers license", "passport", "national ID card", etc " )

3. Testing Facility Collects blood sample and issues a "SampleCollectionCredential"

subject = sha256 ( presentedIDType + presentedIDNumber + 8-DIGIT-PIN )
presentedIDType (repeated in credential)
testResultURL: https://example.com/covid-19/vc-test-results/ ( subject )

credential is provided on paper, to the subject after sample is taken (multiple copies are provided, and it's safe for them to be copied further).

4. sample get sent to lab... days go by, etc...

5. Subjects can check a registry for their test results, when test results are ready they are published at a URL, which is provided to them in their credential.

testResultURL: https://example.com/covid-19/vc-test-results/ ( subject )

6. Subject can present test results to TSA / Law Enforcement when traveling by presenting their "SampleCollectionCredential" , whatever ID type they used for it, and disclosing their 8 DIGIT PIN

7. Verification is as follows
7.1 confirming the face / gender / eyes / height (etc) of the ID Card used for "SampleCollectionCredential"
7.2 Verify "SampleCollectionCredential" (no VP here, since the subject has no keys / DID).
7.3 Confirm  subject = sha256 ( presentedIDType + presentedIDNumber + 8-DIGIT-PIN ) (website helps them do this)
7.3 Lookup testResultURL: https://example.com/covid-19/vc-test-results/ ( subject )
7.4 Verify "SampleTestResultCredential"
7.5 Apply allow / deny list (any other business logic rules)


Only the issuers would have DIDs in this scenario, and there would be no signed verifiable presentations.

anyone with presentedIDType + presentedIDNumber + 8-DIGIT-PIN, could claim a test result belonged to them, and its the responsibility of the verifier to check the presentedIDType.

Assuming that the presentedIDType where digital and that SampleCollectionCredential were digital, a Presentation that included the disclosure of the 8-DIGIT-PIN could be made over any transport that was supported ( CHAPI / DIDComm / Bluetooth )

OS










On Sat, Apr 11, 2020 at 5:40 PM Adrian Gropper <agropper@...> wrote:
inline...

On Sat, Apr 11, 2020 at 6:06 PM Orie Steele <orie@...> wrote:
I'm not sure the exact context but I think the following is equivalent to what you are suggesting.

The context is getting through a door by showing the bouncer your phone the way you might a driver's license. This is not like showing a boarding pass to the gate agent because there's no pre-registration. The main issue in this context, is whether the bouncer thinks you've borrowed somebody else's license or tampered with your own license. There is no privacy issue as long as the bouncer promises not to save any PII after he makes a decision to let you in or not.

1. Testing facility draws blood from people with driver's licenses.
2. Testing facility labels samples with unique id = sha256(drivers license + salt).
3. When I log into the lab (how? by email, phone number?, anybody can see results?)... I can find the result of my test, which contains no PII, by knowing my drivers license number and the salt.

By the hash. The hash was generated by your wallet. The lab never sees the actual driver's license.

The lab issues a VC to the hash that includes your test result.
 
4. When someone asks for my results, I can show them my drivers license and disclose the salt, and the can go download the results themself, or confirm that the results I provided have the same identifier... but I also need them to believe that whatever I provided them has not been tampered with, hence they also verify the VC.

I'm not sure about the salt. If I trust the verifier not to store the data from my driver's license, I can let them calculate the hash (with salt) and match it to the VC that was issued to the hash.

There are a couple things combined in these which it's probably a good idea to seperate.

1. VC Format (sample identifier is a deterministic function of an existing identifier, and we trust the test facility to generate this).

Not necessarily. The association between the sample and the driver's license may be made by a nurse that draws the blood. The nurse saves nothing but might sign the hash with her credentials in order to be held accountable.

2. VC is signed at the lab, not at the point of collection... so we trust the lab, they are the issuer, and we trust them not to change the identifier from the facility... the lab does not need any PII to do its job... unless they provide a web portal to log in with pii.. if they just disclose test results publically, then they don't need PII.

Yes, mostly. The result is accessed by the hash. This is similar to how COVID bluetooth proximity schemes (Google and Apple announcement) are based on pseudorandom rotating IDs that can only be re-identified by the issuing app which is under the control of the subject (the wallet).

3. VCs are disclosed via some permission system (login)... not defined, but I would assume sms / email / IAL Level 2... implies the lab needs PII. or that the login mechanism is knowing a sample-id...

No. The lab has zero PII. 

 
4. Presentation of the VC includes the disclosure of information which can be used to bind an existing identifier (drivers license) to the rest results...

Yes.

VCs for tests that have to be retrieved from the lab are harder than ones that come from a facility that just does everything, which is why i tackled the easy case first... but I guess its probably much more realistic that people would have to wait  / lookup their results.

Yes.

Thank you,
- Adrian

OS



On Fri, Apr 10, 2020 at 9:06 PM Adrian Gropper <agropper@...> wrote:
Apologies if I missed the answer elsewhere but I'm lost when it comes to the photo on a driver's license. I'll ask again:
- The state driver's license in my pocket has a photo, a license number, and some tamper-resistant features.
- When I go get a serology test, the person drawing my blood might look at my license, write a hash of my license number and photo on the tube and send to the lab
- I log in to the lab, search for the hash that was put on the tube and download the result to my wallet
- When someone asks for my result, I show them my driver's license with my photo and the hash of the number matches the VC I received from the lab

So,
- My photo and the driver's license number never left my wallet.
- A tamper-resistant scheme has to prevent me from changing the photo on my license without also changing the hash that labels the sample and the result.

What am I missing?

- Adrian

On Fri, Apr 10, 2020 at 11:56 AM Orie Steele <orie@...> wrote:
CC'ing the W3C Mailing list, since this discussion of COVID-19 Credentials has been discussed there as well...

Most of the attributes are just leftovers from basing the credential on a Permanent Resident Card.

I'm not sure how the VC Data Model values would be collected, but it's sometimes the case that an organization will use birthdate, gender and name to double check that things like SSN / Driver's License are accurate (I've seen this kind of overcollection in healthcare, for this exact reason)... people make mistakes when entering data, having a group of values to check against, helps mitigate the damage caused by these mistakes, but it's not a perfect solution.

I was expecting some request for a binding to a SSN / Drivers License... I'm not sure that's actually a good idea, but I'm not an expert.

My thought was that this credential could be provided by a laptop computer in a tent, to people who have no existing identification (persons experiencing homelessness, refugees, etc...)

Obviously you don't need a picture or any of the PII fields if you are just going to bind to another identity system like drivers license number... but that credential won't work for people who are not registered... 

The credential format could be expanded to include either a binding to a well known identity system, OR the current approach... that might give us the best of both worlds.

If you can leave comments on the PR, that will help make sure that other communities (outside of these mailing lists) can see your thoughts.

Thanks for the feedback!

OS



On Fri, Apr 10, 2020 at 9:11 AM Daniel Hardman <daniel.hardman@...> wrote:
Regarding Eric's comments about identifying the subject:

The strategy proposed in the schemas doc in a couple places [1, 2] is to provide just enough information about the holder to let them be linked to other credentials (physical or digital/VC) that provide strong identification as needed. Orie's example is mostly aligned with this proposal, though its birthdate + photo may be a little more than is needed. The reasoning behind this is that a lab isn't going to be authoritative about facts of birth, and probably isn't going to take a photo of each test subject, but probably will check a stronger form of ID when the test sample is submitted -- so whatever form of ID they check, they need to embed just enough info about the holder in their results to allow the holder to present the same strong identification later.

An example of how this could be tweaked to embody the proposal a little better might be to remove the photo and birthdate fields, and to add the following two fields:

presentedIDType: a picklist with strings such as "drivers license", "passport", "national ID card", etc
presentedIDNumber: the number from whatever strong identification the test subject supplied when submitting the sample

Now it becomes clear how Eric can explain the trust dynamics to a harried government official: "The testing regime has the same trust dynamics as our national ID card/passport/driver's licenses, because that form of ID has to be used to submit a sample, and the same ID has to be used when presenting the test results."

On Fri, Apr 10, 2020 at 3:58 AM Eric Welton (Korsimoro) <eric@...> wrote:
Fantastic!  Thanks!

I have a two questions and am thinking about how I could summarize/present this to a government minister and relate it to a paper form version of the same.

First question: what is a TestCard? and what role does that play?

Second - and this is a question that is more "general" - i'm not nitpicking this specific example, but wondering more about credential design in general and how we want to deal with the issue of subject identification:

- in addition to IgG and IgM - the context explicitly out a name-pair, birthday, and something to do with the subject's sexuality, and the Person structure from schema.org is called out, where most of the fields in the Person model are not particularly useful for identifying a Person but more about "describing" a Person or Person-like thing.

Taken together, the presented information doesn't let me easily point to a Person in a way that is immediately useful to me - for my use cases, I would imagine one of the two:
- a national id number or semantic model, with optional image (citizens)
- a passport semantic model, with optional image (foreigners)

I don't see this as a deep problem, because I can always build up context that matches the identification context relative to my expected use context - e.g. I want a checkpoint guard to be able to see the IgM/IgG information, an F2F presented plastic national id card or passport, and make a policy enforcement decision.

So the question is just more generic - drawing on this example as a starting point and using it to explore guidance - how can we do this systematically so that we don't have covid credentials that vary for every issuance context based solely on the properties of "subject identification"?

One option is to push that out of the credential entirely, and let that come from the wallet or alternate documents provided during presentation - linked only by cryptographic material.  But that brings in a raft of problems and would be a hard sell in a 30 second elevator pitch to a busy and distracted government minister - especially one with a mental model of a physical form with tons of lateral information on it.

The other option is to try to "define the subject information" in the credential over and over - like, family name, given name, birth date, sexual idiosyncracies, DUNS number, brand, funder, honorificSuffix, interactionStatistic, product offerings, performances, employer, or many of the other Person attributes ;)

Perhaps a strategy of figuring out how to pool information in loosely coupled groups - e.g. only the Ig* values in one group, the person identification in another - perhaps as a one-or-more-of-many selection - there might be a pattern we can establish here that clearly isolates the human-identification-variability from the relatively stable science-driven covid-19 data.

again - my concern is for explaining this to a non-technical politician as soon as Monday - and we assume that person has an existing mental model, one that looks like "all the other test result documentation" they've seen - with a bunch of socially-specific subject identification information, issuer identification information, document photocopies, and signatures, stamps, and more signatures, and more stamps - in red, for extra authentication and security.

best,

  -e



On Fri, Apr 10, 2020 at 1:06 AM orie <orie@...> wrote:
https://github.com/w3c-ccg/vc-examples/pull/30

Based on the new schema.org definitions for COVID-19 testing facilities and the DHS SVIP hypothetical Permanent Resident Card. 

Issued from a did:web, Presented by a did:key.

Comments welcome.

--
ORIE STEELE
Chief Technical Officer
www.transmute.industries




--
ORIE STEELE
Chief Technical Officer
www.transmute.industries




--
ORIE STEELE
Chief Technical Officer
www.transmute.industries




--
ORIE STEELE
Chief Technical Officer
www.transmute.industries


Join {main@toolsCCI.groups.io to automatically receive all group messages.