identity verification service

Context (~2025.04.02): This document was originally shared within Tlon on ~2024.07.26 and contains the initial feature & implementation spec for the verifier service. This publicly accessible copy lacks discussion and roadmap sections, but is otherwise unaltered, provided for posterity.

overview

This document describes an identity verification service for Urbit ships. The primary end goal of implementing this service is supporting contact discovery.

The specification below aims to support the following features:

Verification: registering identifiers under users' own @ps.
Proving: Providing proofs of registration of identifiers, for display on profile pages and similar.
- And confirming or denying a specific registration on-demand.
Discovery: Finding users' @ps by providing verified identifiers.
- Being discoverable is opt-in per registered identifier.
- Discovery can happen before, during and after onboarding, for access gating, contact discovery, and "x joined" notifications respectively.

We define the different participants in the verification system as follows:

Verifier: Central service for proving and tracking identifier registration.
- This is an Urbit agent with access to "sidecars" for Earth-y verification processes.
- Users should be able to interact with third-party verifiers also, but Tlon will run the default one.
User: A person using a specific Urbit ship.
Ship: That specific Urbit ship.
- This runs an Urbit agent that facilitates and tracks the ship's interactions with verifiers. Should be accessible through the Tlon frontend.
Client: Device owned by the user, running any software that needs to interact with the discovery system.

verification and proving

Normally, all identifiers are registered as belonging to a specific ship. To register an identifier, a ship must initiate the process by sending the identifier to the verifier over ames.

During onboarding, it may be desirable to register an identifier prior to obtaining an urbit. To this end, the verifier may also support interactions over HTTP, authenticating not with Urbit IDs, but with short-lived authentication tokens generated by the verifier. Registrations made this way may be attached to a ship by sending the authentication token over ames, or expire after some period of non-use.

general verification process

Upon receiving the registration request, the verifier checks whether the identifier is already registered. If it is, the verification fails instantly. (And, for edge cases: if this becomes the case at any point during the registration process, the in-progress registration fails.)

xx or: should we just support binding the same identifier to multiple ships? this lets you go @mdfang->palfun and @mdfang->paldev, instead of @mdfang->palfun->paldev. in practice, many people already run multiple ships, even if it's just as testing ships.

If it is still unregistered, the verifier starts the registration process. The specifics of this vary depending on the kind of identifier being verified. It may involve the verifier talking to a "sidecar" or other external service.

The verifier sends updates back to the ship as needed. These may include instructions for user action ("do x", "tell me y", "wait for z"). The agent running on the ship should know how to present these to the user as necessary.

The ship, in turn, communicates back to the verifier as it completes its instructions. This may prompt the verifier to, well, verify that the correct action was taken, eventually completing the registration.

Completed registrations will bind the identifier to the ship, and produce some amount of metadata. This always includes the timestamp at which the registration was completed and an attestation signed by the verifier, and may include pointers to publicly-accessible proofs.

The attestation signed by the verifier is always sent back to the ship, who may share it as a "proof of verifier" that can be checked without having to contact the verifier directly. Of course, as long as it's online, it remains possible to ask the verifier "is this legitimate" about arbitrary registration claims. (Asking it such should require either the signed message, or be subject to stringent rate-limiting. See also the abuse prevention section below.)

identifiers & specific processes

We provide some suggestions for identifiers and their verification processes below. In the immediate term, phone numbers are the primary use-case, but some other identifiers would not be difficult to support.

Phone numbers: Tlon hosting already knows how to verify phone numbers. Concrete proofs beyond "we say so" are hard to provide here.
Email addresses: Tlon hosting already knows how to send emails. Verification could consist of the classic "tell us the code we sent". Concrete proofs beyond "we say so" may be hard to provide here. (Could probably do something with DKIM signatures, but not very ergonomic.)
Social media accounts: The user may be given a nonce, either generated on or provided to their ship, and be requested to post a Keybase-style "verifying myself" message on their social media account. The verifier retrieves the message, registers the account to the ship associated with the nonce, and stores a link to the message as proof.
- Alternatively, OAuth could be used to prove control of the account without having to post anything publicly.
Websites: Let's Encrypt has long supported the HTTP-01 challenge type for proving control over the server behind a domain name. Verification of this kind is trivially implemented Urbit-natively. The response could even be served by an Urbit itself, making the process fully automatic for domains that a user's ship is serving on. The proof is the link to the challenge URL.
- The DNS-01 challenge could be supported as well, but would be less trivial to verify.
Urbits: Making ~foo sign a message that says "I am ~bar" and subsequently making ~bar sign ~foo's signature is trivial, and can be as simple as pressing a button on both ships.

In general, nonces used for verification should:

Be generated by the verifier.
- Letting the user's ship generate them would let the user provide a still-present verification for someone else's identifier.
Not contain the name of the ship that's trying to register it.
- Otherwise these types of registration would be inherently non-private, letting anyone discover the ship name from a given identifier regardless of the user's verifier-side settings.
- The verifier can privately keep track of nonce<->ship bindings.

displaying and validating proofs

All registrations come with an attestation signed by the verifier's networking keys. These attestations may be presented alongside the claim to an identifier, like a "verified by" checkmark or otherwise. The attestation should be easy to copy into an Urbit, which can use its knowledge of the verifier's networking keys to validate the attestation.

In contexts where the visitor has control over the display code (that is, in local or self-serve contexts, as opposed to a profile page served by its owner) it may be possible to reach out to the verifier over HTTP to validate the attestation "live". But this may be considered overkill.

When publicly-visible proofs are available, it may be good to present a clickable link to the proof if possible, making it trivial to verify the proof. (As in, clicking "@mdfang" taking you to the account, but clicking the checkmark taking you to the verification tweet.)

revoking registrations

Users may ask the verifier to unregister any identifier they had previously registered.

Of course, the signed attestation may have been shared by the user, and be irrevocably "out there". Both the ship and the verifier should support being asked about attestations, and confirm/deny whether they are still valid or have been revoked (a la OCSP).

The verifier may rotate its networking keys. Attestations should specify which networking key revision they were signed with, and so should remain legible and valid after key rotation. But the verifier should send re-signed attestations to ships that ask for them, and ships should ask for them as soon as they hear of the key rotation.

chain of trust

xx put a verifier on zod, then register tlon's verifier on there as controlling tlon.io etc, to make it less "yes this verifier is ours just trust me bro"

discovery

The motivating use-case for the verification service is to make it possible to discover users' ships from identifiers they use elsewhere. In other words: as a user, I want to know the ship associated with a given identifier, if any.

The primary, must-solve challenge for discoverability is preventing abuse. More on this in the section about abuse protection below.

Another important aspect of discoverability is user control. The user may not want to be discoverable, or only be discoverable through specific identifiers or to other users matching certain criteria.

A user may set any registered identifier to either discoverable, undiscoverable, or only discoverable by others who have a registered identifier of the same kind. Additionally, a user may set themselves to discoverable, entirely undiscoverable, or only discoverable when a minimum number of identifiers is known.

Another discoverability option would be "mutual discovery", where a user's ship is only discoverable to users they themselves know. Doing this without storing legible social graphs on the verifier is likely possible, but very non-trivial. As such, we will not be implementing it in the short term. (See also TLON-2169 for further details.)

To discover someone's ship, the client presents the verifier with a bundle of identifiers they know that person has. If any of those are registered and pass the registrant's discoverability settings, the verifier responds with the corresponding ship name.

xx what if there are conflicts in the bundle? give response based on identifier "priority"? ie phone nrs > email > twitter > domains. or do "all or nothing"?

The client may periodically query the verifier even after onboarding has completed, to discover contacts that have since registered themselves.

Discovery should be supported over both ames and HTTP. The latter is important for the use case of gating access to hosting on knowing at least one person on the network. (xx should tlon just proxy the discovery request so it can verify the result, instead of trusting the client? arguably if you'd spoof to get onto urbit we should let you in regardless of known contacts)

privacy

The verifier necessarily needs to know the identifiers any given ship is registering. This may mean knowing sensitive, personally-identifying information (such as phone numbers). The verifier may obscure those values in its state to avoid accidentally exposing them to operators, but otherwise makes no attempts to hide them. A malicious actor with direct access to the verifier can discover the identifier mapping, and this is unavoidable.

Company-internal processes may help reduce the risk of unintended access to the mappings, but those are outside of the scope of this document.

User-facing privacy controls were discussed in the discovery section above.

Even for data the verifier doesn't intentionally remember, like the contents of discovery requests, this may still be retained in the verifier's event log. The urbit should be set up to prune its event log early and often.

The client may generate dummy identifiers and include them in its requests, to provide some plausible deniability.

abuse protection

The verifier knows which identifiers belong to the same person. Part of its intended functionality is to share this information with those who ask. Inherently, this leaks information all over the place.

As a user, I may not want the world to know my sensitive identifiers (such as my phone number), but also still want to be able to be discoverable through them. The assumption is that, so far, I have only shared my phone number with those I trust.

A naive verifier does not make finding someone's registered identifier any harder than scraping through all possible identifiers (by using the discovery feature, "do you know a ship with this identifier?"). For phone numbers (and most other identifiers) that space is small, making the brute-force attack trivial.

It's easy to come up with complicated hashing schemes that feel like they protect against this abuse. They don't. There is no way around the root cause of identifier possibility space being small. Instead, it is easier and safer to rely on properties of legitimate usage to detect and prevent abusive behavior.

contact book-based rate-limiting

We assume that the rate of change in any given user's contact book is low, and that that rate is much lower than the rate at which a malicious actor would want to scrape through all possible identifiers. In other words, the key difference between legitimate usage and abuse is that legitimate usage asks repeatedly about a slowly changing set of identifiers, whereas abusive usage asks about entirely new sets in rapid succession.

Leveraging that assumption to prevent abuse is slightly non-trivial, because we do not want to explicitly store identifiers that are being asked about. (See also the privacy section above.)

Hagen et al. (2022) describe a rate-limiting scheme they call Differential Contact Discovery (DCD) that solves for this challenge.

In short, DCD has the server store a hash of the set of identifiers contained in the previous request, and has the client send the previous request alongside the current request. This way, the server can tell how many _new_ identifiers the client is asking about, and do rate-limiting based on those, keeping re-checks of previously asked-about identifiers free. Crucially, the set of identifiers is hashed with a unique nonce (or salt) for each new request, and that nonce is stored by the client (not the server), making it practically impossible for the server to recover the set from its hash alone.

DCD should be relatively easy to implement and provides us with strong rate-limiting as a primary line of defense against scraping.

rate-limiting without ames

In the above, we implicitly assumed clients were doing discovery over ames, letting us bind request statistics to Urbit IDs. Acquiring a planet is non-trivial, or at least not completely free. It provides an immediate barrier and convenient anchor for request statistics.

However, during onboarding it may be desirable to let users do discovery prior to letting them acquire a planet. This in order to limit new users to only those who will have something to do or someone to talk to once they're on the network.

As mentioned earlier, to support this case, the verifier should support discovery over HTTP in addition to ames. But this begs the question: what does the client authenticate with? IP addresses are easy to come by and may not be sufficient to effectively rate-limit malicious actors.

Since the use-case for discovery over HTTP is so narrow (practically: one request prior to each onboarding attempt), we propose the following token-based scheme.

The verifier gives out authentication tokens to onboarding users. These tokens comes from a limited pool that replenishes slowly over time. Using such an authentication token, the client may make a single discovery request (with up to the amount of identifiers normally allowed by the rate-limiting), after which the token expires.

For additional defense, the verifier may gate access to an authentication token on verifying a phone number or other "high barrier" identifier. This implies supporting registration over HTTP as well, but could (in the phone number case) also be handled by hosting directly, which already has processes for verifying phone numbers.

paths not taken

private set intersection

Private set intersection (PSI) (for which a maximally private and scalable scheme is described by Hetz et al. (2023)) would let users do existence checks on identifiers without telling the service what specific identifiers they're interested in, and without the server showing them all registered identifiers. This approach is not useful in our discovery case, where we don't just want to know whether an identifier is registered, but also what ship it is bound to. We would want something more like "private map reading", which is a very different problem.

private contact discovery through secure enclave

Similarly, Signal's open-source contact discovery setup is not fit for our particular use-case. Their system relies on identifiers and addresses being the same thing (phone numbers). This lets them do something akin to PSI where the client talks to the server over an encrypted connection to a secure enclave, making it theoretically impossible for Signal to learn anything about users' contact discovery requests. In our case, again, we don't want to do existence checks, we want additional information matching the data we ask about. And, even running Signal's setup would be non-trivial, let alone adapting it to our very different use-case.

mutual contact discovery

As briefly mentioned above (and discussed in more detail in TLON-2169), mutual contact discovery, wherein a user is only discoverable to those they themselves know, is likely possible. (Hoepman, (2023).) However, it is sufficiently challenging to implement (dependent on specific cryptographic properties, no reference implementation) that we have chosen to punt on it. Notably, Signal does not support mutual contact discovery either.

Fang-/verification-and-discovery.md