Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Discovery fixes and features #692

Closed
tegefaulkes opened this issue Mar 28, 2024 · 15 comments
Closed

General Discovery fixes and features #692

tegefaulkes opened this issue Mar 28, 2024 · 15 comments
Assignees
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Mar 28, 2024

Specification

This is an epic tracking the current work related to Discovery. There are a few things that need to be addressed. There are a bunch of existing issues but they need to be flattened out into some atomic tasks that can be done separately. This Epic will be handled by @tegefaulkes and @amydevs .

Currently there is no feedback when discovery is being done. Given this is a backgrounded system, in the past this would've been tricky to address but with work with event systems and the audit domain we've worked out a lot of the kinks when it comes to addressing this. We'll need to add a CLI command that will output discovery steps as they happen. Some degree of filtering needs to be involved as well. I think it makes more sense for this to be an audit domain command since it shares a lot of similarity with the connections auditing.

Currently there is a bug with discovering identities. By design there shouldn't be any constraint when an identity claims multiple nodes. But there is a bug where multiple cryptolinks to an identity is not recognised. So our logic is not handling multiple gists.

We're missing periodic re-discovery. So it seems that discovery always needs to be triggered. When a node is discovered, it needs to be added to the discovery queue to be discovered again after a period of time.

The GestaltGraph isn't updating with new information. This needs to be investigated. The gestalt graph is updated by the discovery process. So if there is a failure there or no re-discovery is being done then that could be the cause.

We need some quality of life features to streamline the sharing and permissions process. I think right now we can't set ACL permissions unless a node already exists in the GestatltGraph. We can also trigger automatic discovery identities that are friends of your linked identity.

We need some way of handling dead nodes and revocation of links. I'm unsure if we check with both sides of a link before considering it valid. Any certificate indicating a cryptolink has singnatures of both sides so it can be validated without actualy contacting anyone. However we need to validate links to see if they're still valid. In this case if a gist is deleted then that would invalidate the link, or if the claim is misssing on a nodes sigchain. Basically if we can't find the original copy of a link then we need to consider the link revoked.

The discovery logic is a little messy right now. Parts of it can be factored out into protected utility functions. But generally readability of the domain needs to be improved.

Additional context

Tasks

These are the sub-issues for addressing each point above.

  1. General feedback for the discovery process. - Discovery background processing feedback Polykey-CLI#162
  2. Bug with handling multiple cryotlinks for a single identity. - Failure to Update Gestalt with New Node ID and Issues with Discovery Mechanism Transparency Polykey-CLI#163 Discovery - revisiting Gestalt Vertices and error handling #328
  3. Periodic re-discovery and make sure the GestaltGrapth is updating with new information. - Background Discovery Mechanisms #691
  4. Quality of life and streamline features such as automatic friend discovery for identities -
  5. Handling dead links and gestalt revocation - identities unclaim command Polykey-CLI#164
  6. General discovery code cleaning and refactoring -
@tegefaulkes tegefaulkes added the development Standard development label Mar 28, 2024
Copy link

linear bot commented Mar 28, 2024

@tegefaulkes tegefaulkes added the epic Big issue with multiple subissues label Mar 28, 2024
@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 28, 2024

#462 - This is a pretty broad issue that wants to address a few problems. Good for reference but its too bloated. It's relevant but for the sake of dividing up work I need more atomic issues.

I also need to create a new issue to address Quality of life and streamline features such as automatic friend discovery for identities. I'll need to spec that out some more, we need to work out some pain points with sharing vaults and discovery to get a better idea for this. @CryptoTotalWar

And one more issue for handling dead claims.

@tegefaulkes
Copy link
Contributor Author

Some quick notes.

  • How to handle automatic discovery of peers,
    • We need to handle is as tasks, there would be one step where each peer is queued,
    • Checking each peer would be its own task.
    • We need to factor in rate limiting for API calls, but we don't want to block tasks, while doing so.
  • Rediscovery, the existing issue for TTL actualyl covers this mostly with some extra stuff. The extra stuff can be separated out from that later.
  • QOL is the discovering peers but also other things that streamline the discovery/sharing process. We need some user input for pain points to expand on this.
  • Handling dead links and revocation. Two parts, the gist can just be deleted from the identity. But sigchain claims can't be deleted. They can only be unclaimed with a new claim on the sigchain. So when an identity is un-linked the gist is deleted and the claim is revoked with a new claim. Node to node claims have both sides revoke the claim. but it's possble that only one side is actually revoked. In this case the claim is technically invalid. but that leave 3 levels of validity, both claims are found, Only one side is found, or neither is found. Only having both claims makes the link valid.

Moving forward I'll start on the re-discovery logic in issue #691. @amydevs will start on looking into the bug with issues MatrixAI/Polykey-CLI#163 #328

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Apr 8, 2024

Discovery progress report.

Only thing that's been addressed so far is updating visited vertex tracking and skipping.

We still need to do everything else. Highest priority internal stuff is

  1. rediscovery - 2 days
  2. Better error handling - 1 day
  3. Retrieving only new claim information 1 day

External CLI stuff

  1. Identities status command - needs to be scoped out more
  2. Identities unclaim and handling dead cryptolinks - needs to be scoped more.
  3. Streamlining features - Needs to be scoped more and an issue created.
  4. General code cleaning and refactoring - Could use an issue, but so far I've been addressing it as needed in my current work.

External but related.

While technically not a discovery problem, @amy is addressing the poor feedback with vaults share command and as part of that, upgrading the notifications domain. #695

@tegefaulkes tegefaulkes closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2024
@tegefaulkes tegefaulkes reopened this Apr 11, 2024
Copy link
Contributor Author

tegefaulkes commented Apr 12, 2024

Here is a general diagram of what a social network would look like.

Untitled-2024-01-23-1145 excalidraw

This is the kind of network we need to preform discovery on. The network is esentially made up of a graph, containing verticies made up of identities and polykey nodes, and edges forming links between them.

There are 3 tiers of edges.

  1. Cryptolinks, These are the most concrete form of a link. You can think of a gestalt as a fully formed distinct sub-graph made up of JUST cryptolinks. Cryptolinks are depicted as black arrow edges above. The circles grouping them are the gestalts.
  2. Trust and permission links, depicted as the blue arrows above. These are the main relationships between nodes. There are gestalt level permissions such as trusting that gestalt. And node-node level permissions such as sharing a vault. These edges form a relationship between gestalts such that we want to know more about them since we're directly interacting with them.
  3. There are weak relationships between identities. Depending on the kind of identity they could be friends, followers, part of the same group, whatever. It just implies a social relation between two identities. These exist outside of the Polykey ecosystem and don't really affect the interaction within Polykey. But it's useful to know about for inviting friends into the polykey ecosystem, for finding friends already using polykey.

Currently Polykey discovery only operates on the first tier of edges. So only whole gestalts are discovered and the user needs to manually trigger discovery on each gestalt to discover them.

To address task 4 in the above issue description, Quality of life and streamline features such as automatic friend discovery for identities we need to make some upgrades to the discovery system. We need the ability to do the following

  1. Follow permission links between gestalts to discover them in the background. We only really need to follow our own permissions.
  2. Allow the ability to trust or set permissions between gestalts or nodes without having to discover them first.
  3. Trusting or sharing should trigger background discovery.
  4. Starting up Polykey should trigger initial discovery on our own node moving outwards.
  5. We should check social level (tier 3) edges to enable the following.
    1. Compile a list of friends/followers to invite to use Polykey.
    2. Find friend/followers that already use Polykey.
  6. We'd need to have a priority system for processing tier 2-3 edges. Social edges alone could crowd out all other forms of discovery and grind useful discovery to a halt.

As a note, I want to avoid indiscriminate discovery across social links. Social links alone will form a very large graph of potentially all global identities. And we don't need to know about most of them unless we decide to trust them. So rather than a social link being processed and further links queued via it. I'd rather trigger further discovery via the action to trust an identity.

This ties back to the 3 rings of what we care about in the gestalt network. We only really need to track the first 2 rings. That would form a reasonable amount of data to handle.

  1. Our own gestalt
  2. Gestalts we interact with
  3. Everything else.

Copy link
Contributor Author

After some discussion It'll be fine to explore first order social links and their gestalts. But we stop at follower of follower links. This should reasonably restrict our exploration space of the overall gestalt network.

Copy link
Member

Future optimisation is to focus on directed edges outwards for IdPs that have asymmetric links. For example an instagram user could have millions of followers, but only follow couple hundred people. Therefore auto-discovery would focus on discovering outward directed edges, not inward directed edges. For IdPs that only have bidirectional edges, they generally have more limited connectivity, which is fine to discover it all - like LinkedIn and Facebook, but we would add heuristics to prioritise recency and activity and other metrics measuring "closeness" if possible.

The need to discover immediate neighbourhood is essential for UX reasons.

Copy link
Member

This issue's title is way too vague. Can this be made more specific?

Copy link
Member

CMCDragonkai commented Apr 19, 2024

BTW ENG-31 @pablo.padillo this diagram is a great addition to the decentralized trust network concepts that should go into the docs.

@tegefaulkes
Copy link
Contributor Author

This issue's title is way too vague. Can this be made more specific?

It's vague because it's a parent issue for a bunch of smaller tasks relating to the Discovery domain.

@CMCDragonkai
Copy link
Member

What's the status of this? Can we close this and remaining discovery issues be turned into new issues.

Copy link
Contributor Author

Current status is this. All of theses will either have to be resolved or added back to the backlog. There is also a new bug report about discovery failing with multiple claims that will have to be looked into. But right now we're focusing on other stuff.

image.png

Copy link
Member

Can you set an estimate for this in relation to the subissues and allocate them the to appropriate cycles. I think also you might need to consider how these translate to the 1.0.0 Project now. Since the Polykey CLI Beta Launch is now closed.

@tegefaulkes
Copy link
Contributor Author

Just 3 issues left for this. They're not really important enough to be addressed right now, not compared to current work. I'll probably move the renaming issues to the backlog and close out this issue.

@tegefaulkes
Copy link
Contributor Author

I'm closing this issue as all the most important issues here have been done. The have

@CMCDragonkai CMCDragonkai added r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy and removed epic Big issue with multiple subissues labels Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

No branches or pull requests

3 participants