Home >Technology peripherals >AI >Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

王林
王林Original
2024-07-30 02:06:441066browse

Recently, a piece of news shocked the open source community: deleted content on GitHub and data in private repositories can be permanently accessed, and this was intentionally designed by the official.

Open source security software company Truffle Security detailed the issue in a blog post.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Truffle Security introduces a new term: CFOR (Cross Fork Object Reference): when a repository fork has access to sensitive data in another fork (including data from private and deleted forks) A CFOR vulnerability exists.

Similar to unsafe direct object references, in CFOR, users can directly access commit data by providing the commit hash value, otherwise the data is invisible.

The following is the original content of the Truffle Security blog.

Access data from a deleted fork repository

Imagine the following workflow:

  • Fork a public repository on GitHub;

  • Commit the code to your fork repository;

  • You delete your fork repository.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

So, the code you submitted to the fork should be inaccessible, right, because you deleted the fork repository. However, it is permanently accessible and not under your control.

As shown in the video below, fork a repository, submit data to it, and then delete the fork repository, then the "deleted" submitted data can be accessed through the original repository. Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

This situation is common. Truffle Security investigated 3 frequently forked public repositories of a large AI company and easily found 40 valid API keys from the deleted forked repositories.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Accessing data from a deleted repository

Consider the following workflow:

  • You have a public repository on GitHub;

  • User forks your repository;

  • You commit data after they fork, and they never sync their forked repository with your updates;

  • You delete the entire repository.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Then, the code you submitted is still accessible after the user forks your repository.

GitHub stores repositories and forked repositories in a repository network, with the original "upstream" repository acting as the root node. When a forked public "upstream" repository is "removed", GitHub reassigns the root role to one of the downstream forked repositories. However, all commits from the "upstream" repository still exist and are accessible through any forked repository.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designedPrivate data and deleted content can be permanently accessed, GitHub official: intentionally designed

This situation is not unique, something like this happened last week:

Truffle Security submitted a P1 vulnerability to a large technology company, showing that they accidentally submitted an employee to GitHub The key to an account with critical access to the entire GitHub organization. The company immediately deleted the repository, but because the repository had been forked, commits containing sensitive data were still accessible through the forked repository, even though the forked repository was never synchronized with the original "upstream" repository.

That is, any code committed to a public repository is permanently accessible as long as the repository has at least one fork repository.

Access private repository data

Consider the following workflow:

  • You create a private repository that will eventually be made public;

  • Create a private build of that repository (via fork) and Commit additional code for features that are not intended to be public;

  • You make your "upstream" repository public and keep your fork repository private.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Then, private features and related code are available for public viewing. Any code committed between the internal fork of the repository where you created the tool and the open source of the tool will be accessible through the public repository.

After you make your "upstream" repository public, any commits made to your private fork repository will not be visible. This is because changing the visibility of a private "upstream" repository results in two repository networks: one for the private version and one for the public version.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designedPrivate data and deleted content can be permanently accessed, GitHub official: intentionally designed

Unfortunately, this workflow is one of the most common methods used by users and institutions when developing open source software. As a result, confidential data may be inadvertently exposed on GitHub public repositories.

How to access data?

Destructive operations in the GitHub repository network (like the 3 scenarios above) remove references to commit data from the standard GitHub UI and normal git operations. However, the data still exists and is accessible (commit hash). This is the link between CFOR and IDOR vulnerabilities.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

commit hashes can be brute-forced via GitHub's UI, especially since the git protocol allows the use of short SHA-1 values ​​when referencing commits. A short SHA-1 value is the minimum number of characters required to avoid a collision with another commit hash, with an absolute minimum of 4. The key space for all 4-character SHA-1 values ​​is 65536 (16^4). Brute forcing all possible values ​​can be achieved relatively easily.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Interestingly, GitHub exposes a public events API endpoint. You can also query commit hashes in an event archive managed by a third party and keep all GitHub events from the past ten years outside of GitHub, even after the repository is deleted.

GitHub Provisions

Truffle Security submitted its findings to GitHub officials through GitHub’s VDP program. GitHub responded: "This is by design" and attached documentation.

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Private data and deleted content can be permanently accessed, GitHub official: intentionally designed

Documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/what-happens-to- forks-when-a-repository-is-deleted-or-changes-visibility

Truffle Security applauds GitHub for being transparent about its architecture, but Truffle Security believes: Ordinary users view the separation of private and public repositories as a security boundary, and Public users are considered incapable of accessing any data in private repositories. Unfortunately, as mentioned above, this is not always the case.

Truffle Security concluded that as long as a fork repository exists, any commits to that repository network (i.e. commits on the "upstream" repository or the "downstream" fork repository) will persist forever.

Truffle Security also makes the point that the only way to securely fix compromised keys on public GitHub repositories is through key rotation.

GitHub’s repository architecture suffers from these design flaws. Unfortunately, the vast majority of GitHub users will never understand how repository networking actually works, and will compromise security as a result.

Original link: https://trufflesecurity.com/blog/anyone-can-access-deleted-and-private-repo-data-github

The above is the detailed content of Private data and deleted content can be permanently accessed, GitHub official: intentionally designed. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn