AI Security: Don't Share Sensitive Data

Protect your sensitive information when using artificial intelligence tools

⚠️ Golden rule:

NEVER paste real sensitive data in an AI prompt. Always use placeholders and replace afterward.

What You Should NEVER Share

API Keys and tokens: OpenAI, AWS, Stripe, GitHub...
Passwords: Of any kind
Personal data: Real emails, phone numbers, addresses
Financial information: Card numbers, bank accounts
Production secrets: Real environment variables
Customer data: Names, emails, histories
Proprietary code: Your company's secret algorithms

How to Sanitize Your Code Before Pasting

Replace with placeholders

DON'T do this:

const API_KEY = "sk-proj-abc123def456ghi789";
const DB_URL = "mongodb://user:pass123@db.example.com:27017";

Do this:

const API_KEY = process.env.API_KEY; // "sk-proj-xxx"
const DB_URL = process.env.DB_URL; // "mongodb://user:pass@host:port/db"

Safe Prompts

Debug with sanitized data

"I have this error when connecting to the database:

[code with placeholders]

Environment variables (fictional values):
- DB_HOST: localhost
- DB_PORT: 27017
- DB_USER: admin
- DB_PASS: [REDACTED]

Error: ConnectionTimeoutError"

Secure configuration

"I need to configure JWT authentication for my API.

My stack: Node.js + Express + MongoDB
Don't include real secrets, use process.env

Generate:
1. Authentication middleware
2. Login function
3. .env.example with the required variables"

Tools for Sanitizing Code

Environment variables: Always use dotenv
.env.example: Share only the structure
Secrets managers: AWS Secrets, Vault, Doppler
Pre-commit hooks: Detect secrets before committing
git-secrets: Prevent committing credentials

Security Checklist

Did I remove all real API keys? Did I replace passwords with placeholders? Did I use fictional data for examples? Did I remove customer information? Did I verify there are no tokens in URLs? Did I check logs and stack traces for sensitive data?

What to Do if You Shared Data by Accident

Immediately change the exposed API key or password
Revoke the compromised token or credential
Delete the chat from your history if possible
Notify your security team
Document the incident to prevent repetition

Automatic Sanitization Tools

There are tools that automatically detect and block the exposure of sensitive data in your code. Integrating them into your workflow is one of the best security investments you can make.

dotenv and environment variables

The dotenv package is the standard way to manage environment variables in Node.js projects. It allows you to store all credentials in a .env file that is never pushed to the repository, keeping code clean and secure.

Installation and basic usage:

npm install dotenv

# .env file (NEVER push to git)
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
JWT_SECRET=my-super-secret-12345
STRIPE_KEY=sk_test_abc123

# .env.example file (DO push to git)
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
JWT_SECRET=your-secret-here
STRIPE_KEY=sk_test_xxx

# In your code:
require('dotenv').config();
const db = connect(process.env.DATABASE_URL);

AWS git-secrets

git-secrets is a tool developed by AWS that scans your commits and prevents you from pushing credentials to the repository. It installs as a git hook and analyzes each change before it's committed.

git-secrets configuration:

# Installation
brew install git-secrets  # macOS
# or download from github.com/awslabs/git-secrets

# Configure in your repository
git secrets --install
git secrets --register-aws

# Scan entire history
git secrets --scan-history

# Result if it detects a secret:
# [ERROR] Matched one or more prohibited patterns
# file:config.js, line:3
# secret: AKIAIOSFODNN7EXAMPLE

truffleHog and secret detection

truffleHog is an open-source tool that searches for exposed credentials in the complete history of a git repository. It uses entropy analysis and regular expressions to find API keys, tokens, and passwords that may have been leaked in previous commits.

Using truffleHog:

# Installation
pip install trufflehog

# Scan local repository
trufflehog filesystem /path/to/repo

# Scan remote repository
trufflehog git https://github.com/user/repo.git

# Example output:
# Found verified result 🐷🔑
# Detector Type: AWS
# Decoder Type: PLAIN
# Raw result: AKIAIOSFODNN7EXAMPLE
# Commit: abc123 - "added config file"

Gitleaks and pre-commit hooks

Gitleaks is another popular tool that integrates easily with pre-commit hooks. It detects more than 90 different types of secrets, including cloud service keys, SaaS platform tokens, and database credentials.

Pre-commit configuration:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

# Install hooks
pip install pre-commit
pre-commit install

# Now every commit will be scanned automatically

Security Best Practices in Development

Security is not just a one-time task, but a habit that should be integrated into every step of the development cycle. These practices will help you build more secure applications from day one.

Principle of least privilege

Every component in your system should have only the permissions necessary to function. A Stripe API key in test mode doesn't need production permissions. A read-only database user shouldn't have write permissions. Configure granular roles for each service and environment.

Regular credential rotation

Establish a rotation schedule for API keys, tokens, and passwords. Services like AWS offer automatic key rotation. For keys that can't be rotated automatically, create quarterly reminders. Document the rotation process so any team member can execute it.

Environment separation

Keep completely separate environments for development, staging, and production. Each environment should have its own credentials, databases, and configurations. Never use production data in development. If you need realistic data, use fictional data generators like Faker or Mockaroo.

Environment structure:

# Recommended structure
.env.development      # Local variables
.env.staging          # Staging variables
.env.production       # Production variables
.env.example          # Template (no real values)

# In your code:
const env = process.env.NODE_ENV || 'development';
const config = require(`./config/${env}.js`);

# Each config file uses its own values:
# config/development.js  -> uses local services
# config/staging.js      -> uses staging services
# config/production.js   -> uses production services

Security review in Pull Requests

Include a security checklist in every Pull Request. Verify that no hardcoded credentials have been introduced, that new dependencies don't have known vulnerabilities, and that changes to authentication or authorization have been reviewed by a second developer. Tools like Dependabot and Snyk can automate part of this review.

Encryption of sensitive data

Sensitive data should be encrypted both in transit (HTTPS/TLS) and at rest. For passwords, use hashing algorithms like bcrypt or Argon2, never MD5 or SHA-256 without salt. For data you need to read back (like card numbers), use symmetric encryption with AES-256 and store encryption keys in a secrets manager.

Secure hashing example with bcrypt:

const bcrypt = require('bcrypt');
const SALT_ROUNDS = 12;

// Register user
async function hashPassword(plainPassword) {
  const hash = await bcrypt.hash(plainPassword, SALT_ROUNDS);
  await db.users.create({ password: hash });
}

// Verify login
async function verifyPassword(plainPassword, storedHash) {
  const isValid = await bcrypt.compare(plainPassword, storedHash);
  return isValid;
}

Regulations and Compliance

Depending on where you operate and what type of data you handle, you may need to comply with specific data protection regulations. Understanding the fundamentals of these regulations will help you make better technical decisions.

GDPR (General Data Protection Regulation)

GDPR is the European data protection regulation and applies to any company that processes EU citizens' data, regardless of where it's located. Its main requirements include: obtaining explicit consent to collect data, allowing users to access, correct, and delete their personal data, notifying data breaches within a maximum of 72 hours, and appointing a Data Protection Officer (DPO) when necessary. Fines can reach 20 million euros or 4% of global annual revenue.

CCPA (California Consumer Privacy Act)

CCPA grants California residents rights over their personal data, including the right to know what data is collected, the right to delete it, and the right to opt out of selling their personal information. It applies to businesses meeting certain revenue or data volume thresholds. From a technical standpoint, this implies implementing mechanisms to export user data, completely delete accounts (including backups), and respect privacy preferences.

Technical implications of compliance

Complying with these regulations requires implementing several technical measures: granular consent systems (cookies, marketing, analytics), API endpoints for exporting and deleting user data, audit logs documenting who accessed what data and when, encryption of personal data at rest and in transit, and data retention policies with automatic deletion. When using AI tools, be especially careful not to send users' personal data through prompts, as this could constitute an unauthorized data transfer.

Basic GDPR technical checklist:

[ ] Explicit consent before collecting data
[ ] Cookie banner with granular options
[ ] Endpoint GET /api/user/data - Export data
[ ] Endpoint DELETE /api/user/account - Delete account
[ ] Audit logs for personal data access
[ ] AES-256 encryption for sensitive data in DB
[ ] HTTPS mandatory on all endpoints
[ ] Retention policy with auto-deletion
[ ] Breach notification process (< 72h)
[ ] Data processing agreements with third parties

Frequently Asked Questions

Does AI save my conversations? Can they use my code to train models?

It depends on the service. ChatGPT (free version) may use your conversations to improve its models, although you can disable this in settings. Paid versions like ChatGPT Plus with "Data Controls" disabled and OpenAI APIs don't use your data for training. Claude by Anthropic doesn't train with user data. Always check the data policy of the service you use and, when in doubt, never share sensitive information.

What should I do if my company doesn't have security policies for using AI?

Propose creating an internal policy. In the meantime, follow the principle of maximum caution: treat all company information as if it were confidential. Don't share proprietary code, customer data, credentials, or financial information with external AI tools. If you need to use AI for sensitive tasks, suggest on-premise or self-hosted solutions like Ollama or LM Studio that process data locally.

Is it safe to use AI extensions in my code editor (Copilot, Codeium, etc.)?

AI extensions like GitHub Copilot send fragments of your code to their servers to generate suggestions. GitHub states it doesn't store your code or use it to train models if you have privacy settings enabled. However, the code you send as context may accidentally include credentials or sensitive data. Configure extensions to exclude sensitive files (.env, config files) and always review suggestions before accepting them.

How can I verify if my repository has exposed credentials?

Use tools like truffleHog, gitleaks, or git-secrets to scan the complete history of your repository. GitHub also has an automatic "secret scanning" system that detects tokens and keys from more than 200 services. If you find exposed credentials, it's not enough to remove them in a new commit: you must rotate them immediately, as they remain in git history. To remove them from history, use tools like git-filter-repo or BFG Repo-Cleaner.