Managing Credentials in Production

Costly mistake

A common mistake that I often saw junior data scientist made is writing token access and credentials directly into the Python script (see image below). This is especially true for those who seldom work in a production environment. It is a bad practice because when we push our code to GitHub, our access tokens will be pushed as well and it will pose a security threat if bad actors get hold of the credentials.

# This is our source code Python file

ID = "test123"
PASSWORD = "test123"

Use .env files

The proper way to do this is to create a .env file in the working directory. We can put all our credentials inside the .env file.

# This is our .env file

ID="test123"
PASSWORD="test123"

We can then import the environment variables into our script.

# This is our source code Python file

from dotenv import load_dotenv
import os

load_dotenv()

storage_account_name = os.getenv("ID")
storage_account_password = os.getenv("PASSWORD")

We can import the credentials easily into our bash scripts as well.

#!/bin/bash

# Source the .env file to set the environment variables
source <file_path>/.env

# Load credentials
storage_account_name=$ID
storage_account_password=$PASSWORD

The final step is to include the .env in the .gitignore file. Files that are listed in .gitignore will not be pushed to GitHub and will remain only in our production environment.

# This is our .ignore file

.env